Coming Soon

Expected availability announced soon

This course is in active development. Preview the scope below and create a free account to be notified the moment it goes live.

Notify me

ISACA CertificatesAssociateComing Soon

Data Science Fundamentals

The Data Science Fundamentals Certificate covers the foundational concepts of data science: the data lifecycle, statistics, machine learning at conceptual depth, data engineering basics, visualization, and the governance considerations specific to data products. Targeted at conceptual depth for non-practitioners.

Who Should Take This

Auditors, business analysts, IT generalists, and managers who work alongside data scientists and need a working vocabulary. Assumes basic computing and statistical literacy. Learners finish able to discuss data science projects with practitioners and recognize common patterns and pitfalls.

What's Included in AccelaStudy® AI

Adaptive Knowledge Graph

Practice Questions

Lesson Modules

Console Simulator Labs

Exam Tips & Strategy

13 Activity Formats

Course Outline

1Data Lifecycle

3 topics

Data Sources and Ingestion

Identify common data sources: transactional databases, APIs, log streams, file uploads, third-party feeds, sensors.
Identify ingestion patterns: batch, micro-batch, streaming, change data capture (CDC).

Data Cleaning and Preparation

Identify common data-quality problems: missing values, outliers, inconsistent formats, duplicate records, schema drift.
Apply data-cleaning techniques: imputation, outlier handling, deduplication, schema validation, normalization.
Analyze a data-quality scenario where 30% of a key column is missing and identify whether to drop the column, impute, or restructure the analysis.

Exploration and Feature Engineering

Identify exploratory data analysis (EDA) and identify common steps: distribution plots, correlation review, missingness inspection, target-distribution check.
Identify feature engineering as the creation of model-suitable inputs from raw data and identify common operations: encoding, scaling, aggregation, binning.

2Statistics

3 topics

Descriptive Statistics

Identify common descriptive statistics: mean, median, mode, standard deviation, variance, percentile.
Apply descriptive-statistics interpretation: identify when median is more informative than mean (skewed distributions, outliers).

Distributions and Sampling

Identify common distributions: normal, binomial, Poisson, exponential, log-normal — and identify a real-world example of each.
Identify sampling concepts: random sampling, stratified sampling, sampling bias, sample size.

Inference and Causality

Identify hypothesis testing concepts: null hypothesis, alternative hypothesis, p-value, significance level, statistical power.
Distinguish correlation and causation and identify common spurious-correlation pitfalls.
Analyze a 'correlation reported as causation' headline and identify the missing controlled experiment or causal inference required.

3Machine Learning

3 topics

Learning Paradigms

Distinguish supervised (labeled), unsupervised (unlabeled), and reinforcement learning and identify a representative use case for each.
Identify the typical supervised tasks (classification, regression) and unsupervised tasks (clustering, dimensionality reduction).

Common Algorithms

Identify linear regression, logistic regression, decision trees, random forests, gradient boosting, k-means, and neural networks at conceptual depth.
Apply algorithm-selection guidance: linear models for small interpretable problems, gradient boosting for tabular data, neural networks for unstructured data (text, images).

Deep Learning and Foundation Models

Identify deep learning as multi-layer neural networks and identify common architectures (CNN for images, RNN/Transformer for sequences).
Identify foundation models and LLMs as pretrained models that can be adapted to many downstream tasks.

4Model Evaluation

3 topics

Splitting and Cross-Validation

Identify train/test split, k-fold cross-validation, and identify the role of a held-out test set in unbiased evaluation.
Identify common pitfalls: target leakage, train-test contamination, ignoring time order in time-series data.

Classification Metrics

Identify accuracy, precision, recall, F1, ROC-AUC, and the confusion matrix.
Apply metric-selection guidance: precision when false positives are expensive, recall when false negatives are expensive, F1 when both matter.
Analyze a classifier with 99% accuracy on imbalanced data (1% positive class) and explain why accuracy is misleading.

Regression and Other Metrics

Identify common regression metrics: MAE, RMSE, MAPE, R² — and identify when each is appropriate.
Identify drift and degradation as the post-deployment failure modes that require ongoing monitoring.

5Data Engineering and Visualization

3 topics

Pipelines and Storage

Identify ETL/ELT pipelines and identify common tools: Airflow, dbt, Spark, Fivetran, AWS Glue.
Distinguish data warehouses (Snowflake, BigQuery, Redshift), data lakes (S3, GCS, ADLS), and lakehouses (Databricks, Iceberg, Delta).

Streaming and Real-Time

Identify streaming data systems: Kafka, Kinesis, Pub/Sub — and identify their typical use cases (CDC, telemetry, event-driven pipelines).
Apply batch-vs-streaming selection guidance: latency requirements, ordering needs, replay-ability.

Visualization

Identify common chart types and their appropriate use: bar (categorical comparison), line (trends over time), scatter (correlation), heatmap (matrix relationships), box (distribution comparison).
Analyze a misleading chart (truncated y-axis, dual axes, deceptive coloring) and propose a corrected version.

6Governance and Ethics

3 topics

Data Governance

Identify data governance topics: data ownership, stewardship, lineage, data dictionary, master-data management.
Identify the value of data lineage in audit and root-cause investigation.

Privacy and Sensitive Data

Identify privacy-by-design concepts: minimization, anonymization, pseudonymization, k-anonymity at conceptual depth.
Apply privacy-controls selection for a customer-analytics workload subject to GDPR and CCPA, balancing utility with minimization.

Model Ethics and Bias

Identify common sources of model bias: training-data imbalance, label bias, proxy variables, deployment-context drift.
Identify model-governance practices: model cards, evaluation across demographic slices, monitoring for drift, human-in-the-loop for high-stakes decisions.
Analyze a hiring-screening model that produced disparate outcomes across demographic groups and identify the diagnostic and remediation steps.

7Data Science in Practice

8 topics

Working with Stakeholders

Identify common stakeholder personas: business sponsor, subject-matter expert, data engineer, ML engineer, end-user, regulator.
Apply requirements-elicitation for a data-science project: convert vague business asks into measurable success criteria, identify the acceptable range of model errors.
Analyze a 'we just need a model that predicts churn' request and identify the missing scope (data access, labels, evaluation, deployment, monitoring).

Project Lifecycle and Reproducibility

Identify CRISP-DM and TDSP as standard data-science process frameworks and identify their typical phases.
Identify reproducibility concerns: random seeds, environment pinning, dataset versioning (DVC, LakeFS, Delta), experiment tracking (MLflow, Weights & Biases).
Apply experiment-tracking discipline so that 6 months later a stakeholder can answer 'how was this number produced?' without rerunning the world.

Productionizing Models

Identify deployment patterns: batch scoring, online inference, edge inference, embedded.
Identify MLOps concepts: CI/CD for models, model registry, A/B testing, canary deploy, shadow mode.
Apply 'shadow mode' deployment guidance for a high-stakes model: route requests to both old and new model, log differences, compare before cutover.

Generative AI in Data Science

Identify how foundation models change data-science practice: code generation, EDA assistants, synthetic data, data labeling, summarization of analytic results.
Apply generative AI safely in a data-science workflow: never paste sensitive data into public LLMs, use enterprise-licensed providers, validate generated outputs.

Data Science Career

Identify common data-science career paths: data analyst, data scientist, ML engineer, data engineer, applied scientist, research scientist.
Identify the spectrum of practitioner roles: business-analyst-leaning data scientist (SQL, viz, statistics) vs ML-engineer-leaning (production systems, MLOps).

Continuous Learning

Identify continuous-learning sources: arXiv (cs.LG, cs.CL, stat.ML), NeurIPS / ICML / ACL papers, fast.ai courses, Andrej Karpathy's series, Distill.pub archive.
Apply hands-on practice: Kaggle competitions, reproducing published results, contributing to open-source ML libraries.
Analyze the rapid-AI-evolution information overload and propose a focus strategy: pick depth areas, time-box exploration, accept that some advances you will not master.

Communicating Data Science

Identify common data-science communication failures: jargon overload, no business framing, 'all the metrics' instead of the relevant ones, ignoring uncertainty.
Apply audience-aware reporting: executive summary with one chart, technical appendix with full evaluation, code repo for reviewers.

Ethics in Practice

Identify common practical ethics scenarios: sensitive features in models, opt-out for users, transparency about AI involvement, dual-use concerns.
Apply structured ethical-review for a proposed model: who is affected, what failure modes exist, what redress is available, what oversight will be in place.

Scope

Included Topics

Data lifecycle: collection, ingestion, cleaning, exploration, modeling, deployment, monitoring.
Statistical concepts: descriptive vs inferential, distributions, hypothesis testing, correlation vs causation.
Machine learning concepts: supervised, unsupervised, reinforcement learning at conceptual depth.
Common algorithms: linear/logistic regression, decision trees, random forests, gradient boosting, neural networks.
Model evaluation: train/test split, cross-validation, accuracy/precision/recall/F1, ROC-AUC, confusion matrix.
Data engineering basics: pipelines, data warehouses, data lakes, lakehouses, streaming.
Visualization concepts and common chart types.
Data governance: quality, lineage, privacy, ethics, model governance.

Not Covered

Hands-on Python/R/SQL coding.
Deep mathematical foundations.

Data Science Fundamentals is coming soon

Adaptive learning that maps your knowledge and closes your gaps.

Create Free Account to Be Notified