🚀 Early Adopter Price: $39/mo for lifeClaim Your Price →
Data Science Fundamentals
Coming Soon
Expected availability announced soon

This course is in active development. Preview the scope below and create a free account to be notified the moment it goes live.

Notify me
ISACA CertificatesAssociateComing Soon

Data Science Fundamentals

The Data Science Fundamentals Certificate covers the foundational concepts of data science: the data lifecycle, statistics, machine learning at conceptual depth, data engineering basics, visualization, and the governance considerations specific to data products. Targeted at conceptual depth for non-practitioners.

Who Should Take This

Auditors, business analysts, IT generalists, and managers who work alongside data scientists and need a working vocabulary. Assumes basic computing and statistical literacy. Learners finish able to discuss data science projects with practitioners and recognize common patterns and pitfalls.

What's Included in AccelaStudy® AI

Adaptive Knowledge Graph
Practice Questions
Lesson Modules
Console Simulator Labs
Exam Tips & Strategy
13 Activity Formats

Course Outline

1Data Lifecycle
3 topics

Data Sources and Ingestion

  • Identify common data sources: transactional databases, APIs, log streams, file uploads, third-party feeds, sensors.
  • Identify ingestion patterns: batch, micro-batch, streaming, change data capture (CDC).

Data Cleaning and Preparation

  • Identify common data-quality problems: missing values, outliers, inconsistent formats, duplicate records, schema drift.
  • Apply data-cleaning techniques: imputation, outlier handling, deduplication, schema validation, normalization.
  • Analyze a data-quality scenario where 30% of a key column is missing and identify whether to drop the column, impute, or restructure the analysis.

Exploration and Feature Engineering

  • Identify exploratory data analysis (EDA) and identify common steps: distribution plots, correlation review, missingness inspection, target-distribution check.
  • Identify feature engineering as the creation of model-suitable inputs from raw data and identify common operations: encoding, scaling, aggregation, binning.
2Statistics
3 topics

Descriptive Statistics

  • Identify common descriptive statistics: mean, median, mode, standard deviation, variance, percentile.
  • Apply descriptive-statistics interpretation: identify when median is more informative than mean (skewed distributions, outliers).

Distributions and Sampling

  • Identify common distributions: normal, binomial, Poisson, exponential, log-normal — and identify a real-world example of each.
  • Identify sampling concepts: random sampling, stratified sampling, sampling bias, sample size.

Inference and Causality

  • Identify hypothesis testing concepts: null hypothesis, alternative hypothesis, p-value, significance level, statistical power.
  • Distinguish correlation and causation and identify common spurious-correlation pitfalls.
  • Analyze a 'correlation reported as causation' headline and identify the missing controlled experiment or causal inference required.
3Machine Learning
3 topics

Learning Paradigms

  • Distinguish supervised (labeled), unsupervised (unlabeled), and reinforcement learning and identify a representative use case for each.
  • Identify the typical supervised tasks (classification, regression) and unsupervised tasks (clustering, dimensionality reduction).

Common Algorithms

  • Identify linear regression, logistic regression, decision trees, random forests, gradient boosting, k-means, and neural networks at conceptual depth.
  • Apply algorithm-selection guidance: linear models for small interpretable problems, gradient boosting for tabular data, neural networks for unstructured data (text, images).

Deep Learning and Foundation Models

  • Identify deep learning as multi-layer neural networks and identify common architectures (CNN for images, RNN/Transformer for sequences).
  • Identify foundation models and LLMs as pretrained models that can be adapted to many downstream tasks.
4Model Evaluation
3 topics

Splitting and Cross-Validation

  • Identify train/test split, k-fold cross-validation, and identify the role of a held-out test set in unbiased evaluation.
  • Identify common pitfalls: target leakage, train-test contamination, ignoring time order in time-series data.

Classification Metrics

  • Identify accuracy, precision, recall, F1, ROC-AUC, and the confusion matrix.
  • Apply metric-selection guidance: precision when false positives are expensive, recall when false negatives are expensive, F1 when both matter.
  • Analyze a classifier with 99% accuracy on imbalanced data (1% positive class) and explain why accuracy is misleading.

Regression and Other Metrics

  • Identify common regression metrics: MAE, RMSE, MAPE, R² — and identify when each is appropriate.
  • Identify drift and degradation as the post-deployment failure modes that require ongoing monitoring.
5Data Engineering and Visualization
3 topics

Pipelines and Storage

  • Identify ETL/ELT pipelines and identify common tools: Airflow, dbt, Spark, Fivetran, AWS Glue.
  • Distinguish data warehouses (Snowflake, BigQuery, Redshift), data lakes (S3, GCS, ADLS), and lakehouses (Databricks, Iceberg, Delta).

Streaming and Real-Time

  • Identify streaming data systems: Kafka, Kinesis, Pub/Sub — and identify their typical use cases (CDC, telemetry, event-driven pipelines).
  • Apply batch-vs-streaming selection guidance: latency requirements, ordering needs, replay-ability.

Visualization

  • Identify common chart types and their appropriate use: bar (categorical comparison), line (trends over time), scatter (correlation), heatmap (matrix relationships), box (distribution comparison).
  • Analyze a misleading chart (truncated y-axis, dual axes, deceptive coloring) and propose a corrected version.
6Governance and Ethics
3 topics

Data Governance

  • Identify data governance topics: data ownership, stewardship, lineage, data dictionary, master-data management.
  • Identify the value of data lineage in audit and root-cause investigation.

Privacy and Sensitive Data

  • Identify privacy-by-design concepts: minimization, anonymization, pseudonymization, k-anonymity at conceptual depth.
  • Apply privacy-controls selection for a customer-analytics workload subject to GDPR and CCPA, balancing utility with minimization.

Model Ethics and Bias

  • Identify common sources of model bias: training-data imbalance, label bias, proxy variables, deployment-context drift.
  • Identify model-governance practices: model cards, evaluation across demographic slices, monitoring for drift, human-in-the-loop for high-stakes decisions.
  • Analyze a hiring-screening model that produced disparate outcomes across demographic groups and identify the diagnostic and remediation steps.
7Data Science in Practice
8 topics

Working with Stakeholders

  • Identify common stakeholder personas: business sponsor, subject-matter expert, data engineer, ML engineer, end-user, regulator.
  • Apply requirements-elicitation for a data-science project: convert vague business asks into measurable success criteria, identify the acceptable range of model errors.
  • Analyze a 'we just need a model that predicts churn' request and identify the missing scope (data access, labels, evaluation, deployment, monitoring).

Project Lifecycle and Reproducibility

  • Identify CRISP-DM and TDSP as standard data-science process frameworks and identify their typical phases.
  • Identify reproducibility concerns: random seeds, environment pinning, dataset versioning (DVC, LakeFS, Delta), experiment tracking (MLflow, Weights & Biases).
  • Apply experiment-tracking discipline so that 6 months later a stakeholder can answer 'how was this number produced?' without rerunning the world.

Productionizing Models

  • Identify deployment patterns: batch scoring, online inference, edge inference, embedded.
  • Identify MLOps concepts: CI/CD for models, model registry, A/B testing, canary deploy, shadow mode.
  • Apply 'shadow mode' deployment guidance for a high-stakes model: route requests to both old and new model, log differences, compare before cutover.

Generative AI in Data Science

  • Identify how foundation models change data-science practice: code generation, EDA assistants, synthetic data, data labeling, summarization of analytic results.
  • Apply generative AI safely in a data-science workflow: never paste sensitive data into public LLMs, use enterprise-licensed providers, validate generated outputs.

Data Science Career

  • Identify common data-science career paths: data analyst, data scientist, ML engineer, data engineer, applied scientist, research scientist.
  • Identify the spectrum of practitioner roles: business-analyst-leaning data scientist (SQL, viz, statistics) vs ML-engineer-leaning (production systems, MLOps).

Continuous Learning

  • Identify continuous-learning sources: arXiv (cs.LG, cs.CL, stat.ML), NeurIPS / ICML / ACL papers, fast.ai courses, Andrej Karpathy's series, Distill.pub archive.
  • Apply hands-on practice: Kaggle competitions, reproducing published results, contributing to open-source ML libraries.
  • Analyze the rapid-AI-evolution information overload and propose a focus strategy: pick depth areas, time-box exploration, accept that some advances you will not master.

Communicating Data Science

  • Identify common data-science communication failures: jargon overload, no business framing, 'all the metrics' instead of the relevant ones, ignoring uncertainty.
  • Apply audience-aware reporting: executive summary with one chart, technical appendix with full evaluation, code repo for reviewers.

Ethics in Practice

  • Identify common practical ethics scenarios: sensitive features in models, opt-out for users, transparency about AI involvement, dual-use concerns.
  • Apply structured ethical-review for a proposed model: who is affected, what failure modes exist, what redress is available, what oversight will be in place.

Scope

Included Topics

  • Data lifecycle: collection, ingestion, cleaning, exploration, modeling, deployment, monitoring.
  • Statistical concepts: descriptive vs inferential, distributions, hypothesis testing, correlation vs causation.
  • Machine learning concepts: supervised, unsupervised, reinforcement learning at conceptual depth.
  • Common algorithms: linear/logistic regression, decision trees, random forests, gradient boosting, neural networks.
  • Model evaluation: train/test split, cross-validation, accuracy/precision/recall/F1, ROC-AUC, confusion matrix.
  • Data engineering basics: pipelines, data warehouses, data lakes, lakehouses, streaming.
  • Visualization concepts and common chart types.
  • Data governance: quality, lineage, privacy, ethics, model governance.

Not Covered

  • Hands-on Python/R/SQL coding.
  • Deep mathematical foundations.

Data Science Fundamentals is coming soon

Adaptive learning that maps your knowledge and closes your gaps.

Create Free Account to Be Notified