Fabric Data Engineer
The DP-700 course teaches candidates how to design, build, and operate lakehouses, data pipelines, semantic models, and governance in Microsoft Fabric, ensuring scalable, secure, and performant data solutions.
Who Should Take This
Data engineers, BI developers, and analytics professionals with roughly one year of hands‑on experience building data solutions on Microsoft Fabric should pursue this certification. They seek to validate their ability to implement lakehouses, orchestrate pipelines, create semantic models, and enforce security and governance, advancing their career toward associate‑level expertise.
What's Covered
1
Creating and configuring Fabric lakehouses, implementing Delta Lake tables, managing data ingestion with shortcuts and pipelines, and organizing medallion architecture.
2
Implementing data transformations using Spark notebooks, dataflows Gen2, and data pipelines; orchestrating data processing workflows.
3
Monitoring Fabric workloads, optimizing Spark job performance, managing capacity and consumption, and implementing data quality checks.
Exam Structure
Question Types
- Multiple Choice
- Multiple Response
- Case Studies
Scoring Method
Scaled score 100-1000, passing score 700
Delivery Method
Proctored exam, 40-60 questions, 100 minutes
Prerequisites
None required. DP-900 and experience with Microsoft Fabric recommended.
Recertification
Renew annually via free Microsoft Learn renewal assessment
What's Included in AccelaStudy® AI
Course Outline
76 learning goals
1
Domain 1: Implement and Manage Lakehouses
3 topics
Implement and manage a lakehouse
- Identify the components of a Microsoft Fabric lakehouse including OneLake storage, the Tables section, the Files section, and the SQL analytics endpoint and describe how each component supports data engineering workflows.
- Describe the Delta Lake table format including transaction log, ACID transactions, time travel, and schema enforcement and explain how Delta Lake provides reliability guarantees for lakehouse tables.
- Identify supported data ingestion methods for a lakehouse including file upload, COPY INTO, Dataflow Gen2, pipelines, notebooks, and OneLake shortcuts and describe when each method is appropriate.
- Create a Fabric lakehouse, configure OneLake storage, ingest data using file upload and COPY INTO commands, and organize data across the Tables and Files sections for structured and unstructured content.
- Manage Delta Lake tables by creating managed and unmanaged tables, applying schema evolution with merge schema and overwrite schema options, and performing table maintenance operations including OPTIMIZE and VACUUM.
- Evaluate data ingestion strategies and determine the optimal approach among file upload, COPY INTO, shortcuts, pipelines, and Dataflow Gen2 based on data volume, frequency, source location, and transformation requirements.
Implement and manage data transformation
- Identify Spark notebook capabilities in Fabric including cell types, language support for PySpark and Spark SQL and Scala, magic commands, and notebook parameterization for reusable data transformation logic.
- Describe V-Order optimization and Z-Order indexing for Delta Lake tables and explain how each technique improves read performance through file-level sorting and column co-location strategies.
- Implement data transformations using PySpark DataFrames by applying select, filter, groupBy, join, withColumn, and window function operations to cleanse, reshape, and enrich lakehouse data.
- Implement data transformations using Spark SQL by writing SELECT statements with joins, aggregations, CTEs, and MERGE INTO operations to perform upserts and slowly changing dimension updates on Delta tables.
- Configure Delta Lake optimization by applying V-Order write optimization, Z-Order indexing on high-cardinality filter columns, and file compaction using OPTIMIZE to improve downstream query performance.
- Analyze transformation performance scenarios and determine the optimal combination of PySpark DataFrame operations, Spark SQL queries, V-Order, and Z-Order strategies based on data size, query patterns, and join characteristics.
Implement and manage lakehouse schemas
- Describe the medallion architecture including bronze, silver, and gold layers and explain how each layer progressively refines data quality from raw ingestion through cleansed to business-ready aggregated datasets.
- Identify star schema components including fact tables, dimension tables, surrogate keys, and conformed dimensions and describe how dimensional modeling supports analytical query patterns in a lakehouse.
- Implement a medallion architecture in a Fabric lakehouse by creating bronze tables for raw ingestion, silver tables with data cleansing and deduplication, and gold tables with business-level aggregations and dimensional models.
- Build star schema tables in a lakehouse by creating fact tables with foreign keys and measures, dimension tables with surrogate keys and descriptive attributes, and implementing slowly changing dimension patterns for historical tracking.
- Evaluate schema design tradeoffs between normalized and denormalized models, wide fact tables versus conformed dimensions, and medallion layer granularity to optimize a lakehouse schema for both query performance and data quality.
2
Domain 2: Implement and Manage Data Engineering with Microsoft Fabric
3 topics
Implement and manage dataflows
- Identify Dataflow Gen2 capabilities including Power Query Online editor, M language transformations, data source connectors, and data destinations and describe how Dataflow Gen2 differs from traditional Power BI dataflows.
- Create Dataflow Gen2 transformations using the Power Query Online editor to connect to data sources, apply column transformations, merge and append queries, and configure data destinations to lakehouse tables or warehouse tables.
- Evaluate incremental refresh strategies for Dataflow Gen2 by comparing range parameter configurations, refresh policies with detection of data changes, and append-only versus partition-based approaches to optimize refresh performance for large datasets.
- Analyze dataflow design decisions and evaluate the tradeoffs between Dataflow Gen2 and Spark notebooks for data transformation scenarios based on data volume, transformation complexity, developer skill set, and refresh frequency requirements.
Implement and manage data pipelines
- Identify Fabric data pipeline components including activities, parameters, variables, expressions, control flow constructs, and triggers and describe how pipelines orchestrate data movement and transformation workflows.
- Differentiate pipeline activity types including Copy Data, Dataflow, Notebook, Stored Procedure, ForEach, If Condition, and Web activities and determine when each activity type is appropriate for data orchestration scenarios.
- Build data pipelines by adding Copy Data and Notebook activities, configuring activity parameters and dependencies, implementing ForEach and If Condition control flow, and connecting activities with success and failure dependency paths.
- Configure pipeline scheduling using triggers with tumbling window, schedule, and event-based patterns and implement pipeline parameters with dynamic expressions for reusable and parameterized orchestration workflows.
- Implement error handling in pipelines by configuring retry policies, timeout settings, failure dependency paths, and alert notifications to build resilient data orchestration workflows that recover gracefully from transient failures.
- Analyze pipeline orchestration scenarios and evaluate the tradeoffs among scheduling patterns, error handling strategies, and activity sequencing to design resilient multi-step data engineering workflows for production environments.
Implement and manage Spark jobs
- Identify Spark job definition components in Fabric including main definition file, reference files, command-line arguments, Spark configuration settings, and lakehouse references and describe how job definitions enable scheduled Spark execution.
- Create and configure Spark job definitions by specifying the main Python or JAR file, setting command-line arguments, attaching lakehouse references, and adding custom library dependencies for production-ready batch processing.
- Configure Spark session settings including executor and driver memory, core allocation, dynamic allocation, and custom Spark properties to optimize resource utilization for data engineering workloads of varying size and complexity.
- Analyze Spark job execution by interpreting Spark UI stage and task metrics, evaluating resource utilization in the Fabric monitoring hub, and diagnosing common failures including out-of-memory errors and data skew to determine corrective actions.
- Analyze Spark job performance scenarios and evaluate session configuration tradeoffs among memory allocation, executor count, and parallelism settings to optimize throughput and resource efficiency for large-scale data processing.
3
Domain 3: Implement and Manage Semantic Models
2 topics
Design and build semantic models
- Identify semantic model components in Microsoft Fabric including tables, columns, relationships, hierarchies, calculated columns, measures, and calculation groups and describe how semantic models serve as the analytical layer above lakehouse data.
- Describe relationship types in semantic models including one-to-many, many-to-many, and bi-directional cross-filtering and explain how relationship cardinality and filter direction affect DAX query behavior and measure calculations.
- Create a semantic model by connecting to lakehouse tables, defining relationships between fact and dimension tables, configuring table properties, and organizing columns into display folders for a user-friendly analytical experience.
- Implement DAX measures and calculated columns by writing SUM, AVERAGE, CALCULATE, FILTER, ALL, and time intelligence functions to create business metrics that respond to slicer and filter context in analytical reports.
- Configure calculation groups and format strings to standardize measure calculations across time intelligence patterns, currency conversions, and percentage variations without duplicating individual measures for each variation.
- Evaluate semantic model design decisions and compare relationship patterns, measure implementation approaches, and calculation group strategies to determine the optimal model structure for complex analytical requirements.
Optimize semantic models
- Identify semantic model optimization techniques including aggregation tables, user-defined aggregations, automatic aggregations, and composite models and describe how each technique reduces query latency for large datasets.
- Configure incremental refresh for semantic models by defining range parameters, setting refresh and archive policies, and enabling detect data changes to minimize refresh duration and capacity consumption for large fact tables.
- Implement aggregation tables by creating pre-aggregated summary tables, configuring aggregation mappings to detail tables, and setting storage mode to optimize query performance for high-granularity datasets.
- Analyze DAX query performance using Performance Analyzer and DAX Studio to identify storage engine and formula engine bottlenecks, evaluate query execution patterns, and determine optimization strategies including variable usage and measure refactoring.
- Analyze semantic model performance metrics and evaluate the tradeoffs among import mode, DirectQuery, composite models, and aggregation strategies to determine the optimal configuration for datasets balancing refresh latency and query speed.
4
Domain 4: Monitor and Optimize Data Engineering Solutions
3 topics
Monitor Fabric capacity and workloads
- Identify Fabric capacity concepts including capacity units, SKU tiers, throttling behavior, smoothing, bursting, and the capacity metrics app and describe how capacity management affects data engineering workload performance.
- Configure autoscale behavior for Fabric capacity by setting scale-up triggers, cooldown periods, and maximum capacity limits and manage variable workload demand patterns while controlling cost implications.
- Configure the Fabric capacity metrics app to monitor capacity utilization, identify throttled workloads, and track compute consumption trends across workspaces and item types for proactive capacity planning.
- Manage workload distribution across Fabric capacity by configuring workspace assignments, scheduling heavy workloads during off-peak hours, and implementing workload management strategies to avoid throttling and performance degradation.
- Analyze capacity utilization patterns and evaluate workload scheduling, autoscale configuration, and capacity SKU selection to optimize the balance between performance requirements and cost for multi-team Fabric environments.
Optimize data processing
- Identify data partitioning strategies for lakehouse tables including date-based partitioning, hash partitioning, and partition pruning and describe how partitioning reduces data scanned during query execution.
- Configure caching mechanisms in Fabric including Spark result caching, Delta table caching, and SQL analytics endpoint caching to reduce compute consumption and improve response times for repeated query patterns across lakehouse workloads.
- Implement table partitioning strategies by creating partitioned Delta tables, configuring partition columns based on query filter patterns, and applying partition pruning to reduce data scan volume for time-series and categorical query workloads.
- Optimize Spark job performance by tuning shuffle partition count, broadcast join thresholds, adaptive query execution settings, and data serialization formats to minimize job execution time and resource consumption.
- Evaluate SQL analytics endpoint query performance by analyzing query execution plans, identifying table scan and join bottlenecks, and determining the appropriate statistics updates and indexing strategies to improve query response times.
- Analyze data processing optimization scenarios and evaluate the tradeoffs among partitioning strategies, caching configurations, Spark tuning parameters, and file compaction schedules to design a holistic performance optimization plan for a lakehouse workload.
Implement data quality monitoring
- Identify data quality monitoring capabilities in Fabric including data profiling, column statistics, null detection, distribution analysis, and data lineage tracing and describe how each capability supports data quality governance.
- Implement data validation rules in notebooks and Dataflow Gen2 by applying null checks, range validations, referential integrity checks, and duplicate detection to identify and quarantine data quality issues during ingestion and transformation.
- Configure error handling and dead-letter patterns for data quality failures by routing invalid records to error tables, logging quality metrics, and implementing automated alerting for data quality threshold violations.
- Implement data lineage tracking using Fabric lineage views and Purview integration to trace data flow from source through lakehouse transformations to semantic model consumption for impact analysis and root cause investigation.
- Assess data quality monitoring strategies and determine the appropriate combination of validation rules, profiling checks, lineage tracking, and alerting thresholds to maintain data integrity across a multi-layer lakehouse architecture.
5
Domain 5: Implement and Manage Data Security and Governance
3 topics
Implement data security
- Identify Fabric workspace roles including Admin, Member, Contributor, and Viewer and describe how each role grants permissions for managing workspace items, data access, and sharing capabilities.
- Differentiate item-level permissions, row-level security, column-level security, object-level security, and sensitivity labels in Fabric and determine how each mechanism provides granular data access control beyond workspace roles.
- Configure workspace roles by assigning users and security groups to appropriate roles and manage item-level permissions by granting read, write, and reshare access to specific lakehouse, warehouse, and semantic model items.
- Implement row-level security by creating DAX security roles with filter expressions, assigning users to roles, and testing RLS behavior to ensure users only access authorized data rows in semantic models and SQL analytics endpoints.
- Configure column-level security and sensitivity labels by restricting access to sensitive columns in warehouse tables and applying Microsoft Information Protection sensitivity labels to classify and protect data assets according to organizational policy.
- Analyze data security scenarios and evaluate the tradeoffs among workspace roles, item permissions, row-level security, column-level security, and sensitivity labels to design a layered security model that satisfies organizational compliance and least-privilege requirements.
Implement data governance
- Identify Microsoft Purview integration capabilities with Fabric including data catalog, data lineage, data classification, and endorsement and describe how Purview provides centralized governance for the Fabric data estate.
- Manage endorsement levels in Fabric by applying promoted and certified badges to data assets and configuring endorsement workflows to communicate data asset quality, trustworthiness, and readiness for production consumption across the organization.
- Configure Purview integration by registering Fabric workspaces, scanning data assets for classification, and reviewing data catalog entries to enable organization-wide data discovery and governance for lakehouse and semantic model assets.
- Manage data asset endorsement by promoting lakehouse tables and semantic models to promoted or certified status and configuring endorsement policies to communicate data quality and reliability to downstream consumers.
- Evaluate data governance strategies and determine the appropriate combination of Purview integration, endorsement policies, data classification, and lineage tracking to establish comprehensive data governance for a multi-workspace Fabric deployment.
Manage data lifecycle
- Identify OneLake shortcut types including internal shortcuts to other Fabric lakehouses and external shortcuts to ADLS Gen2, Amazon S3, and Google Cloud Storage and describe how shortcuts enable cross-source data access without data duplication.
- Configure database mirroring in Fabric by selecting source databases from Azure SQL Database, Azure Cosmos DB, or Snowflake and managing near-real-time replication of external data into OneLake for analytical consumption.
- Create OneLake shortcuts to external data sources by configuring connection credentials, specifying source paths, and validating data access to enable cross-cloud and cross-workspace data federation without copying data into the lakehouse.
- Manage mirrored database monitoring by tracking replication latency, validating data freshness against service-level objectives, and troubleshooting replication failures for mirrored tables in OneLake analytical workloads.
- Evaluate data retention and archival strategies by comparing Delta Lake VACUUM schedules, table versioning history depths, and lifecycle policy configurations to determine the optimal balance between storage cost, data recovery capability, and compliance requirements.
- Analyze data lifecycle scenarios and evaluate the tradeoffs among OneLake shortcuts, database mirroring, direct data ingestion, and archival strategies to design a data lifecycle management plan that optimizes cost, freshness, and compliance.
Hands-On Labs
Practice in a simulated cloud console or Python code sandbox — no account needed. Each lab runs entirely in your browser.
Certification Benefits
Salary Impact
Related Job Roles
Industry Recognition
Microsoft Azure certifications are among the most valued in enterprise IT, with Microsoft holding the second-largest cloud market share globally and serving as the dominant platform in enterprise and hybrid cloud environments.
Scope
Included Topics
- All domains and task statements in the Microsoft Fabric Data Engineer Associate (DP-700) exam guide: Domain 1 Implement and manage lakehouses (25-30%), Domain 2 Implement and manage data engineering with Microsoft Fabric (20-25%), Domain 3 Implement and manage semantic models (15-20%), Domain 4 Monitor and optimize data engineering solutions (15-20%), Domain 5 Implement and manage data security and governance (10-15%).
- Foundational to intermediate data engineering skills for Microsoft Fabric workloads, including lakehouse creation, Delta Lake table management, Spark notebook development, DataFrame transformations, medallion architecture design, Dataflow Gen2 development, pipeline orchestration, semantic model creation with DAX, capacity monitoring, query optimization, data security, and Purview governance integration.
- Service-selection and configuration reasoning for common Fabric data engineering scenarios requiring hands-on provisioning, data ingestion, transformation pipeline development, semantic model optimization, and security configuration within the Microsoft Fabric unified analytics platform.
- Key Microsoft Fabric components for data engineers: Lakehouse, Warehouse, Notebooks (PySpark/Spark SQL), DataFrames, Delta Lake, V-Order, Z-Order, medallion architecture, star schemas, Dataflow Gen2, Power Query, data pipelines, pipeline activities, Spark job definitions, semantic models, DAX, relationships, aggregations, incremental refresh, Fabric capacity, capacity metrics app, workspace roles, item permissions, row-level security, column-level security, sensitivity labels, Microsoft Purview, OneLake, OneLake shortcuts, mirroring, data lineage.
Not Covered
- Implementation detail depth expected only for Microsoft Fabric advanced or specialty scenarios such as custom Spark pool configuration at the infrastructure level or Azure Synapse Analytics dedicated SQL pool administration.
- Low-level Spark internals, custom Spark connector development, and advanced distributed systems programming beyond what DP-700 expects of a data engineer.
- Current list prices, promotional discounts, and region-specific pricing values for Microsoft Fabric capacity units that change frequently over time.
- Power BI report and dashboard creation beyond what is required for semantic model optimization, including advanced Power BI visual design, paginated reports, and Power BI embedded scenarios.
- Azure services and features not emphasized by the DP-700 exam guide, including Azure Synapse Analytics dedicated SQL pools, Azure Data Factory standalone, Azure Databricks standalone, and Azure HDInsight.
Official Exam Page
Learn more at Microsoft Azure
Ready to master DP-700?
Adaptive learning that maps your knowledge and closes your gaps.
Subscribe to Access