Microsoft DP-203 Exam (page: 9)
Microsoft Data Engineering on Azure
Updated on: 01-Aug-2025

Viewing Page 9 of 75

You are designing a slowly changing dimension (SCD) for supplier data in an Azure Synapse Analytics dedicated SQL pool.

You plan to keep a record of changes to the available fields.
The supplier data contains the following columns.


Which three additional columns should you add to the data to create a Type 2 SCD? Each correct answer presents part of the solution.
NOTE: Each correct selection is worth one point.

  1. surrogate primary key
  2. effective start date
  3. business key
  4. last modified date
  5. effective end date
  6. foreign key

Answer(s): A,B,E



HOTSPOT (Drag and Drop is not supported)
You have a Microsoft SQL Server database that uses a third normal form schema.
You plan to migrate the data in the database to a star schema in an Azure Synapse Analytics dedicated SQL pool.
You need to design the dimension tables. The solution must optimize read operations.

What should you include in the solution? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point.
Hot Area:

  1. See Explanation section for answer.

Answer(s): A

Explanation:



Box 1: Denormalize to a second normal form
Denormalization is the process of transforming higher normal forms to lower normal forms via storing the join of higher normal form relations as a base relation. Denormalization increases the performance in data retrieval at cost of bringing update anomalies to a database.

Box 2: New identity columns
The collapsing relations strategy can be used in this step to collapse classification entities into component entities to obtain flat dimension tables with single-part keys that connect directly to the fact table. The singlepart key is a surrogate key generated to ensure it remains unique over time.
Example:


Note: A surrogate key on a table is a column with a unique identifier for each row. The key is not generated from the table data. Data modelers like to create surrogate keys on their tables when they design data warehouse models. You can use the IDENTITY property to achieve this goal simply and effectively without affecting load performance.


Reference:

https://www.mssqltips.com/sqlservertip/5614/explore-the-role-of-normal-forms-in-dimensional-modeling/ https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tablesidentity



HOTSPOT (Drag and Drop is not supported)
You plan to develop a dataset named Purchases by using Azure Databricks. Purchases will contain the following columns:
-ProductID
-ItemPrice
-LineTotal
-Quantity
-StoreID
-Minute
-Month
-Hour
-Year
-Day
You need to store the data to support hourly incremental load pipelines that will vary for each Store ID. The solution must minimize storage costs.

How should you complete the code? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point.
Hot Area:

  1. See Explanation section for answer.

Answer(s): A

Explanation:



Box 1: partitionBy
We should overwrite at the partition level.

Example:
df.write.partitionBy("y","m","d")
.mode(SaveMode.Append)
.parquet("/data/hive/warehouse/db_name.db/" + tableName)

Box 2: ("StoreID", "Year", "Month", "Day", "Hour", "StoreID")

Box 3: parquet("/Purchases")


Reference:

https://intellipaat.com/community/11744/how-to-partition-and-write-dataframe-in-spark-without-deletingpartitions-with-no-new-data



You are designing a partition strategy for a fact table in an Azure Synapse Analytics dedicated SQL pool. The table has the following specifications:
-Contain sales data for 20,000 products.
-Use hash distribution on a column named ProductID.
-Contain 2.4 billion records for the years 2019 and 2020.

Which number of partition ranges provides optimal compression and performance for the clustered columnstore index?

  1. 40
  2. 240
  3. 400
  4. 2,400

Answer(s): A

Explanation:

Each partition should have around 1 millions records. Dedication SQL pools already have 60 partitions. We have the formula: Records/(Partitions*60)= 1 million
Partitions= Records/(1 million * 60)
Partitions= 2.4 x 1,000,000,000/(1,000,000 * 60) = 40

Note: Having too many partitions can reduce the effectiveness of clustered columnstore indexes if each partition has fewer than 1 million rows. Dedicated SQL pools automatically partition your data into 60 databases. So, if you create a table with 100 partitions, the result will be 6000 partitions.


Reference:

https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/best-practices-dedicated-sql-pool



HOTSPOT (Drag and Drop is not supported)
You are creating dimensions for a data warehouse in an Azure Synapse Analytics dedicated SQL pool. You create a table by using the Transact-SQL statement shown in the following exhibit.


Use the drop-down menus to select the answer choice that completes each statement based on the information presented in the graphic.

NOTE: Each correct selection is worth one point.
Hot Area:

  1. See Explanation section for answer.

Answer(s): A

Explanation:



Box 1: Type 2
A Type 2 SCD supports versioning of dimension members. Often the source system doesn't store versions, so the data warehouse load process detects and manages changes in a dimension table. In this case, the dimension table must use a surrogate key to provide a unique reference to a version of the dimension member. It also includes columns that define the date range validity of the version (for example, StartDate and EndDate) and possibly a flag column (for example, IsCurrent) to easily filter by current dimension members.

Incorrect Answers:
A Type 1 SCD always reflects the latest values, and when changes in source data are detected, the dimension table data is overwritten.

Box 2: a business key
A business key or natural key is an index which identifies uniqueness of a row based on columns that exist naturally in a table according to businessrules. For example business keys are customer code in a customer table, composite of sales order header number and sales order item line number within a sales order details table.


Reference:

https://docs.microsoft.com/en-us/learn/modules/populate-slowly-changing-dimensions-azure-synapse-analytics-pipelines/3-choose-between-dimension-types



Viewing Page 9 of 75



Share your comments for Microsoft DP-203 exam with other users:

SB 7/21/2023 3:18:00 AM

please provide the dumps
UNITED STATES


David 8/2/2023 8:20:00 AM

it is amazing
Anonymous


User 8/3/2023 3:32:00 AM

quesion 178 about "a banking system that predicts whether a loan will be repaid is an example of the" the answer is classification. not regresion, you should fix it.
EUROPEAN UNION


quen 7/26/2023 10:39:00 AM

please upload apache spark dumps
Anonymous


Erineo 11/2/2023 5:34:00 PM

q14 is b&c to reduce you will switch off mail for every single alert and you will switch on daily digest to get a mail once per day, you might even skip the empty digest mail but i see this as a part of the daily digest adjustment
Anonymous


Paul 10/21/2023 8:25:00 AM

i think it is good question
Anonymous


Unknown 8/15/2023 5:09:00 AM

good for students who wish to give certification.
INDIA


Ch 11/20/2023 10:56:00 PM

is there a google drive link to the images? the links in questions are not working.
AUSTRALIA


Joey 5/16/2023 5:25:00 AM

very promising, looks great, so much wow!
Anonymous


alaska 10/24/2023 5:48:00 AM

i scored 87% on the az-204 exam. thanks! i always trust
GERMANY


nnn 7/9/2023 11:09:00 PM

good need more
Anonymous


User-sfdc 12/29/2023 7:21:00 AM

sample questions seems good
Anonymous


Tamer dam 8/4/2023 10:21:00 AM

huawei is ok
UNITED STATES


YK 12/11/2023 1:10:00 AM

good one nice
JAPAN


de 8/28/2023 2:38:00 AM

please continue
GERMANY


DMZ 6/25/2023 11:56:00 PM

this exam dumps just did the job. i donot want to ruffle your feathers but your exam dumps and mock test engine is amazing.
UNITED KINGDOM


Jose 8/30/2023 6:14:00 AM

nice questions
PORTUGAL


Tar01 7/24/2023 7:07:00 PM

the explanation are really helpful
Anonymous


DaveG 12/15/2023 4:50:00 PM

just passed my exam yesterday on my first attempt. these dumps were extremely helpful in passing first time. the questions were very, very similar to these questions!
Anonymous


A.K. 6/30/2023 6:34:00 AM

cosmos db is paas not saas
Anonymous


S Roychowdhury 6/26/2023 5:27:00 PM

what is the percentage of common questions in gcp exam compared to 197 dump questions? are they 100% matching with real gcp exam?
Anonymous


Bella 7/22/2023 2:05:00 AM

not able to see questions
Anonymous


Scott 9/8/2023 7:19:00 AM

by far one of the best sites for free questions. i have pass 2 exams with the help of this website.
CANADA


donald 8/19/2023 11:05:00 AM

excellent question bank.
Anonymous


Ashwini 8/22/2023 5:13:00 AM

it really helped
Anonymous


sk 5/13/2023 2:07:00 AM

excelent material
INDIA


Christopher 9/5/2022 10:54:00 PM

the new versoin of this exam which i downloaded has all the latest questions from the exam. i only saw 3 new questions in the exam which was not in this dump.
CANADA


Sam 9/7/2023 6:51:00 AM

question 8 - can cloudtrail be used for storing jobs? based on aws - aws cloudtrail is used for governance, compliance and investigating api usage across all of our aws accounts. every action that is taken by a user or script is an api call so this is logged to [aws] cloudtrail. something seems incorrect here.
UNITED STATES


Tanvi Rajput 8/14/2023 10:55:00 AM

question 13 tda - c01 answer : quick table calculation -> percentage of total , compute using table down
UNITED KINGDOM


PMSAGAR 9/19/2023 2:48:00 AM

pls share teh dump
UNITED STATES


zazza 6/16/2023 10:47:00 AM

question 44 answer is user risk
ITALY


Prasana 6/23/2023 1:59:00 AM

please post the questions for preparation
Anonymous


test user 9/24/2023 3:15:00 AM

thanks for the questions
AUSTRALIA


Draco 7/19/2023 5:34:00 AM

please reopen it now ..its really urgent
UNITED STATES