Microsoft DP-203 Exam (page: 14)
Microsoft Data Engineering on Azure
Updated on: 28-Feb-2026

Viewing Page 14 of 75

You need to implement a Type 3 slowly changing dimension (SCD) for product category data in an Azure Synapse Analytics dedicated SQL pool.

You have a table that was created by using the following Transact-SQL statement.


Which two columns should you add to the table? Each correct answer presents part of the solution.
NOTE: Each correct selection is worth one point.






Answer(s): B,E

Explanation:

A Type 3 SCD supports storing two versions of a dimension member as separate columns. The table includes a column for the current value of a member plus either the original or previous value of the member. So Type 3 uses additional columns to track one key instance of history, rather than storing additional rows to track each change like in a Type 2 SCD.
This type of tracking may be used for one or two columns in a dimension table. It is not common to use it for many members of the same table. It is often used in combination with Type 1 or Type 2 members.


Reference:

https://k21academy.com/microsoft-azure/azure-data-engineer-dp203-q-a-day-2-live-session-review/



DRAG DROP (Drag and Drop is not supported)
You have an Azure subscription.

You plan to build a data warehouse in an Azure Synapse Analytics dedicated SQL pool named pool1 that will contain staging tables and a dimensional model. Pool1 will contain the following tables.


You need to design the table storage for pool1. The solution must meet the following requirements:

-Maximize the performance of data loading operations to Staging.WebSessions.
-Minimize query times for reporting queries against the dimensional model.

Which type of table distribution should you use for each table? To answer, drag the appropriate table distribution types to the correct tables. Each table distribution type may be used once, more than once, or not at all. You may need to drag the split bar between panes or scroll to view content.

NOTE: Each correct selection is worth one point.

  1. See Explanation section for answer.

Answer(s): A

Explanation:



Box 1: Replicated
The best table storage option for a small table is to replicate it across all the Compute nodes.

Box 2: Hash
Hash-distribution improves query performance on large fact tables.

Box 3: Round-robin
Round-robin distribution is useful for improving loading speed.


Reference:

https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-distribute



HOTSPOT (Drag and Drop is not supported)
You have an Azure Synapse Analytics dedicated SQL pool.

You need to create a table named FactInternetSales that will be a large fact table in a dimensional model. FactInternetSales will contain 100 million rows and two columns named SalesAmount and OrderQuantity. Queries executed on FactInternetSales will aggregate the values in SalesAmount and OrderQuantity from the last year for a specific product. The solution must minimize the data size and query execution time.

How should you complete the code? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.

  1. See Explanation section for answer.

Answer(s): A

Explanation:



Box 1: (CLUSTERED COLUMNSTORE INDEX
CLUSTERED COLUMNSTORE INDEX
Columnstore indexes are the standard for storing and querying large data warehousing fact tables. This index uses column-based data storage and query processing to achieve gains up to 10 times the query performance in your data warehouse over traditional row-oriented storage. You can also achieve gains up to 10 times the data compression over the uncompressed data size. Beginning with SQL Server 2016 (13.x) SP1, columnstore indexes enable operational analytics: the ability to run performant real-time analytics on a transactional workload.

Note: Clustered columnstore index
A clustered columnstore index is the physical storage for the entire table.


To reduce fragmentation of the column segments and improve performance, the columnstore index might store some data temporarily into a clustered index called a deltastore and a B-tree list of IDs for deleted rows. The deltastore operations are handled behind the scenes. To return the correct query results, the clustered columnstore index combines query results from both the columnstore and the deltastore.

Box 2: HASH([ProductKey])
A hash distributed table distributes rows based on the value in the distribution column. A hash distributed table is designed to achieve high performance for queries on large tables.

Choose a distribution column with data that distributes evenly

Incorrect:
* Not HASH([OrderDateKey]). Is not a date column. All data for the same date lands in the same distribution. If several users are all filtering on the same date, then only 1 of the 60 distributions do all the processing work

* A replicated table has a full copy of the table available on every Compute node. Queries run fast on replicated tables since joins on replicated tables don't require data movement. Replication requires extra storage, though, and isn't practical for large tables.

* A round-robin table distributes table rows evenly across all distributions. The rows are distributed randomly. Loading data into a round-robin table is fast. Keep in mind that queries can require more data movement than the other distribution methods.


Reference:

https://docs.microsoft.com/en-us/sql/relational-databases/indexes/columnstore-indexes-overview
https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-overview
https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-distribute



You have an Azure Synapse Analytics dedicated SQL pool that contains a table named Table1. Table1 contains the following:

-One billion rows
-A clustered columnstore index
-A hash-distributed column named Product Key
-A column named Sales Date that is of the date data type and cannot be null

Thirty million rows will be added to Table1 each month.

You need to partition Table1 based on the Sales Date column. The solution must optimize query performance and data loading.

How often should you create a partition?

  1. once per month
  2. once per year
  3. once per day
  4. once per week

Answer(s): B

Explanation:

Need a minimum 1 million rows per distribution. Each table is 60 distributions. 30 millions rows is added each month. Need 2 months to get a minimum of 1 million rows per distribution in a new partition.

Note: When creating partitions on clustered columnstore tables, it is important to consider how many rows belong to each partition. For optimal compression and performance of clustered columnstore tables, a minimum of 1 million rows per distribution and partition is needed. Before partitions are created, dedicated SQL pool already divides each table into 60 distributions.

Any partitioning added to a table is in addition to the distributions created behind the scenes. Using this example, if the sales fact table contained 36 monthly partitions, and given that a dedicated SQL pool has 60 distributions, then the sales fact table should contain 60 million rows per month, or 2.1 billion rows when all months are populated. If a table contains fewer than the recommended minimum number of rows per partition, consider using fewer partitions in order to increase the number of rows per partition.


Reference:

https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-partition



You have an Azure Databricks workspace that contains a Delta Lake dimension table named Table1.

Table1 is a Type 2 slowly changing dimension (SCD) table.
You need to apply updates from a source table to Table1.

Which Apache Spark SQL operation should you use?

  1. CREATE
  2. UPDATE
  3. ALTER
  4. MERGE

Answer(s): D

Explanation:

The Delta provides the ability to infer the schema for data input which further reduces the effort required in managing the schema changes. The Slowly Changing Data(SCD) Type 2 records all the changes made to each key in the dimensional table. These operations require updating the existing rows to mark the previous values of the keys as old and then inserting new rows as the latest values. Also, Given a source table with the updates and the target table with dimensional data, SCD Type 2 can be expressed with the merge.

Example:
// Implementing SCD Type 2 operation using merge function
customersTable
.as("customers")
.merge(
stagedUpdates.as("staged_updates"),
"customers.customerId = mergeKey")
.whenMatched("customers.current = true AND customers.address <> staged_updates.address")
.updateExpr(Map(
"current" -> "false",
"endDate" -> "staged_updates.effectiveDate"))
.whenNotMatched()
.insertExpr(Map(
"customerid" -> "staged_updates.customerId",
"address" -> "staged_updates.address",
"current" -> "true",
"effectiveDate" -> "staged_updates.effectiveDate",
"endDate" -> "null"))
.execute()
}


Reference:

https://www.projectpro.io/recipes/what-is-slowly-changing-data-scd-type-2-operation-delta-table-databricks



Viewing Page 14 of 75



Share your comments for Microsoft DP-203 exam with other users:

ethiopia 8/2/2023 2:18:00 AM

seems good..
ETHIOPIA


whoAreWeReally 12/19/2023 8:29:00 PM

took the test last week, i did have about 15 - 20 word for word from this site on the test. (only was able to cram 600 of the questions from this site so maybe more were there i didnt review) had 4 labs, bgp, lacp, vrf with tunnels and actually had to skip a lab due to time. lots of automation syntax questions.
EUROPEAN UNION


vs 9/2/2023 12:19:00 PM

no comments
Anonymous


john adenu 11/14/2023 11:02:00 AM

nice questions bring out the best in you.
Anonymous


Osman 11/21/2023 2:27:00 PM

really helpful
Anonymous


Edward 9/13/2023 5:27:00 PM

question #50 and question #81 are exactly the same questions, azure site recovery provides________for virtual machines. the first says that it is fault tolerance is the answer and second says disater recovery. from my research, it says it should be disaster recovery. can anybody explain to me why? thank you
CANADA


Monti 5/24/2023 11:14:00 PM

iam thankful for these exam dumps questions, i would not have passed without this exam dumps.
UNITED STATES


Anon 10/25/2023 10:48:00 PM

some of the answers seem to be inaccurate. q10 for example shouldnt it be an m custom column?
MALAYSIA


PeterPan 10/18/2023 10:22:00 AM

are the question real or fake?
Anonymous


CW 7/11/2023 3:19:00 PM

thank you for providing such assistance.
UNITED STATES


Mn8300 11/9/2023 8:53:00 AM

nice questions
Anonymous


Nico 4/23/2023 11:41:00 PM

my 3rd purcahse from this site. these exam dumps are helpful. very helpful.
ITALY


Chere 9/15/2023 4:21:00 AM

found it good
Anonymous


Thembelani 5/30/2023 2:47:00 AM

excellent material
Anonymous


vinesh phale 9/11/2023 2:51:00 AM

very helpfull
UNITED STATES


Bhagiii 11/4/2023 7:04:00 AM

well explained.
Anonymous


Rahul 8/8/2023 9:40:00 PM

i need the pdf, please.
CANADA


CW 7/11/2023 2:51:00 PM

a good source for exam preparation
UNITED STATES


Anchal 10/23/2023 4:01:00 PM

nice questions
INDIA


J Nunes 9/29/2023 8:19:00 AM

i need ielts general training audio guide questions
BRAZIL


Ananya 9/14/2023 5:16:00 AM

please make this content available
UNITED STATES


Swathi 6/4/2023 2:18:00 PM

content is good
Anonymous


Leo 7/29/2023 8:45:00 AM

latest dumps please
INDIA


Laolu 2/15/2023 11:04:00 PM

aside from pdf the test engine software is helpful. the interface is user-friendly and intuitive, making it easy to navigate and find the questions.
UNITED STATES


Zaynik 9/17/2023 5:36:00 AM

questions and options are correct, but the answers are wrong sometimes. so please check twice or refer some other platform for the right answer
Anonymous


Massam 6/11/2022 5:55:00 PM

90% of questions was there but i failed the exam, i marked the answers as per the guide but looks like they are not accurate , if not i would have passed the exam given that i saw about 45 of 50 questions from dump
Anonymous


Anonymous 12/27/2023 12:47:00 AM

answer to this question "what administrative safeguards should be implemented to protect the collected data while in use by manasa and her product management team? " it should be (c) for the following reasons: this administrative safeguard involves controlling access to collected data by ensuring that only individuals who need the data for their job responsibilities have access to it. this helps minimize the risk of unauthorized access and potential misuse of sensitive information. while other options such as (a) documenting data flows and (b) conducting a privacy impact assessment (pia) are important steps in data protection, implementing a "need to know" access policy directly addresses the issue of protecting data while in use by limiting access to those who require it for legitimate purposes. (d) is not directly related to safeguarding data during use; it focuses on data transfers and location.
INDIA


Japles 5/23/2023 9:46:00 PM

password lockout being the correct answer for question 37 does not make sense. it should be geofencing.
Anonymous


Faritha 8/10/2023 6:00:00 PM

for question 4, the righr answer is :recover automatically from failures
UNITED STATES


Anonymous 9/14/2023 4:27:00 AM

question number 4s answer is 3, option c. i
UNITED STATES


p das 12/7/2023 11:41:00 PM

very good questions
UNITED STATES


Anna 1/5/2024 1:12:00 AM

i am confused about the answers to the questions. are the answers correct?
KOREA REPUBLIC OF


Bhavya 9/13/2023 10:15:00 AM

very usefull
Anonymous


Rahul Kumar 8/31/2023 12:30:00 PM

need certification.
CANADA