Microsoft DP-203 Exam (page: 15)
Microsoft Data Engineering on Azure
Updated on: 28-Feb-2026

Viewing Page 15 of 75

You are designing an Azure Data Lake Storage solution that will transform raw JSON files for use in an analytical workload.

You need to recommend a format for the transformed files. The solution must meet the following requirements:

-Contain information about the data types of each column in the files.
-Support querying a subset of columns in the files.
-Support read-heavy analytical workloads.
-Minimize the file size.

What should you recommend?

  1. JSON
  2. CSV
  3. Apache Avro
  4. Apache Parquet

Answer(s): D

Explanation:

Parquet, an open-source file format for Hadoop, stores nested data structures in a flat columnar format.

Compared to a traditional approach where data is stored in a row-oriented approach, Parquet file format is more efficient in terms of storage and performance.

It is especially good for queries that read particular columns from a “wide” (with many columns) table since only needed columns are read, and IO is minimized.

Incorrect:
Not C:
The Avro format is the ideal candidate for storing data in a data lake landing zone because:

1. Data from the landing zone is usually read as a whole for further processing by downstream systems (the row-based format is more efficient in this case).

2. Downstream systems can easily retrieve table schemas from Avro files (there is no need to store the schemas separately in an external meta store).

3. Any source schema change is easily handled (schema evolution).


Reference:

https://www.clairvoyant.ai/blog/big-data-file-formats



Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.

After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.

You have an Azure Storage account that contains 100 GB of files. The files contain rows of text and numerical values. 75% of the rows contain description data that has an average length of 1.1 MB.

You plan to copy the data from the storage account to an enterprise data warehouse in Azure Synapse Analytics.

You need to prepare the files to ensure that the data copies quickly.

Solution: You modify the files to ensure that each row is less than 1 MB.
Does this meet the goal?

  1. Yes
  2. No

Answer(s): A

Explanation:

Polybase loads rows that are smaller than 1 MB.

Note on Polybase Load: PolyBase is a technology that accesses external data stored in Azure Blob storage or Azure Data Lake Store via the T-SQL language.

Extract, Load, and Transform (ELT)
Extract, Load, and Transform (ELT) is a process by which data is extracted from a source system, loaded into a data warehouse, and then transformed.

The basic steps for implementing a PolyBase ELT for dedicated SQL pool are:

-Extract the source data into text files.
-Land the data into Azure Blob storage or Azure Data Lake Store.
-Prepare the data for loading.
-Load the data into dedicated SQL pool staging tables using PolyBase.
-Transform the data.
-Insert the data into production tables.


Reference:

https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-service-capacity-limits
https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/load-data-overview



You plan to create a dimension table in Azure Synapse Analytics that will be less than 1 GB.
You need to create the table to meet the following requirements:

-Provide the fastest query time.
-Minimize data movement during queries.

Which type of table should you use?

  1. replicated
  2. hash distributed
  3. heap
  4. round-robin

Answer(s): A

Explanation:

A replicated table has a full copy of the table accessible on each Compute node. Replicating a table removes the need to transfer data among Compute nodes before a join or aggregation. Since the table has multiple copies, replicated tables work best when the table size is less than 2 GB compressed. 2 GB is not a hard limit.


Reference:

https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/design-guidance-for-replicated-tables



You are designing a dimension table in an Azure Synapse Analytics dedicated SQL pool.

You need to create a surrogate key for the table. The solution must provide the fastest query performance.

What should you use for the surrogate key?

  1. a GUID column
  2. a sequence object
  3. an IDENTITY column

Answer(s): C

Explanation:

Use IDENTITY to create surrogate keys using dedicated SQL pool in AzureSynapse Analytics.

Note: A surrogate key on a table is a column with a unique identifier for each row. The key is not generated from the table data. Data modelers like to create surrogate keys on their tables when they design data warehouse models. You can use the IDENTITY property to achieve this goal simply and effectively without affecting load performance.


Reference:

https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-identity



HOTSPOT (Drag and Drop is not supported)
You have an Azure Data Lake Storage Gen2 account that contains a container named container1. You have an Azure Synapse Analytics serverless SQL pool that contains a native external table named dbo.Table1. The source data for dbo.Table1 is stored in container1. The folder structure of container1 is shown in the following exhibit.


The external data source is defined by using the following statement.


For each of the following statements, select Yes if the statement is true. Otherwise, select No.
NOTE: Each correct selection is worth one point.

  1. See Explanation section for answer.

Answer(s): A

Explanation:




Box 1: Yes
In the serverless SQL pool you can also use recursive wildcards /logs/** to reference Parquet or CSV files in any sub-folder beneath the referenced folder.

Box 2: Yes
Box 3: No


Reference:

https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/develop-tables-external-tables



Viewing Page 15 of 75



Share your comments for Microsoft DP-203 exam with other users:

Sarah 6/13/2023 1:42:00 PM

anyone use this? the question dont seem to follow other formats and terminology i have been studying im getting worried
CANADA


Shuv 10/3/2023 8:19:00 AM

good questions
UNITED STATES


Reb974 8/5/2023 1:44:00 AM

hello are these questions valid for ms-102
CANADA


Mchal 7/20/2023 3:38:00 AM

some questions are wrongly answered but its good nonetheless
POLAND


Sonbir 8/8/2023 1:04:00 PM

how to get system serial number using intune
Anonymous


Manju 10/19/2023 1:19:00 PM

is it really helpful to pass the exam
Anonymous


LeAnne Hair 8/24/2023 12:47:00 PM

#229 in incorrect - all the customers require an annual review
UNITED STATES


Abdul SK 9/28/2023 11:42:00 PM

kindy upload
Anonymous


Aderonke 10/23/2023 12:53:00 PM

fantastic assessment on psm 1
UNITED KINGDOM


SAJI 7/20/2023 2:51:00 AM

56 question correct answer a,b
Anonymous


Raj Kumar 10/23/2023 8:52:00 PM

thank you for providing the q bank
CANADA


piyush keshari 7/7/2023 9:46:00 PM

true quesstions
Anonymous


B.A.J 11/6/2023 7:01:00 AM

i can´t believe ms asks things like this, seems to be only marketing material.
Anonymous


Guss 5/23/2023 12:28:00 PM

hi, could you please add the last update of ns0-527
Anonymous


Rond65 8/22/2023 4:39:00 PM

question #3 refers to vnet4 and vnet5. however, there is no vnet5 listed in the case study (testlet 2).
UNITED STATES


Cheers 12/13/2023 9:55:00 AM

sometimes it may be good some times it may be
GERMANY


Sumita Bose 7/21/2023 1:01:00 AM

qs 4 answer seems wrong- please check
AUSTRALIA


Amit 9/7/2023 12:53:00 AM

very detailed explanation !
HONG KONG


FisherGirl 5/16/2022 10:36:00 PM

the interactive nature of the test engine application makes the preparation process less boring.
NETHERLANDS


Chiranthaka 9/20/2023 11:15:00 AM

very useful.
Anonymous


SK 7/15/2023 3:51:00 AM

complete question dump should be made available for practice.
Anonymous


Gamerrr420 5/25/2022 9:38:00 PM

i just passed my first exam. i got 2 exam dumps as part of the 50% sale. my second exam is under work. once i write that exam i report my result. but so far i am confident.
AUSTRALIA


Kudu hgeur 9/21/2023 5:58:00 PM

nice create dewey stefen
CZECH REPUBLIC


Anorag 9/6/2023 9:24:00 AM

i just wrote this exam and it is still valid. the questions are exactly the same but there are about 4 or 5 questions that are answered incorrectly. so watch out for those. best of luck with your exam.
CANADA


Nathan 1/10/2023 3:54:00 PM

passed my exam today. this is a good start to 2023.
UNITED STATES


1 10/28/2023 7:32:00 AM

great sharing
Anonymous


Anand 1/20/2024 10:36:00 AM

very helpful
UNITED STATES


Kumar 6/23/2023 1:07:00 PM

thanks.. very helpful
FRANCE


User random 11/15/2023 3:01:00 AM

i registered for 1z0-1047-23 but dumps qre available for 1z0-1047-22. help me with this...
UNITED STATES


kk 1/17/2024 3:00:00 PM

very helpful
UNITED STATES


Raj 7/24/2023 10:20:00 AM

please upload oracle 1z0-1110-22 exam pdf
INDIA


Blessious Phiri 8/13/2023 11:58:00 AM

becoming interesting on the logical part of the cdbs and pdbs
Anonymous


LOL what a joke 9/10/2023 9:09:00 AM

some of the answers are incorrect, i would be wary of using this until an admin goes back and reviews all the answers
UNITED STATES


Muhammad Rawish Siddiqui 12/9/2023 7:40:00 AM

question # 267: federated operating model is also correct.
SAUDI ARABIA