Microsoft DP-203 Exam (page: 15)
Microsoft Data Engineering on Azure
Updated on: 06-Aug-2025

Viewing Page 15 of 75

You are designing an Azure Data Lake Storage solution that will transform raw JSON files for use in an analytical workload.

You need to recommend a format for the transformed files. The solution must meet the following requirements:

-Contain information about the data types of each column in the files.
-Support querying a subset of columns in the files.
-Support read-heavy analytical workloads.
-Minimize the file size.

What should you recommend?

  1. JSON
  2. CSV
  3. Apache Avro
  4. Apache Parquet

Answer(s): D

Explanation:

Parquet, an open-source file format for Hadoop, stores nested data structures in a flat columnar format.

Compared to a traditional approach where data is stored in a row-oriented approach, Parquet file format is more efficient in terms of storage and performance.

It is especially good for queries that read particular columns from a “wide” (with many columns) table since only needed columns are read, and IO is minimized.

Incorrect:
Not C:
The Avro format is the ideal candidate for storing data in a data lake landing zone because:

1. Data from the landing zone is usually read as a whole for further processing by downstream systems (the row-based format is more efficient in this case).

2. Downstream systems can easily retrieve table schemas from Avro files (there is no need to store the schemas separately in an external meta store).

3. Any source schema change is easily handled (schema evolution).


Reference:

https://www.clairvoyant.ai/blog/big-data-file-formats



Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.

After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.

You have an Azure Storage account that contains 100 GB of files. The files contain rows of text and numerical values. 75% of the rows contain description data that has an average length of 1.1 MB.

You plan to copy the data from the storage account to an enterprise data warehouse in Azure Synapse Analytics.

You need to prepare the files to ensure that the data copies quickly.

Solution: You modify the files to ensure that each row is less than 1 MB.
Does this meet the goal?

  1. Yes
  2. No

Answer(s): A

Explanation:

Polybase loads rows that are smaller than 1 MB.

Note on Polybase Load: PolyBase is a technology that accesses external data stored in Azure Blob storage or Azure Data Lake Store via the T-SQL language.

Extract, Load, and Transform (ELT)
Extract, Load, and Transform (ELT) is a process by which data is extracted from a source system, loaded into a data warehouse, and then transformed.

The basic steps for implementing a PolyBase ELT for dedicated SQL pool are:

-Extract the source data into text files.
-Land the data into Azure Blob storage or Azure Data Lake Store.
-Prepare the data for loading.
-Load the data into dedicated SQL pool staging tables using PolyBase.
-Transform the data.
-Insert the data into production tables.


Reference:

https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-service-capacity-limits
https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/load-data-overview



You plan to create a dimension table in Azure Synapse Analytics that will be less than 1 GB.
You need to create the table to meet the following requirements:

-Provide the fastest query time.
-Minimize data movement during queries.

Which type of table should you use?

  1. replicated
  2. hash distributed
  3. heap
  4. round-robin

Answer(s): A

Explanation:

A replicated table has a full copy of the table accessible on each Compute node. Replicating a table removes the need to transfer data among Compute nodes before a join or aggregation. Since the table has multiple copies, replicated tables work best when the table size is less than 2 GB compressed. 2 GB is not a hard limit.


Reference:

https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/design-guidance-for-replicated-tables



You are designing a dimension table in an Azure Synapse Analytics dedicated SQL pool.

You need to create a surrogate key for the table. The solution must provide the fastest query performance.

What should you use for the surrogate key?

  1. a GUID column
  2. a sequence object
  3. an IDENTITY column

Answer(s): C

Explanation:

Use IDENTITY to create surrogate keys using dedicated SQL pool in AzureSynapse Analytics.

Note: A surrogate key on a table is a column with a unique identifier for each row. The key is not generated from the table data. Data modelers like to create surrogate keys on their tables when they design data warehouse models. You can use the IDENTITY property to achieve this goal simply and effectively without affecting load performance.


Reference:

https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-identity



HOTSPOT (Drag and Drop is not supported)
You have an Azure Data Lake Storage Gen2 account that contains a container named container1. You have an Azure Synapse Analytics serverless SQL pool that contains a native external table named dbo.Table1. The source data for dbo.Table1 is stored in container1. The folder structure of container1 is shown in the following exhibit.


The external data source is defined by using the following statement.


For each of the following statements, select Yes if the statement is true. Otherwise, select No.
NOTE: Each correct selection is worth one point.

  1. See Explanation section for answer.

Answer(s): A

Explanation:




Box 1: Yes
In the serverless SQL pool you can also use recursive wildcards /logs/** to reference Parquet or CSV files in any sub-folder beneath the referenced folder.

Box 2: Yes
Box 3: No


Reference:

https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/develop-tables-external-tables



Viewing Page 15 of 75



Share your comments for Microsoft DP-203 exam with other users:

Georgio 1/19/2024 8:15:00 AM

question 205 answer is b
Anonymous


Matthew Dievendorf 5/30/2023 9:37:00 PM

question 39, should be answer b, directions stated is being sudneted from /21 to a /23. a /23 has 512 ips so 510 hosts. and can make 4 subnets out of the /21
Anonymous


Adhithya 8/11/2022 12:27:00 AM

beautiful test engine software and very helpful. questions are same as in the real exam. i passed my paper.
UNITED ARAB EMIRATES


SuckerPumch88 4/25/2022 10:24:00 AM

the questions are exactly the same in real exam. just make sure not to answer all them correct or else they suspect you are cheating.
UNITED STATES


soheib 7/24/2023 7:05:00 PM

question: 78 the right answer i think is d not a
Anonymous


srija 8/14/2023 8:53:00 AM

very helpful
EUROPEAN UNION


Thembelani 5/30/2023 2:17:00 AM

i am writing this exam tomorrow and have dumps
Anonymous


Anita 10/1/2023 4:11:00 PM

can i have the icdl excel exam
Anonymous


Ben 9/9/2023 7:35:00 AM

please upload it
Anonymous


anonymous 9/20/2023 11:27:00 PM

hye when will post again the past year question for this h13-311_v3 part since i have to for my test tommorow…thank you very much
Anonymous


Randall 9/28/2023 8:25:00 PM

on question 22, option b-once per session is also valid.
Anonymous


Tshegofatso 8/28/2023 11:51:00 AM

this website is very helpful
SOUTH AFRICA


philly 9/18/2023 2:40:00 PM

its my first time exam
SOUTH AFRICA


Beexam 9/4/2023 9:06:00 PM

correct answers are device configuration-enable the automatic installation of webview2 runtime. & policy management- prevent users from submitting feedback.
NEW ZEALAND


RAWI 7/9/2023 4:54:00 AM

is this dump still valid? today is 9-july-2023
SWEDEN


Annie 6/7/2023 3:46:00 AM

i need this exam.. please upload these are really helpful
PAKISTAN


Shubhra Rathi 8/26/2023 1:08:00 PM

please upload the oracle 1z0-1059-22 dumps
Anonymous


Shiji 10/15/2023 1:34:00 PM

very good questions
INDIA


Rita Rony 11/27/2023 1:36:00 PM

nice, first step to exams
Anonymous


Aloke Paul 9/11/2023 6:53:00 AM

is this valid for chfiv9 as well... as i am reker 3rd time...
CHINA


Calbert Francis 1/15/2024 8:19:00 PM

great exam for people taking 220-1101
UNITED STATES


Ayushi Baria 11/7/2023 7:44:00 AM

this is very helpfull for me
Anonymous


alma 8/25/2023 1:20:00 PM

just started preparing for the exam
UNITED KINGDOM


CW 7/10/2023 6:46:00 PM

these are the type of questions i need.
UNITED STATES


Nobody 8/30/2023 9:54:00 PM

does this actually work? are they the exam questions and answers word for word?
Anonymous


Salah 7/23/2023 9:46:00 AM

thanks for providing these questions
Anonymous


Ritu 9/15/2023 5:55:00 AM

interesting
CANADA


Ron 5/30/2023 8:33:00 AM

these dumps are pretty good.
Anonymous


Sowl 8/10/2023 6:22:00 PM

good questions
UNITED STATES


Blessious Phiri 8/15/2023 2:02:00 PM

dbua is used for upgrading oracle database
Anonymous


Richard 10/24/2023 6:12:00 AM

i am thrilled to say that i passed my amazon web services mls-c01 exam, thanks to study materials. they were comprehensive and well-structured, making my preparation efficient.
Anonymous


Janjua 5/22/2023 3:31:00 PM

please upload latest ibm ace c1000-056 dumps
GERMANY


Matt 12/30/2023 11:18:00 AM

if only explanations were provided...
FRANCE


Rasha 6/29/2023 8:23:00 PM

yes .. i need the dump if you can help me
Anonymous