Free DP-203 Exam Questions and Answers | by Exam-Dumps.net (Page: 15)

QUESTION: 64 Exam Topic: Design and implement data storage questions

You are designing an Azure Data Lake Storage solution that will transform raw JSON files for use in an analytical workload.

You need to recommend a format for the transformed files. The solution must meet the following requirements:

-Contain information about the data types of each column in the files.
-Support querying a subset of columns in the files.
-Support read-heavy analytical workloads.
-Minimize the file size.

What should you recommend?

JSON
CSV
Apache Avro
Apache Parquet

Answer(s): D

Explanation:

Parquet, an open-source file format for Hadoop, stores nested data structures in a flat columnar format.

Compared to a traditional approach where data is stored in a row-oriented approach, Parquet file format is more efficient in terms of storage and performance.

It is especially good for queries that read particular columns from a “wide” (with many columns) table since only needed columns are read, and IO is minimized.

Incorrect:
Not C:
The Avro format is the ideal candidate for storing data in a data lake landing zone because:

1. Data from the landing zone is usually read as a whole for further processing by downstream systems (the row-based format is more efficient in this case).

2. Downstream systems can easily retrieve table schemas from Avro files (there is no need to store the schemas separately in an external meta store).

3. Any source schema change is easily handled (schema evolution).

Reference:

https://www.clairvoyant.ai/blog/big-data-file-formats

Reveal Solution Next Question

QUESTION: 65 Exam Topic: Design and implement data storage questions

Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.

After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.

You have an Azure Storage account that contains 100 GB of files. The files contain rows of text and numerical values. 75% of the rows contain description data that has an average length of 1.1 MB.

You plan to copy the data from the storage account to an enterprise data warehouse in Azure Synapse Analytics.

You need to prepare the files to ensure that the data copies quickly.

Solution: You modify the files to ensure that each row is less than 1 MB.
Does this meet the goal?

Yes
No

Answer(s): A

Explanation:

Polybase loads rows that are smaller than 1 MB.

Note on Polybase Load: PolyBase is a technology that accesses external data stored in Azure Blob storage or Azure Data Lake Store via the T-SQL language.

Extract, Load, and Transform (ELT)
Extract, Load, and Transform (ELT) is a process by which data is extracted from a source system, loaded into a data warehouse, and then transformed.

The basic steps for implementing a PolyBase ELT for dedicated SQL pool are:

-Extract the source data into text files.
-Land the data into Azure Blob storage or Azure Data Lake Store.
-Prepare the data for loading.
-Load the data into dedicated SQL pool staging tables using PolyBase.
-Transform the data.
-Insert the data into production tables.

Reference:

https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-service-capacity-limits
https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/load-data-overview

Reveal Solution Next Question

QUESTION: 66 Exam Topic: Design and implement data storage questions

You plan to create a dimension table in Azure Synapse Analytics that will be less than 1 GB.
You need to create the table to meet the following requirements:

-Provide the fastest query time.
-Minimize data movement during queries.

Which type of table should you use?

replicated
hash distributed
heap
round-robin

Answer(s): A

Explanation:

A replicated table has a full copy of the table accessible on each Compute node. Replicating a table removes the need to transfer data among Compute nodes before a join or aggregation. Since the table has multiple copies, replicated tables work best when the table size is less than 2 GB compressed. 2 GB is not a hard limit.

Reference:

https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/design-guidance-for-replicated-tables

Reveal Solution Next Question

QUESTION: 67 Exam Topic: Design and implement data storage questions

You are designing a dimension table in an Azure Synapse Analytics dedicated SQL pool.

You need to create a surrogate key for the table. The solution must provide the fastest query performance.

What should you use for the surrogate key?

a GUID column
a sequence object
an IDENTITY column

Answer(s): C

Explanation:

Use IDENTITY to create surrogate keys using dedicated SQL pool in AzureSynapse Analytics.

Note: A surrogate key on a table is a column with a unique identifier for each row. The key is not generated from the table data. Data modelers like to create surrogate keys on their tables when they design data warehouse models. You can use the IDENTITY property to achieve this goal simply and effectively without affecting load performance.

Reference:

https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-identity

Reveal Solution Next Question

QUESTION: 68 Exam Topic: Design and implement data storage questions

HOTSPOT (Drag and Drop is not supported)
You have an Azure Data Lake Storage Gen2 account that contains a container named container1. You have an Azure Synapse Analytics serverless SQL pool that contains a native external table named dbo.Table1. The source data for dbo.Table1 is stored in container1. The folder structure of container1 is shown in the following exhibit.

The external data source is defined by using the following statement.

For each of the following statements, select Yes if the statement is true. Otherwise, select No.
NOTE: Each correct selection is worth one point.

See Explanation section for answer.

Answer(s): A

Explanation:

Box 1: Yes
In the serverless SQL pool you can also use recursive wildcards /logs/** to reference Parquet or CSV files in any sub-folder beneath the referenced folder.

Box 2: Yes
Box 3: No

Reference:

https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/develop-tables-external-tables

Reveal Solution Next Question

Microsoft DP-203 Exam (page: 15) Microsoft Data Engineering on Azure Updated on: 06-Aug-2025

QUESTION: 64 Exam Topic: Design and implement data storage questions

Explanation:

Reference:

QUESTION: 65 Exam Topic: Design and implement data storage questions

Explanation:

Reference:

QUESTION: 66 Exam Topic: Design and implement data storage questions

Explanation:

Reference:

QUESTION: 67 Exam Topic: Design and implement data storage questions

Explanation:

Reference:

QUESTION: 68 Exam Topic: Design and implement data storage questions

Explanation:

Reference:

Microsoft DP-203 Exam (page: 15)
Microsoft Data Engineering on Azure
Updated on: 06-Aug-2025