You are designing an Azure Data Lake Storage solution that will transform raw JSON files for use in an analytical workload.You need to recommend a format for the transformed files. The solution must meet the following requirements:-Contain information about the data types of each column in the files.-Support querying a subset of columns in the files.-Support read-heavy analytical workloads.-Minimize the file size.What should you recommend?
Answer(s): D
Parquet, an open-source file format for Hadoop, stores nested data structures in a flat columnar format.Compared to a traditional approach where data is stored in a row-oriented approach, Parquet file format is more efficient in terms of storage and performance.It is especially good for queries that read particular columns from a “wide” (with many columns) table since only needed columns are read, and IO is minimized.Incorrect:Not C: The Avro format is the ideal candidate for storing data in a data lake landing zone because:1. Data from the landing zone is usually read as a whole for further processing by downstream systems (the row-based format is more efficient in this case).2. Downstream systems can easily retrieve table schemas from Avro files (there is no need to store the schemas separately in an external meta store).3. Any source schema change is easily handled (schema evolution).
https://www.clairvoyant.ai/blog/big-data-file-formats
Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution. After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen. You have an Azure Storage account that contains 100 GB of files. The files contain rows of text and numerical values. 75% of the rows contain description data that has an average length of 1.1 MB.You plan to copy the data from the storage account to an enterprise data warehouse in Azure Synapse Analytics.You need to prepare the files to ensure that the data copies quickly.Solution: You modify the files to ensure that each row is less than 1 MB.Does this meet the goal?
Answer(s): A
Polybase loads rows that are smaller than 1 MB.Note on Polybase Load: PolyBase is a technology that accesses external data stored in Azure Blob storage or Azure Data Lake Store via the T-SQL language.Extract, Load, and Transform (ELT)Extract, Load, and Transform (ELT) is a process by which data is extracted from a source system, loaded into a data warehouse, and then transformed.The basic steps for implementing a PolyBase ELT for dedicated SQL pool are:-Extract the source data into text files.-Land the data into Azure Blob storage or Azure Data Lake Store.-Prepare the data for loading.-Load the data into dedicated SQL pool staging tables using PolyBase.-Transform the data.-Insert the data into production tables.
https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-service-capacity-limitshttps://docs.microsoft.com/en-us/azure/synapse-analytics/sql/load-data-overview
You plan to create a dimension table in Azure Synapse Analytics that will be less than 1 GB.You need to create the table to meet the following requirements:-Provide the fastest query time.-Minimize data movement during queries.Which type of table should you use?
A replicated table has a full copy of the table accessible on each Compute node. Replicating a table removes the need to transfer data among Compute nodes before a join or aggregation. Since the table has multiple copies, replicated tables work best when the table size is less than 2 GB compressed. 2 GB is not a hard limit.
https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/design-guidance-for-replicated-tables
You are designing a dimension table in an Azure Synapse Analytics dedicated SQL pool.You need to create a surrogate key for the table. The solution must provide the fastest query performance.What should you use for the surrogate key?
Answer(s): C
Use IDENTITY to create surrogate keys using dedicated SQL pool in AzureSynapse Analytics.Note: A surrogate key on a table is a column with a unique identifier for each row. The key is not generated from the table data. Data modelers like to create surrogate keys on their tables when they design data warehouse models. You can use the IDENTITY property to achieve this goal simply and effectively without affecting load performance.
https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-identity
HOTSPOT (Drag and Drop is not supported)You have an Azure Data Lake Storage Gen2 account that contains a container named container1. You have an Azure Synapse Analytics serverless SQL pool that contains a native external table named dbo.Table1. The source data for dbo.Table1 is stored in container1. The folder structure of container1 is shown in the following exhibit.The external data source is defined by using the following statement.For each of the following statements, select Yes if the statement is true. Otherwise, select No.NOTE: Each correct selection is worth one point.
Box 1: YesIn the serverless SQL pool you can also use recursive wildcards /logs/** to reference Parquet or CSV files in any sub-folder beneath the referenced folder.Box 2: YesBox 3: No
https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/develop-tables-external-tables
Share your comments for Microsoft DP-203 exam with other users:
i think in question 7 the first answer should be power bi portal (not power bi)
on question 10 and so far 2 wrong answers as evident in the included reference link.
wonderful material
i passed!! ...but barely! got 728, but needed 720 to pass. the exam hit me with labs right out of the gate! then it went to multiple choice. protip: study the labs!
correct answer for question 92 is c -aws shield
great !! it is really good
explanations for the answers are to the point.
how can rea next
question: 128 d is the wrong answer...should be c
thanks for az 700 dumps
thank you for this tableau dumps . it will helpfull for tableau certification
good content
just testing if the comments are real
very helpful for exam preparation
question 11: https://help.salesforce.com/s/articleview?id=sf.admin_lead_to_patient_setup_overview.htm&type=5
i think the answer to question 42 is b not c
thanks for the dump
fantastic assessments
i find the xengine test engine simulator to be more fun than reading from pdf.
nice document
thank you for making the questions and answers intractive and selectable.
answers are correct?
can i belive this dump
great site to practice for sitecore exam
good for students
nice practice dumps
nokia 4a0-114 dumps
great content and wonderful to have the answers with explanation
for question #118, the answer is option c. the screen shot is showing the drop down, but the answer is marked incorrectly please update . thanks for sharing such nice questions.
the correct answer for the question 29 is d.
question no 22: correct answers: bc, 1 per session 1 per page 1 per component always
these are pretty useful
awesome
yes please upload