Databricks Databricks-Certified-Data-Analyst-Associate Exam (page: 2)
Databricks Certified Data Analyst Associate
Updated on: 09-Apr-2026

A data analyst has set up a SQL query to run every four hours on a SQL endpoint, but the SQL endpoint is taking too long to start up with each run.

Which of the following changes can the data analyst make to reduce the start-up time for the endpoint while managing costs?

  1. Reduce the SQL endpoint cluster size
  2. Increase the SQL endpoint cluster size
  3. Turn off the Auto stop feature
  4. Increase the minimum scaling value
  5. Use a Serverless SQL endpoint

Answer(s): E

Explanation:

A Serverless SQL endpoint is a type of SQL endpoint that does not require a dedicated cluster to run queries. Instead, it uses a shared pool of resources that can scale up and down automatically based on the demand. This means that a Serverless SQL endpoint can start up much faster than a SQL endpoint that uses a cluster, and it can also save costs by only paying for the resources that are used. A Serverless SQL endpoint is suitable for ad-hoc queries and exploratory analysis, but it may not offer the same level of performance and isolation as a SQL endpoint that uses a cluster. Therefore, a data analyst should consider the trade-offs between speed, cost, and quality when choosing between a Serverless SQL endpoint and a SQL endpoint that uses a cluster.


Reference:

Databricks SQL endpoints, Serverless SQL endpoints, SQL endpoint clusters



A data engineering team has created a Structured Streaming pipeline that processes data in micro- batches and populates gold-level tables. The microbatches are triggered every minute.

A data analyst has created a dashboard based on this gold-level data. The project stakeholders want to see the results in the dashboard updated within one minute or less of new data becoming available within the gold-level tables.

Which of the following cautions should the data analyst share prior to setting up the dashboard to complete this task?

  1. The required compute resources could be costly
  2. The gold-level tables are not appropriately clean for business reporting
  3. The streaming data is not an appropriate data source for a dashboard
  4. The streaming cluster is not fault tolerant
  5. The dashboard cannot be refreshed that quickly

Answer(s): A

Explanation:

A Structured Streaming pipeline that processes data in micro-batches and populates gold-level tables every minute requires a high level of compute resources to handle the frequent data ingestion, processing, and writing. This could result in a significant cost for the organization, especially if the data volume and velocity are large. Therefore, the data analyst should share this caution with the project stakeholders before setting up the dashboard and evaluate the trade-offs between the desired refresh rate and the available budget. The other options are not valid cautions because:

B) The gold-level tables are assumed to be appropriately clean for business reporting, as they are the final output of the data engineering pipeline. If the data quality is not satisfactory, the issue should be addressed at the source or silver level, not at the gold level.

C) The streaming data is an appropriate data source for a dashboard, as it can provide near real-time insights and analytics for the business users. Structured Streaming supports various sources and sinks for streaming data, including Delta Lake, which can enable both batch and streaming queries on the same data.

D) The streaming cluster is fault tolerant, as Structured Streaming provides end-to-end exactly-once fault-tolerance guarantees through checkpointing and write-ahead logs. If a query fails, it can be restarted from the last checkpoint and resume processing.

E) The dashboard can be refreshed within one minute or less of new data becoming available in the gold-level tables, as Structured Streaming can trigger micro-batches as fast as possible (every few seconds) and update the results incrementally. However, this may not be necessary or optimal for the business use case, as it could cause frequent changes in the dashboard and consume more resources.


Reference:

Streaming on Databricks, Monitoring Structured Streaming queries on Databricks, A look at the new Structured Streaming UI in Apache Spark 3.0, Run your first Structured Streaming workload



Which of the following approaches can be used to ingest data directly from cloud-based object storage?

  1. Create an external table while specifying the DBFS storage path to FROM
  2. Create an external table while specifying the DBFS storage path to PATH
  3. It is not possible to directly ingest data from cloud-based object storage
  4. Create an external table while specifying the object storage path to FROM
  5. Create an external table while specifying the object storage path to LOCATION

Answer(s): E

Explanation:

External tables are tables that are defined in the Databricks metastore using the information stored in a cloud object storage location. External tables do not manage the data, but provide a schema and a table name to query the data. To create an external table, you can use the CREATE EXTERNAL TABLE statement and specify the object storage path to the LOCATION clause. For example, to create an external table named ext_table on a Parquet file stored in S3, you can use the following statement:

SQL

CREATE EXTERNAL TABLE ext_table (

col1 INT,

col2 STRING

)

STORED AS PARQUET

LOCATION 's3://bucket/path/file.parquet'

AI-generated code. Review and use carefully. More info on FAQ.


Reference:

External tables



A data analyst wants to create a dashboard with three main sections: Development, Testing, and Production. They want all three sections on the same dashboard, but they want to clearly designate the sections using text on the dashboard.

Which of the following tools can the data analyst use to designate the Development, Testing, and Production sections using text?

  1. Separate endpoints for each section
  2. Separate queries for each section
  3. Markdown-based text boxes
  4. Direct text written into the dashboard in editing mode
  5. Separate color palettes for each section

Answer(s): C

Explanation:

Markdown-based text boxes are useful as labels on a dashboard. They allow the data analyst to add text to a dashboard using the %md magic command in a notebook cell and then select the dashboard icon in the cell actions menu. The text can be formatted using markdown syntax and can include headings, lists, links, images, and more. The text boxes can be resized and moved around on the dashboard using the float layout option.


Reference:

Dashboards in notebooks, How to add text to a dashboard in Databricks



A data analyst needs to use the Databricks Lakehouse Platform to quickly create SQL queries and data visualizations. It is a requirement that the compute resources in the platform can be made serverless, and it is expected that data visualizations can be placed within a dashboard.

Which of the following Databricks Lakehouse Platform services/capabilities meets all of these requirements?

  1. Delta Lake
  2. Databricks Notebooks
  3. Tableau
  4. Databricks Machine Learning
  5. Databricks SQL

Answer(s): E

Explanation:

Databricks SQL is a serverless data warehouse on the Lakehouse that lets you run all of your SQL and BI applications at scale with your tools of choice, all at a fraction of the cost of traditional cloud data warehouses1. Databricks SQL allows you to create SQL queries and data visualizations using the SQL Analytics UI or the Databricks SQL CLI2. You can also place your data visualizations within a dashboard and share it with other users in your organization3. Databricks SQL is powered by Delta Lake, which provides reliability, performance, and governance for your data lake4.


Reference:

Databricks SQL

Query data using SQL Analytics

Visualizations in Databricks notebooks

Delta Lake



Viewing Page 2 of 10



Share your comments for Databricks Databricks-Certified-Data-Analyst-Associate exam with other users:

Corey 12/29/2023 5:06:00 PM

question 35 is incorrect, the correct answer is c, it even states so: explanation: when a vm is infected with ransomware, you should not restore the vm to the infected vm. this is because the ransomware will still be present on the vm, and it will encrypt the files again. you should also not restore the vm to any vm within the companys subscription. this is because the ransomware could spread to other vms in the subscription. the best way to restore a vm that is infected with ransomware is to restore it to a new azure vm. this will ensure that the ransomware is not present on the new vm.
Anonymous


Rajender 10/18/2023 3:54:00 AM

i would like to take psm1 exam.
Anonymous


Blessious Phiri 8/14/2023 9:53:00 AM

cbd and pdb are key to the database
SOUTH AFRICA


Alkaed 10/19/2022 10:41:00 AM

the purchase and download process is very much streamlined. the xengine application is very nice and user-friendly but there is always room for improvement.
NETHERLANDS


Dave Gregen 9/4/2023 3:17:00 PM

please upload p_sapea_2023
SWEDEN


Sarah 6/13/2023 1:42:00 PM

anyone use this? the question dont seem to follow other formats and terminology i have been studying im getting worried
CANADA


Shuv 10/3/2023 8:19:00 AM

good questions
UNITED STATES


Reb974 8/5/2023 1:44:00 AM

hello are these questions valid for ms-102
CANADA


Mchal 7/20/2023 3:38:00 AM

some questions are wrongly answered but its good nonetheless
POLAND


Sonbir 8/8/2023 1:04:00 PM

how to get system serial number using intune
Anonymous


Manju 10/19/2023 1:19:00 PM

is it really helpful to pass the exam
Anonymous


LeAnne Hair 8/24/2023 12:47:00 PM

#229 in incorrect - all the customers require an annual review
UNITED STATES


Abdul SK 9/28/2023 11:42:00 PM

kindy upload
Anonymous


Aderonke 10/23/2023 12:53:00 PM

fantastic assessment on psm 1
UNITED KINGDOM


SAJI 7/20/2023 2:51:00 AM

56 question correct answer a,b
Anonymous


Raj Kumar 10/23/2023 8:52:00 PM

thank you for providing the q bank
CANADA


piyush keshari 7/7/2023 9:46:00 PM

true quesstions
Anonymous


B.A.J 11/6/2023 7:01:00 AM

i can“t believe ms asks things like this, seems to be only marketing material.
Anonymous


Guss 5/23/2023 12:28:00 PM

hi, could you please add the last update of ns0-527
Anonymous


Rond65 8/22/2023 4:39:00 PM

question #3 refers to vnet4 and vnet5. however, there is no vnet5 listed in the case study (testlet 2).
UNITED STATES


Cheers 12/13/2023 9:55:00 AM

sometimes it may be good some times it may be
GERMANY


Sumita Bose 7/21/2023 1:01:00 AM

qs 4 answer seems wrong- please check
AUSTRALIA


Amit 9/7/2023 12:53:00 AM

very detailed explanation !
HONG KONG


FisherGirl 5/16/2022 10:36:00 PM

the interactive nature of the test engine application makes the preparation process less boring.
NETHERLANDS


Chiranthaka 9/20/2023 11:15:00 AM

very useful.
Anonymous


SK 7/15/2023 3:51:00 AM

complete question dump should be made available for practice.
Anonymous


Gamerrr420 5/25/2022 9:38:00 PM

i just passed my first exam. i got 2 exam dumps as part of the 50% sale. my second exam is under work. once i write that exam i report my result. but so far i am confident.
AUSTRALIA


Kudu hgeur 9/21/2023 5:58:00 PM

nice create dewey stefen
CZECH REPUBLIC


Anorag 9/6/2023 9:24:00 AM

i just wrote this exam and it is still valid. the questions are exactly the same but there are about 4 or 5 questions that are answered incorrectly. so watch out for those. best of luck with your exam.
CANADA


Nathan 1/10/2023 3:54:00 PM

passed my exam today. this is a good start to 2023.
UNITED STATES


1 10/28/2023 7:32:00 AM

great sharing
Anonymous


Anand 1/20/2024 10:36:00 AM

very helpful
UNITED STATES


Kumar 6/23/2023 1:07:00 PM

thanks.. very helpful
FRANCE


User random 11/15/2023 3:01:00 AM

i registered for 1z0-1047-23 but dumps qre available for 1z0-1047-22. help me with this...
UNITED STATES