Databricks Databricks Certified Associate Developer for Apache Spark 3.0 Exam (page: 3)
Databricks Certified Associate Developer for Apache Spark
Updated on: 09-Apr-2026

Which of the following statements about RDDs is incorrect?

  1. An RDD consists of a single partition.
  2. The high-level DataFrame API is built on top of the low-level RDD API.
  3. RDDs are immutable.
  4. RDD stands for Resilient Distributed Dataset.
  5. RDDs are great for precisely instructing Spark on how to do a query.

Answer(s): A

Explanation:

An RDD consists of a single partition.
Quite the opposite: Spark partitions RDDs and distributes the partitions across multiple nodes.



Which of the elements that are labeled with a circle and a number contain an error or are misrepresented?

  1. 1, 10
  2. 1, 8
  3. 10
  4. 7, 9, 10
  5. 1, 4, 6, 9

Answer(s): B

Explanation:

1: Correct – This should just read "API" or "DataFrame API". The DataFrame is not part of the SQL API. To make a DataFrame accessible via SQL, you first need to create a DataFrame view. That

view can then be accessed via SQL.
4: Although "K_38_INU" looks odd, it is a completely valid name for a DataFrame column. 6: No, StringType is a correct type.
7: Although a StringType may not be the most efficient way to store a phone number, there is nothing fundamentally wrong with using this type here.
8: Correct – TreeType is not a type that Spark supports.
9: No, Spark DataFrames support ArrayType variables. In this case, the variable would represent a sequence of elements with type LongType, which is also a valid type for Spark DataFrames.
10: There is nothing wrong with this row.
More info: Data Types - Spark 3.1.1 Documentation (https://bit.ly/3aAPKJT)



Which of the following describes characteristics of the Spark UI?

  1. Via the Spark UI, workloads can be manually distributed across executors.
  2. Via the Spark UI, stage execution speed can be modified.
  3. The Scheduler tab shows how jobs that are run in parallel by multiple users are distributed across the cluster.
  4. There is a place in the Spark UI that shows the property spark.executor.memory.
  5. Some of the tabs in the Spark UI are named Jobs, Stages, Storage, DAGs, Executors, and SQL.

Answer(s): D

Explanation:

There is a place in the Spark UI that shows the property spark.executor.memory.
Correct, you can see Spark properties such as spark.executor.memory in the Environment tab. Some of the tabs in the Spark UI are named Jobs, Stages, Storage, DAGs, Executors, and SQL.
Wrong – Jobs, Stages, Storage, Executors, and SQL are all tabs in the Spark UI. DAGs can be inspected in the "Jobs" tab in the job details or in the Stages or SQL tab, but are not a separate tab.
Via the Spark UI, workloads can be manually distributed across distributors.
No, the Spark UI is meant for inspecting the inner workings of Spark which ultimately helps understand, debug, and optimize Spark transactions.
Via the Spark UI, stage execution speed can be modified. No, see above.
The Scheduler tab shows how jobs that are run in parallel by multiple users are distributed across the cluster.
No, there is no Scheduler tab.



Which of the following statements about broadcast variables is correct?

  1. Broadcast variables are serialized with every single task.
  2. Broadcast variables are commonly used for tables that do not fit into memory.
  3. Broadcast variables are immutable.
  4. Broadcast variables are occasionally dynamically updated on a per-task basis.
  5. Broadcast variables are local to the worker node and not shared across the cluster.

Answer(s): C

Explanation:

Broadcast variables are local to the worker node and not shared across the cluster.
This is wrong because broadcast variables are meant to be shared across the cluster. As such, they are never just local to the worker node, but available to all worker nodes.
Broadcast variables are commonly used for tables that do not fit into memory.
This is wrong because broadcast variables can only be broadcast because they are small and do fit into memory.
Broadcast variables are serialized with every single task.
This is wrong because they are cached on every machine in the cluster, precisely avoiding to have to be serialized with every single task.
Broadcast variables are occasionally dynamically updated on a per-task basis.
This is wrong because broadcast variables are immutable – they are never updated. More info: Spark – The Definitive Guide, Chapter 14



Which of the following is a viable way to improve Spark's performance when dealing with large amounts of data, given that there is only a single application running on the cluster?

  1. Increase values for the properties spark.default.parallelism and spark.sql.shuffle.partitions
  2. Decrease values for the properties spark.default.parallelism and spark.sql.partitions
  3. Increase values for the properties spark.sql.parallelism and spark.sql.partitions
  4. Increase values for the properties spark.sql.parallelism and spark.sql.shuffle.partitions
  5. Increase values for the properties spark.dynamicAllocation.maxExecutors, spark.default.parallelism, and spark.sql.shuffle.partitions

Answer(s): A

Explanation:

Decrease values for the properties spark.default.parallelism and spark.sql.partitions No, these values need to be increased.
Increase values for the properties spark.sql.parallelism and spark.sql.partitions Wrong, there is no property spark.sql.parallelism.
Increase values for the properties spark.sql.parallelism and spark.sql.shuffle.partitions See above.
Increase values for the properties spark.dynamicAllocation.maxExecutors, spark.default.parallelism, and spark.sql.shuffle.partitions
The property spark.dynamicAllocation.maxExecutors is only in effect if dynamic allocation is enabled, using the spark.dynamicAllocation.enabled property. It is disabled by default. Dynamic allocation can be useful when to run multiple applications on the same cluster in parallel. However, in this case there is only a single application running on the cluster, so enabling dynamic allocation would not yield a performance benefit.
More info: Practical Spark Tips For Data Scientists | Experfy.com and Basics of Apache Spark Configuration Settings | by Halil Ertan | Towards Data Science (https://bit.ly/3gA0A6w ,
https://bit.ly/2QxhNTr)



Viewing Page 3 of 37



Share your comments for Databricks Databricks Certified Associate Developer for Apache Spark 3.0 exam with other users:

Paul K 11/27/2023 2:28:00 AM

i think it should be a,c. option d goes against the principle of building anything custom unless there are no work arounds available
INDIA


ph 6/16/2023 12:41:00 AM

very legible
Anonymous


sephs2001 7/31/2023 10:42:00 PM

is this exam accurate or helpful?
Anonymous


ash 7/11/2023 3:00:00 AM

please upload dump, i have exam in 2 days
INDIA


Sneha 8/17/2023 6:29:00 PM

this is useful
CANADA


sachin 12/27/2023 2:45:00 PM

question 232 answer should be perimeter not netowrk layer. wrong answer selected
Anonymous


tomAws 7/18/2023 5:05:00 AM

nice questions
BRAZIL


Rahul 6/11/2023 2:07:00 AM

hi team, could you please provide this dump ?
INDIA


TeamOraTech 12/5/2023 9:49:00 AM

very helpful to clear the exam and understand the concept.
Anonymous


Curtis 7/12/2023 8:20:00 PM

i think it is great that you are helping people when they need it. thanks.
UNITED STATES


sam 7/17/2023 6:22:00 PM

cannot evaluate yet
Anonymous


nutz 7/20/2023 1:54:00 AM

a laptops wireless antenna is most likely located in the bezel of the lid
UNITED STATES


rajesh soni 1/17/2024 6:53:00 AM

good examplae to learn basic
INDIA


Tanya 10/25/2023 7:07:00 AM

this is useful information
Anonymous


Nasir Mahmood 12/11/2023 7:32:00 AM

looks usefull
Anonymous


Jason 9/30/2023 1:07:00 PM

question 81 should be c.
CANADA


TestPD1 8/10/2023 12:22:00 PM

question 18 : response isnt a ?
EUROPEAN UNION


ally 8/19/2023 5:31:00 PM

plaese add questions
TURKEY


DIA 10/7/2023 5:59:00 AM

is dumps still valid ?
FRANCE


Annie 7/7/2023 8:33:00 AM

thanks for this
EUROPEAN UNION


arnie 9/17/2023 6:38:00 AM

please upload questions
Anonymous


Tanuj Rana 7/22/2023 2:33:00 AM

please upload the question dump for professional machinelearning
Anonymous


Future practitioner 8/10/2023 1:26:00 PM

question 4 answer is c. this site shows the correct answer as b. "adopt a consumption model" is clearly a cost optimization design principle. looks like im done using this site to study!!!
Anonymous


Ace 8/3/2023 10:37:00 AM

number 52 answer is d
UNITED STATES


Nathan 12/17/2023 12:04:00 PM

just started preparing for my exam , and this site is so much help
Anonymous


Corey 12/29/2023 5:06:00 PM

question 35 is incorrect, the correct answer is c, it even states so: explanation: when a vm is infected with ransomware, you should not restore the vm to the infected vm. this is because the ransomware will still be present on the vm, and it will encrypt the files again. you should also not restore the vm to any vm within the companys subscription. this is because the ransomware could spread to other vms in the subscription. the best way to restore a vm that is infected with ransomware is to restore it to a new azure vm. this will ensure that the ransomware is not present on the new vm.
Anonymous


Rajender 10/18/2023 3:54:00 AM

i would like to take psm1 exam.
Anonymous


Blessious Phiri 8/14/2023 9:53:00 AM

cbd and pdb are key to the database
SOUTH AFRICA


Alkaed 10/19/2022 10:41:00 AM

the purchase and download process is very much streamlined. the xengine application is very nice and user-friendly but there is always room for improvement.
NETHERLANDS


Dave Gregen 9/4/2023 3:17:00 PM

please upload p_sapea_2023
SWEDEN


Sarah 6/13/2023 1:42:00 PM

anyone use this? the question dont seem to follow other formats and terminology i have been studying im getting worried
CANADA


Shuv 10/3/2023 8:19:00 AM

good questions
UNITED STATES


Reb974 8/5/2023 1:44:00 AM

hello are these questions valid for ms-102
CANADA


Mchal 7/20/2023 3:38:00 AM

some questions are wrongly answered but its good nonetheless
POLAND