Databricks Databricks Certified Associate Developer for Apache Spark 3.0 Exam (page: 4)
Databricks Certified Associate Developer for Apache Spark
Updated on: 02-Jan-2026

Which of the following describes a shuffle?

  1. A shuffle is a process that is executed during a broadcast hash join.
  2. A shuffle is a process that compares data across executors.
  3. A shuffle is a process that compares data across partitions.
  4. A shuffle is a Spark operation that results from DataFrame.coalesce().
  5. A shuffle is a process that allocates partitions to executors.

Answer(s): C

Explanation:

A shuffle is a Spark operation that results from DataFrame.coalesce(). No. DataFrame.coalesce() does not result in a shuffle.
A shuffle is a process that allocates partitions to executors. This is incorrect.
A shuffle is a process that is executed during a broadcast hash join.
No, broadcast hash joins avoid shuffles and yield performance benefits if at least one of the two tables is small in size (<= 10 MB by default). Broadcast hash joins can avoid shuffles because instead of exchanging partitions between executors, they broadcast a small table to all executors that then perform the rest of the join operation locally.
A shuffle is a process that compares data across executors.
No, in a shuffle, data is compared across partitions, and not executors. More info: Spark Repartition & Coalesce - Explained (https://bit.ly/32KF7zS)



Which of the following describes Spark's Adaptive Query Execution?

  1. Adaptive Query Execution features include dynamically coalescing shuffle partitions, dynamically injecting scan filters, and dynamically optimizing skew joins.
  2. Adaptive Query Execution is enabled in Spark by default.
  3. Adaptive Query Execution reoptimizes queries at execution points.
  4. Adaptive Query Execution features are dynamically switching join strategies and dynamically optimizing skew joins.
  5. Adaptive Query Execution applies to all kinds of queries.

Answer(s): D

Explanation:

Adaptive Query Execution features include dynamically coalescing shuffle partitions, dynamically injecting scan filters, and dynamically optimizing skew joins.
This is almost correct. All of these features, except for dynamically injecting scan filters, are part of Adaptive Query Execution. Dynamically injecting scan filters for join operations to limit the amount of data to be considered in a query is part of Dynamic Partition Pruning and not of Adaptive Query Execution.
Adaptive Query Execution reoptimizes queries at execution points.
No, Adaptive Query Execution reoptimizes queries at materialization points. Adaptive Query Execution is enabled in Spark by default.
No, Adaptive Query Execution is disabled in Spark needs to be enabled through the spark.sql.adaptive.enabled property.
Adaptive Query Execution applies to all kinds of queries.
No, Adaptive Query Execution applies only to queries that are not streaming queries and that contain at least one exchange (typically expressed through a join, aggregate, or window operator) or one subquery.
More info: How to Speed up SQL Queries with Adaptive Query Execution, Learning Spark, 2nd Edition, Chapter 12 (https://bit.ly/3tOh8M1)



The code block displayed below contains an error. The code block is intended to join DataFrame itemsDf with the larger DataFrame transactionsDf on column itemId. Find the error.
Code block:
transactionsDf.join(itemsDf, "itemId", how="broadcast")

  1. The syntax is wrong, how= should be removed from the code block.
  2. The join method should be replaced by the broadcast method.
  3. Spark will only perform the broadcast operation if this behavior has been enabled on the Spark cluster.
  4. The larger DataFrame transactionsDf is being broadcasted, rather than the smaller DataFrame itemsDf.
  5. broadcast is not a valid join type.

Answer(s): E

Explanation:

broadcast is not a valid join type.
Correct! The code block should read transactionsDf.join(broadcast(itemsDf), "itemId"). This would imply an inner join (this is the default in DataFrame.join()), but since the join type is not given in the question, this would be a valid choice.
The larger DataFrame transactionsDf is being broadcasted, rather than the smaller DataFrame itemsDf.
This option does not apply here, since the syntax around broadcasting is incorrect.
Spark will only perform the broadcast operation if this behavior has been enabled on the Spark cluster.
No, it is enabled by default, since the spark.sql.autoBroadcastJoinThreshold property is set to 10 MB by default. If that property would be set to -1, then broadcast joining would be disabled.
More info: Performance Tuning - Spark 3.1.1 Documentation (https://bit.ly/3gCz34r) The join method should be replaced by the broadcast method.
No, DataFrame has no broadcast() method.
The syntax is wrong, how= should be removed from the code block. No, having the keyword argument how= is totally acceptable.



Which of the following code blocks efficiently converts DataFrame transactionsDf from 12 into 24 partitions?

  1. transactionsDf.repartition(24, boost=True)
  2. transactionsDf.repartition()
  3. transactionsDf.repartition("itemId", 24)
  4. transactionsDf.coalesce(24)
  5. transactionsDf.repartition(24)

Answer(s): E

Explanation:

transactionsDf.coalesce(24)
No, the coalesce() method can only reduce, but not increase the number of partitions. transactionsDf.repartition()
No, repartition() requires a numPartitions argument. transactionsDf.repartition("itemId", 24)
No, here the cols and numPartitions argument have been mixed up. If the code block would be transactionsDf.repartition(24, "itemId"), this would be a valid solution. transactionsDf.repartition(24, boost=True)
No, there is no boost argument in the repartition() method.



Which of the following code blocks removes all rows in the 6-column DataFrame transactionsDf that have missing data in at least 3 columns?

  1. transactionsDf.dropna("any")
  2. transactionsDf.dropna(thresh=4)
  3. transactionsDf.drop.na("",2)
  4. transactionsDf.dropna(thresh=2)
  5. transactionsDf.dropna("",4)

Answer(s): B

Explanation:

transactionsDf.dropna(thresh=4)
Correct. Note that by only working with the thresh keyword argument, the first how keyword argument is ignored. Also, figuring out which value to set for thresh can be difficult, especially when under pressure in the exam. Here, I recommend you use the notes to create a "simulation" of what different values for thresh would do to a DataFrame. Here is an explanatory image why thresh=4 is the correct answer to the question:

transactionsDf.dropna(thresh=2)
Almost right. See the comment about thresh for the correct answer above. transactionsDf.dropna("any")
No, this would remove all rows that have at least one missing value. transactionsDf.drop.na("",2)
No, drop.na is not a proper DataFrame method. transactionsDf.dropna("",4)
No, this does not work and will throw an error in Spark because Spark cannot understand the first argument.
More info: pyspark.sql.DataFrame.dropna — PySpark 3.1.1 documentation (https://bit.ly/2QZpiCp) Static notebook | Dynamic notebook: See test 1, Question: 20 (Databricks import instructions) (https://flrs.github.io/spark_practice_tests_code/#1/20.html ,
https://bit.ly/sparkpracticeexams_import_instructions)



Viewing Page 4 of 37



Share your comments for Databricks Databricks Certified Associate Developer for Apache Spark 3.0 exam with other users:

Anonymous 7/16/2023 11:05:00 AM

upload cks exam questions
Anonymous


Johan 12/13/2023 8:16:00 AM

awesome training material
NETHERLANDS


PC 7/28/2023 3:49:00 PM

where is dump
Anonymous


YoloStar Yoloing 10/22/2023 9:58:00 PM

q. 289 - the correct answer should be b not d, since the question asks for the most secure way to provide access to a s3 bucket (a single one), and by principle of the least privilege you should not be giving access to all buckets.
Anonymous


Zelalem Nega 5/14/2023 12:45:00 PM

please i need if possible h12-831,
UNITED KINGDOM


unknown-R 11/23/2023 7:36:00 AM

good collection of questions and solution for pl500 certification
UNITED STATES


Swaminathan 5/11/2023 9:59:00 AM

i would like to appear the exam.
Anonymous


Veenu 10/24/2023 6:26:00 AM

i am very happy as i cleared my comptia a+ 220-1101 exam. i studied from as it has all exam dumps and mock tests available. i got 91% on the test.
Anonymous


Karan 5/17/2023 4:26:00 AM

need this dump
Anonymous


Ramesh Kutumbaka 12/30/2023 11:17:00 PM

its really good to eventuate knowledge before appearing for the actual exam.
Anonymous


anonymous 7/20/2023 10:31:00 PM

this is great
CANADA


Xenofon 6/26/2023 9:35:00 AM

please i want the questions to pass the exam
UNITED STATES


Diego 1/21/2024 8:21:00 PM

i need to pass exam
Anonymous


Vichhai 12/25/2023 3:25:00 AM

great, i appreciate it.
AUSTRALIA


P Simon 8/25/2023 2:39:00 AM

please could you upload (isc)2 certified in cybersecurity (cc) exam questions
SOUTH AFRICA


Karim 10/8/2023 8:34:00 PM

good questions, wrong answers
Anonymous


Itumeleng 1/6/2024 12:53:00 PM

im preparing for exams
Anonymous


MS 1/19/2024 2:56:00 PM

question no: 42 isnt azure vm an iaas solution? so, shouldnt the answer be "no"?
Anonymous


keylly 11/28/2023 10:10:00 AM

im study azure
Anonymous


dorcas 9/22/2023 8:08:00 AM

i need this now
Anonymous


treyf 11/9/2023 5:13:00 AM

i took the aws saa-c03 test and scored 935/1000. it has all the exam dumps and important info.
UNITED STATES


anonymous 1/11/2024 4:50:00 AM

good questions
Anonymous


Anjum 9/23/2023 6:22:00 PM

well explained
Anonymous


Thakor 6/7/2023 11:52:00 PM

i got the full version and it helped me pass the exam. pdf version is very good.
INDIA


sartaj 7/18/2023 11:36:00 AM

provide the download link, please
INDIA


loso 7/25/2023 5:18:00 AM

please upload thank.
THAILAND


Paul 6/23/2023 7:12:00 AM

please can you share 1z0-1055-22 dump pls
UNITED STATES


exampei 10/7/2023 8:14:00 AM

i will wait impatiently. thank youu
Anonymous


Prince 10/31/2023 9:09:00 PM

is it possible to clear the exam if we focus on only these 156 questions instead of 623 questions? kindly help!
Anonymous


Ali Azam 12/7/2023 1:51:00 AM

really helped with preparation of my scrum exam
Anonymous


Jerman 9/29/2023 8:46:00 AM

very informative and through explanations
Anonymous


Jimmy 11/4/2023 12:11:00 PM

prep for exam
INDONESIA


Abhi 9/19/2023 1:22:00 PM

thanks for helping us
Anonymous


mrtom33 11/20/2023 4:51:00 AM

i prepared for the eccouncil 350-401 exam. i scored 92% on the test.
Anonymous