Databricks Certified Associate Developer for Apache Spark Databricks Certified Associate Developer for Apache Spark 3.0 Exam Questions in PDF

Free Databricks Databricks Certified Associate Developer for Apache Spark 3.0 Dumps Questions (page: 4)

Which of the following describes a shuffle?

  1. A shuffle is a process that is executed during a broadcast hash join.
  2. A shuffle is a process that compares data across executors.
  3. A shuffle is a process that compares data across partitions.
  4. A shuffle is a Spark operation that results from DataFrame.coalesce().
  5. A shuffle is a process that allocates partitions to executors.

Answer(s): C

Explanation:

A shuffle is a Spark operation that results from DataFrame.coalesce(). No. DataFrame.coalesce() does not result in a shuffle.
A shuffle is a process that allocates partitions to executors. This is incorrect.
A shuffle is a process that is executed during a broadcast hash join.
No, broadcast hash joins avoid shuffles and yield performance benefits if at least one of the two tables is small in size (<= 10 MB by default). Broadcast hash joins can avoid shuffles because instead of exchanging partitions between executors, they broadcast a small table to all executors that then perform the rest of the join operation locally.
A shuffle is a process that compares data across executors.
No, in a shuffle, data is compared across partitions, and not executors. More info: Spark Repartition & Coalesce - Explained (https://bit.ly/32KF7zS)



Which of the following describes Spark's Adaptive Query Execution?

  1. Adaptive Query Execution features include dynamically coalescing shuffle partitions, dynamically injecting scan filters, and dynamically optimizing skew joins.
  2. Adaptive Query Execution is enabled in Spark by default.
  3. Adaptive Query Execution reoptimizes queries at execution points.
  4. Adaptive Query Execution features are dynamically switching join strategies and dynamically optimizing skew joins.
  5. Adaptive Query Execution applies to all kinds of queries.

Answer(s): D

Explanation:

Adaptive Query Execution features include dynamically coalescing shuffle partitions, dynamically injecting scan filters, and dynamically optimizing skew joins.
This is almost correct. All of these features, except for dynamically injecting scan filters, are part of Adaptive Query Execution. Dynamically injecting scan filters for join operations to limit the amount of data to be considered in a query is part of Dynamic Partition Pruning and not of Adaptive Query Execution.
Adaptive Query Execution reoptimizes queries at execution points.
No, Adaptive Query Execution reoptimizes queries at materialization points. Adaptive Query Execution is enabled in Spark by default.
No, Adaptive Query Execution is disabled in Spark needs to be enabled through the spark.sql.adaptive.enabled property.
Adaptive Query Execution applies to all kinds of queries.
No, Adaptive Query Execution applies only to queries that are not streaming queries and that contain at least one exchange (typically expressed through a join, aggregate, or window operator) or one subquery.
More info: How to Speed up SQL Queries with Adaptive Query Execution, Learning Spark, 2nd Edition, Chapter 12 (https://bit.ly/3tOh8M1)



The code block displayed below contains an error. The code block is intended to join DataFrame itemsDf with the larger DataFrame transactionsDf on column itemId. Find the error.
Code block:
transactionsDf.join(itemsDf, "itemId", how="broadcast")

  1. The syntax is wrong, how= should be removed from the code block.
  2. The join method should be replaced by the broadcast method.
  3. Spark will only perform the broadcast operation if this behavior has been enabled on the Spark cluster.
  4. The larger DataFrame transactionsDf is being broadcasted, rather than the smaller DataFrame itemsDf.
  5. broadcast is not a valid join type.

Answer(s): E

Explanation:

broadcast is not a valid join type.
Correct! The code block should read transactionsDf.join(broadcast(itemsDf), "itemId"). This would imply an inner join (this is the default in DataFrame.join()), but since the join type is not given in the question, this would be a valid choice.
The larger DataFrame transactionsDf is being broadcasted, rather than the smaller DataFrame itemsDf.
This option does not apply here, since the syntax around broadcasting is incorrect.
Spark will only perform the broadcast operation if this behavior has been enabled on the Spark cluster.
No, it is enabled by default, since the spark.sql.autoBroadcastJoinThreshold property is set to 10 MB by default. If that property would be set to -1, then broadcast joining would be disabled.
More info: Performance Tuning - Spark 3.1.1 Documentation (https://bit.ly/3gCz34r) The join method should be replaced by the broadcast method.
No, DataFrame has no broadcast() method.
The syntax is wrong, how= should be removed from the code block. No, having the keyword argument how= is totally acceptable.



Which of the following code blocks efficiently converts DataFrame transactionsDf from 12 into 24 partitions?

  1. transactionsDf.repartition(24, boost=True)
  2. transactionsDf.repartition()
  3. transactionsDf.repartition("itemId", 24)
  4. transactionsDf.coalesce(24)
  5. transactionsDf.repartition(24)

Answer(s): E

Explanation:

transactionsDf.coalesce(24)
No, the coalesce() method can only reduce, but not increase the number of partitions. transactionsDf.repartition()
No, repartition() requires a numPartitions argument. transactionsDf.repartition("itemId", 24)
No, here the cols and numPartitions argument have been mixed up. If the code block would be transactionsDf.repartition(24, "itemId"), this would be a valid solution. transactionsDf.repartition(24, boost=True)
No, there is no boost argument in the repartition() method.



Which of the following code blocks removes all rows in the 6-column DataFrame transactionsDf that have missing data in at least 3 columns?

  1. transactionsDf.dropna("any")
  2. transactionsDf.dropna(thresh=4)
  3. transactionsDf.drop.na("",2)
  4. transactionsDf.dropna(thresh=2)
  5. transactionsDf.dropna("",4)

Answer(s): B

Explanation:

transactionsDf.dropna(thresh=4)
Correct. Note that by only working with the thresh keyword argument, the first how keyword argument is ignored. Also, figuring out which value to set for thresh can be difficult, especially when under pressure in the exam. Here, I recommend you use the notes to create a "simulation" of what different values for thresh would do to a DataFrame. Here is an explanatory image why thresh=4 is the correct answer to the question:

transactionsDf.dropna(thresh=2)
Almost right. See the comment about thresh for the correct answer above. transactionsDf.dropna("any")
No, this would remove all rows that have at least one missing value. transactionsDf.drop.na("",2)
No, drop.na is not a proper DataFrame method. transactionsDf.dropna("",4)
No, this does not work and will throw an error in Spark because Spark cannot understand the first argument.
More info: pyspark.sql.DataFrame.dropna — PySpark 3.1.1 documentation (https://bit.ly/2QZpiCp) Static notebook | Dynamic notebook: See test 1, Question: 20 (Databricks import instructions) (https://flrs.github.io/spark_practice_tests_code/#1/20.html ,
https://bit.ly/sparkpracticeexams_import_instructions)



Share your comments for Databricks Databricks Certified Associate Developer for Apache Spark 3.0 exam with other users:

S
s_123
8/12/2023 4:28:00 PM

do we need c# coding to be az204 certified

B
Blessious Phiri
8/15/2023 3:38:00 PM

excellent topics covered

M
Manasa
12/5/2023 3:15:00 AM

are these really financial cloud questions and answers, seems these are basic admin question and answers

N
Not Robot
5/14/2023 5:33:00 PM

are these comments real

K
kriah
9/4/2023 10:44:00 PM

please upload the latest dumps

E
ed
12/17/2023 1:41:00 PM

a company runs its workloads on premises. the company wants to forecast the cost of running a large application on aws. which aws service or tool can the company use to obtain this information? pricing calculator ... the aws pricing calculator is primarily used for estimating future costs

M
Muru
12/29/2023 10:23:00 AM

looks interesting

T
Tech Lady
10/17/2023 12:36:00 PM

thanks! that’s amazing

M
Mike
8/20/2023 5:12:00 PM

the exam dumps are helping me get a solid foundation on the practical techniques and practices needed to be successful in the auditing world.

N
Nobody
9/18/2023 6:35:00 PM

q 14 should be dmz sever1 and notepad.exe why does note pad have a 443 connection

M
Muhammad Rawish Siddiqui
12/4/2023 12:17:00 PM

question # 108, correct answers are business growth and risk reduction.

E
Emmah
7/29/2023 9:59:00 AM

are these valid chfi questions

M
Mort
10/19/2023 7:09:00 PM

question: 162 should be dlp (b)

E
Eknath
10/4/2023 1:21:00 AM

good exam questions

N
Nizam
6/16/2023 7:29:00 AM

I have to say this is really close to real exam. Passed my exam with this.

P
poran
11/20/2023 4:43:00 AM

good analytics question

A
Antony
11/23/2023 11:36:00 AM

this looks accurate

E
Ethan
8/23/2023 12:52:00 AM

question 46, the answer should be data "virtualization" (not visualization).

N
nSiva
9/22/2023 5:58:00 AM

its useful.

R
Ranveer
7/26/2023 7:26:00 PM

Pass this exam 3 days ago. The PDF version and the Xengine App is quite useful.

S
Sanjay
8/15/2023 10:22:00 AM

informative for me.

T
Tom
12/12/2023 8:53:00 PM

question 134s answer shoule be "dlp"

A
Alex
11/7/2023 11:02:00 AM

in 72 the answer must be [sys_user_has_role] table.

F
Finn
5/4/2023 10:21:00 PM

i appreciated the mix of multiple-choice and short answer questions. i passed my exam this morning.

A
AJ
7/13/2023 8:33:00 AM

great to find this website, thanks

C
Curtis Nakawaki
6/29/2023 9:11:00 PM

examination questions seem to be relevant.

U
Umashankar Sharma
10/22/2023 9:39:00 AM

planning to take psm test

E
ED SHAW
7/31/2023 10:34:00 AM

please allow to download

A
AD
7/22/2023 11:29:00 AM

please provide dumps

A
Ayyjayy
11/6/2023 7:29:00 AM

is the answer to question 15 correct ? i feel like the answer should be b

B
Blessious Phiri
8/12/2023 11:56:00 AM

its getting more technical

J
Jeanine J
7/11/2023 3:04:00 PM

i think these questions are what i need.

A
Aderonke
10/23/2023 2:13:00 PM

helpful assessment

T
Tom
1/5/2024 2:32:00 AM

i am confused about the answers to the questions. do you know if the answers are correct?

AI Tutor 👋 I’m here to help!