Databricks Certified Associate Developer for Apache Spark Certified Associate Developer for Apache Spark Exam Questions in PDF

Free Databricks Certified Associate Developer for Apache Spark Dumps Questions (page: 7)

Which of the following code blocks returns a copy of DataFrame itemsDf where the column supplier has been renamed to manufacturer?

  1. itemsDf.withColumn(["supplier", "manufacturer"])
  2. itemsDf.withColumn("supplier").alias("manufacturer")
  3. itemsDf.withColumnRenamed("supplier", "manufacturer")
  4. itemsDf.withColumnRenamed(col("manufacturer"), col("supplier"))
  5. itemsDf.withColumnsRenamed("supplier", "manufacturer")

Answer(s): C

Explanation:

itemsDf.withColumnRenamed("supplier", "manufacturer")

Correct! This uses the relatively trivial DataFrame method withColumnRenamed for renaming column supplier to column manufacturer.
Note that the Question: asks for
"a copy of DataFrame itemsDf". This may be confusing if you are not familiar with Spark yet. RDDs (Resilient Distributed Datasets) are the foundation of Spark DataFrames and are immutable. As such, DataFrames are immutable, too. Any command that changes anything in the DataFrame therefore necessarily returns a copy, or a new version, of it that has the changes applied. itemsDf.withColumnsRenamed("supplier", "manufacturer")
Incorrect. Spark's DataFrame API does not have a withColumnsRenamed() method. itemsDf.withColumnRenamed(col("manufacturer"), col("supplier"))
No. Watch out – although the col() method works for many methods of the DataFrame API, withColumnRenamed is not one of them. As outlined in the documentation linked below, withColumnRenamed expects strings. itemsDf.withColumn(["supplier", "manufacturer"])
Wrong. While DataFrame.withColumn() exists in Spark, it has a different purpose than renaming columns. withColumn is typically used to add columns to DataFrames, taking the name of the new column as a first, and a Column as a second argument. Learn more via the documentation that is linked below.
itemsDf.withColumn("supplier").alias("manufacturer")
No. While DataFrame.withColumn() exists, it requires 2 arguments. Furthermore, the alias() method on DataFrames would not help the cause of renaming a column much. DataFrame.alias() can be useful in addressing the input of join statements. However, this is far outside of the scope of this question. If you are curious nevertheless, check out the link below.
More info: pyspark.sql.DataFrame.withColumnRenamed — PySpark 3.1.1 documentation, pyspark.sql.DataFrame.withColumn — PySpark 3.1.1 documentation, and pyspark.sql.DataFrame.alias —

PySpark 3.1.2 documentation (https://bit.ly/3aSB5tm , https://bit.ly/2Tv4rbE , https://bit.ly/2RbhBd2)
Static notebook | Dynamic notebook: See test 1, Question: 31 (Databricks import instructions) (https://flrs.github.io/spark_practice_tests_code/#1/31.html ,
https://bit.ly/sparkpracticeexams_import_instructions)



Which of the following code blocks returns DataFrame transactionsDf sorted in descending order by column predError, showing missing values last?

  1. transactionsDf.sort(asc_nulls_last("predError"))
  2. transactionsDf.orderBy("predError").desc_nulls_last()
  3. transactionsDf.sort("predError", ascending=False)
  4. transactionsDf.desc_nulls_last("predError")
  5. transactionsDf.orderBy("predError").asc_nulls_last()

Answer(s): C

Explanation:

transactionsDf.sort("predError", ascending=False)
Correct! When using DataFrame.sort() and setting ascending=False, the DataFrame will be sorted by the specified column in descending order, putting all missing values last. An alternative, although not listed as an answer here, would be transactionsDf.sort(desc_nulls_last("predError")). transactionsDf.sort(asc_nulls_last("predError"))
Incorrect. While this is valid syntax, the DataFrame will be sorted on column predError in ascending order and not in descending order, putting missing values last. transactionsDf.desc_nulls_last("predError")
Wrong, this is invalid syntax. There is no method DataFrame.desc_nulls_last() in the Spark API. There is a Spark function desc_nulls_last() however (link see below). transactionsDf.orderBy("predError").desc_nulls_last()
No. While transactionsDf.orderBy("predError") is correct syntax (although it sorts the DataFrame by column predError in ascending order) and returns a DataFrame, there is no method DataFrame.desc_nulls_last() in the Spark API. There is a Spark function desc_nulls_last() however (link see below).
transactionsDf.orderBy("predError").asc_nulls_last()
Incorrect. There is no method DataFrame.asc_nulls_last() in the Spark API (see above).
More info: pyspark.sql.functions.desc_nulls_last — PySpark 3.1.2 documentation and pyspark.sql.DataFrame.sort — PySpark 3.1.2 documentation (https://bit.ly/3g1JtbI , https://bit.ly/2R90NCS)
Static notebook | Dynamic notebook: See test 1, Question: 32 (
Databricks import instructions) (https://flrs.github.io/spark_practice_tests_code/#1/32.html ,
https://bit.ly/sparkpracticeexams_import_instructions)



The code block displayed below contains an error. The code block is intended to perform an outer join of DataFrames transactionsDf and itemsDf on columns productId and itemId, respectively.

Find the error.
Code block:
transactionsDf.join(itemsDf, [itemsDf.itemId, transactionsDf.productId], "outer")

  1. The "outer" argument should be eliminated, since "outer" is the default join type.
  2. The join type needs to be appended to the join() operator, like join().outer() instead of listing it as the last argument inside the join() call.
  3. The term [itemsDf.itemId, transactionsDf.productId] should be replaced by itemsDf.itemId == transactionsDf.productId.
  4. The term [itemsDf.itemId, transactionsDf.productId] should be replaced by itemsDf.col("itemId")
    == transactionsDf.col("productId").
  5. The "outer" argument should be eliminated from the call and join should be replaced by joinOuter.

Answer(s): C

Explanation:

Correct code block:
transactionsDf.join(itemsDf, itemsDf.itemId == transactionsDf.productId, "outer") Static notebook | Dynamic notebook: See test 1, Question: 33 (
Databricks import instructions) (https://flrs.github.io/spark_practice_tests_code/#1/33.html ,
https://bit.ly/sparkpracticeexams_import_instructions)



Which of the following code blocks performs a join in which the small DataFrame transactionsDf is sent to all executors where it is joined with DataFrame itemsDf on columns storeId and itemId, respectively?

  1. itemsDf.join(transactionsDf, itemsDf.itemId == transactionsDf.storeId, "right_outer")
  2. itemsDf.join(transactionsDf, itemsDf.itemId == transactionsDf.storeId, "broadcast")
  3. itemsDf.merge(transactionsDf, "itemsDf.itemId == transactionsDf.storeId", "broadcast")
  4. itemsDf.join(broadcast(transactionsDf), itemsDf.itemId == transactionsDf.storeId)
  5. itemsDf.join(transactionsDf, broadcast(itemsDf.itemId == transactionsDf.storeId))

Answer(s): D

Explanation:

The issue with all answers that have "broadcast" as very last argument is that "broadcast" is not a valid join type. While the entry with "right_outer" is a valid statement, it is not a broadcast join. The item where broadcast() is wrapped around the equality condition is not valid code in Spark. broadcast() needs to be wrapped around the name of the small DataFrame that should be broadcast.

More info: Learning Spark, 2nd Edition, Chapter 7
Static notebook | Dynamic notebook: See test 1, Question: 34 ( Databricks import instructions)
tion and explanation?



Which of the following code blocks reduces a DataFrame from 12 to 6 partitions and performs a full shuffle?

  1. DataFrame.repartition(12)
  2. DataFrame.coalesce(6).shuffle()
  3. DataFrame.coalesce(6)
  4. DataFrame.coalesce(6, shuffle=True)
  5. DataFrame.repartition(6)

Answer(s): E

Explanation:

DataFrame.repartition(6)
Correct. repartition() always triggers a full shuffle (different from coalesce()). DataFrame.repartition(12)
No, this would just leave the DataFrame with 12 partitions and not 6. DataFrame.coalesce(6)
coalesce does not perform a full shuffle of the data. Whenever you see "full shuffle", you know that you are not dealing with coalesce(). While coalesce() can perform a partial shuffle when required, it will try to minimize shuffle operations, so the amount of data that is sent between executors.
Here, 12 partitions can easily be repartitioned to be 6 partitions simply by stitching every two partitions into one.
DataFrame.coalesce(6, shuffle=True) and DataFrame.coalesce(6).shuffle() These statements are not valid Spark API syntax.
More info: Spark Repartition & Coalesce - Explained and Repartition vs Coalesce in Apache Spark - Rock the JVM Blog



Share your comments for Databricks Certified Associate Developer for Apache Spark exam with other users:

J
john adenu
11/14/2023 11:02:00 AM

nice questions bring out the best in you.

O
Osman
11/21/2023 2:27:00 PM

really helpful

E
Edward
9/13/2023 5:27:00 PM

question #50 and question #81 are exactly the same questions, azure site recovery provides________for virtual machines. the first says that it is fault tolerance is the answer and second says disater recovery. from my research, it says it should be disaster recovery. can anybody explain to me why? thank you

M
Monti
5/24/2023 11:14:00 PM

iam thankful for these exam dumps questions, i would not have passed without this exam dumps.

A
Anon
10/25/2023 10:48:00 PM

some of the answers seem to be inaccurate. q10 for example shouldnt it be an m custom column?

P
PeterPan
10/18/2023 10:22:00 AM

are the question real or fake?

C
CW
7/11/2023 3:19:00 PM

thank you for providing such assistance.

M
Mn8300
11/9/2023 8:53:00 AM

nice questions

N
Nico
4/23/2023 11:41:00 PM

my 3rd purcahse from this site. these exam dumps are helpful. very helpful.

C
Chere
9/15/2023 4:21:00 AM

found it good

T
Thembelani
5/30/2023 2:47:00 AM

excellent material

V
vinesh phale
9/11/2023 2:51:00 AM

very helpfull

B
Bhagiii
11/4/2023 7:04:00 AM

well explained.

R
Rahul
8/8/2023 9:40:00 PM

i need the pdf, please.

C
CW
7/11/2023 2:51:00 PM

a good source for exam preparation

A
Anchal
10/23/2023 4:01:00 PM

nice questions

J
J Nunes
9/29/2023 8:19:00 AM

i need ielts general training audio guide questions

A
Ananya
9/14/2023 5:16:00 AM

please make this content available

S
Swathi
6/4/2023 2:18:00 PM

content is good

L
Leo
7/29/2023 8:45:00 AM

latest dumps please

L
Laolu
2/15/2023 11:04:00 PM

aside from pdf the test engine software is helpful. the interface is user-friendly and intuitive, making it easy to navigate and find the questions.

Z
Zaynik
9/17/2023 5:36:00 AM

questions and options are correct, but the answers are wrong sometimes. so please check twice or refer some other platform for the right answer

M
Massam
6/11/2022 5:55:00 PM

90% of questions was there but i failed the exam, i marked the answers as per the guide but looks like they are not accurate , if not i would have passed the exam given that i saw about 45 of 50 questions from dump

A
Anonymous
12/27/2023 12:47:00 AM

answer to this question "what administrative safeguards should be implemented to protect the collected data while in use by manasa and her product management team? " it should be (c) for the following reasons: this administrative safeguard involves controlling access to collected data by ensuring that only individuals who need the data for their job responsibilities have access to it. this helps minimize the risk of unauthorized access and potential misuse of sensitive information. while other options such as (a) documenting data flows and (b) conducting a privacy impact assessment (pia) are important steps in data protection, implementing a "need to know" access policy directly addresses the issue of protecting data while in use by limiting access to those who require it for legitimate purposes. (d) is not directly related to safeguarding data during use; it focuses on data transfers and location.

J
Japles
5/23/2023 9:46:00 PM

password lockout being the correct answer for question 37 does not make sense. it should be geofencing.

F
Faritha
8/10/2023 6:00:00 PM

for question 4, the righr answer is :recover automatically from failures

A
Anonymous
9/14/2023 4:27:00 AM

question number 4s answer is 3, option c. i

P
p das
12/7/2023 11:41:00 PM

very good questions

A
Anna
1/5/2024 1:12:00 AM

i am confused about the answers to the questions. are the answers correct?

B
Bhavya
9/13/2023 10:15:00 AM

very usefull

R
Rahul Kumar
8/31/2023 12:30:00 PM

need certification.

D
Diran Ole
9/17/2023 5:15:00 PM

great exam prep

V
Venkata Subbarao Bandaru
6/24/2023 8:45:00 AM

i require dump

D
D
7/15/2023 1:38:00 AM

good morning, could you please upload this exam again,

AI Tutor 👋 I’m here to help!