Databricks Databricks Certified Associate Developer for Apache Spark 3.0 Exam (page: 7)
Databricks Certified Associate Developer for Apache Spark
Updated on: 02-Jan-2026

Which of the following code blocks returns a copy of DataFrame itemsDf where the column supplier has been renamed to manufacturer?

  1. itemsDf.withColumn(["supplier", "manufacturer"])
  2. itemsDf.withColumn("supplier").alias("manufacturer")
  3. itemsDf.withColumnRenamed("supplier", "manufacturer")
  4. itemsDf.withColumnRenamed(col("manufacturer"), col("supplier"))
  5. itemsDf.withColumnsRenamed("supplier", "manufacturer")

Answer(s): C

Explanation:

itemsDf.withColumnRenamed("supplier", "manufacturer")

Correct! This uses the relatively trivial DataFrame method withColumnRenamed for renaming column supplier to column manufacturer.
Note that the Question: asks for
"a copy of DataFrame itemsDf". This may be confusing if you are not familiar with Spark yet. RDDs (Resilient Distributed Datasets) are the foundation of Spark DataFrames and are immutable. As such, DataFrames are immutable, too. Any command that changes anything in the DataFrame therefore necessarily returns a copy, or a new version, of it that has the changes applied. itemsDf.withColumnsRenamed("supplier", "manufacturer")
Incorrect. Spark's DataFrame API does not have a withColumnsRenamed() method. itemsDf.withColumnRenamed(col("manufacturer"), col("supplier"))
No. Watch out – although the col() method works for many methods of the DataFrame API, withColumnRenamed is not one of them. As outlined in the documentation linked below, withColumnRenamed expects strings. itemsDf.withColumn(["supplier", "manufacturer"])
Wrong. While DataFrame.withColumn() exists in Spark, it has a different purpose than renaming columns. withColumn is typically used to add columns to DataFrames, taking the name of the new column as a first, and a Column as a second argument. Learn more via the documentation that is linked below.
itemsDf.withColumn("supplier").alias("manufacturer")
No. While DataFrame.withColumn() exists, it requires 2 arguments. Furthermore, the alias() method on DataFrames would not help the cause of renaming a column much. DataFrame.alias() can be useful in addressing the input of join statements. However, this is far outside of the scope of this question. If you are curious nevertheless, check out the link below.
More info: pyspark.sql.DataFrame.withColumnRenamed — PySpark 3.1.1 documentation, pyspark.sql.DataFrame.withColumn — PySpark 3.1.1 documentation, and pyspark.sql.DataFrame.alias —

PySpark 3.1.2 documentation (https://bit.ly/3aSB5tm , https://bit.ly/2Tv4rbE , https://bit.ly/2RbhBd2)
Static notebook | Dynamic notebook: See test 1, Question: 31 (Databricks import instructions) (https://flrs.github.io/spark_practice_tests_code/#1/31.html ,
https://bit.ly/sparkpracticeexams_import_instructions)



Which of the following code blocks returns DataFrame transactionsDf sorted in descending order by column predError, showing missing values last?

  1. transactionsDf.sort(asc_nulls_last("predError"))
  2. transactionsDf.orderBy("predError").desc_nulls_last()
  3. transactionsDf.sort("predError", ascending=False)
  4. transactionsDf.desc_nulls_last("predError")
  5. transactionsDf.orderBy("predError").asc_nulls_last()

Answer(s): C

Explanation:

transactionsDf.sort("predError", ascending=False)
Correct! When using DataFrame.sort() and setting ascending=False, the DataFrame will be sorted by the specified column in descending order, putting all missing values last. An alternative, although not listed as an answer here, would be transactionsDf.sort(desc_nulls_last("predError")). transactionsDf.sort(asc_nulls_last("predError"))
Incorrect. While this is valid syntax, the DataFrame will be sorted on column predError in ascending order and not in descending order, putting missing values last. transactionsDf.desc_nulls_last("predError")
Wrong, this is invalid syntax. There is no method DataFrame.desc_nulls_last() in the Spark API. There is a Spark function desc_nulls_last() however (link see below). transactionsDf.orderBy("predError").desc_nulls_last()
No. While transactionsDf.orderBy("predError") is correct syntax (although it sorts the DataFrame by column predError in ascending order) and returns a DataFrame, there is no method DataFrame.desc_nulls_last() in the Spark API. There is a Spark function desc_nulls_last() however (link see below).
transactionsDf.orderBy("predError").asc_nulls_last()
Incorrect. There is no method DataFrame.asc_nulls_last() in the Spark API (see above).
More info: pyspark.sql.functions.desc_nulls_last — PySpark 3.1.2 documentation and pyspark.sql.DataFrame.sort — PySpark 3.1.2 documentation (https://bit.ly/3g1JtbI , https://bit.ly/2R90NCS)
Static notebook | Dynamic notebook: See test 1, Question: 32 (
Databricks import instructions) (https://flrs.github.io/spark_practice_tests_code/#1/32.html ,
https://bit.ly/sparkpracticeexams_import_instructions)



The code block displayed below contains an error. The code block is intended to perform an outer join of DataFrames transactionsDf and itemsDf on columns productId and itemId, respectively.

Find the error.
Code block:
transactionsDf.join(itemsDf, [itemsDf.itemId, transactionsDf.productId], "outer")

  1. The "outer" argument should be eliminated, since "outer" is the default join type.
  2. The join type needs to be appended to the join() operator, like join().outer() instead of listing it as the last argument inside the join() call.
  3. The term [itemsDf.itemId, transactionsDf.productId] should be replaced by itemsDf.itemId == transactionsDf.productId.
  4. The term [itemsDf.itemId, transactionsDf.productId] should be replaced by itemsDf.col("itemId")
    == transactionsDf.col("productId").
  5. The "outer" argument should be eliminated from the call and join should be replaced by joinOuter.

Answer(s): C

Explanation:

Correct code block:
transactionsDf.join(itemsDf, itemsDf.itemId == transactionsDf.productId, "outer") Static notebook | Dynamic notebook: See test 1, Question: 33 (
Databricks import instructions) (https://flrs.github.io/spark_practice_tests_code/#1/33.html ,
https://bit.ly/sparkpracticeexams_import_instructions)



Which of the following code blocks performs a join in which the small DataFrame transactionsDf is sent to all executors where it is joined with DataFrame itemsDf on columns storeId and itemId, respectively?

  1. itemsDf.join(transactionsDf, itemsDf.itemId == transactionsDf.storeId, "right_outer")
  2. itemsDf.join(transactionsDf, itemsDf.itemId == transactionsDf.storeId, "broadcast")
  3. itemsDf.merge(transactionsDf, "itemsDf.itemId == transactionsDf.storeId", "broadcast")
  4. itemsDf.join(broadcast(transactionsDf), itemsDf.itemId == transactionsDf.storeId)
  5. itemsDf.join(transactionsDf, broadcast(itemsDf.itemId == transactionsDf.storeId))

Answer(s): D

Explanation:

The issue with all answers that have "broadcast" as very last argument is that "broadcast" is not a valid join type. While the entry with "right_outer" is a valid statement, it is not a broadcast join. The item where broadcast() is wrapped around the equality condition is not valid code in Spark. broadcast() needs to be wrapped around the name of the small DataFrame that should be broadcast.

More info: Learning Spark, 2nd Edition, Chapter 7
Static notebook | Dynamic notebook: See test 1, Question: 34 ( Databricks import instructions)
tion and explanation?



Which of the following code blocks reduces a DataFrame from 12 to 6 partitions and performs a full shuffle?

  1. DataFrame.repartition(12)
  2. DataFrame.coalesce(6).shuffle()
  3. DataFrame.coalesce(6)
  4. DataFrame.coalesce(6, shuffle=True)
  5. DataFrame.repartition(6)

Answer(s): E

Explanation:

DataFrame.repartition(6)
Correct. repartition() always triggers a full shuffle (different from coalesce()). DataFrame.repartition(12)
No, this would just leave the DataFrame with 12 partitions and not 6. DataFrame.coalesce(6)
coalesce does not perform a full shuffle of the data. Whenever you see "full shuffle", you know that you are not dealing with coalesce(). While coalesce() can perform a partial shuffle when required, it will try to minimize shuffle operations, so the amount of data that is sent between executors.
Here, 12 partitions can easily be repartitioned to be 6 partitions simply by stitching every two partitions into one.
DataFrame.coalesce(6, shuffle=True) and DataFrame.coalesce(6).shuffle() These statements are not valid Spark API syntax.
More info: Spark Repartition & Coalesce - Explained and Repartition vs Coalesce in Apache Spark - Rock the JVM Blog



Viewing Page 7 of 37



Share your comments for Databricks Databricks Certified Associate Developer for Apache Spark 3.0 exam with other users:

Calbert Francis 1/15/2024 8:19:00 PM

great exam for people taking 220-1101
UNITED STATES


Ayushi Baria 11/7/2023 7:44:00 AM

this is very helpfull for me
Anonymous


alma 8/25/2023 1:20:00 PM

just started preparing for the exam
UNITED KINGDOM


CW 7/10/2023 6:46:00 PM

these are the type of questions i need.
UNITED STATES


Nobody 8/30/2023 9:54:00 PM

does this actually work? are they the exam questions and answers word for word?
Anonymous


Salah 7/23/2023 9:46:00 AM

thanks for providing these questions
Anonymous


Ritu 9/15/2023 5:55:00 AM

interesting
CANADA


Ron 5/30/2023 8:33:00 AM

these dumps are pretty good.
Anonymous


Sowl 8/10/2023 6:22:00 PM

good questions
UNITED STATES


Blessious Phiri 8/15/2023 2:02:00 PM

dbua is used for upgrading oracle database
Anonymous


Richard 10/24/2023 6:12:00 AM

i am thrilled to say that i passed my amazon web services mls-c01 exam, thanks to study materials. they were comprehensive and well-structured, making my preparation efficient.
Anonymous


Janjua 5/22/2023 3:31:00 PM

please upload latest ibm ace c1000-056 dumps
GERMANY


Matt 12/30/2023 11:18:00 AM

if only explanations were provided...
FRANCE


Rasha 6/29/2023 8:23:00 PM

yes .. i need the dump if you can help me
Anonymous


Anonymous 7/25/2023 8:05:00 AM

good morning, could you please upload this exam again?
SPAIN


AJ 9/24/2023 9:32:00 AM

hi please upload sre foundation and practitioner exam questions
Anonymous


peter parker 8/10/2023 10:59:00 AM

the exam is listed as 80 questions with a pass mark of 70%, how is your 50 questions related?
Anonymous


Berihun 7/13/2023 7:29:00 AM

all questions are so important and covers all ccna modules
Anonymous


nspk 1/19/2024 12:53:00 AM

q 44. ans:- b (goto setup > order settings > select enable optional price books for orders) reference link --> https://resources.docs.salesforce.com/latest/latest/en-us/sfdc/pdf/sfom_impl_b2b_b2b2c.pdf(decide whether you want to enable the optional price books feature. if so, select enable optional price books for orders. you can use orders in salesforce while managing price books in an external platform. if you’re using d2c commerce, you must select enable optional price books for orders.)
Anonymous


Muhammad Rawish Siddiqui 12/2/2023 5:28:00 AM

"cost of replacing data if it were lost" is also correct.
SAUDI ARABIA


Anonymous 7/14/2023 3:17:00 AM

pls upload the questions
UNITED STATES


Mukesh 7/10/2023 4:14:00 PM

good questions
UNITED KINGDOM


Elie Abou Chrouch 12/11/2023 3:38:00 AM

question 182 - correct answer is d. ethernet frame length is 64 - 1518b. length of user data containing is that frame: 46 - 1500b.
Anonymous


Damien 9/23/2023 8:37:00 AM

i need this exam pls
Anonymous


Nani 9/10/2023 12:02:00 PM

its required for me, please make it enable to access. thanks
UNITED STATES


ethiopia 8/2/2023 2:18:00 AM

seems good..
ETHIOPIA


whoAreWeReally 12/19/2023 8:29:00 PM

took the test last week, i did have about 15 - 20 word for word from this site on the test. (only was able to cram 600 of the questions from this site so maybe more were there i didnt review) had 4 labs, bgp, lacp, vrf with tunnels and actually had to skip a lab due to time. lots of automation syntax questions.
EUROPEAN UNION


vs 9/2/2023 12:19:00 PM

no comments
Anonymous


john adenu 11/14/2023 11:02:00 AM

nice questions bring out the best in you.
Anonymous


Osman 11/21/2023 2:27:00 PM

really helpful
Anonymous


Edward 9/13/2023 5:27:00 PM

question #50 and question #81 are exactly the same questions, azure site recovery provides________for virtual machines. the first says that it is fault tolerance is the answer and second says disater recovery. from my research, it says it should be disaster recovery. can anybody explain to me why? thank you
CANADA


Monti 5/24/2023 11:14:00 PM

iam thankful for these exam dumps questions, i would not have passed without this exam dumps.
UNITED STATES


Anon 10/25/2023 10:48:00 PM

some of the answers seem to be inaccurate. q10 for example shouldnt it be an m custom column?
MALAYSIA


PeterPan 10/18/2023 10:22:00 AM

are the question real or fake?
Anonymous