Databricks Certified Associate Developer for Apache Spark Certified Associate Developer for Apache Spark Exam Questions in PDF

Free Databricks Certified Associate Developer for Apache Spark Dumps Questions (page: 6)

Which of the following code blocks returns a DataFrame that has all columns of DataFrame transactionsDf and an additional column predErrorSquared which is the squared value of column predError in DataFrame transactionsDf?

  1. transactionsDf.withColumn("predError", pow(col("predErrorSquared"), 2))
  2. transactionsDf.withColumnRenamed("predErrorSquared", pow(predError, 2))
  3. transactionsDf.withColumn("predErrorSquared", pow(col("predError"), lit(2)))
  4. transactionsDf.withColumn("predErrorSquared", pow(predError, lit(2)))
  5. transactionsDf.withColumn("predErrorSquared", "predError"**2)

Answer(s): C

Explanation:

While only one of these code blocks works, the DataFrame API is pretty flexible when it comes to accepting columns into the pow() method. The following code blocks would also work: transactionsDf.withColumn("predErrorSquared", pow("predError", 2)) transactionsDf.withColumn("predErrorSquared", pow("predError", lit(2)))
Static notebook | Dynamic notebook: See test 1, Question: 26 (
Databricks import instructions) (https://flrs.github.io/spark_practice_tests_code/#1/26.html ,
https://bit.ly/sparkpracticeexams_import_instructions)



The code block displayed below contains an error. The code block should return a new DataFrame
that only contains rows from DataFrame transactionsDf in which the value in column predError is at least 5. Find the error. Code block:
transactionsDf.where("col(predError) >= 5")

  1. The argument to the where method should be "predError >= 5".
  2. Instead of where(), filter() should be used.
  3. The expression returns the original DataFrame transactionsDf and not a new DataFrame. To avoid this, the code block should be transactionsDf.toNewDataFrame().where("col(predError) >= 5").
  4. The argument to the where method cannot be a string.
  5. Instead of >=, the SQL operator GEQ should be used.

Answer(s): A

Explanation:

The argument to the where method cannot be a string. It can be a string, no problem here.
Instead of where(), filter() should be used.
No, that does not matter. In PySpark, where() and filter() are equivalent. Instead of >=, the SQL operator GEQ should be used.
Incorrect.
The expression returns the original DataFrame transactionsDf and not a new DataFrame. To avoid this, the code block should be transactionsDf.toNewDataFrame().where("col(predError) >= 5").
No, Spark returns a new DataFrame.
Static notebook | Dynamic notebook: See test 1, Question: 27 (
Databricks import instructions) (https://flrs.github.io/spark_practice_tests_code/#1/27.html ,
https://bit.ly/sparkpracticeexams_import_instructions)



Which of the following code blocks saves DataFrame transactionsDf in location
/FileStore/transactions.csv as a CSV file and throws an error if a file already exists in the location?

  1. transactionsDf.write.save("/FileStore/transactions.csv")
  2. transactionsDf.write.format("csv").mode("error").path("/FileStore/transactions.csv")
  3. transactionsDf.write.format("csv").mode("ignore").path("/FileStore/transactions.csv")
  4. transactionsDf.write("csv").mode("error").save("/FileStore/transactions.csv")
  5. transactionsDf.write.format("csv").mode("error").save("/FileStore/transactions.csv")

Answer(s): E

Explanation:

Static notebook | Dynamic notebook: See test 1, question 28 (
Databricks import instructions) (https://flrs.github.io/spark_practice_tests_code/#1/28.html ,
https://bit.ly/sparkpracticeexams_import_instructions)



The code block shown below should return a DataFrame with two columns, itemId and col. In this DataFrame, for each element in column attributes of DataFrame itemDf there should be a separate row in which the column itemId contains the associated itemId from DataFrame itemsDf. The new DataFrame should only contain rows for rows in DataFrame itemsDf in which the column attributes contains the element cozy.

A sample of DataFrame itemsDf is below.

Code block:
itemsDf. 1 ( 2 ). 3 ( 4 , 5 ( 6 ))

  1. 1. filter
    2. array_contains("cozy")
    3. select
    4. "itemId"
    5. explode
    6. "attributes"
  2. 1. where
    2. "array_contains(attributes, 'cozy')"
    3. select
    4. itemId
    5. explode
    6. attributes
  3. 1. filter
    2. "array_contains(attributes, 'cozy')"
    3. select
    4. "itemId"
    5. map
    6. "attributes"
  4. 1. filter
    2. "array_contains(attributes, cozy)"
    3. select
    4. "itemId"
    5. explode
    6. "attributes"
  5. 1. filter
    2. "array_contains(attributes, 'cozy')"
    3. select
    4. "itemId"
    5. explode
    6. "attributes"

Answer(s): E

Explanation:

The correct code block is:
itemsDf.filter("array_contains(attributes, 'cozy')").select("itemId", explode("attributes"))
The key here is understanding how to use array_contains(). You can either use it as an expression in a string, or you can import it from pyspark.sql.functions. In that case, the following would also work:
itemsDf.filter(array_contains("attributes", "cozy")).select("itemId", explode("attributes")) Static notebook | Dynamic notebook: See test 1, Question: 29 (
Databricks import instructions) (https://flrs.github.io/spark_practice_tests_code/#1/29.html ,
https://bit.ly/sparkpracticeexams_import_instructions)



The code block displayed below contains an error. The code block should return the average of rows in column value grouped by unique storeId. Find the error.
Code block: transactionsDf.agg("storeId").avg("value")

  1. Instead of avg("value"), avg(col("value")) should be used.
  2. The avg("value") should be specified as a second argument to agg() instead of being appended to it.
  3. All column names should be wrapped in col() operators.
  4. agg should be replaced by groupBy.
  5. "storeId" and "value" should be swapped.

Answer(s): D

Explanation:

Static notebook | Dynamic notebook: See test 1, Question: 30 (
Databricks import instructions) (https://flrs.github.io/spark_practice_tests_code/#1/30.html ,
https://bit.ly/sparkpracticeexams_import_instructions)



Share your comments for Databricks Certified Associate Developer for Apache Spark exam with other users:

T
Tink
7/24/2023 9:23:00 AM

great for prep

J
Jaro
12/18/2023 3:12:00 PM

i think in question 7 the first answer should be power bi portal (not power bi)

9
9eagles
4/7/2023 10:04:00 AM

on question 10 and so far 2 wrong answers as evident in the included reference link.

T
Tai
8/28/2023 5:28:00 AM

wonderful material

V
VoiceofMidnight
12/29/2023 4:48:00 PM

i passed!! ...but barely! got 728, but needed 720 to pass. the exam hit me with labs right out of the gate! then it went to multiple choice. protip: study the labs!

A
A K
8/3/2023 11:56:00 AM

correct answer for question 92 is c -aws shield

N
Nitin Mindhe
11/27/2023 6:12:00 AM

great !! it is really good

B
BailleyOne
11/22/2023 1:45:00 AM

explanations for the answers are to the point.

P
patel
10/25/2023 8:17:00 AM

how can rea next

M
MortonG
10/19/2023 6:32:00 PM

question: 128 d is the wrong answer...should be c

J
Jayant
11/2/2023 3:15:00 AM

thanks for az 700 dumps

B
Bipul Mishra
12/14/2023 7:12:00 AM

thank you for this tableau dumps . it will helpfull for tableau certification

H
hello
10/31/2023 12:07:00 PM

good content

M
Matheus
9/3/2023 2:14:00 PM

just testing if the comments are real

Y
yenvti2@gmail.com
8/12/2023 7:56:00 PM

very helpful for exam preparation

M
Miguel
10/5/2023 12:16:00 PM

question 11: https://help.salesforce.com/s/articleview?id=sf.admin_lead_to_patient_setup_overview.htm&type=5

N
Noushin
11/28/2023 4:52:00 PM

i think the answer to question 42 is b not c

S
susan sandivore
8/28/2023 1:00:00 AM

thanks for the dump

A
Aderonke
10/31/2023 12:51:00 AM

fantastic assessments

P
Priscila
7/22/2022 9:59:00 AM

i find the xengine test engine simulator to be more fun than reading from pdf.

S
suresh
12/16/2023 10:54:00 PM

nice document

W
Wali
6/4/2023 10:07:00 PM

thank you for making the questions and answers intractive and selectable.

N
Nawaz
7/18/2023 1:10:00 AM

answers are correct?

D
das
6/23/2023 7:57:00 AM

can i belive this dump

S
Sanjay
10/15/2023 1:34:00 PM

great site to practice for sitecore exam

J
jaya
12/17/2023 8:36:00 AM

good for students

B
Bsmaind
8/20/2023 9:23:00 AM

nice practice dumps

K
kumar
11/15/2023 11:24:00 AM

nokia 4a0-114 dumps

V
Vetri
10/3/2023 12:59:00 AM

great content and wonderful to have the answers with explanation

R
Ranjith
8/21/2023 3:39:00 PM

for question #118, the answer is option c. the screen shot is showing the drop down, but the answer is marked incorrectly please update . thanks for sharing such nice questions.

E
Eduardo Ramírez
12/11/2023 9:55:00 PM

the correct answer for the question 29 is d.

D
Dass
11/2/2023 7:43:00 AM

question no 22: correct answers: bc, 1 per session 1 per page 1 per component always

R
Reddy
12/14/2023 2:42:00 AM

these are pretty useful

D
Daisy Delgado
1/9/2023 1:05:00 PM

awesome

AI Tutor 👋 I’m here to help!