Cloudera CCA175 Exam (page: 2)
Cloudera CCA Spark and Hadoop Developer Exam
Updated on: 31-Mar-2026

Viewing Page 2 of 21

Problem Scenario 74 : You have been given MySQL DB with following details.
user=retail_dba
password=cloudera
database=retail_db
table=retail_db.orders
table=retail_db.order_items
jdbc URL = jdbc:mysql://quickstart:3306/retail_db
Columns of order table : (orderjd , order_date , ordercustomerid, order status}
Columns of orderjtems table : (order_item_td , order_item_order_id , order_item_product_id,
order_item_quantity, order_item_subtotal, order_item_product_price)
Please accomplish following activities.

1. Copy "retaildb.orders" and "retaildb.orderjtems" table to hdfs in respective directory
p89_orders and p89_order_items .
2. Join these data using orderjd in Spark and Python
3. Now fetch selected columns from joined data Orderld, Order date and amount collected
on this order.
4. Calculate total order placed for each date, and produced the output sorted by date.

  1. See the explanation for Step by Step Solution and configuration.

Answer(s): A

Explanation:

Solution:Step 1: Import Single table .
sqoop import --connect jdbc:mysql://quickstart:3306/retail_db -username=retail_dba - password=cloudera -table=orders --target-dir=p89_orders - -m1 sqoop import --connect jdbc:mysql://quickstart:3306/retail_db -username=retail_dba - password=cloudera -table=order_items ~target-dir=p89_ order items -m 1 Note : Please check you dont have space between before or after '=' sign. Sqoop uses the MapReduce framework to copy data from RDBMS to hdfs
Step 2: Read the data from one of the partition, created using above command, hadoopfs -cat p89_orders/part-m-00000 hadoop fs -cat p89_order_items/part-m-00000

Step 3: Load these above two directory as RDD using Spark and Python (Open pyspark terminal and do following). orders = sc.textFile("p89_orders") orderitems = sc.textFile("p89_order_items")
Step 4: Convert RDD into key value as (orderjd as a key and rest of the values as a value)
#First value is orderjd
ordersKeyValue = orders.map(lambda line: (int(line.split(", ")[0]), line))
#Second value as an Orderjd
orderltemsKeyValue = orderltems.map(lambda line: (int(line.split(", ")[1]), line))
Step 5: Join both the RDD using orderjd
joinedData = orderltemsKeyValue.join(ordersKeyValue)
#print the joined data
tor line in joinedData.collect():
print(line)
Format of joinedData as below.
[Orderld, 'All columns from orderltemsKeyValue', 'All columns from orders Key Value']
Step 6: Now fetch selected values Orderld, Order date and amount collected on this order. revenuePerOrderPerDay = joinedData.map(lambda row: (row[0]( row[1][1].split(", ")[1]( f!oat(row[1][0].split('\M}[4]}}}
#printthe result
for line in revenuePerOrderPerDay.collect():
print(line)
Step 7: Select distinct order ids for each date.
#distinct(date, order_id)
distinctOrdersDate = joinedData.map(lambda row: row[1][1].split('\")[1] + ", " + str(row[0])).distinct()
for line in distinctOrdersDate.collect(): print(line)
Step 8: Similar to word count, generate (date, 1) record for each row. newLineTuple = distinctOrdersDate.map(lambda line: (line.split(", ")[0], 1))
Step 9: Do the count for each key(date), to get total order per date. totalOrdersPerDate = newLineTuple.reduceByKey(lambda a, b: a + b}
#print results
for line in totalOrdersPerDate.collect():
print(line)

Step 10: Sort the results by date sortedData=totalOrdersPerDate.sortByKey().collect()
#print results
for line in sortedData:
print(line)



Problem Scenario 34 : You have given a file named spark6/user.csv.
Data is given below:
user.csv
id, topic, hits
Rahul, scala, 120
Nikita, spark, 80
Mithun, spark, 1
myself, cca175, 180
Now write a Spark code in scala which will remove the header part and create RDD of values as below, for all rows. And also if id is myself" than filter out row.
Map(id -> om, topic -> scala, hits -> 120)

  1. See the explanation for Step by Step Solution and configuration.

Answer(s): A

Explanation:

Solution :
Step 1: Create file in hdfs (We will do using Hue). However, you can first create in local filesystem and then upload it to hdfs.
Step 2: Load user.csv file from hdfs and create PairRDDs val csv = sc.textFile("spark6/user.csv")
Step 3: split and clean data
val headerAndRows = csv.map(line => line.split(", ").map(_.trim))
Step 4: Get header row
val header = headerAndRows.first
Step 5: Filter out header (We need to check if the first val matches the first header name) val data = headerAndRows.filter(_(0) != header(O))
Step 6: Splits to map (header/value pairs)
val maps = data.map(splits => header.zip(splits).toMap)

Step 7: Filter out the user "myself
val result = maps.filter(map => mapf'id") != "myself")
Step 8: Save the output as a Text file. result.saveAsTextFile("spark6/result.txt")



Problem Scenario 39 : You have been given two files
spark16/file1.txt
1, 9, 5
2, 7, 4
3, 8, 3
spark16/file2.txt
1, g, h
2, i, j
3, k, l
Load these two tiles as Spark RDD and join them to produce the below results
(l, ((9, 5), (g, h)))
(2, ((7, 4), (i, j))) (3, ((8, 3), (k, l)))
And write code snippet which will sum the second columns of above joined results (5+4+3).

  1. See the explanation for Step by Step Solution and configuration.

Answer(s): A

Explanation:

Solution :
Step 1: Create tiles in hdfs using Hue.
Step 2: Create pairRDD for both the files.
val one = sc.textFile("spark16/file1.txt").map{
_.split(", ", -1) match {
case Array(a, b, c) => (a, ( b, c))
} }
val two = sc.textFHe(Mspark16/file2.txt").map{
_.split('7\-1) match {
case Array(a, b, c) => (a, (b, c))
} }
Step 3: Join both the RDD. val joined = one.join(two)
Step 4: Sum second column values.
val sum = joined.map {
case (_, ((_, num2), (_, _))) => num2.tolnt
}.reduce(_ + _)



Problem Scenario 58 : You have been given below code snippet.
val a = sc.parallelize(List("dog", "tiger", "lion", "cat", "spider", "eagle"), 2) val b = a.keyBy(_.length)
operation1
Write a correct code snippet for operationl which will produce desired output, shown below.
Array[(lnt, Seq[String])] = Array((4, ArrayBuffer(lion)), (6, ArrayBuffer(spider)), (3, ArrayBuffer(dog, cat)), (5, ArrayBuffer(tiger, eagle}}}

  1. See the explanation for Step by Step Solution and configuration.

Answer(s): A

Explanation:

Solution :
b.groupByKey.collect
groupByKey [Pair]
Very similar to groupBy, but instead of supplying a function, the key-component of each pair will automatically be presented to the partitioner.
Listing Variants
def groupByKeyQ: RDD[(K, lterable[V]}]
def groupByKey(numPartittons: Int): RDD[(K, lterable[V] )] def groupByKey(partitioner: Partitioner): RDD[(K, lterable[V])]



Problem Scenario 63 : You have been given below code snippet.
val a = sc.parallelize(List("dog", "tiger", "lion", "cat", "panther", "eagle"), 2)
val b = a.map(x => (x.length, x))
operation1

Write a correct code snippet for operationl which will produce desired output, shown below. Array[(lnt, String}] = Array((4, lion), (3, dogcat), (7, panther), (5, tigereagle))

  1. See the explanation for Step by Step Solution and configuration.

Answer(s): A

Explanation:

Solution :
b.reduceByKey(_ + _).collect
reduceByKey JPair] : This function provides the well-known reduce functionality in Spark. Please note that any function f you provide, should be commutative in order to generate reproducible results.



Viewing Page 2 of 21



Share your comments for Cloudera CCA175 exam with other users:

SB 7/21/2023 3:18:00 AM

please provide the dumps
UNITED STATES


David 8/2/2023 8:20:00 AM

it is amazing
Anonymous


User 8/3/2023 3:32:00 AM

quesion 178 about "a banking system that predicts whether a loan will be repaid is an example of the" the answer is classification. not regresion, you should fix it.
EUROPEAN UNION


quen 7/26/2023 10:39:00 AM

please upload apache spark dumps
Anonymous


Erineo 11/2/2023 5:34:00 PM

q14 is b&c to reduce you will switch off mail for every single alert and you will switch on daily digest to get a mail once per day, you might even skip the empty digest mail but i see this as a part of the daily digest adjustment
Anonymous


Paul 10/21/2023 8:25:00 AM

i think it is good question
Anonymous


Unknown 8/15/2023 5:09:00 AM

good for students who wish to give certification.
INDIA


Ch 11/20/2023 10:56:00 PM

is there a google drive link to the images? the links in questions are not working.
AUSTRALIA


Joey 5/16/2023 5:25:00 AM

very promising, looks great, so much wow!
Anonymous


alaska 10/24/2023 5:48:00 AM

i scored 87% on the az-204 exam. thanks! i always trust
GERMANY


nnn 7/9/2023 11:09:00 PM

good need more
Anonymous


User-sfdc 12/29/2023 7:21:00 AM

sample questions seems good
Anonymous


Tamer dam 8/4/2023 10:21:00 AM

huawei is ok
UNITED STATES


YK 12/11/2023 1:10:00 AM

good one nice
JAPAN


de 8/28/2023 2:38:00 AM

please continue
GERMANY


DMZ 6/25/2023 11:56:00 PM

this exam dumps just did the job. i donot want to ruffle your feathers but your exam dumps and mock test engine is amazing.
UNITED KINGDOM


Jose 8/30/2023 6:14:00 AM

nice questions
PORTUGAL


Tar01 7/24/2023 7:07:00 PM

the explanation are really helpful
Anonymous


DaveG 12/15/2023 4:50:00 PM

just passed my exam yesterday on my first attempt. these dumps were extremely helpful in passing first time. the questions were very, very similar to these questions!
Anonymous


A.K. 6/30/2023 6:34:00 AM

cosmos db is paas not saas
Anonymous


S Roychowdhury 6/26/2023 5:27:00 PM

what is the percentage of common questions in gcp exam compared to 197 dump questions? are they 100% matching with real gcp exam?
Anonymous


Bella 7/22/2023 2:05:00 AM

not able to see questions
Anonymous


Scott 9/8/2023 7:19:00 AM

by far one of the best sites for free questions. i have pass 2 exams with the help of this website.
CANADA


donald 8/19/2023 11:05:00 AM

excellent question bank.
Anonymous


Ashwini 8/22/2023 5:13:00 AM

it really helped
Anonymous


sk 5/13/2023 2:07:00 AM

excelent material
INDIA


Christopher 9/5/2022 10:54:00 PM

the new versoin of this exam which i downloaded has all the latest questions from the exam. i only saw 3 new questions in the exam which was not in this dump.
CANADA


Sam 9/7/2023 6:51:00 AM

question 8 - can cloudtrail be used for storing jobs? based on aws - aws cloudtrail is used for governance, compliance and investigating api usage across all of our aws accounts. every action that is taken by a user or script is an api call so this is logged to [aws] cloudtrail. something seems incorrect here.
UNITED STATES


Tanvi Rajput 8/14/2023 10:55:00 AM

question 13 tda - c01 answer : quick table calculation -> percentage of total , compute using table down
UNITED KINGDOM


PMSAGAR 9/19/2023 2:48:00 AM

pls share teh dump
UNITED STATES


zazza 6/16/2023 10:47:00 AM

question 44 answer is user risk
ITALY


Prasana 6/23/2023 1:59:00 AM

please post the questions for preparation
Anonymous


test user 9/24/2023 3:15:00 AM

thanks for the questions
AUSTRALIA


Draco 7/19/2023 5:34:00 AM

please reopen it now ..its really urgent
UNITED STATES