Cloudera CCA175 Exam (page: 2)
Cloudera CCA Spark and Hadoop Developer Exam
Updated on: 25-Dec-2025

Viewing Page 2 of 21

Problem Scenario 74 : You have been given MySQL DB with following details.
user=retail_dba
password=cloudera
database=retail_db
table=retail_db.orders
table=retail_db.order_items
jdbc URL = jdbc:mysql://quickstart:3306/retail_db
Columns of order table : (orderjd , order_date , ordercustomerid, order status}
Columns of orderjtems table : (order_item_td , order_item_order_id , order_item_product_id,
order_item_quantity, order_item_subtotal, order_item_product_price)
Please accomplish following activities.

1. Copy "retaildb.orders" and "retaildb.orderjtems" table to hdfs in respective directory
p89_orders and p89_order_items .
2. Join these data using orderjd in Spark and Python
3. Now fetch selected columns from joined data Orderld, Order date and amount collected
on this order.
4. Calculate total order placed for each date, and produced the output sorted by date.

  1. See the explanation for Step by Step Solution and configuration.

Answer(s): A

Explanation:

Solution:Step 1: Import Single table .
sqoop import --connect jdbc:mysql://quickstart:3306/retail_db -username=retail_dba - password=cloudera -table=orders --target-dir=p89_orders - -m1 sqoop import --connect jdbc:mysql://quickstart:3306/retail_db -username=retail_dba - password=cloudera -table=order_items ~target-dir=p89_ order items -m 1 Note : Please check you dont have space between before or after '=' sign. Sqoop uses the MapReduce framework to copy data from RDBMS to hdfs
Step 2: Read the data from one of the partition, created using above command, hadoopfs -cat p89_orders/part-m-00000 hadoop fs -cat p89_order_items/part-m-00000

Step 3: Load these above two directory as RDD using Spark and Python (Open pyspark terminal and do following). orders = sc.textFile("p89_orders") orderitems = sc.textFile("p89_order_items")
Step 4: Convert RDD into key value as (orderjd as a key and rest of the values as a value)
#First value is orderjd
ordersKeyValue = orders.map(lambda line: (int(line.split(", ")[0]), line))
#Second value as an Orderjd
orderltemsKeyValue = orderltems.map(lambda line: (int(line.split(", ")[1]), line))
Step 5: Join both the RDD using orderjd
joinedData = orderltemsKeyValue.join(ordersKeyValue)
#print the joined data
tor line in joinedData.collect():
print(line)
Format of joinedData as below.
[Orderld, 'All columns from orderltemsKeyValue', 'All columns from orders Key Value']
Step 6: Now fetch selected values Orderld, Order date and amount collected on this order. revenuePerOrderPerDay = joinedData.map(lambda row: (row[0]( row[1][1].split(", ")[1]( f!oat(row[1][0].split('\M}[4]}}}
#printthe result
for line in revenuePerOrderPerDay.collect():
print(line)
Step 7: Select distinct order ids for each date.
#distinct(date, order_id)
distinctOrdersDate = joinedData.map(lambda row: row[1][1].split('\")[1] + ", " + str(row[0])).distinct()
for line in distinctOrdersDate.collect(): print(line)
Step 8: Similar to word count, generate (date, 1) record for each row. newLineTuple = distinctOrdersDate.map(lambda line: (line.split(", ")[0], 1))
Step 9: Do the count for each key(date), to get total order per date. totalOrdersPerDate = newLineTuple.reduceByKey(lambda a, b: a + b}
#print results
for line in totalOrdersPerDate.collect():
print(line)

Step 10: Sort the results by date sortedData=totalOrdersPerDate.sortByKey().collect()
#print results
for line in sortedData:
print(line)



Problem Scenario 34 : You have given a file named spark6/user.csv.
Data is given below:
user.csv
id, topic, hits
Rahul, scala, 120
Nikita, spark, 80
Mithun, spark, 1
myself, cca175, 180
Now write a Spark code in scala which will remove the header part and create RDD of values as below, for all rows. And also if id is myself" than filter out row.
Map(id -> om, topic -> scala, hits -> 120)

  1. See the explanation for Step by Step Solution and configuration.

Answer(s): A

Explanation:

Solution :
Step 1: Create file in hdfs (We will do using Hue). However, you can first create in local filesystem and then upload it to hdfs.
Step 2: Load user.csv file from hdfs and create PairRDDs val csv = sc.textFile("spark6/user.csv")
Step 3: split and clean data
val headerAndRows = csv.map(line => line.split(", ").map(_.trim))
Step 4: Get header row
val header = headerAndRows.first
Step 5: Filter out header (We need to check if the first val matches the first header name) val data = headerAndRows.filter(_(0) != header(O))
Step 6: Splits to map (header/value pairs)
val maps = data.map(splits => header.zip(splits).toMap)

Step 7: Filter out the user "myself
val result = maps.filter(map => mapf'id") != "myself")
Step 8: Save the output as a Text file. result.saveAsTextFile("spark6/result.txt")



Problem Scenario 39 : You have been given two files
spark16/file1.txt
1, 9, 5
2, 7, 4
3, 8, 3
spark16/file2.txt
1, g, h
2, i, j
3, k, l
Load these two tiles as Spark RDD and join them to produce the below results
(l, ((9, 5), (g, h)))
(2, ((7, 4), (i, j))) (3, ((8, 3), (k, l)))
And write code snippet which will sum the second columns of above joined results (5+4+3).

  1. See the explanation for Step by Step Solution and configuration.

Answer(s): A

Explanation:

Solution :
Step 1: Create tiles in hdfs using Hue.
Step 2: Create pairRDD for both the files.
val one = sc.textFile("spark16/file1.txt").map{
_.split(", ", -1) match {
case Array(a, b, c) => (a, ( b, c))
} }
val two = sc.textFHe(Mspark16/file2.txt").map{
_.split('7\-1) match {
case Array(a, b, c) => (a, (b, c))
} }
Step 3: Join both the RDD. val joined = one.join(two)
Step 4: Sum second column values.
val sum = joined.map {
case (_, ((_, num2), (_, _))) => num2.tolnt
}.reduce(_ + _)



Problem Scenario 58 : You have been given below code snippet.
val a = sc.parallelize(List("dog", "tiger", "lion", "cat", "spider", "eagle"), 2) val b = a.keyBy(_.length)
operation1
Write a correct code snippet for operationl which will produce desired output, shown below.
Array[(lnt, Seq[String])] = Array((4, ArrayBuffer(lion)), (6, ArrayBuffer(spider)), (3, ArrayBuffer(dog, cat)), (5, ArrayBuffer(tiger, eagle}}}

  1. See the explanation for Step by Step Solution and configuration.

Answer(s): A

Explanation:

Solution :
b.groupByKey.collect
groupByKey [Pair]
Very similar to groupBy, but instead of supplying a function, the key-component of each pair will automatically be presented to the partitioner.
Listing Variants
def groupByKeyQ: RDD[(K, lterable[V]}]
def groupByKey(numPartittons: Int): RDD[(K, lterable[V] )] def groupByKey(partitioner: Partitioner): RDD[(K, lterable[V])]



Problem Scenario 63 : You have been given below code snippet.
val a = sc.parallelize(List("dog", "tiger", "lion", "cat", "panther", "eagle"), 2)
val b = a.map(x => (x.length, x))
operation1

Write a correct code snippet for operationl which will produce desired output, shown below. Array[(lnt, String}] = Array((4, lion), (3, dogcat), (7, panther), (5, tigereagle))

  1. See the explanation for Step by Step Solution and configuration.

Answer(s): A

Explanation:

Solution :
b.reduceByKey(_ + _).collect
reduceByKey JPair] : This function provides the well-known reduce functionality in Spark. Please note that any function f you provide, should be commutative in order to generate reproducible results.



Viewing Page 2 of 21



Share your comments for Cloudera CCA175 exam with other users:

Ahmed Hashi 7/6/2023 5:40:00 PM

thanks ly so i have information cia
EUROPEAN UNION


mansi 5/31/2023 7:58:00 AM

hello team, i need sap qm dumps for practice
INDIA


Jamil aljamil 12/4/2023 4:47:00 AM

it’s good but not senatios based
UNITED KINGDOM


Cath 10/10/2023 10:19:00 AM

q.119 - the correct answer is b - they are not captured in an update set as theyre data.
VIET NAM


P 1/6/2024 11:22:00 AM

good matter
Anonymous


surya 7/30/2023 2:02:00 PM

please upload c_sacp_2308
CANADA


Sasuke 7/11/2023 10:30:00 PM

please upload the dump. thanks very much !!
Anonymous


V 7/4/2023 8:57:00 AM

good questions
UNITED STATES


TTB 8/22/2023 5:30:00 AM

hi, could you please update the latest dump version
Anonymous


T 7/28/2023 9:06:00 PM

this question is keep repeat : you are developing a sales application that will contain several azure cloud services and handle different components of a transaction. different cloud services will process customer orders, billing, payment, inventory, and shipping. you need to recommend a solution to enable the cloud services to asynchronously communicate transaction information by using xml messages. what should you include in the recommendation?
NEW ZEALAND


Gurgaon 9/28/2023 4:35:00 AM

great questions
UNITED STATES


wasif 10/11/2023 2:22:00 AM

its realy good
UNITED ARAB EMIRATES


Shubhra Rathi 8/26/2023 1:12:00 PM

oracle 1z0-1059-22 dumps
Anonymous


Leo 7/29/2023 8:48:00 AM

please share me the pdf..
INDIA


AbedRabbou Alaqabna 12/18/2023 3:10:00 AM

q50: which two functions can be used by an end user when pivoting an interactive report? the correct answer is a, c because we do not have rank in the function pivoting you can check in the apex app
GREECE


Rohan Limaye 12/30/2023 8:52:00 AM

best to practice
Anonymous


Aparajeeta 10/13/2023 2:42:00 PM

so far it is good
Anonymous


Vgf 7/20/2023 3:59:00 PM

please provide me the dump
Anonymous


Deno 10/25/2023 1:14:00 AM

i failed the cisa exam today. but i have found all the questions that were on the exam to be on this site.
Anonymous


CiscoStudent 11/15/2023 5:29:00 AM

in question 272 the right answer states that an autonomous acces point is "configured and managed by the wlc" but this is not what i have learned in my ccna course. is this a mistake? i understand that lightweight aps are managed by wlc while autonomous work as standalones on the wlan.
Anonymous


pankaj 9/28/2023 4:36:00 AM

it was helpful
Anonymous


User123 10/8/2023 9:59:00 AM

good question
UNITED STATES


vinay 9/4/2023 10:23:00 AM

really nice
Anonymous


Usman 8/28/2023 10:07:00 AM

please i need dumps for isc2 cybersecuity
Anonymous


Q44 7/30/2023 11:50:00 AM

ans is coldline i think
UNITED STATES


Anuj 12/21/2023 1:30:00 PM

very helpful
Anonymous


Giri 9/13/2023 10:31:00 PM

can you please provide dumps so that it helps me more
UNITED STATES


Aaron 2/8/2023 12:10:00 AM

thank you for providing me with the updated question and answers. this version has all the questions from the exam. i just saw them in my exam this morning. i passed my exam today.
SOUTH AFRICA


Sarwar 12/21/2023 4:54:00 PM

how i can see exam questions?
CANADA


Chengchaone 9/11/2023 10:22:00 AM

can you please upload please?
Anonymous


Mouli 9/2/2023 7:02:00 AM

question 75: option c is correct answer
Anonymous


JugHead 9/27/2023 2:40:00 PM

please add this exam
Anonymous


sushant 6/28/2023 4:38:00 AM

please upoad
EUROPEAN UNION


John 8/7/2023 12:09:00 AM

has anyone recently attended safe 6.0 certification? is it the samq question from here.
Anonymous