Problem Scenario 74 : You have been given MySQL DB with following details.user=retail_dbapassword=clouderadatabase=retail_dbtable=retail_db.orderstable=retail_db.order_itemsjdbc URL = jdbc:mysql://quickstart:3306/retail_dbColumns of order table : (orderjd , order_date , ordercustomerid, order status}Columns of orderjtems table : (order_item_td , order_item_order_id , order_item_product_id, order_item_quantity, order_item_subtotal, order_item_product_price)Please accomplish following activities.1. Copy "retaildb.orders" and "retaildb.orderjtems" table to hdfs in respective directoryp89_orders and p89_order_items .2. Join these data using orderjd in Spark and Python3. Now fetch selected columns from joined data Orderld, Order date and amount collectedon this order.4. Calculate total order placed for each date, and produced the output sorted by date.
Answer(s): A
Solution:Step 1: Import Single table .sqoop import --connect jdbc:mysql://quickstart:3306/retail_db -username=retail_dba - password=cloudera -table=orders --target-dir=p89_orders - -m1 sqoop import --connect jdbc:mysql://quickstart:3306/retail_db -username=retail_dba - password=cloudera -table=order_items ~target-dir=p89_ order items -m 1 Note : Please check you dont have space between before or after '=' sign. Sqoop uses the MapReduce framework to copy data from RDBMS to hdfsStep 2: Read the data from one of the partition, created using above command, hadoopfs -cat p89_orders/part-m-00000 hadoop fs -cat p89_order_items/part-m-00000 Step 3: Load these above two directory as RDD using Spark and Python (Open pyspark terminal and do following). orders = sc.textFile("p89_orders") orderitems = sc.textFile("p89_order_items")Step 4: Convert RDD into key value as (orderjd as a key and rest of the values as a value)#First value is orderjdordersKeyValue = orders.map(lambda line: (int(line.split(", ")[0]), line))#Second value as an OrderjdorderltemsKeyValue = orderltems.map(lambda line: (int(line.split(", ")[1]), line))Step 5: Join both the RDD using orderjdjoinedData = orderltemsKeyValue.join(ordersKeyValue)#print the joined datator line in joinedData.collect():print(line)Format of joinedData as below.[Orderld, 'All columns from orderltemsKeyValue', 'All columns from orders Key Value']Step 6: Now fetch selected values Orderld, Order date and amount collected on this order. revenuePerOrderPerDay = joinedData.map(lambda row: (row[0]( row[1][1].split(", ")[1]( f!oat(row[1][0].split('\M}[4]}}}#printthe resultfor line in revenuePerOrderPerDay.collect():print(line)Step 7: Select distinct order ids for each date.#distinct(date, order_id)distinctOrdersDate = joinedData.map(lambda row: row[1][1].split('\")[1] + ", " + str(row[0])).distinct()for line in distinctOrdersDate.collect(): print(line)Step 8: Similar to word count, generate (date, 1) record for each row. newLineTuple = distinctOrdersDate.map(lambda line: (line.split(", ")[0], 1))Step 9: Do the count for each key(date), to get total order per date. totalOrdersPerDate = newLineTuple.reduceByKey(lambda a, b: a + b}#print resultsfor line in totalOrdersPerDate.collect():print(line)Step 10: Sort the results by date sortedData=totalOrdersPerDate.sortByKey().collect()#print resultsfor line in sortedData:print(line)
Problem Scenario 34 : You have given a file named spark6/user.csv.Data is given below:user.csvid, topic, hitsRahul, scala, 120Nikita, spark, 80Mithun, spark, 1myself, cca175, 180Now write a Spark code in scala which will remove the header part and create RDD of values as below, for all rows. And also if id is myself" than filter out row.Map(id -> om, topic -> scala, hits -> 120)
Solution :Step 1: Create file in hdfs (We will do using Hue). However, you can first create in local filesystem and then upload it to hdfs.Step 2: Load user.csv file from hdfs and create PairRDDs val csv = sc.textFile("spark6/user.csv")Step 3: split and clean dataval headerAndRows = csv.map(line => line.split(", ").map(_.trim))Step 4: Get header rowval header = headerAndRows.firstStep 5: Filter out header (We need to check if the first val matches the first header name) val data = headerAndRows.filter(_(0) != header(O))Step 6: Splits to map (header/value pairs)val maps = data.map(splits => header.zip(splits).toMap)Step 7: Filter out the user "myselfval result = maps.filter(map => mapf'id") != "myself")Step 8: Save the output as a Text file. result.saveAsTextFile("spark6/result.txt")
Problem Scenario 39 : You have been given two filesspark16/file1.txt1, 9, 52, 7, 43, 8, 3spark16/file2.txt1, g, h2, i, j3, k, lLoad these two tiles as Spark RDD and join them to produce the below results(l, ((9, 5), (g, h)))(2, ((7, 4), (i, j))) (3, ((8, 3), (k, l)))And write code snippet which will sum the second columns of above joined results (5+4+3).
Solution :Step 1: Create tiles in hdfs using Hue.Step 2: Create pairRDD for both the files.val one = sc.textFile("spark16/file1.txt").map{_.split(", ", -1) match {case Array(a, b, c) => (a, ( b, c))} }val two = sc.textFHe(Mspark16/file2.txt").map{_.split('7\-1) match {case Array(a, b, c) => (a, (b, c))} }Step 3: Join both the RDD. val joined = one.join(two)Step 4: Sum second column values.val sum = joined.map {case (_, ((_, num2), (_, _))) => num2.tolnt}.reduce(_ + _)
Problem Scenario 58 : You have been given below code snippet.val a = sc.parallelize(List("dog", "tiger", "lion", "cat", "spider", "eagle"), 2) val b = a.keyBy(_.length)operation1Write a correct code snippet for operationl which will produce desired output, shown below.Array[(lnt, Seq[String])] = Array((4, ArrayBuffer(lion)), (6, ArrayBuffer(spider)), (3, ArrayBuffer(dog, cat)), (5, ArrayBuffer(tiger, eagle}}}
Solution :b.groupByKey.collectgroupByKey [Pair]Very similar to groupBy, but instead of supplying a function, the key-component of each pair will automatically be presented to the partitioner.Listing Variantsdef groupByKeyQ: RDD[(K, lterable[V]}]def groupByKey(numPartittons: Int): RDD[(K, lterable[V] )] def groupByKey(partitioner: Partitioner): RDD[(K, lterable[V])]
Problem Scenario 63 : You have been given below code snippet.val a = sc.parallelize(List("dog", "tiger", "lion", "cat", "panther", "eagle"), 2)val b = a.map(x => (x.length, x))operation1 Write a correct code snippet for operationl which will produce desired output, shown below. Array[(lnt, String}] = Array((4, lion), (3, dogcat), (7, panther), (5, tigereagle))
Solution :b.reduceByKey(_ + _).collectreduceByKey JPair] : This function provides the well-known reduce functionality in Spark. Please note that any function f you provide, should be commutative in order to generate reproducible results.
Share your comments for Cloudera CCA175 exam with other users:
interesting
Passed this exam 2 days ago. These questions are in the exam. You are safe to use them.
Helpful to test your preparedness before giving exam
Really helped
Good explanation
very helpful
Question 1, Ans is - Developer,Standard,Professional Direct and Premier
Passed this exam in first appointment. Great resource and valid exam dump.
Today I wrote this exam and passed, i totally relay on this practice exam. The questions were very tough, these questions are valid and I encounter the same.
Anyone used this dump recently?
173 question is A not D
nice questions
Thanks for the practice questions they helped me a lot.
Passed this exam today. All questions are valid and this is not something you can find in ChatGPT.
i need to pass exam for VMware 2V0-11.25
Great questions.
great dumps to practice for the exam
How reliable and relevant are these questions?? also i can see the last update here was January and definitely new questions would have emerged.
Can I trust to this source?
can you please provide the CBDA latest test preparation
This is the best and only way of passing this exam as it is extremely hard. Good questions and valid dump.
Can I use this dumps when I am taking the exam? I mean does somebody look what tabs or windows I have opened ?
Finally got a change to write this exam and pass it! Valid and accurate!
Upload this exam please!
Thank you for providing these questions. It helped me a lot with passing my exam.
my first attempt
very explainable
i think answer of q 462 is variance analysis
hi i need see questions
best study material for exam
very interesting repository
american history 1
good level of questions
i need this dump kindly upload it