Problem Scenario 71 :Write down a Spark script using Python, In which it read a file "Content.txt" (On hdfs) with following content.After that split each row as (key, value), where key is first word in line and entire line as value.Filter out the empty lines.And save this key value in "problem86" as Sequence file(On hdfs)Part 2 : Save as sequence file , where key as null and entire line as value. Read back the stored sequence files.Content.txtHello this is ABCTECH.comThis is XYZTECH.comApache Spark TrainingThis is Spark Learning Session Spark is faster than MapReduce
Answer(s): A
Solution :Step 1:# Import SparkContext and SparkConffrom pyspark import SparkContext, SparkConfStep 2:#load data from hdfscontentRDD = sc.textFile(MContent.txt")Step 3:#filter out non-empty linesnonemptyjines = contentRDD.filter(lambda x: len(x) > 0)Step 4:#Split line based on space (Remember : It is mandatory to convert is in tuple} words = nonempty_lines.map(lambda x: tuple(x.split('', 1)))words.saveAsSequenceFile("problem86")Step 5: Check contents in directory problem86 hdfs dfs -cat problem86/part*Step 6: Create key, value pair (where key is null)nonempty_lines.map(lambda line: (None, Mne}).saveAsSequenceFile("problem86_1")Step 7: Reading back the sequence file data using spark. seqRDD = sc.sequenceFile("problem86_1")Step 8: Print the content to validate the same.for line in seqRDD.collect():print(line)
Problem Scenario 12 : You have been given following mysql database details as well as other info.user=retail_dbapassword=clouderadatabase=retail_dbjdbc URL = jdbc:mysql://quickstart:3306/retail_dbPlease accomplish following.1. Create a table in retailedb with following definition.CREATE table departments_new (department_id int(11), department_name varchar(45),created_date T1MESTAMP DEFAULT NOW());2. Now isert records from departments table to departments_new3. Now import data from departments_new table to hdfs.4. Insert following 5 records in departmentsnew table. Insert into departments_newvalues(110, "Civil" , null); Insert into departments_new values(111, "Mechanical" , null);Insert into departments_new values(112, "Automobile" , null); Insert into departments_newvalues(113, "Pharma" , null);Insert into departments_new values(114, "Social Engineering" , null);5. Now do the incremental import based on created_date column.
Solution :Step 1: Login to musql dbmysql --user=retail_dba -password=clouderashow databases;use retail db; show tables; Step 2: Create a table as given in problem statement. CREATE table departments_new (department_id int(11), department_name varchar(45), createddate T1MESTAMP DEFAULT NOW());show tables;Step 3: isert records from departments table to departments_new insert into departments_new select a.", null from departments a;Step 4: Import data from departments new table to hdfs.sqoop import \-connect jdbc:mysql://quickstart:330G/retail_db \~username=retail_dba \-password=cloudera \-table departments_new\--target-dir /user/cloudera/departments_new \--split-by departmentsStpe 5 : Check the imported data.hdfs dfs -cat /user/cloudera/departmentsnew/part"Step 6: Insert following 5 records in departmentsnew table. Insert into departments_new values(110, "Civil" , null); Insert into departments_new values(111, "Mechanical" , null); Insert into departments_new values(112, "Automobile" , null); Insert into departments_new values(113, "Pharma" , null); Insert into departments_new values(114, "Social Engineering" , null); commit;Stpe 7 : Import incremetal data based on created_date column.sqoop import \-connect jdbc:mysql://quickstart:330G/retaiI_db \-username=retail_dba \-password=cloudera \--table departments_new\-target-dir /user/cloudera/departments_new \-append \-check-column created_date \-incremental lastmodified \-split-by departments \-last-value "2016-01-30 12:07:37.0"Step 8: Check the imported value.hdfs dfs -cat /user/cloudera/departmentsnew/part"
Problem Scenario 29 : Please accomplish the following exercises using HDFS command line options.1. Create a directory in hdfs named hdfs_commands.2. Create a file in hdfs named data.txt in hdfs_commands.3. Now copy this data.txt file on local filesystem, however while copying file please makesure file properties are not changed e.g. file permissions.4. Now create a file in local directory named data_local.txt and move this file to hdfs inhdfs_commands directory.5. Create a file data_hdfs.txt in hdfs_commands directory and copy it to local file system.6. Create a file in local filesystem named file1.txt and put it to hdfs
Solution :Step 1: Create directoryhdfs dfs -mkdir hdfs_commandsStep 2: Create a file in hdfs named data.txt in hdfs_commands. hdfs dfs -touchz hdfs_commands/data.txtStep 3: Now copy this data.txt file on local filesystem, however while copying file please make sure file properties are not changed e.g. file permissions.hdfs dfs -copyToLocal -p hdfs_commands/data.txt/home/cloudera/Desktop/HadoopExamStep 4: Now create a file in local directory named data_local.txt and move this file to hdfs in hdfs_commands directory.touch data_local.txthdfs dfs -moveFromLocal /home/cloudera/Desktop/HadoopExam/dataJocal.txt hdfs_commands/Step 5: Create a file data_hdfs.txt in hdfs_commands directory and copy it to local file system.hdfs dfs -touchz hdfscommands/data hdfs.txthdfs dfs -getfrdfs_commands/data_hdfs.txt /home/cloudera/Desktop/HadoopExam/Step 6: Create a file in local filesystem named filel .txt and put it to hdfs touch filel.txthdfs dfs -put/home/cloudera/Desktop/HadoopExam/file1.txt hdfs_commands/
Problem Scenario 86 : In Continuation of previous question, please accomplish following activities.1. Select Maximum, minimum, average , Standard Deviation, and total quantity.2. Select minimum and maximum price for each product code.3. Select Maximum, minimum, average , Standard Deviation, and total quantity for eachproduct code, hwoever make sure Average and Standard deviation will have maximum twodecimal values.4. Select all the product code and average price only where product count is more than orequal to 3.5. Select maximum, minimum , average and total of all the products for each code. Alsoproduce the same across all the products.
Solution : Step 1: Select Maximum, minimum, average , Standard Deviation, and total quantity. val results = sqlContext.sql('.....SELECT MAX(price) AS MAX , MIN(price) AS MIN , AVG(price) AS Average, STD(price) AS STD, SUM(quantity) AS total_products FROM products......)results. showQStep 2: Select minimum and maximum price for each product code. val results = sqlContext.sql(......SELECT code, MAX(price) AS Highest Price', MIN(price) AS Lowest Price'FROM products GROUP BY code......)results. showQStep 3: Select Maximum, minimum, average , Standard Deviation, and total quantity for each product code, hwoever make sure Average and Standard deviation will have maximum two decimal values.val results = sqlContext.sql(......SELECT code, MAX(price), MIN(price), CAST(AVG(price} AS DECIMAL(7, 2)) AS Average', CAST(STD(price) AS DECIMAL(7, 2)) AS 'Std Dev\ SUM(quantity) FROM productsGROUP BY code......)results. showQStep 4: Select all the product code and average price only where product count is more than or equal to 3.val results = sqlContext.sql(......SELECT code AS Product Code', COUNTf) AS Count', CAST(AVG(price) AS DECIMAL(7, 2)) AS Average' FROM products GROUP BY code HAVING Count >=3"M") results. showQStep 5: Select maximum, minimum , average and total of all the products for each code.Also produce the same across all the products.val results = sqlContext.sql( """SELECTcode, MAX(price), MIN(pnce), CAST(AVG(price) AS DECIMAL(7, 2)) AS Average', SUM(quantity)-FROM productsGROUP BY codeWITH ROLLUP""" )results. show()
Problem Scenario 9 : You have been given following mysql database details as well as other info.user=retail_dbapassword=clouderadatabase=retail_dbjdbc URL = jdbc:mysql://quickstart:3306/retail_dbPlease accomplish following.1. Import departments table in a directory.2. Again import departments table same directory (However, directory already exist henceit should not overrride and append the results)3. Also make sure your results fields are terminated by '|' and lines terminated by '\n\
Solution:Step 1: Clean the hdfs file system, if they exists clean out.hadoop fs -rm -R departmentshadoop fs -rm -R categorieshadoop fs -rm -R productshadoop fs -rm -R ordershadoop fs -rm -R order_itemshadoop fs -rm -R customersStep 2: Now import the department table as per requirement.sqoop import \-connect jdbc:mysql://quickstart:330G/retaiI_db \--username=retail_dba \-password=cloudera \-table departments \-target-dir=departments \-fields-terminated-by '|' \-lines-terminated-by '\n' \-mlStep 3: Check imported data.hdfs dfs -Is departmentshdfs dfs -cat departments/part-m-00000Step 4: Now again import data and needs to appended.sqoop import \-connect jdbc:mysql://quickstart:3306/retail_db \--username=retail_dba \-password=cloudera \-table departments \-target-dir departments \-append \-tields-terminated-by '|' \-lines-termtnated-by '\n' \-mlStep 5: Again Check the resultshdfs dfs -Is departmentshdfs dfs -cat departments/part-m-00001
Share your comments for Cloudera CCA175 exam with other users:
please upload
prepping for fsc exam
pd1 with great experience
@t it seems like azure service bus message quesues could be the best solution
helpful to check your understanding.
question 128 the answer should be static not auto
more comments here
great support to appear for exams
useful dumps
making progress
q31 answer should be d i think
is this real?
q10: c and f are also true. q11: this is outdated. you no longer need ownership on a pipe to operate it
good questions with simple explanation
admin guide (windows) respond to malicious causality chains. when the cortex xdr agent identifies a remote network connection that attempts to perform malicious activity—such as encrypting endpoint files—the agent can automatically block the ip address to close all existing communication and block new connections from this ip address to the endpoint. when cortex xdrblocks an ip address per endpoint, that address remains blocked throughout all agent profiles and policies, including any host-firewall policy rules. you can view the list of all blocked ip addresses per endpoint from the action center, as well as unblock them to re-enable communication as appropriate. this module is supported with cortex xdr agent 7.3.0 and later. select the action mode to take when the cortex xdr agent detects remote malicious causality chains: enabled (default)—terminate connection and block ip address of the remote connection. disabled—do not block remote ip addresses. to allow specific and known s
very inciting
question 5, it seems a instead of d, because: - care plan = case - patient = person account - product = product2;
it look like real one
i am taking oracle fcc certification test next two days, pls share question dumps
i need dumps
its time to comptia sec+
question 35 has an answer for a different question. i believe the answer is "a" because it shut off the firewall. "0" in registry data means that its false (aka off).
helpful content
oracle 19c is complex db
helpful for practice
support team is fast and deeply knowledgeable. i appreciate that a lot.
helpful questions
thanks for question
the software is provided for free so this is a big change. all other sites are charging for that. also that fucking examtopic site that says free is not free at all. you are hit with a pay-wall.
i need exam questions nca 6.5 any help please ?
just took the comptia cybersecurity analyst (cysa+) - wished id seeing this before my exam
very helpful
i need this exam
nice questions... are these questions the same of the exam?