Snowflake DSA-C02 Exam (page: 2)
Snowflake SnowPro Advanced: Data Scientist Certification
Updated on: 12-Feb-2026

Viewing Page 2 of 14

Which ones are the key actions in the data collection phase of Machine learning included?

  1. Label
  2. Ingest and Aggregate
  3. Probability
  4. Measure

Answer(s): A,B

Explanation:

The key actions in the data collection phase include:
Label: Labeled data is the raw data that was processed by adding one or more meaningful tags so that a model can learn from it. It will take some work to label it if such information is missing (manually or automatically).
Ingest and Aggregate: Incorporating and combining data from many data sources is part of data collection in AI.
Data collection
Collecting data for training the ML model is the basic step in the machine learning pipeline. The predictions made by ML systems can only be as good as the data on which they have been trained. Following are some of the problems that can arise in data collection:
Inaccurate data. The collected data could be unrelated to the problem statement. Missing data. Sub-data could be missing. That could take the form of empty values in columns or missing images for some class of prediction.
Data imbalance. Some classes or categories in the data may have a disproportionately high or low number of corresponding samples. As a result, they risk being under-represented in the model. Data bias. Depending on how the data, subjects and labels themselves are chosen, the model could propagate inherent biases on gender, politics, age or region, for example. Data bias is difficult to detect and remove.
Several techniques can be applied to address those problems:

Pre-cleaned, freely available datasets. If the problem statement (for example, image classification, object recognition) aligns with a clean, pre-existing, properly formulated dataset, then take ad- vantage of existing, open-source expertise.
Web crawling and scraping. Automated tools, bots and headless browsers can crawl and scrape websites for data.
Private data. ML engineers can create their own data. This is helpful when the amount of data required to train the model is small and the problem statement is too specific to generalize over an open-source dataset.
Custom data. Agencies can create or crowdsource the data for a fee.



Which ones are the type of visualization used for Data exploration in Data Science?

  1. Heat Maps
  2. Newton AI
  3. Feature Distribution by Class
  4. 2D-Density Plots
  5. Sand Visualization

Answer(s): A,D,E

Explanation:

Type of visualization used for exploration:
· Correlation heatmap
· Class distributions by feature
· Two-Dimensional density plots.
All the visualizations are interactive, as is standard for Plotly.
For More details, please refer the below link:
https://towardsdatascience.com/data-exploration-understanding-and-visualization-72657f5eac41



Which one is not the feature engineering techniques used in ML data science world?

  1. Imputation
  2. Binning
  3. One hot encoding
  4. Statistical

Answer(s): D

Explanation:

Feature engineering is the pre-processing step of machine learning, which is used to transform raw data into features that can be used for creating a predictive model using Machine learning or statistical Modelling.

What is a feature?
Generally, all machine learning algorithms take input data to generate the output. The input data re- mains in a tabular form consisting of rows (instances or observations) and columns (variable or at- tributes), and these attributes are often known as features. For example, an image is an instance in computer vision, but a line in the image could be the feature. Similarly, in NLP, a document can be an observation, and the word count could be the feature. So, we can say a feature is an attribute that impacts a problem or is useful for the problem.
What is Feature Engineering?
Feature engineering is the pre-processing step of machine learning, which extracts features from raw data. It helps to represent an underlying problem to predictive models in a better way, which as a result, improve the accuracy of the model for unseen data. The predictive model contains predictor variables and an outcome variable, and while the feature engineering process selects the most useful predictor variables for the model.
Some of the popular feature engineering techniques include:
1. Imputation
Feature engineering deals with inappropriate data, missing values, human interruption, general errors, insufficient data sources, etc. Missing values within the dataset highly affect the performance of the algorithm, and to deal with them "Imputation" technique is used. Imputation is responsible for handling irregularities within the dataset.
For example, removing the missing values from the complete row or complete column by a huge percentage of missing values. But at the same time, to maintain the data size, it is required to impute the missing data, which can be done as:

For numerical data imputation, a default value can be imputed in a column, and missing values can be filled with means or medians of the columns.
For categorical data imputation, missing values can be interchanged with the maximum occurred value in a column.
2. Handling Outliers
Outliers are the deviated values or data points that are observed too away from other data points in such a way that they badly affect the performance of the model. Outliers can be handled with this feature engineering technique. This technique first identifies the outliers and then remove them out. Standard deviation can be used to identify the outliers. For example, each value within a space has a definite to an average distance, but if a value is greater distant than a certain value, it can be considered as an outlier. Z-score can also be used to detect outliers.
3. Log transform
Logarithm transformation or log transform is one of the commonly used mathematical techniques in machine learning. Log transform helps in handling the skewed data, and it makes the distribution more approximate to normal after transformation. It also reduces the effects of outliers on the data, as because of the normalization of magnitude differences, a model becomes much robust.
4. Binning
In machine learning, overfitting is one of the main issues that degrade the performance of the model and which occurs due to a greater number of parameters and noisy data. However, one of the popular techniques of feature engineering, "binning", can be used to normalize the noisy data. This process involves segmenting different features into bins.
5. Feature Split
As the name suggests, feature split is the process of splitting features intimately into two or more parts and performing to make new features. This technique helps the algorithms to better understand and learn the patterns in the dataset.
The feature splitting process enables the new features to be clustered and binned, which results in extracting useful information and improving the performance of the data models.
6. One hot encoding
One hot encoding is the popular encoding technique in machine learning. It is a technique that converts the categorical data in a form so that they can be easily understood by machine learning algorithms and hence can make a good prediction. It enables group the of categorical data without losing any information.



Skewness of Normal distribution is ___________

  1. Negative
  2. Positive
  3. 0
  4. Undefined

Answer(s): C

Explanation:

Since the normal curve is symmetric about its mean, its skewness is zero. This is a theoretical explanation for mathematical proofs, you can refer to books or websites that speak on the same in detail.



What is the formula for measuring skewness in a dataset?

  1. MEAN - MEDIAN
  2. MODE - MEDIAN
  3. (3(MEAN - MEDIAN))/ STANDARD DEVIATION
  4. (MEAN - MODE)/ STANDARD DEVIATION

Answer(s): C

Explanation:

Since the normal curve is symmetric about its mean, its skewness is zero. This is a theoretical expla- nation for mathematical proofs, you can refer to books or websites that speak on the same in detail.



Viewing Page 2 of 14



Share your comments for Snowflake DSA-C02 exam with other users:

DMZ 6/25/2023 11:56:00 PM

this exam dumps just did the job. i donot want to ruffle your feathers but your exam dumps and mock test engine is amazing.
UNITED KINGDOM


Jose 8/30/2023 6:14:00 AM

nice questions
PORTUGAL


Tar01 7/24/2023 7:07:00 PM

the explanation are really helpful
Anonymous


DaveG 12/15/2023 4:50:00 PM

just passed my exam yesterday on my first attempt. these dumps were extremely helpful in passing first time. the questions were very, very similar to these questions!
Anonymous


A.K. 6/30/2023 6:34:00 AM

cosmos db is paas not saas
Anonymous


S Roychowdhury 6/26/2023 5:27:00 PM

what is the percentage of common questions in gcp exam compared to 197 dump questions? are they 100% matching with real gcp exam?
Anonymous


Bella 7/22/2023 2:05:00 AM

not able to see questions
Anonymous


Scott 9/8/2023 7:19:00 AM

by far one of the best sites for free questions. i have pass 2 exams with the help of this website.
CANADA


donald 8/19/2023 11:05:00 AM

excellent question bank.
Anonymous


Ashwini 8/22/2023 5:13:00 AM

it really helped
Anonymous


sk 5/13/2023 2:07:00 AM

excelent material
INDIA


Christopher 9/5/2022 10:54:00 PM

the new versoin of this exam which i downloaded has all the latest questions from the exam. i only saw 3 new questions in the exam which was not in this dump.
CANADA


Sam 9/7/2023 6:51:00 AM

question 8 - can cloudtrail be used for storing jobs? based on aws - aws cloudtrail is used for governance, compliance and investigating api usage across all of our aws accounts. every action that is taken by a user or script is an api call so this is logged to [aws] cloudtrail. something seems incorrect here.
UNITED STATES


Tanvi Rajput 8/14/2023 10:55:00 AM

question 13 tda - c01 answer : quick table calculation -> percentage of total , compute using table down
UNITED KINGDOM


PMSAGAR 9/19/2023 2:48:00 AM

pls share teh dump
UNITED STATES


zazza 6/16/2023 10:47:00 AM

question 44 answer is user risk
ITALY


Prasana 6/23/2023 1:59:00 AM

please post the questions for preparation
Anonymous


test user 9/24/2023 3:15:00 AM

thanks for the questions
AUSTRALIA


Draco 7/19/2023 5:34:00 AM

please reopen it now ..its really urgent
UNITED STATES


Megan 4/14/2023 5:08:00 PM

these practice exam questions were exactly what i needed. the variety of questions and the realistic exam-like environment they created helped me assess my strengths and weaknesses. i felt more confident and well-prepared on exam day, and i owe it to this exam dumps!
UNITED KINGDOM


abdo casa 8/9/2023 6:10:00 PM

thank u it very instructuf
Anonymous


Danny 1/15/2024 9:10:00 AM

its helpful?
INDIA


hanaa 10/3/2023 6:57:00 PM

is this dump still valid???
Anonymous


Georgio 1/19/2024 8:15:00 AM

question 205 answer is b
Anonymous


Matthew Dievendorf 5/30/2023 9:37:00 PM

question 39, should be answer b, directions stated is being sudneted from /21 to a /23. a /23 has 512 ips so 510 hosts. and can make 4 subnets out of the /21
Anonymous


Adhithya 8/11/2022 12:27:00 AM

beautiful test engine software and very helpful. questions are same as in the real exam. i passed my paper.
UNITED ARAB EMIRATES


SuckerPumch88 4/25/2022 10:24:00 AM

the questions are exactly the same in real exam. just make sure not to answer all them correct or else they suspect you are cheating.
UNITED STATES


soheib 7/24/2023 7:05:00 PM

question: 78 the right answer i think is d not a
Anonymous


srija 8/14/2023 8:53:00 AM

very helpful
EUROPEAN UNION


Thembelani 5/30/2023 2:17:00 AM

i am writing this exam tomorrow and have dumps
Anonymous


Anita 10/1/2023 4:11:00 PM

can i have the icdl excel exam
Anonymous


Ben 9/9/2023 7:35:00 AM

please upload it
Anonymous


anonymous 9/20/2023 11:27:00 PM

hye when will post again the past year question for this h13-311_v3 part since i have to for my test tommorow…thank you very much
Anonymous


Randall 9/28/2023 8:25:00 PM

on question 22, option b-once per session is also valid.
Anonymous