Amazon AWS Certified Machine Learning - Specialty Exam (page: 2)
Amazon AWS Certified Machine Learning - Specialty (MLS-C01)
Updated on: 09-Feb-2026

A Machine Learning Specialist is developing a custom video recommendation model for an application. The dataset used to train this model is very large with millions of data points and is hosted in an Amazon S3 bucket. The Specialist wants to avoid loading all of this data onto an Amazon SageMaker notebook instance because it would take hours to move and will exceed the attached 5 GB Amazon EBS volume on the notebook instance.

Which approach allows the Specialist to use all the data to train the model?

  1. Load a smaller subset of the data into the SageMaker notebook and train locally. Confirm that the training code is executing and the model parameters seem reasonable. Initiate a SageMaker training job using the full dataset from the S3 bucket using Pipe input mode.
  2. Launch an Amazon EC2 instance with an AWS Deep Learning AMI and attach the S3 bucket to the instance. Train on a small amount of the data to verify the training code and hyperparameters. Go back to Amazon SageMaker and train using the full dataset
  3. Use AWS Glue to train a model using a small subset of the data to confirm that the data will be compatible with Amazon SageMaker. Initiate a SageMaker training job using the full dataset from the S3 bucket using Pipe input mode.
  4. Load a smaller subset of the data into the SageMaker notebook and train locally. Confirm that the training code is executing and the model parameters seem reasonable. Launch an Amazon EC2 instance with an AWS Deep Learning AMI and attach the S3 bucket to train the full dataset.

Answer(s): A



A Machine Learning Specialist has completed a proof of concept for a company using a small data sample, and now the Specialist is ready to implement an end-to-end solution in AWS using Amazon SageMaker. The historical training data is stored in Amazon RDS.

Which approach should the Specialist use for training a model using that data?

  1. Write a direct connection to the SQL database within the notebook and pull data in
  2. Push the data from Microsoft SQL Server to Amazon S3 using an AWS Data Pipeline and provide the S3 location within the notebook.
  3. Move the data to Amazon DynamoDB and set up a connection to DynamoDB within the notebook to pull data in.
  4. Move the data to Amazon ElastiCache using AWS DMS and set up a connection within the notebook to pull data in for fast access.

Answer(s): B



A Machine Learning Specialist receives customer data for an online shopping website. The data includes demographics, past visits, and locality information. The Specialist must develop a machine learning approach to identify the customer shopping patterns, preferences, and trends to enhance the website for better service and smart recommendations.

Which solution should the Specialist recommend?

  1. Latent Dirichlet Allocation (LDA) for the given collection of discrete data to identify patterns in the customer database.
  2. A neural network with a minimum of three layers and random initial weights to identify patterns in the customer database.
  3. Collaborative filtering based on user interactions and correlations to identify patterns in the customer database.
  4. Random Cut Forest (RCF) over random subsamples to identify patterns in the customer database.

Answer(s): C



A Machine Learning Specialist is working with a large company to leverage machine learning within its products. The company wants to group its customers into categories based on which customers will and will not churn within the next 6 months. The company has labeled the data available to the Specialist.

Which machine learning model type should the Specialist use to accomplish this task?

  1. Linear regression
  2. Classification
  3. Clustering
  4. Reinforcement learning

Answer(s): B

Explanation:

The goal of classification is to determine to which class or category a data point (customer in our case) belongs to. For classification problems, data scientists would use historical data with predefined target variables AKA labels (churner/non-churner) – answers that need to be predicted – to train an algorithm. With classification, businesses can answer the following questions:

•Will this customer churn or not?
•Will a customer renew their subscription?
•Will a user downgrade a pricing plan?
•Are there any signs of unusual customer behavior?


Reference:

https://www.kdnuggets.com/2019/05/churn-prediction-machine-learning.html



The displayed graph is from a forecasting model for testing a time series.


Considering the graph only, which conclusion should a Machine Learning Specialist make about the behavior of the model?

  1. The model predicts both the trend and the seasonality well
  2. The model predicts the trend well, but not the seasonality.
  3. The model predicts the seasonality well, but not the trend.
  4. The model does not predict the trend or the seasonality well.

Answer(s): A



A company wants to classify user behavior as either fraudulent or normal. Based on internal research, a Machine Learning Specialist would like to build a binary classifier based on two features: age of account and transaction month. The class distribution for these features is illustrated in the figure provided.


Based on this information, which model would have the HIGHEST accuracy?

  1. Long short-term memory (LSTM) model with scaled exponential linear unit (SELU)
  2. Logistic regression
  3. Support vector machine (SVM) with non-linear kernel
  4. Single perceptron with tanh activation function

Answer(s): C



A Machine Learning Specialist at a company sensitive to security is preparing a dataset for model training. The dataset is stored in Amazon S3 and contains Personally Identifiable Information (PII).

The dataset:
•Must be accessible from a VPC only.
•Must not traverse the public internet.

How can these requirements be satisfied?

  1. Create a VPC endpoint and apply a bucket access policy that restricts access to the given VPC endpoint and the VPC.
  2. Create a VPC endpoint and apply a bucket access policy that allows access from the given VPC endpoint and an Amazon EC2 instance.
  3. Create a VPC endpoint and use Network Access Control Lists (NACLs) to allow traffic between only the given VPC endpoint and an Amazon EC2 instance.
  4. Create a VPC endpoint and use security groups to restrict access to the given VPC endpoint and an Amazon EC2 instance

Answer(s): A



During mini-batch training of a neural network for a classification problem, a Data Scientist notices that training accuracy oscillates.

What is the MOST likely cause of this issue?

  1. The class distribution in the dataset is imbalanced.
  2. Dataset shuffling is disabled.
  3. The batch size is too big.
  4. The learning rate is very high.

Answer(s): D


Reference:

https://towardsdatascience.com/deep-learning-personal-notes-part-1-lesson-2-8946fe970b95



Viewing Page 2 of 36



Share your comments for Amazon AWS Certified Machine Learning - Specialty exam with other users:

Reddy 12/14/2023 2:42:00 AM

these are pretty useful
Anonymous