Professional Machine Learning Engineer Exam Questions and Answers (Page: 7)

QUESTION: 49

You are building a real-time prediction engine that streams files which may contain Personally Identifiable Information (Pll) to Google Cloud. You want to use the Cloud Data Loss Prevention (DLP) API to scan the files. How should you ensure that the Pll is not accessible by unauthorized individuals?

Stream all files to Google CloudT and then write the data to BigQuery Periodically conduct a bulk scan of the table using the DLP API.
Stream all files to Google Cloud, and write batches of the data to BigQuery While the data is being written to BigQuery conduct a bulk scan of the data using the DLP API.
Create two buckets of data Sensitive and Non-sensitive Write all data to the Non-sensitive bucket Periodically conduct a bulk scan of that bucket using the DLP API, and move the sensitive data to the Sensitive bucket
Create three buckets of data: Quarantine, Sensitive, and Non-sensitive Write all data to the Quarantine bucket.
Periodically conduct a bulk scan of that bucket using the DLP API, and move the data to either the Sensitive or Non-Sensitive bucket

Answer(s): D

Explanation:

The Cloud DLP API is a service that allows users to inspect, classify, and de-identify sensitive data. It can be used to scan data in Cloud Storage, BigQuery, Cloud Datastore, and Cloud Pub/Sub. The best way to ensure that the PII is not accessible by unauthorized individuals is to use a quarantine bucket to store the data before scanning it with the DLP API. This way, the data is isolated from other applications and users until it is classified and moved to the appropriate bucket. The other options are not as secure or efficient, as they either expose the data to BigQuery before scanning, or scan the data after writing it to a non-sensitive bucket.

Reference:

Cloud DLP documentation
Scanning and classifying Cloud Storage files

Reveal Solution Next Question

QUESTION: 50

You are designing an ML recommendation model for shoppers on your company's ecommerce website. You will use Recommendations Al to build, test, and deploy your system. How should you develop recommendations that increase revenue while following best practices?

Use the "Other Products You May Like" recommendation type to increase the click-through rate
Use the "Frequently Bought Together' recommendation type to increase the shopping cart size for each order.
Import your user events and then your product catalog to make sure you have the highest quality event stream
Because it will take time to collect and record product data, use placeholder values for the product catalog to test the viability of the model.

Answer(s): B

Explanation:

Recommendations AI is a service that allows users to build, test, and deploy personalized product recommendations for their ecommerce websites. It uses Google's deep learning models to learn from user behavior and product data, and generate high-quality recommendations that can increase revenue, click-through rate, and customer satisfaction. One of the best practices for using Recommendations AI is to choose the right recommendation type for the business objective. The "Frequently Bought Together" recommendation type shows products that are often purchased together with the current product, and encourages users to add more items to their shopping cart.

This can increase the average order value and the revenue for each transaction. The other options are not as effective or feasible for this objective. The "Other Products You May Like" recommendation type shows products that are similar to the current product, and may increase the click-through rate, but not necessarily the shopping cart size. Importing the user events and then the product catalog is not a recommended order, as it may cause data inconsistency and missing recommendations. The product catalog should be imported first, and then the user events. Using placeholder values for the product catalog is not a viable option, as it will not produce meaningful recommendations or reflect the real performance of the model.

Reference:

Recommendations AI documentation
Choosing a recommendation type
Importing data to Recommendations AI

Reveal Solution Next Question

QUESTION: 51

You are designing an architecture with a serverless ML system to enrich customer support tickets with informative metadata before they are routed to a support agent. You need a set of models to predict ticket priority, predict ticket resolution time, and perform sentiment analysis to help agents make strategic decisions when they process support requests. Tickets are not expected to have any domain-specific terms or jargon.
The proposed architecture has the following flow:

Which endpoints should the Enrichment Cloud Functions call?

1 Vertex Al. 2 Vertex Al. 3 AutoML Natural Language
1 Vertex Al. 2 Vertex Al. 3 Cloud Natural Language API
1 Vertex Al. 2 Vertex Al. 3 AutoML Vision
1 Cloud Natural Language API. 2 Vertex Al, 3 Cloud Vision API

Answer(s): B

Explanation:

Vertex AI is a unified platform for building and deploying ML models on Google Cloud. It supports both custom and AutoML models, and provides various tools and services for ML development, such as Vertex Pipelines, Vertex Vizier, Vertex Explainable AI, and Vertex Feature Store. Vertex AI can be used to create models for predicting ticket priority and resolution time, as these are domain-specific tasks that require custom training data and evaluation metrics. Cloud Natural Language API is a pre- trained service that provides natural language understanding capabilities, such as sentiment analysis, entity analysis, syntax analysis, and content classification. Cloud Natural Language API can be used to perform sentiment analysis on the support tickets, as this is a general task that does not require domain-specific knowledge or jargon. The other options are not suitable for the given architecture. AutoML Natural Language and AutoML Vision are services that allow users to create custom natural language and vision models using their own data and labels. They are not needed for sentiment analysis, as Cloud Natural Language API already provides this functionality. Cloud Vision API is a pre- trained service that provides image analysis capabilities, such as object detection, face detection, text detection, and image labeling. It is not relevant for the support tickets, as they are not expected to have any images.

Reference:

Vertex AI documentation
Cloud Natural Language API documentation

Reveal Solution Next Question

QUESTION: 52

You work with a data engineering team that has developed a pipeline to clean your dataset and save it in a Cloud Storage bucket. You have created an ML model and want to use the data to refresh your model as soon as new data is available. As part of your CI/CD workflow, you want to automatically run a Kubeflow Pipelines training job on Google Kubernetes Engine (GKE). How should you architect this workflow?

Configure your pipeline with Dataflow, which saves the files in Cloud Storage After the file is saved, start the training job on a GKE cluster
Use App Engine to create a lightweight python client that continuously polls Cloud Storage for new files As soon as a file arrives, initiate the training job
Configure a Cloud Storage trigger to send a message to a Pub/Sub topic when a new file is available in a storage bucket. Use a Pub/Sub-triggered Cloud Function to start the training job on a GKE cluster
Use Cloud Scheduler to schedule jobs at a regular interval. For the first step of the job. check the timestamp of objects in your Cloud Storage bucket If there are no new files since the last run, abort the job.

Answer(s): C

Explanation:

This option is the best way to architect the workflow, as it allows you to use event-driven and serverless components to automate the ML training process. Cloud Storage triggers are a feature that allows you to send notifications to a Pub/Sub topic when an object is created, deleted, or updated in a storage bucket. Pub/Sub is a service that allows you to publish and subscribe to messages on various topics. Pub/Sub-triggered Cloud Functions are a type of Cloud Functions that are invoked when a message is published to a specific Pub/Sub topic. Cloud Functions are a serverless platform that allows you to run code in response to events. By using these components,

you can create a workflow that starts the training job on a GKE cluster as soon as a new file is available in the Cloud Storage bucket, without having to manage any servers or poll for changes. The other options are not as efficient or scalable as this option. Dataflow is a service that allows you to create and run data processing pipelines, but it is not designed to trigger ML training jobs on GKE. App Engine is a service that allows you to build and deploy web applications, but it is not suitable for polling Cloud Storage for new files, as it may incur unnecessary costs and latency. Cloud Scheduler is a service that allows you to schedule jobs at regular intervals, but it is not ideal for triggering ML training jobs based on data availability, as it may miss some files or run unnecessary jobs.

Reference:

Cloud Storage triggers documentation
Pub/Sub documentation
Pub/Sub-triggered Cloud Functions documentation
Cloud Functions documentation
Kubeflow Pipelines documentation

Reveal Solution Next Question

QUESTION: 53

You are developing models to classify customer support emails. You created models with TensorFlow Estimators using small datasets on your on-premises system, but you now need to train the models using large datasets to ensure high performance. You will port your models to Google Cloud and want to minimize code refactoring and infrastructure overhead for easier migration from on-prem to cloud.
What should you do?

Use Vertex Al Platform for distributed training
Create a cluster on Dataproc for training
Create a Managed Instance Group with autoscaling
Use Kubeflow Pipelines to train on a Google Kubernetes Engine cluster.

Answer(s): A

Explanation:

Vertex AI Platform is a unified platform for building and deploying ML models on Google Cloud. It supports both custom and AutoML models, and provides various tools and services for ML development, such as Vertex Pipelines, Vertex Vizier, Vertex Explainable AI, and Vertex Feature Store. Vertex AI Platform allows users to train their TensorFlow models using distributed training, which can speed up the training process and handle large datasets. Vertex AI Platform also minimizes code refactoring and infrastructure overhead, as it is compatible with TensorFlow Estimators and handles the provisioning, configuration, and scaling of the training resources automatically. The other options are not as suitable for this scenario. Dataproc is a service that allows users to create and run data processing pipelines using Apache Spark and Hadoop, but it is not designed for TensorFlow model training. Managed Instance Groups are a feature that allows users to create and manage groups of identical compute instances, but they require more configuration and management than Vertex AI Platform. Kubeflow Pipelines are a tool that allows users to create and run ML workflows on Google Kubernetes Engine, but they involve more complexity and code changes than Vertex AI Platform.

Reference:

Vertex AI Platform documentation
Distributed training with Vertex AI Platform

Reveal Solution Next Question

QUESTION: 54

You work for a large technology company that wants to modernize their contact center. You have been asked to develop a solution to classify incoming calls by product so that requests can be more quickly routed to the correct support team. You have already transcribed the calls using the Speech- to-Text API. You want to minimize data preprocessing and development time. How should you build the model?

Use the Al Platform Training built-in algorithms to create a custom model
Use AutoML Natural Language to extract custom entities for classification
Use the Cloud Natural Language API to extract custom entities for classification
Build a custom model to identify the product keywords from the transcribed calls, and then run the keywords through a classification algorithm

Answer(s): B

Explanation:

AutoML Natural Language is a service that allows users to create custom natural language models using their own data and labels. It supports various natural language tasks, such as text classification, entity extraction, and sentiment analysis. AutoML Natural Language can be used to build a model to classify incoming calls by product, as it can extract custom entities from the transcribed calls and assign them to predefined categories. AutoML Natural Language also minimizes data preprocessing and development time, as it handles the data preparation, model training, and evaluation automatically. The other options are not as suitable for this scenario. AI Platform Training built-in algorithms are a set of pre-defined algorithms that can be used to train ML models on AI Platform, but they do not support natural language processing tasks. Cloud Natural Language API is a pre- trained service that provides natural language understanding capabilities, such as sentiment analysis, entity analysis, syntax analysis, and content classification. However, it does not support custom entities or categories, and may not recognize the product names from the calls. Building a custom model to identify the product keywords and then running them through a classification algorithm would require more data preprocessing and development time, as well as more coding and testing.

Reference:

AutoML Natural Language documentation
AI Platform Training built-in algorithms documentation

Cloud Natural Language API documentation

Reveal Solution Next Question

QUESTION: 55

You are an ML engineer at a regulated insurance company. You are asked to develop an insurance approval model that accepts or rejects insurance applications from potential customers.
What factors should you consider before building the model?

Redaction, reproducibility, and explainability
Traceability, reproducibility, and explainability
Federated learning, reproducibility, and explainability
Differential privacy federated learning, and explainability

Answer(s): B

Explanation:

Before building an insurance approval model, an ML engineer should consider the factors of traceability, reproducibility, and explainability, as these are important aspects of responsible AI and fairness in a regulated domain. Traceability is the ability to track the provenance and lineage of the data, models, and decisions throughout the ML lifecycle. It helps to ensure the quality, reliability, and accountability of the ML system, and to comply with the regulatory and ethical standards. Reproducibility is the ability to recreate the same results and outcomes using the same data, models, and parameters. It helps to verify the validity, consistency, and robustness of the ML system, and to debug and improve the performance. Explainability is the ability to understand and interpret the logic, behavior, and outcomes of the ML system. It helps to increase the transparency, trust, and confidence of the ML system, and to identify and mitigate any potential biases, errors, or risks. The other options are not as relevant or comprehensive as this option. Redaction is the process of removing sensitive or confidential information from the data or documents, but it is not a factor that the ML engineer should consider before building the model, as it is more related to the data preparation and protection. Federated learning is a technique that allows training ML models on decentralized data without transferring the data to a central server, but it is not a factor that the ML engineer should consider before building the model, as it is more related to the model architecture and privacy preservation. Differential privacy is a method that adds noise to the data or the model outputs to protect the individual privacy of the data subjects, but it is not a factor that the ML engineer should consider before building the model, as it is more related to the model evaluation and deployment.

Reference:

Responsible AI documentation
Traceability documentation
Reproducibility documentation
Explainability documentation

Reveal Solution Next Question

QUESTION: 56

You work for a large hotel chain and have been asked to assist the marketing team in gathering predictions for a targeted marketing strategy. You need to make predictions about user lifetime value (LTV) over the next 30 days so that marketing can be adjusted accordingly. The customer dataset is in BigQuery, and you are preparing the tabular data for training with AutoML Tables. This data has a time signal that is spread across multiple columns. How should you ensure that AutoML fits the best model to your data?

Manually combine all columns that contain a time signal into an array Allow AutoML to interpret this array appropriately
Choose an automatic data split across the training, validation, and testing sets
Submit the data for training without performing any manual transformations Allow AutoML to handle the appropriate transformations Choose an automatic data split across the training, validation, and testing sets
Submit the data for training without performing any manual transformations, and indicate an appropriate column as the Time column Allow AutoML to split your data based on the time signal provided, and reserve the more recent data for the validation and testing sets
Submit the data for training without performing any manual transformations Use the columns that have a time signal to manually split your data Ensure that the data in your validation set is from 30 days after the data in your training set and that the data in your testing set is from 30 days after your validation set

Answer(s): C

Explanation:

This answer is correct because it allows AutoML Tables to handle the time signal in the data and split the data accordingly. This ensures that the model is trained on the historical data and evaluated on the more recent data, which is consistent with the prediction task. AutoML Tables can automatically detect and handle temporal features in the data, such as date, time, and duration. By specifying the Time column, AutoML Tables can also perform time-series forecasting and use the time signal to generate additional features, such as seasonality and trend.

Reference:

[AutoML Tables: Preparing your training data]
[AutoML Tables: Time-series forecasting]

Reveal Solution Next Question

Google Professional Machine Learning Engineer Exam (page: 7) Google Professional Machine Learning Engineer Updated on: 25-Aug-2025

QUESTION: 49

Explanation:

Reference:

QUESTION: 50

Explanation:

Reference:

QUESTION: 51

Explanation:

Reference:

QUESTION: 52

Explanation:

Reference:

QUESTION: 53

Explanation:

Reference:

QUESTION: 54

Explanation:

Reference:

QUESTION: 55

Explanation:

Reference:

QUESTION: 56

Explanation:

Reference:

Google Professional Machine Learning Engineer Exam (page: 7)
Google Professional Machine Learning Engineer
Updated on: 25-Aug-2025