Google Professional Machine Learning Engineer Exam (page: 3)
Google Professional Machine Learning Engineer
Updated on: 25-Aug-2025

You have trained a model on a dataset that required computationally expensive preprocessing operations. You need to execute the same preprocessing at prediction time. You deployed the model on Al Platform for high-throughput online prediction.
Which architecture should you use?

  1. · Validate the accuracy of the model that you trained on preprocessed data · Create a new model that uses the raw data and is available in real time · Deploy the new model onto Al Platform for online prediction
  2. · Send incoming prediction requests to a Pub/Sub topic
    · Transform the incoming data using a Dataflow job
    · Submit a prediction request to Al Platform using the transformed data · Write the predictions to an outbound Pub/Sub queue
  3. · Stream incoming prediction request data into Cloud Spanner · Create a view to abstract your preprocessing logic.
    · Query the view every second for new records
    · Submit a prediction request to Al Platform using the transformed data · Write the predictions to an outbound Pub/Sub queue.
  4. · Send incoming prediction requests to a Pub/Sub topic
    · Set up a Cloud Function that is triggered when messages are published to the Pub/Sub topic.
    · Implement your preprocessing logic in the Cloud Function · Submit a prediction request to Al Platform using the transformed data · Write the predictions to an outbound Pub/Sub queue

Answer(s): D

Explanation:

Option A is incorrect because creating a new model that uses the raw data and is available in real time would require retraining the model and deploying it again, which is not efficient or scalable. Option B is incorrect because using a Dataflow job to transform the incoming data would introduce unnecessary latency and complexity for online prediction, which requires fast and simple processing. Option C is incorrect because using Cloud Spanner to stream and query the incoming data would incur high costs and overhead for online prediction, which does not need a relational database. Option D is correct because using a Cloud Function to preprocess the data and submit a prediction request to Al Platform is a simple and scalable solution for online prediction, which leverages the serverless and event-driven features of Cloud Functions.



You are building a model to predict daily temperatures. You split the data randomly and then transformed the training and test datasets. Temperature data for model training is uploaded hourly. During testing, your model performed with 97% accuracy; however, after deploying to production, the model's accuracy dropped to 66%. How can you make your production model more accurate?

  1. Normalize the data for the training, and test datasets as two separate steps.
  2. Split the training and test data based on time rather than a random split to avoid leakage
  3. Add more data to your test set to ensure that you have a fair distribution and sample for testing
  4. Apply data transformations before splitting, and cross-validate to make sure that the transformations are applied to both the training and test sets.

Answer(s): B

Explanation:

When building a model to predict daily temperatures, it is important to split the training and test data based on time rather than a random split. This is because temperature data is likely to have temporal dependencies and patterns, such as seasonality, trends, and cycles. If the data is split randomly, there is a risk of data leakage, which occurs when information from the future is used to train or validate the model. Data leakage can lead to overfitting and unrealistic performance estimates, as the model may learn from data that it should not have access to. By splitting the data based on time, such as using the most recent data as the test set and the older data as the training set, the model can be evaluated on how well it can forecast future temperatures based on past data, which is the realistic scenario in production. Therefore, splitting the data based on time rather than a random split is the best way to make the production model more accurate.



You have a demand forecasting pipeline in production that uses Dataflow to preprocess raw data prior to model training and prediction. During preprocessing, you employ Z-score normalization on data stored in BigQuery and write it back to BigQuery. New training data is added every week. You want to make the process more efficient by minimizing computation time and manual intervention.
What should you do?

  1. Normalize the data using Google Kubernetes Engine
  2. Translate the normalization algorithm into SQL for use with BigQuery
  3. Use the normalizer_fn argument in TensorFlow's Feature Column API
  4. Normalize the data with Apache Spark using the Dataproc connector for BigQuery

Answer(s): B

Explanation:

Z-score normalization is a technique that transforms the values of a numeric variable into standardized units, such that the mean is zero and the standard deviation is one. Z-score normalization can help to compare variables with different scales and ranges, and to reduce the effect of outliers and skewness. The formula for z-score normalization is:
z (x - mu) / sigma where x is the original value, mu is the mean of the variable, and sigma is the standard deviation of the variable.
Dataflow is a service that allows you to create and run data processing pipelines on Google Cloud. You can use Dataflow to preprocess raw data prior to model training and prediction, such as applying z-score normalization on data stored in BigQuery. However, using Dataflow for this task may not be the most efficient option, as it involves reading and writing data from and to BigQuery, which can be time-consuming and costly. Moreover, using Dataflow requires manual intervention to update the pipeline whenever new training data is added.
A more efficient way to perform z-score normalization on data stored in BigQuery is to translate the normalization algorithm into SQL and use it with BigQuery. BigQuery is a service that allows you to analyze large-scale and complex data using SQL queries. You can use BigQuery to perform z-score normalization on your data using SQL functions such as AVG(), STDDEV_POP(), and OVER(). For example, the following SQL query can normalize the values of a column called temperature in a table called weather:
SELECT (temperature - AVG(temperature) OVER ()) / STDDEV_POP(temperature) OVER () AS normalized_temperature FROM weather;
By using SQL to perform z-score normalization on BigQuery, you can make the process more efficient by minimizing computation time and manual intervention. You can also leverage the scalability and performance of BigQuery to handle large and complex datasets. Therefore, translating the normalization algorithm into SQL for use with BigQuery is the best option for this use case.



You were asked to investigate failures of a production line component based on sensor readings. After receiving the dataset, you discover that less than 1% of the readings are positive examples representing failure incidents. You have tried to train several classification models, but none of them converge. How should you resolve the class imbalance problem?

  1. Use the class distribution to generate 10% positive examples
  2. Use a convolutional neural network with max pooling and softmax activation
  3. Downsample the data with upweighting to create a sample with 10% positive examples
  4. Remove negative examples until the numbers of positive and negative examples are equal

Answer(s): C

Explanation:

The class imbalance problem is a common challenge in machine learning, especially in classification tasks. It occurs when the distribution of the target classes is highly skewed, such that one class (the majority class) has much more examples than the other class (the minority class). The minority class is often the more interesting or important class, such as failure incidents, fraud cases, or rare diseases. However, most machine learning algorithms are designed to optimize the overall accuracy, which can be biased towards the majority class and ignore the minority class. This can result in poor predictive performance, especially for the minority class. There are different techniques to deal with the class imbalance problem, such as data-level methods, algorithm-level methods, and evaluation-level methods1. Data-level methods involve resampling the original dataset to create a more balanced class distribution. There are two main types of data-level methods: oversampling and undersampling. Oversampling methods increase the number of examples in the minority class, either by duplicating existing examples or by generating synthetic examples. Undersampling methods reduce the number of examples in the majority class, either by randomly removing examples or by using clustering or other criteria to select representative examples. Both oversampling and undersampling methods can be combined with upweighting or downweighting, which assign different weights to the examples according to their class frequency, to further balance the dataset.
For the use case of investigating failures of a production line component based on sensor readings, the best option is to downsample the data with upweighting to create a sample with 10% positive examples. This option involves randomly removing some of the negative examples (the majority class) until the ratio of positive to negative examples is 1:9, and then assigning higher weights to the positive examples to compensate for their low frequency. This option can create a more balanced dataset that can improve the performance of the classification models, while preserving the diversity and representativeness of the original data. This option can also reduce the computation time and memory usage, as the size of the dataset is reduced. Therefore, downsampling the data with upweighting to create a sample with 10% positive examples is the best option for this use case.


Reference:

A Systematic Study of the Class Imbalance Problem in Convolutional Neural Networks



You need to design a customized deep neural network in Keras that will predict customer purchases based on their purchase history. You want to explore model performance using multiple model architectures, store training data, and be able to compare the evaluation metrics in the same dashboard.
What should you do?

  1. Create multiple models using AutoML Tables
  2. Automate multiple training runs using Cloud Composer
  3. Run multiple training jobs on Al Platform with similar job names
  4. Create an experiment in Kubeflow Pipelines to organize multiple runs

Answer(s): D

Explanation:

Kubeflow Pipelines is a service that allows you to create and run machine learning workflows on Google Cloud using various features, model architectures, and hyperparameters. You can use Kubeflow Pipelines to scale up your workflows, leverage distributed training, and access specialized hardware such as GPUs and TPUs1. An experiment in Kubeflow Pipelines is a workspace where you can try different configurations of your pipelines and organize your runs into logical groups. You can use experiments to compare the performance of different models and track the evaluation metrics in the same dashboard2.
For the use case of designing a customized deep neural network in Keras that will predict customer purchases based on their purchase history, the best option is to create an experiment in Kubeflow Pipelines to organize multiple runs. This option allows you to explore model performance using multiple model architectures, store training data, and compare the evaluation metrics in the same dashboard. You can use Keras to build and train your deep neural network models, and then package them as pipeline components that can be reused and combined with other components. You can also use Kubeflow Pipelines SDK to define and submit your pipelines programmatically, and use Kubeflow Pipelines UI to monitor and manage your experiments. Therefore, creating an experiment in Kubeflow Pipelines to organize multiple runs is the best option for this use case.


Reference:

Kubeflow Pipelines documentation
Experiment | Kubeflow



Your team needs to build a model that predicts whether images contain a driver's license, passport, or credit card. The data engineering team already built the pipeline and generated a dataset composed of 10,000 images with driver's licenses, 1,000 images with passports, and 1,000 images with credit cards. You now have to train a model with the following label map: ['driversjicense', 'passport', 'credit_card'].
Which loss function should you use?

  1. Categorical hinge
  2. Binary cross-entropy
  3. Categorical cross-entropy
  4. Sparse categorical cross-entropy

Answer(s): C

Explanation:

Categorical cross-entropy is a loss function that is suitable for multi-class classification problems, where the target variable has more than two possible values. Categorical cross-entropy measures the difference between the true probability distribution of the target classes and the predicted probability distribution of the model. It is defined as:
L - sum(y_i * log(p_i))
where y_i is the true probability of class i, and p_i is the predicted probability of class i. Categorical cross-entropy penalizes the model for making incorrect predictions, and encourages the model to assign high probabilities to the correct classes and low probabilities to the incorrect classes. For the use case of building a model that predicts whether images contain a driver's license, passport, or credit card, categorical cross-entropy is the appropriate loss function to use. This is because the problem is a multi-class classification problem, where the target variable has three possible values: [`drivers_license', `passport', `credit_card']. The label map is a list that maps the class names to the class indices, such that `drivers_license' corresponds to index 0, `passport' corresponds to index 1, and `credit_card' corresponds to index 2. The model should output a probability distribution over the three classes for each image, and the categorical cross-entropy loss function should compare the output with the true labels. Therefore, categorical cross-entropy is the best loss function for this use case.



You are an ML engineer at a global car manufacturer. You need to build an ML model to predict car sales in different cities around the world.
Which features or feature crosses should you use to train city-specific relationships between car type and number of sales?

  1. Three individual features binned latitude, binned longitude, and one-hot encoded car type
  2. One feature obtained as an element-wise product between latitude, longitude, and car type
  3. One feature obtained as an element-wise product between binned latitude, binned longitude, and one-hot encoded car type
  4. Two feature crosses as a element-wise product the first between binned latitude and one-hot encoded car type, and the second between binned longitude and one-hot encoded car type

Answer(s): C

Explanation:

A feature cross is a synthetic feature that is obtained by combining two or more existing features, usually by taking their product or concatenation. A feature cross can help to capture the nonlinear and interaction effects between the original features, and improve the predictive performance of the model. A feature cross can be applied to different types of features, such as numeric, categorical, or geospatial features1.
For the use case of building an ML model to predict car sales in different cities around the world, the best option is to use one feature obtained as an element-wise product between binned latitude,

binned longitude, and one-hot encoded car type. This option involves creating a feature cross that combines three individual features: binned latitude, binned longitude, and one-hot encoded car type. Binning is a technique that transforms a continuous numeric feature into a discrete categorical feature by dividing its range into equal intervals, or bins. One-hot encoding is a technique that transforms a categorical feature into a binary vector, where each element corresponds to a possible category, and has a value of 1 if the feature belongs to that category, and 0 otherwise. By applying binning and one-hot encoding to the latitude, longitude, and car type features, the feature cross can capture the city-specific relationships between car type and number of sales, as each combination of bins and car types can represent a different city and its preference for a certain car type. For example, the feature cross can learn that a city with a latitude bin of [40, 50], a longitude bin of [-80, -70], and a car type of SUV has a higher number of sales than a city with a latitude bin of [-10, 0], a longitude bin of [10, 20], and a car type of sedan. Therefore, using one feature obtained as an element-wise product between binned latitude, binned longitude, and one-hot encoded car type is the best option for this use case.


Reference:

Feature Crosses | Machine Learning Crash Course



You trained a text classification model. You have the following SignatureDefs:



What is the correct way to write the predict request?

  1. data json.dumps({"signature_name": "serving_default'\ "instances": [fab', 'be1, 'cd']]})
  2. data json dumps({"signature_name": "serving_default"! "instances": [['a', 'b', "c", 'd', 'e', 'f']]})
  3. data json.dumps({"signature_name": "serving_default, "instances": [['a', 'b\ 'c'1, [d\ 'e\ T]]})
  4. data json dumps({"signature_name": f,serving_default", "instances": [['a', 'b'], [c\ 'd'], ['e\ T]]})

Answer(s): D

Explanation:

A predict request is a way to send data to a trained model and get predictions in return. A predict request can be written in different formats, such as JSON, protobuf, or gRPC, depending on the service and the platform that are used to host and serve the model. A predict request usually contains the following information:
The signature name: This is the name of the signature that defines the inputs and outputs of the model. A signature is a way to specify the expected format, type, and shape of the data that the model can accept and produce. A signature can be specified when exporting or saving the model, or it can be automatically inferred by the service or the platform. A model can have multiple signatures, but only one can be used for each predict request.
The instances: This is the data that is sent to the model for prediction. The instances can be a single instance or a batch of instances, depending on the size and shape of the data. The instances should match the input specification of the signature, such as the number, name, and type of the input tensors.
For the use case of training a text classification model, the correct way to write the predict request is D. data json.dumps({"signature_name": "serving_default", "instances": [[`a', `b'], [`c', `d'], [`e', `f']]}) This option involves writing the predict request in JSON format, which is a common and convenient format for sending and receiving data over the web. JSON stands for JavaScript Object Notation, and it is a way to represent data as a collection of name-value pairs or an ordered list of values. JSON can be easily converted to and from Python objects using the json module. This option also involves using the signature name "serving_default", which is the default signature name that is assigned to the model when it is saved or exported without specifying a custom signature name. The serving_default signature defines the input and output tensors of the model based on the SignatureDef that is shown in the image. According to the SignatureDef, the model expects an input tensor called "text" that has a shape of (-1, 2) and a type of DT_STRING, and produces an output tensor called "softmax" that has a shape of (-1, 2) and a type of DT_FLOAT. The -1 in the shape indicates that the dimension can vary depending on the number of instances, and the 2 indicates that the dimension is fixed at 2. The DT_STRING and DT_FLOAT indicate that the data type is string and float, respectively.
This option also involves sending a batch of three instances to the model for prediction. Each instance is a list of two strings, such as [`a', `b'], [`c', `d'], or [`e', `f']. These instances match the input specification of the signature, as they have a shape of (3, 2) and a type of string. The model will process these instances and produce a batch of three predictions, each with a softmax output that has a shape of (1, 2) and a type of float. The softmax output is a probability distribution over the two possible classes that the model can predict, such as positive or negative sentiment. Therefore, writing the predict request as data json.dumps({"signature_name": "serving_default", "instances": [[`a', `b'], [`c', `d'], [`e', `f']]}) is the correct and valid way to send data to the text classification model and get predictions in return.


Reference:

[json -- JSON encoder and decoder]



Viewing Page 3 of 37



Share your comments for Google Professional Machine Learning Engineer exam with other users:

jaya 12/17/2023 8:36:00 AM

good for students
UNITED STATES


Bsmaind 8/20/2023 9:23:00 AM

nice practice dumps
Anonymous


kumar 11/15/2023 11:24:00 AM

nokia 4a0-114 dumps
Anonymous


Vetri 10/3/2023 12:59:00 AM

great content and wonderful to have the answers with explanation
UNITED STATES


Ranjith 8/21/2023 3:39:00 PM

for question #118, the answer is option c. the screen shot is showing the drop down, but the answer is marked incorrectly please update . thanks for sharing such nice questions.
Anonymous


Eduardo Ramírez 12/11/2023 9:55:00 PM

the correct answer for the question 29 is d.
Anonymous


Dass 11/2/2023 7:43:00 AM

question no 22: correct answers: bc, 1 per session 1 per page 1 per component always
UNITED STATES


Reddy 12/14/2023 2:42:00 AM

these are pretty useful
Anonymous


Daisy Delgado 1/9/2023 1:05:00 PM

awesome
UNITED STATES


Atif 6/13/2023 4:09:00 AM

yes please upload
UNITED STATES


Xunil 6/12/2023 3:04:00 PM

great job whoever put this together, for the greater good! thanks!
Anonymous


Lakshmi 10/2/2023 5:26:00 AM

just started to view all questions for the exam
NETHERLANDS


rani 1/19/2024 11:52:00 AM

helpful material
Anonymous


Greg 11/16/2023 6:59:00 AM

hope for the best
UNITED STATES


hi 10/5/2023 4:00:00 AM

will post exam has finished
UNITED STATES


Vmotu 8/24/2023 11:14:00 AM

really correct and good analyze!
AZERBAIJAN


hicham 5/30/2023 8:57:00 AM

excellent thanks a lot
FRANCE


Suman C 7/7/2023 8:13:00 AM

will post once pass the cka exam
INDIA


Ram 11/3/2023 5:10:00 AM

good content
Anonymous


Nagendra Pedipina 7/13/2023 2:12:00 AM

q:32 answer has to be option c
INDIA


Tamer Barakat 12/7/2023 5:17:00 PM

nice questions
Anonymous


Daryl 8/1/2022 11:33:00 PM

i really like the support team in this website. they are fast in communication and very helpful.
UNITED KINGDOM


Curtis Nakawaki 6/29/2023 9:13:00 PM

a good contemporary exam review
UNITED STATES


x-men 5/23/2023 1:02:00 AM

q23, its an array, isnt it? starts with [ and end with ]. its an array of objects, not object.
UNITED STATES


abuti 7/21/2023 6:24:00 PM

cool very helpfull
Anonymous


Krishneel 3/17/2023 10:34:00 AM

i just passed. this exam dumps is the same one from prepaway and examcollection. it has all the real test questions.
INDIA


Regor 12/4/2023 2:01:00 PM

is this a valid prince2 practitioner dumps?
UNITED KINGDOM


asl 9/14/2023 3:59:00 PM

all are relatable questions
CANADA


Siyya 1/19/2024 8:30:00 PM

might help me to prepare for the exam
Anonymous


Ted 6/21/2023 11:11:00 PM

just paid and downlaod the 2 exams using the 50% sale discount. so far i was able to download the pdf and the test engine. all looks good.
GERMANY


Paul K 11/27/2023 2:28:00 AM

i think it should be a,c. option d goes against the principle of building anything custom unless there are no work arounds available
INDIA


ph 6/16/2023 12:41:00 AM

very legible
Anonymous


sephs2001 7/31/2023 10:42:00 PM

is this exam accurate or helpful?
Anonymous


ash 7/11/2023 3:00:00 AM

please upload dump, i have exam in 2 days
INDIA