Your team is building an application for a global bank that will be used by millions of customers. You built a forecasting model that predicts customers' account balances 3 days in the future. Your team will use the results in a new feature that will notify users when their account balance is likely to drop below $25. How should you serve your predictions?
Answer(s): D
Option D is correct because Firebase Cloud Messaging (FCM) is designed for scalable, per-user push notifications, suitable for real-time alerts to individual customers when a model predicts a threshold breach. It supports targeting by user authentication and handles delivery to mobile/web clients globally.A) Incorrect — Pub/Sub per-user topic is not scalable or cost-effective for millions of users and adds maintenance overhead; Pub/Sub is better for decoupled, event-driven architectures, not direct per-user push notifications.B) Incorrect — Same as A; creates excessive topics and complicates authorization, not aligned with push notification delivery to end users.C) Incorrect — Firebase is the recommended, integrated solution for user-targeted mobile notifications; building a separate Firebase-based system is unnecessary when FCM already provides reliable per-user delivery.
You work for an advertising company and want to understand the effectiveness of your company's latest advertising campaign. You have streamed 500 MB of campaign data into BigQuery. You want to query the table, and then manipulate the results of that query with a pandas dataframe in an AI Platform notebook. What should you do?
Answer(s): A
Option A is correct because AI Platform Notebooks supports BigQuery cell magic to run a query and directly load results into a pandas DataFrame, enabling seamless integration with pandas workflows in notebooks.B is incorrect because exporting to Google Drive and using Drive API adds unnecessary steps and latency; not the most integrated or efficient path for notebook analysis.C is incorrect because downloading a local CSV and uploading to the notebook is manual, slower, and impractical for scalable analysis on large datasets.D is incorrect because although it uses Cloud Storage export, it requires extra steps (gsutil, intermediate CSV) and is not as streamlined as the built-in BigQuery cell magic for direct ingestion into pandas.
You are an ML engineer at a global car manufacture. You need to build an ML model to predict car sales in different cities around the world. Which features or feature crosses should you use to train city-specific relationships between car type and number of sales?
Answer(s): C
Option C is correct because element-wise product of binned latitude, binned longitude, and one-hot encoded car type creates a single cross feature that captures city-specific geography and car type interactions, enabling the model to learn distinct sales patterns per city and type. This aligns with feature crosses used to model non-linear interactions in tabular data without exploding feature space unmanageably.A is incorrect because separate bin features plus car type do not explicitly encode their interactions.B is incorrect because raw latitude/longitude cross with car type lacks geographic granularity (binning).D is incorrect because separating crosses (lat×type and lon×type) does not jointly capture the three-way interaction as effectively as a single three-way cross.
You work for a large technology company that wants to modernize their contact center. You have been asked to develop a solution to classify incoming calls by product so that requests can be more quickly routed to the correct support team. You have already transcribed the calls using the Speech-to-Text API. You want to minimize data preprocessing and development time. How should you build the model?
Answer(s): B
Option B is correct because AutoML Natural Language can learn from labeled text data (transcripts) to perform custom text classification with minimal preprocessing and development time, fitting a rapid deployment workflow for routing by product.A is incorrect because built-in AI Platform Training algorithms require more model selection, feature engineering, and tuning, increasing development effort for a text classification task.C is incorrect because Cloud Natural Language API extracts general entities, not trainable custom classifiers tailored to product categories from your data.D is incorrect because building a keyword-based classifier is brittle and high-effort for robust product routing; AutoML NL provides end-to-end learning from data.
You are training a TensorFlow model on a structured dataset with 100 billion records stored in several CSV files. You need to improve the input/output execution performance. What should you do?
Option C is correct because TFRecords are optimized for TensorFlow data pipelines, enabling efficient streaming and parallel prefetching when stored in Cloud Storage. Converting CSVs to TFRecord shards reduces parsing overhead and aligns data layout with TensorFlow’s input pipeline (tf.data), improving I/O throughput for large-scale training.A is incorrect because BigQuery is optimized for analytical queries, not streaming TF training I/O; reading from BigQuery adds unnecessary ingestion overhead for training data.B is incorrect because Cloud Bigtable is a NoSQL database optimized for random access, not bulk input pipelines for large CSV-to-TR data ingestion.D is incorrect because HDFS is not a native, managed Google Cloud storage option and adds operational complexity without benefits over Cloud Storage.
As the lead ML Engineer for your company, you are responsible for building ML models to digitize scanned customer forms. You have developed a TensorFlow model that converts the scanned images into text and stores them in Cloud Storage. You need to use your ML model on the aggregated data collected at the end of each day with minimal manual intervention. What should you do?
Option A is correct because batch prediction in Vertex AI is designed for large-scale, offline scoring of large datasets stored in Cloud Storage, with minimal manual intervention after scheduling or triggering. It fits end-of-day aggregated data workflows well.B is incorrect because a serving pipeline in Compute Engine implies online, low-latency inference for individual requests, not efficient batch processing of daily aggregates. C is incorrect because Cloud Functions are event-driven and best suited for real-time small-scale predictions, not scalable batch workloads. D is incorrect because deploying online infrastructure for each model version is unnecessary for daily batch inference and adds management overhead; batch prediction is more cost-effective and scalable for this use case.
You recently joined an enterprise-scale company that has thousands of datasets. You know that there are accurate descriptions for each table in BigQuery, and you are searching for the proper BigQuery table to use for a model you are building on Vertex AI. How should you find the data that you need?
Option A is correct because Data Catalog acts as a centralized metadata repository for GCP resources, including BigQuery datasets and tables, enabling keyword-based search on descriptions to locate relevant data for model training in Vertex AI.B is incorrect because tagging Vertex AI resources helps lineage but does not efficiently locate data across thousands of datasets; it also requires prior tagging and does not leverage dataset descriptions for discovery.C is incorrect because maintaining a manual lookup table adds maintenance overhead and risk of desynchronization, and is not the first-class discovery mechanism.D is incorrect because while INFORMATION_SCHEMA lists tables, it is per-project and not optimized for semantic descriptions; it requires manual filtering rather than metadata-driven search.
You started working on a classification problem with time series data and achieved an area under the receiver operating characteristic curve (AUC ROC) value of 99% for training data after just a few experiments. You haven't explored using any sophisticated algorithms or spent any time on hyperparameter tuning. What should your next step be to identify and fix the problem?
Option B is correct because a near-perfect AUC on training data with minimal modeling effort strongly suggests data leakage; nested cross-validation helps detect leakage and provides an unbiased estimate of model performance. Incorrect — A: Overfitting mitigation via a less complex algorithm is not appropriate when leakage is the likely issue; it would reduce capacity but doesn’t address leakage. C: Removing features highly correlated with the target is a leakage mitigation step, but the primary concern given the scenario is leakage detection, not feature pruning without validation. D: Tuning hyperparameters to reduce AUC contradicts the goal of obtaining an unbiased performance estimate and does not address leakage.
Share your comments for Google PROFESSIONAL MACHINE LEARNING ENGINEER exam with other users:
good ................