You are preparing to train a regression model via automated machine learning. The data available to you has features with missing values, as well as categorical features with little discrete values.
You want to make sure that automated machine learning is configured as follows:
-missing values must be automatically imputed.
-categorical features must be encoded as part of the training task.
Which of the following actions should you take?
- You should make use of the featurization parameter with the 'auto' value pair.
- You should make use of the featurization parameter with the 'off' value pair.
- You should make use of the featurization parameter with the 'on' value pair.
- You should make use of the featurization parameter with the 'FeaturizationConfig' value pair.
Answer(s): A
Explanation:
Featurization str or FeaturizationConfig
Values: 'auto' / 'off' / FeaturizationConfig
Indicator for whether featurization step should be done automatically or not, or whether customized featurization should be used.
Column type is automatically detected. Based on the detected column type preprocessing/featurization is done as follows:
Categorical: Target encoding, one hot encoding, drop high cardinality categories, impute missing values.
Numeric: Impute missing values, cluster distance, weight of evidence.
DateTime: Several features such as day, seconds, minutes, hours etc.
Text: Bag of words, pre-trained Word embedding, text target encoding.
Reference:
https://docs.microsoft.com/en-us/python/api/azureml-train-automl-client/azureml.train.automl.automlconfig.automlconfig

Reveal Solution Next Question