Your team is building a data engineering and data science development environment.
The environment must support the following requirements:
-support Python and Scala
-compose data storage, movement, and processing services into automated data pipelines
-the same tool should be used for the orchestration of both data engineering and data science
-support workload isolation and interactive workloads
-enable scaling across a cluster of machines
You need to create the environment.
What should you do?
- Build the environment in Apache Hive for HDInsight and use Azure Data Factory for orchestration.
- Build the environment in Azure Databricks and use Azure Data Factory for orchestration.
- Build the environment in Apache Spark for HDInsight and use Azure Container Instances for orchestration.
- Build the environment in Azure Databricks and use Azure Container Instances for orchestration.
Answer(s): B
Explanation:
In Azure Databricks, we can create two different types of clusters.
- Standard, these are the default clusters and can be used with Python, R, Scala and SQL
- High-concurrency
Azure Databricks is fully integrated with Azure Data Factory.
Incorrect Answers:
D: Azure Container Instances is good for development or testing. Not suitable for production workloads.
Reference:
https://docs.microsoft.com/en-us/azure/architecture/data-guide/technology-choices/data-science-and-machine-learning
Reveal Solution Next Question