Which of the following is one of the big performance advantages that Spark has over Hadoop?
- Spark achieves great performance by storing data in the DAG format, whereas Hadoop can only use parquet files.
- Spark achieves higher resiliency for queries since, different from Hadoop, it can be deployed on Kubernetes.
- Spark achieves great performance by storing data and performing computation in memory, whereas large jobs in Hadoop require a large amount of relatively slow disk I/O operations.
- Spark achieves great performance by storing data in the HDFS format, whereas Hadoop can only use parquet files.
- Spark achieves performance gains for developers by extending Hadoop's DataFrames with a user- friendly API.
Answer(s): C
Explanation:
Spark achieves great performance by storing data in the DAG format, whereas Hadoop can only use parquet files.
Wrong, there is no "DAG format". DAG stands for "directed acyclic graph". The DAG is a means of representing computational steps in Spark. However, it is true that Hadoop does not use a DAG.
The introduction of the DAG in Spark was a result of the limitation of Hadoop's map reduce framework in which data had to be written to and read from disk continuously.
Graph DAG in Apache Spark - DataFlair
Spark achieves great performance by storing data in the HDFS format, whereas Hadoop can only use parquet files.
No. Spark can certainly store data in HDFS (as well as other formats), but this is not a key performance advantage over Hadoop. Hadoop can use multiple file formats, not only parquet.
Spark achieves higher resiliency for queries since, different from Hadoop, it can be deployed on Kubernetes.
No, resiliency is not asked for in the question. The Question: is about
performance improvements. Both Hadoop and Spark can be deployed on Kubernetes.
Spark achieves performance gains for developers by extending Hadoop's DataFrames with a user- friendly API.
No. DataFrames are a concept in Spark, but not in Hadoop.
Reveal Solution Next Question