Home » Microsoft » 70-475 v.2 » Which file format should you recommend?
You plan to create a Microsoft Azure Data Factory pipeline that will connect to an Azure HDInsight cluster that uses Apache Spark.
You need to recommend which file format must be used by the pipeline. The solution must meet the following requirements:
Store data in the columnar format Support compression
Which file format should you recommend?
A. XML
B. AVRO
C. text
D. Parquet
Correct Answer: D
Explanation/Reference:
Apache Parquet is a columnar storage format available to any project in the Hadoop ecosystem, regardless of the choice of data processing framework, data model or programming language. Apache Parquet supports compression.
Incorrect Answers:
A: Azure Data Factory does not support XML.
C: The text format does not support compression.
Note: Azure Data Factory supports the following file format types:
Text format
JSON format
Parquet format
ORC format
Avro format
References: https://parquet.apache.org/