Home » Google » Professional-Data-Engineer » What should you do?

What should you do?

08/07/2019 – by Mod_GuideK 5

You are creating a new pipeline in Google Cloud to stream IoT data from Cloud Pub/Sub through Cloud Dataflow to BigQuery. While previewing the data, you notice that roughly 2% of the data appears to be corrupt. You need to modify the Cloud Dataflow pipeline to filter out this corrupt data. What should you do?
A. Add a SideInput that returns a Boolean if the element is corrupt.
B. Add a ParDo transform in Cloud Dataflow to discard corrupt elements.
C. Add a Partition transform in Cloud Dataflow to separate valid data from corrupt data.
D. Add a GroupByKey transform in Cloud Dataflow to group all of the valid data together and discard the rest.

SHOW ANSWERS

Download Printable PDF. VALID exam to help you PASS.

5 thoughts on “What should you do?”

Someone says:

08/27/2020 at 11:31 AM

B

1

Reply
Amal says:

02/02/2020 at 5:54 PM

Correct answer is B.
Question says, you need to filter out bad data. So use a ParDo transform with a DoFn() to discard all bad records and you get a PCollection of good data. If you want to keep the bad data, then option C is correct, since you can split the PCollection into 2 (good and bad) based on whether the record is good or bad.

1

Reply
DBT says:

12/14/2019 at 4:48 AM

B ParDo is a Beam transform for generic parallel processing. ParDo is useful for common data processing operations, including
a. Filtering a data set. You can use ParDo to consider each element in a PCollection and either output that element to a new collection, or discard it.
b. Formatting or type-converting each element in a data set.
c.Extracting parts of each element in a data set.
d.Performing computations on each element in a data set.
A does not help
C Partition is a Beam transform for PCollection objects that store the same data type. Partition splits a single PCollection into a fixed number of smaller collections. Again, does not help
D GroupByKey is a Beam transform for processing collections of key/value pairs. GroupByKey is a good way to aggregate data that has something in common
Hence B

4

Reply
Kuldeep says:

10/18/2019 at 1:07 PM

Data is anyway corrupt and of no use . Ans B

1

Reply
Parvinder Singh says:

10/03/2019 at 2:04 PM

Ideally, we should not discard corrupt elements. We should keep the corrupt data separately for later analysis.

Reply

5 thoughts on “What should you do?”

Leave a Reply Cancel reply