What should you do?

Your team is responsible for developing and maintaining ETLs in your company. One of your Dataflow jobs is failing because of some errors in the input data, and you need to improve reliability of the pipeline (incl. being able to reprocess all failing data).
What should you do?
A. Add a filtering step to skip these types of errors in the future, extract erroneous rows from logs.
B. Add a try… catch block to your DoFn that transforms the data, extract erroneous rows from logs.
C. Add a try… catch block to your DoFn that transforms the data, write erroneous rows to PubSub directly from the DoFn.
D. Add a try… catch block to your DoFn that transforms the data, use a sideOutput to create a PCollection that can be stored to PubSub later.

Download Printable PDF. VALID exam to help you PASS.

6 thoughts on “What should you do?

  1. Answer is C : Check this post : https://medium.com/google-cloud/dead-letter-queues-simple-implementation-strategy-for-cloud-pub-sub-80adf4a4a800
    The error records are directly written to PubSub from the DoFn (it’s equivalent in python).
    You cannot directly write a PCollection to PubSub. You have to extract each record and write one at a time. Why do the additional work and why not write it using PubSubIO in the DoFn itself ?
    You can write the whole PCollection to Bigquery though, as explained in https://cloud.google.com/blog/products/gcp/handling-invalid-inputs-in-dataflow

  2. Ok this is a PaDos…and Dont’s, as it says, If the failure is within the processing code of a DoFn, one way to handle this is to catch the exception, log an error, and then drop the input. But just logging the elements isn’t ideal because it doesn’t provide an easy way to see these malformed inputs and reprocess them later. So we can use a side output or send it somewhere else.

    A and B are out, you will not have the failing rows to work later, which is a requirement.
    So is between C and D, do we send the failing rows directly to pubsub, or do we set a sideoutput. Both seem valid to me but D looks better and easy. Not 100% sure.

Leave a Reply

Your email address will not be published. Required fields are marked *


The reCAPTCHA verification period has expired. Please reload the page.