Which job type and transforms should this pipeline use?

You are designing an Apache Beam pipeline to enrich data from Cloud Pub/Sub with static reference data from BigQuery. The reference data is small enough to fit in memory on a single worker. The pipeline should write enriched results to BigQuery for analysis. Which job type and transforms should this pipeline use?
A. Batch job, PubSubIO, side-inputs
B. Streaming job, PubSubIO, JdbcIO, side-outputs
C. Streaming job, PubSubIO, BigQueryIO, side-inputs
D. Streaming job, PubSubIO, BigQueryIO, side-outputs

SHOW ANSWERS

Download Printable PDF. VALID exam to help you PASS.

5 thoughts on “Which job type and transforms should this pipeline use?”

C is correct. Streaming data from Pub/Sub will be written to BigQuery. Static data from BigQuery will be the side input to this pipeline. Hence, C.

The pipeline should write enriched results to BigQuery for analysis.

based on this we need BigQueryIO, so C

no where it is mentioned streaming data , it is mentioned as static reference data which is nothing but batch ,Hence A is write

Arun says:

12/15/2019 at 1:19 PM

Static reference data from BigQuery will go as side-inputs and data from pub-sub will go as streaming data using PubSubIO and finally BigQueryIO is required to push the final data to BigQuery. So option C is correct

Reply

5 thoughts on “Which job type and transforms should this pipeline use?”

Leave a Reply Cancel reply