TerramEarth has equipped all connected trucks with servers and sensors to collect telemetry data. Next year they want to use the data to train machine learning models. They want to store this data in the cloud while reducing costs.
What should they do?
A. Have the vehicle’s computer compress the data in hourly snapshots, and store it in a Google Cloud Storage (GCS) Nearline bucket
B. Push the telemetry data in real-time to a streaming dataflow job that compresses the data, and store it in Google BigQuery
C. Push the telemetry data in real-time to a streaming dataflow job that compresses the data, and store it in Cloud Bigtable
D. Have the vehicle’s computer compress the data in hourly snapshots, and store it in a GCS Coldline bucket
Its D, key is ‘next year’ and minimal storage costs
D is correct
Answer should be B – Why would you collect data in Coldline when the purpose is to collect data to analysis down the line and cost wise it should be similar to store such large volume in BQ instead of COldline given the Data access cost involved. ML can be directly done on BQ instead of pulling this data from Coldline into another DB and then applying ML algos on the data which will be more expensive to achieve
Bigquery can’t store compressed files.