You have a data stored in BigQuery. The data in the BigQuery dataset must be highly available. You need to define a storage, backup, and recovery strategy of this data that minimizes cost. How should you configure the BigQuery table?
A. Set the BigQuery dataset to be regional. In the event of an emergency, use a point-in-time snapshot to recover the data.
B. Set the BigQuery dataset to be regional. Create a scheduled query to make copies of the data to tables suffixed with the time of the backup. In the event of an emergency, use the backup copy of the table.
C. Set the BigQuery dataset to be multi-regional. In the event of an emergency, use a point-in-time snapshot to recover the data.
D. Set the BigQuery dataset to be multi-regional. Create a scheduled query to make copies of the data to tables suffixed with the time of the backup. In the event of an emergency, use the backup copy of the table.
There is no such option called multi-regional at a dataset level. And point-in-time will have only 7 days of data stored.
So i choose B.
C
For exam purpose it should be C. However, in real life situation, I think most companies will do some sort of backup outside BigQuery even Google has made so much promise.
Answer is option C.
Highly available = multi-regional (US or EU multi-regions)
storage, backup, and recovery strategy of this data that minimizes cost = let Bigqury do it (no manual backups).
C:
highly available = multi-regional:
https://cloud.google.com/bigquery/docs/locations
recovery strategy of this data that minimizes cost = point-in-time snapshot:
https://cloud.google.com/solutions/bigquery-data-warehouse#backup-and-recovery
C
Since we need to define the strategy of this data that minimizes cost, use a point-in-time snapshot to recover the data. scheduled query to make copies of the data increases the cost which includes query cost to scan entire table and storage cost for new table.
https://medium.com/weareservian/impact-of-dataset-locations-on-bigquery-query-execution-performance-ea1ffdf071fe
B is correct, there is no option to select regional/multiregional when you create dataset. There is nothing called as regional or multiregional . this is just to confuse us .
Dataset locations
You specify a location for storing your BigQuery data when you create a dataset. After you create the dataset, the location cannot be changed, but you can copy the dataset to a different region.
There are two types of locations:
A regional location is a specific geographic place, such as Tokyo. For more information, see Regional resources on the Geography and Regions page.
A multi-regional location is a large geographic area, such as the United States, that contains at least two geographic places. For more information, see Multi-regional resources on the Geography and Regions page.
https://cloud.google.com/bigquery/docs/locations
So, C
exactly
B and D excluded because of this:
–BigQuery automatically replicates data and keeps a seven-day history of changes, allowing you to easily restore and compare data from different times.
No need for scripted backup.
Now, Bigquery is a hihghly available service, but take a look at this…the SLA does not differenciate if we are multiregional or not.
https://cloud.google.com/bigquery/sla
But multiregion storage IS more resilient. So…if we give weight to the “recovery strategy of this data that minimizes cost” we should choose A. Otherwise C.
As highly available i understand as Multiregional because it will replicate the storage in multiple regions…
As backup and recovery strategy the cheaper option will be point-in-time, it will allow to restore your table at a maximum of 7 days before or 2 days if you deleted.
So C