What should you do?

You are working on a niche product in the image recognition domain. Your team has developed a model that is dominated by custom C++ TensorFlow ops your team has implemented. These ops are used inside your main training loop and are performing bulky matrix multiplications. It currently takes up to several days to train a model. You want to decrease this time significantly and keep the cost low by using an accelerator on Google Cloud. What should you do?
A. Use Cloud TPUs without any additional adjustment to your code.
B. Use Cloud TPUs after implementing GPU kernel support for your customs ops.
C. Use Cloud GPUs after implementing GPU kernel support for your customs ops.
D. Stay on CPUs, and increase the size of the cluster you’re training your model on.

SHOW ANSWERS

Download Printable PDF. VALID exam to help you PASS.

2 thoughts on “What should you do?”

Cannot be A, B as Tensor Flow operations with C++ custom code are not supported with TPUs, and D might be the best option for training custom C++ code but in the question they mentioned about training faster, hence we choose C.

B
https://towardsdatascience.com/when-to-use-cpus-vs-gpus-vs-tpus-in-a-kaggle-competition-9af708a8c3eb

Price considerations when training models
While our comparisons treated the hardware equally, there is a sizeable difference in pricing. TPUs are ~5x as expensive as GPUs ($1.46/hr for a Nvidia Tesla P100 GPU vs $8.00/hr for a Google TPU v3 vs $4.50/hr for the TPUv2 with “on-demand” access on GCP ). If you are trying to optimize for cost then it makes sense to use a TPU if it will train your model at least 5 times as fast as if you trained the same model using a GPU.

2 thoughts on “What should you do?”

Leave a Reply Cancel reply