Suppose there is a farmer who wants to plow his farm. He has only a limited sum of money to buy a livestock using which he can get his job done. Either he can buy 1024 chickens or 2 strong Oxen. A smart choice would be 1024 Chickens. The essential idea is Parallelism. We can solve large problems by breaking them into smaller pieces and then we can run these pieces at the same time.
Earlier parallel computing used to be a niche technology restricted only to super computers. But with advancements in GPUs the world has gone parallel. Modern day computers are like chickens which have 100s of processors that can run each piece of your problem in parallel. A high end GPU contains over 3,000 arithmetic units which can perform 3,000 arithmetic operations, ALUs, simultaneously. GPUs can handle tens of thousands of parallel pieces of work at the same time Those pieces of work are known as threads and a modern GPU can run upto 65,000 concurrent threads. But to achieve this kind of computing power we require a kind of programming technique different from conventional sequential approach.
Here I’ll describe one such technique. Suppose you need to sum n numbers, what would be the step complexity of this problem?
O(n), if you’ll go by sequential approach. Now parallel computing can do this job in O(logn). Hence, a GPU can expedite a process exponentially if the problem involves huge amount of arithmetic operations. The programming technique is known as Reduce.
In sequential processing summation is done one at a time. While in GPU summation is done simultaneously as depicted in the illustration below.
GPUs have played an indispensable role in advancements of Deep Learning through their Parallelism. All matrix Multiplications it involve can be done much faster now. The computation power and performance which GPUs provide to Deep Learning, still a lot of uncharted territories can be explored.