Difference between CPU and GPU in parallel computing

Thursday, July 22, 2010

Difference between CPU and GPU in parallel computing

CPU frequency growth is now limited by physical matters and high power consumption. Their performance is often raised by increasing the number of cores. Present day processors may contain up to four cores (further growth will not be fast), and they are designed for common applications, they use MIMD (multiple instructions / multiple data). Each core works independently of the others, executing various instructions for various processes.
Special vector features (SSE2 and SSE3) for four-component (single floating-point precision) and two-component (double precision) vectors appeared in general-purpose processors because of increased requirements of graphics applications in the first place. That's why it's more expedient to use GPUs for certain tasks, as they are initially designed for them.
For example, NVIDIA chips are based on a multiprocessor with 8-10 cores and hundreds of ALUs, several thousand registers and some shared memory. Besides, a graphics card contains fast global memory, which can be accessed by all multiprocessors, local memory in each multiprocessor, and special memory for constants.
Most importantly, these several multiprocessor cores in a GPU are SIMD (single instruction, multiple data) cores. And these cores execute the same instructions simultaneously. This programming style is common for graphics algorithms and many scientific tasks, but it requires specific programming. In return, this approach allows to increase the number of execution units by simplifying them.
So, let's enumerate the main differences between CPU and GPU architectures. CPU cores are designed to execute a single thread of sequential instructions with maximum speed, and GPUs are designed for fast execution of many parallel instruction threads. General-purpose processors are optimized for high performance of a single command thread processing integer and floating-point numbers. Random memory access.
CPU engineers try to have their products execute as many instructions in parallel as possible to increase performance. So
Intel Pentium introduced superscalar execution of two instructions per cycle, and Pentium Pro added out-of-sequence execution of instructions. However, parallel execution of a sequential instruction thread has some basic limitations, and increasing the number of execution units does not yield proportional performance gains.

0 comments: