diff --git a/gpu/gpu.txt b/gpu/gpu.txt new file mode 100644 index 0000000..627662d --- /dev/null +++ b/gpu/gpu.txt @@ -0,0 +1,55 @@ +# Core GPU Design Tenets(Principles) + +i. Lots of simple computational units +ii. Explicitly parallel programming model(parallel programming) +iii. Optimizte for Throughput not latency +iv. More processors means more transistors; more transistors means faster computation, and less power consumption as transistors are getting smaller. But clock rate is kept constant. + + +### Throughput: + + Throughput is number of tasks done in time(tasks per hr) + +### Latency: + + Amount of time it takes to complete one task + +### Important of Programming in Parallel computating + + i. Parallel programming is harder than serial programming(CPU) + +#### CUDA program + i. Written in C with extensions + ii. Allows us to use CPU and GPU with one program + iii. Supports numerous languages + iv. CUDA term for "CPU" is "HOST" + v. CUDA term for "GPU" is "DEVICE" + vi. CUDA compiler will compile our program, split into pieces that will run on "CPU" and "GPU" and generate code for each. + vii. It assumes that "GPU" has a co-processor "CPU", it also assumes that both "GPU" and "CPU" have their own separate memories where they store data. + viii. In this relationship between "CPU" and "GPU", "CPU" is incharge; runs main program, and sends direction to the "GPU" to tell it what to do. + ix. "GPU" is responsible for the following: + i. Moving data from "CPU's" memory to "GPU's" memory. + ii. Moving data from "GPU's" memory to "CPU" memory + In C programming language, moving data from i. to ii. is called as "Mem copy", command; cudMemcpy + iii. Allocating memory in GPU, command; cudeMalloc + iv. Invoking programs in GPU that compute things parallel; these programs are called kernals. + +##### A Typical GPU Program + + i. CPU Allocates storage on GPU (Cuda Malloc) + ii. CPU Copies Input Data from CPU -> GPU (Cuda Memcpy) + iii. CPU Launches Kernal(s) on GPU to process the data (Kernal Launch) + iv. CPU Copies results back to CPU from GPU (cude Memcpy) + Cuda files are saved with .cu extension + + +##### Defining the GPU Computation + + i. Kernals look like Serial programs + ii. Write your program as If It will run on "one" thread + iii. The GPU will run that program on "Many" threads + iv. You need to specify how many threads you want to launch + +**** GPU is good at launching a large number of threads efficiently(may be thousands of threads), and running a large number of threads in parallel. + +