gpu basic understanding

rameshjes · Jan 21, 2018 · 90648ba · 90648ba
1 parent 13e9082
commit 90648ba
Showing 1 changed file with 55 additions and 0 deletions.
diff --git a/gpu/gpu.txt b/gpu/gpu.txt
@@ -0,0 +1,55 @@
+# Core GPU Design Tenets(Principles)
+
+i. Lots of simple computational units
+ii. Explicitly parallel programming model(parallel programming)
+iii. Optimizte for Throughput not latency
+iv. More processors means more transistors; more transistors means faster computation, and less power consumption as transistors are getting smaller. But clock rate is kept constant.
+
+
+### Throughput:
+
+    Throughput is number of tasks done in time(tasks per hr)
+
+### Latency:
+
+    Amount of time it takes to complete one task
+
+### Important of Programming in Parallel computating
+
+    i. Parallel programming is harder than serial programming(CPU)
+
+#### CUDA program
+    i. Written in C with extensions
+    ii. Allows us to use CPU and GPU with one program
+    iii. Supports numerous languages
+    iv. CUDA term for "CPU" is "HOST"
+    v. CUDA term for "GPU" is "DEVICE"
+    vi. CUDA compiler will compile our program, split into pieces that will run on "CPU" and "GPU" and generate code for each.
+    vii. It assumes that "GPU" has a co-processor "CPU", it also assumes that both "GPU" and "CPU" have their own separate memories where they store data. 
+    viii. In this relationship between "CPU" and "GPU", "CPU" is incharge; runs main program, and sends direction to the "GPU" to tell it what to do. 
+     ix. "GPU" is responsible for the following:
+	i. Moving data from "CPU's" memory to "GPU's" memory.
+        ii. Moving data from "GPU's" memory to "CPU" memory
+        In C programming language, moving data from i. to ii. is called as "Mem copy", command; cudMemcpy
+        iii. Allocating memory in GPU, command; cudeMalloc
+        iv. Invoking programs in GPU that compute things parallel; these programs are called kernals.
+
+##### A Typical GPU Program
+
+      i. CPU Allocates storage on GPU (Cuda Malloc)
+      ii. CPU Copies Input Data from CPU -> GPU (Cuda Memcpy)
+      iii. CPU Launches Kernal(s) on GPU to process the data (Kernal Launch)
+      iv. CPU Copies results back to CPU from GPU (cude Memcpy)
+	Cuda files are saved with .cu extension
+
+
+##### Defining the GPU Computation
+
+      i. Kernals look like Serial programs
+      ii. Write your program as If It will run on "one" thread
+      iii. The GPU will run that program on "Many" threads
+      iv. You need to specify how many threads you want to launch
+
+**** GPU is good at launching a large number of threads efficiently(may be thousands of threads), and running a large number of threads in parallel.
+
+