Skip to content

Commit

Permalink
gpu basic understanding
Browse files Browse the repository at this point in the history
  • Loading branch information
rameshjes committed Jan 21, 2018
1 parent 13e9082 commit 90648ba
Showing 1 changed file with 55 additions and 0 deletions.
55 changes: 55 additions & 0 deletions gpu/gpu.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# Core GPU Design Tenets(Principles)

i. Lots of simple computational units
ii. Explicitly parallel programming model(parallel programming)
iii. Optimizte for Throughput not latency
iv. More processors means more transistors; more transistors means faster computation, and less power consumption as transistors are getting smaller. But clock rate is kept constant.


### Throughput:

Throughput is number of tasks done in time(tasks per hr)

### Latency:

Amount of time it takes to complete one task

### Important of Programming in Parallel computating

i. Parallel programming is harder than serial programming(CPU)

#### CUDA program
i. Written in C with extensions
ii. Allows us to use CPU and GPU with one program
iii. Supports numerous languages
iv. CUDA term for "CPU" is "HOST"
v. CUDA term for "GPU" is "DEVICE"
vi. CUDA compiler will compile our program, split into pieces that will run on "CPU" and "GPU" and generate code for each.
vii. It assumes that "GPU" has a co-processor "CPU", it also assumes that both "GPU" and "CPU" have their own separate memories where they store data.
viii. In this relationship between "CPU" and "GPU", "CPU" is incharge; runs main program, and sends direction to the "GPU" to tell it what to do.
ix. "GPU" is responsible for the following:
i. Moving data from "CPU's" memory to "GPU's" memory.
ii. Moving data from "GPU's" memory to "CPU" memory
In C programming language, moving data from i. to ii. is called as "Mem copy", command; cudMemcpy
iii. Allocating memory in GPU, command; cudeMalloc
iv. Invoking programs in GPU that compute things parallel; these programs are called kernals.

##### A Typical GPU Program

i. CPU Allocates storage on GPU (Cuda Malloc)
ii. CPU Copies Input Data from CPU -> GPU (Cuda Memcpy)
iii. CPU Launches Kernal(s) on GPU to process the data (Kernal Launch)
iv. CPU Copies results back to CPU from GPU (cude Memcpy)
Cuda files are saved with .cu extension


##### Defining the GPU Computation

i. Kernals look like Serial programs
ii. Write your program as If It will run on "one" thread
iii. The GPU will run that program on "Many" threads
iv. You need to specify how many threads you want to launch

**** GPU is good at launching a large number of threads efficiently(may be thousands of threads), and running a large number of threads in parallel.


0 comments on commit 90648ba

Please sign in to comment.