-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
1 changed file
with
55 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,55 @@ | ||
# Core GPU Design Tenets(Principles) | ||
|
||
i. Lots of simple computational units | ||
ii. Explicitly parallel programming model(parallel programming) | ||
iii. Optimizte for Throughput not latency | ||
iv. More processors means more transistors; more transistors means faster computation, and less power consumption as transistors are getting smaller. But clock rate is kept constant. | ||
|
||
|
||
### Throughput: | ||
|
||
Throughput is number of tasks done in time(tasks per hr) | ||
|
||
### Latency: | ||
|
||
Amount of time it takes to complete one task | ||
|
||
### Important of Programming in Parallel computating | ||
|
||
i. Parallel programming is harder than serial programming(CPU) | ||
|
||
#### CUDA program | ||
i. Written in C with extensions | ||
ii. Allows us to use CPU and GPU with one program | ||
iii. Supports numerous languages | ||
iv. CUDA term for "CPU" is "HOST" | ||
v. CUDA term for "GPU" is "DEVICE" | ||
vi. CUDA compiler will compile our program, split into pieces that will run on "CPU" and "GPU" and generate code for each. | ||
vii. It assumes that "GPU" has a co-processor "CPU", it also assumes that both "GPU" and "CPU" have their own separate memories where they store data. | ||
viii. In this relationship between "CPU" and "GPU", "CPU" is incharge; runs main program, and sends direction to the "GPU" to tell it what to do. | ||
ix. "GPU" is responsible for the following: | ||
i. Moving data from "CPU's" memory to "GPU's" memory. | ||
ii. Moving data from "GPU's" memory to "CPU" memory | ||
In C programming language, moving data from i. to ii. is called as "Mem copy", command; cudMemcpy | ||
iii. Allocating memory in GPU, command; cudeMalloc | ||
iv. Invoking programs in GPU that compute things parallel; these programs are called kernals. | ||
|
||
##### A Typical GPU Program | ||
|
||
i. CPU Allocates storage on GPU (Cuda Malloc) | ||
ii. CPU Copies Input Data from CPU -> GPU (Cuda Memcpy) | ||
iii. CPU Launches Kernal(s) on GPU to process the data (Kernal Launch) | ||
iv. CPU Copies results back to CPU from GPU (cude Memcpy) | ||
Cuda files are saved with .cu extension | ||
|
||
|
||
##### Defining the GPU Computation | ||
|
||
i. Kernals look like Serial programs | ||
ii. Write your program as If It will run on "one" thread | ||
iii. The GPU will run that program on "Many" threads | ||
iv. You need to specify how many threads you want to launch | ||
|
||
**** GPU is good at launching a large number of threads efficiently(may be thousands of threads), and running a large number of threads in parallel. | ||
|
||
|