clusterNet

Deep neural network framework for GPU clusters:

supports NVIDIA GPUDirect RDMA
easy distributed computation:

Matrix C = dot(A,B); //uses one GPU
Matrix C = dotMPI(A,B); //uses all available GPUs on the board or in the network
no delay between batches due to asynchronous memory copies to the GPU:
gpu.init_batch_allocator(X, y, 128); for(int i = 0; i < gpu.m_total_batches; i++) { gpu.allocate_next_batch_async(); //loads the next batch while you do computations result = gpu.dot(gpu.m_current_batch_X,w1); //do your computations here gpu.replace_current_batch_with_next(); //get the next batch which is already loaded }

- distributed weights which are larger than a single GPU memory:

  
ClusterNet gpus = ClusterNet(argc,argv,12346);  
Matrix *batch = gpus.rand(128,100000);//34 MB  
Matrix *out1 = empty(128,40000);//19 MB  
Matrix *out2 = empty(128,20000);//9 MB  
Matrix *W1 = gpus.distributed_uniformSqrtWeight(100000,40000);//15258 MB  
Matrix *W2 = gpus.distributed_uniformSqrtWeight(40000,20000);//3051 MB  
gpus.tick("Time taken");  
gpus.dotMPI(batch,W1,out1);  
gpus.dotMPI(out1,W2,out2);  
gpus.tock("Time taken");  
>>>Time taken: 117.704285 ms.

Name		Name	Last commit message	Last commit date
Latest commit History 170 Commits
source		source
tests		tests
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

clusterNet

About

Releases

Packages

Languages

TimDettmers/clusterNet

Folders and files

Latest commit

History

Repository files navigation

clusterNet

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages