SC3260 / SC5260

High Performance Computing

Spring 2020

Instructors
Ana Gainaru (website)
Hongyang Sun (website)

Office Jacobs and Featheringill Hall #382

Lectures Featheringill Hall #134

Prerequisites
Basic scentific computing knowledge (e.g., matrix computations, sorting)
Basic knowledge in C/C++

Table of contents

Lectures
Grading
- Grade Breakdown
- Letter Grades

Lectures

The tentative schedule with PDF lecture notes is shown below (refresh this page to view the lates changes). The schedule will be updated when needed and significant changes will be announced in class.

Lectures	Date	Slides	Additional material
Uniprocessors	01/06	Uniprocs.pdf
Overview of HPC	01/08	HPC.pdf
Introduction to parallel computing I	01/13	Parallel(I).pdf
Introduction to parallel computing II	01/15	Parallel(II).pdf
Introduction to C	01/22	IntroC.pdf	- C extensions for Python: https://cython.org/ - Code examples and documentation in the `C_code` folder
Submitting jobs on a cluster	01/27	SLURM.pdf	ACCRE Tutorials: Link
Shared-memory algorithms	01/29 -- 02/10	Shared.pdf	- Cilk code examples in the `Cilk_code` folder - Cilk installation guide on Brightspace with Homework 1
GPU	02/12	GPU.pdf	Nvidia‘s CUDA parallel computation API from Python: Link
GPU Performance	02/17	GPU_perf.pdf	CUDA code examples in the `CUDA_code` folder. Short tutorial on how to run GPU programs on ACCRE can be found here: ACCRE_guide.pdf
Interconect and communication	02/19	Interconnect.pdf
Homework Review	02/24
MPI	02/26, 03/09	MPI slides	MPI code examples in the `MPI_code` folder. MPI code examples used for the slides can be found here
Distributied-memory algorithms I	03/16	Distributed(I).pdf
Midterm	03/18
Distributied-memory algorithms II	03/23, 03/25	Distributed(II).pdf
Parallel File System	03/30	PFS.pdf	The two examples used in the slides can be found here: PFS_example.pdf
Scheduling I	04/01	BatchScheduler(I).pdf
Scheduling II	04/06	BatchScheduler(II).pdf	Batch Scheduler Simulator: ScheduleFlow.zip To run the examples from the slides: `cd examples`, `python generate_gif_example.py` GIF files can be found in ../draw Generating GIFs requires pdflatex and convert from ImageMagick Updates on Github
Fault tolerance	04/08	FaultTolerance.pdf
Grad students presentations	04/13	Below	Papers and presentations in the `research` folder

Grad students presentations

Paper	Presenter	Slides	Paper
A Three-Dimensional Approach to Parallel Matrix Multiplication	Damin Xia	p1_slides.pdf	p1_paper.pdf
Exposing Hidden Performance Opportunities in High Performance GPU Applications	Chloe Frame	p2_slides.pdf	p2_paper.pdf
GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers	Quach Co	p3_slides.pdf	p3_paper.pdf
Strong scaling of general-purpose molecular dynamics simulations on GPUs	Duncan Steward	p4_slides.pdf	p4_paper.pdf
How Much Parallelism is There in Irregular Applications?	Xiaobo Liu	p5_slides.pdf	p5_paper.pdf

Midterm

The midterm material includes everything before the first MPI lecture (all the lectures including the one on 02/19). The topics include:

Paradigms of parallel computing: data parallel, task parallel, pipelining, speed-ups
Parallel architecture concepts: SIMD, vector machines, array processors, MIMD, uniform shared memory, nonuniform shared memory, distributed memory, distributed shared memory
SPMD versus MIMD style programming
Shared memory algorithms and performance models
Concepts of GPU programming and differences between GPU and CPU
Interconnection networks

The crash course on C is NOT included in the midterm topics

The midterm is in written format, will include 6 questions and will last for one hour. The midterm is open notes.

Midterm date: March 18th, 2020 (online)

Homeworks

All assignments are mandatory and part of the final grade. Assignments and homework should be turned in before or at the due date before midnight. When turned in late, 5% will be deducted from the project grade per day until the submission has been received, with a maximum extension of five days.

There will be 3 homeworks (equal weight in the final grade):

Shared memory (Cilk programming)
- Link to homework 1: available on Brightspace
- Deadline: February 14th
Shared memory (GPU programming)
- Link to homework 2: available on Brightspace
- Deadline: March 20th
Distributed memory (MPI programming)
- Link to homework 3: available on Brightspace
- Deadline: April 6th

Group Project

Project

The project is currently suspended due to the cancellation of all in-person classes. More details on how this will move forward will be announced later.

Update: The project is canceled.

The project consists of parallelizing an algorithm (in any programming language) to solve a specific problem and analyzing the scalability and bottlenecks of the parallel code. The problem to be solved can be something from your field of study, a benchmark or one discussed in class (e.g. Sort, Marix Multiplication). If a simple problem is chosen, it will have to be implemented with multiple methods (e.g. different algorithms in shared memory CPU vs GPU, or shared vs distributed CPU) and the results will have to be compared with each other and explained. All chosen problems and algorithms must be approved.

Groups
This is a group project (2 or 3 students per team).

Deadline
Problem and algorithm choice must be sent for approval by March 16th with the complete list of team members.
Submission of the projects must be done before April 12th.
Each group will have to submit the codes and a small report describing the algorithm and code analysis.

Presentations
Presentations will be held on the last week of the class
Order of presentations will be decided by the instructors by the middle of March.

Research paper presentations

Presentations

Only for grad students

Grad students need to choose a paper related to parallel/distributed programming, optimization, HPC, large-scale architectures/applications and present them in the last week of class. Chosen papers need to be approved. The following is a list of good candidate papers (email us your selections for approval before April 1st). Feel free to chose one of these or something related to HPC from your field of study.

Milind Kulkarni, Martin Burtscher, Rajasekhar Inkulu, Keshav Pingali and Calin Cascaval. "How Much Parallelism is There in Irregular Applications?" Principles and Practices of Parallel Programming (PPoPP), February, 2009.
A.Y. Grama, A. Gupta, and V. Kumar. "Isoefficiency: Measuring the Scalability of Parallel Algorithms and Architectures," Parallel & Distributed Technology: Systems & Applications, IEEE, see also IEEE Concurrency, Volume 1, Issue 3, Aug. 1993, pp. 12-21.
S. Srinivasan, R. Kettimuthu, V. Subramani and P. Sadayappan, "Characterization of backfilling strategies for parallel job scheduling", Proceedings. International Conference on Parallel Processing Workshop, 2002
W. Daniel Hillis and Guy L. Steele. "Data Parallel Algorithms," Communications of the ACM, December 1986, pp. 1170-1183.
David E. Culler, Richard M. Karp, David Patterson, Abhijit Sahay, Eunice E. Santos, Klaus Erik Schauser, Ramesh Subramonian, Thorsten von Eicken. "LogP: A Practical Model of Parallel Computation," Communications of the ACM, Volume 39, Issue 11, November 1996, pp. 78 - 85.
V. Nageshwara Rao and Vipin Kumar. "Superlinear Speedup in Parallel State-Space Search," Lecture Notes In Computer Science, Vol. 338, Proceedings of the Eighth Conference on Foundations of Software Technology and Theoretical Computer Science, pp. 161-174, 1988.
R. C. Agarwal, S. M. Balle, F. G. Gustavson, M. Joshi, and P. Palkar. "A Three-dimensional Approach to Parallel Matrix Multiplication," IBM Journal of Research and Development, Volume 39, Number 5, 1995.
Fabrizio Petrini, Darren J. Kerbyson, Scott Pakin. "The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8,192 Processors of ASCI Q," Conference on High Performance Networking and Computing, Proceedings of the 2003 ACM/IEEE Conference on Supercomputing, 2003.
J.T.Daly, "A higher order estimate of the optimum checkpoint interval for restart dumps", Future Generation Computer Systems, Volume 22, Issue 3, Pages 303-312, February 2006

The submission will be done on Brightspace:

Students will have to prepare slides (1-2 slides for each):
- Describing what the paper is trying to solve
- The current state-of-the-art related to the problem tackled by the paper
- What is the paper bringing new compared to the state-of-the-art
- Details on the implementation that is solving the problem
- Results
- Students optinion on the paper
The slides will be uploaded on the website and will be accessible to all the other students
Deadline: April 13th

Additional Materials

The material in this section is not required for this class. The following books give a good detailed overview to multiple HPC related topics.

"Software Optimization for High Performance Computing: Creating Faster Applications" by K.R. Wadleigh and I.L. Crawford, Hewlett-Packard professional books, Prentice Hall.
"The Software Optimization Cookbook" (2nd ed.) by R. Gerber, A. Bik, K. Smith, and X. Tian, Intel Press.
"Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers" (2nd ed.) by B. Wilkinson and M. Allen, Prentice Hall.
"Parallel Scientific Computing in C++ and MPI" by G. Karniadakis and R. Kirby II, Cambridge University Press.
"Scientific Parallel Computing" by L.R. Scott, T. Clark, and B. Bagheri, Princeton University Press.
"Sourcebook of Parallel Programming" by J. Dongara, I. Foster, G. Fox, W. Gropp, K. Kennedy, L. Torczon, and A. White (eds), Morgan Kaufmann.
"Designing and Building Parallel Programs: Concepts and Tools for Parallel Software Engineering" by I. Foster, Addison-Wesley Longman Publishing Co., Inc.
"Introduction to Parallel Computing (2nd ed.)" by A. Grama, A. Gupta, G. Karypis, V. Kumar. Addison-Wesley Professional
"Introduction to Algorithms (3rd ed.)" by T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. MIT Press and McGraw-Hill.
"Parallel Algorithms" by H. Casanova, A. Legrand, Y. Robert. Chapman & Hall/CRC.

Grading

Grade Breakdown

Honor Code Violations All exams and assignments must be completed individually, unless stated otherwise. Copying solutions is considered cheating. Submitted source code listings will be compared. Keep a copy of the listings to provide evidence of creative work. Students are expected to uphold the Honor Code. Any student involved in cheating is in violation of the Honor code. Consult the "Student Handbook" for more details on the Honor code.

Percentage grade	Undergraduate credit	Graduate credit
Midterm exam	40%	40%
Homework	60%	45%
Research article presentation	n/a	15%

Letter Grades

A	B	C
94-100% A+	76-81% B+	56-61% C+
88-93% A	70-75% B	50-55% C
82-87% A-	62-69% B-	42-49% C-

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SC3260 / SC5260

High Performance Computing

Lectures

Midterm

Homeworks

Group Project

Project

Research paper presentations

Presentations

Only for grad students

Additional Materials

Grading

Grade Breakdown

Letter Grades

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 148 Commits
CUDA_code		CUDA_code
C_code		C_code
Cilk_code		Cilk_code
MPI_code		MPI_code
documentation		documentation
figures		figures
lectures		lectures
research		research
schedule_simulations		schedule_simulations
README.md		README.md
accre_gpu_guide.pdf		accre_gpu_guide.pdf

vanderbiltscl/SC3260_HPC

Folders and files

Latest commit

History

Repository files navigation

SC3260 / SC5260

High Performance Computing

Lectures

Midterm

Homeworks

Group Project

Project

Research paper presentations

Presentations

Only for grad students

Additional Materials

Grading

Grade Breakdown

Letter Grades

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages