The course "Multicore and GPGPU Programming" provides a foundational understanding of parallel programming, focusing on developing high-performance, multi-threaded applications in both CPU and GPU environments. Beginning with a review of multicore processor architectures, caching mechanisms, and Non-Uniform Memory Access (NUMA) systems, students will learn the essentials of shared memory programming, synchronisation techniques, and the use of locks to ensure data integrity across threads.

Multicore and GPGPU Programming
Grow your skills with Coursera Plus for $239/year (usually $399). Save now.

Recommended experience
Recommended experience
Intermediate level
Basic knowledge of C/C++ and computer architecture is recommended.
Recommended experience
Recommended experience
Intermediate level
Basic knowledge of C/C++ and computer architecture is recommended.
What you'll learn
Understand the fundamentals of multi-threaded programming and its applications in multicore systems.
Develop shared memory programs in OpenMP and distributed programming using MPI.
Gain a foundational understanding of GPGPU architecture and the CUDA programming model.
Skills you'll gain
Tools you'll learn
Details to know

Add to your LinkedIn profile
124 assignments
See how employees at top companies are mastering in-demand skills

There are 12 modules in this course
In this module, the learners will be introduced to the course and its syllabus, setting the foundation for their learning journey. The course's introductory video will provide them with insights into the valuable skills and knowledge they can expect to gain throughout the duration of this course. Additionally, the syllabus reading will comprehensively outline essential course components, including course values, assessment criteria, grading system, schedule, details of live sessions, and a recommended reading list that will enhance the learner’s understanding of the course concepts. Moreover, this module offers the learners the opportunity to connect with fellow learners as they participate in a discussion prompt designed to facilitate introductions and exchanges within the course community.
What's included
4 videos1 reading1 discussion prompt
4 videos• Total 51 minutes
- Course Introductory Video• 2 minutes
- Meet Your Instructor - Dr. Gargi Prabhu • 1 minute
- Meet Your Instructor - Dr. Kunal Korgaonkar• 1 minute
- Recording of Multicore and GPGPU Programming: Week 1 - Live Session on 25-05-23 18:32:50 [47:25]• 47 minutes
1 reading• Total 10 minutes
- Course Overview• 10 minutes
1 discussion prompt• Total 10 minutes
- Meet Your Peers• 10 minutes
In this module, students will gain foundational knowledge of parallel and multi-threaded programming, exploring the core principles that underlie the efficient utilisation of modern multi-core and many-core processors. Beginning with an overview of parallel programming concepts, this module covers different types of parallelism, including data parallelism, task parallelism, and pipeline parallelism. Students will also examine critical performance metrics like speedup, efficiency, and scalability, which help in evaluating the benefits and trade-offs of parallel approaches.
What's included
12 videos2 readings12 assignments1 discussion prompt
12 videos• Total 73 minutes
- Need for Ever-Increasing Performance• 8 minutes
- Parallel Systems and Parallel Programs• 8 minutes
- Concurrent, Parallel, Distributed Systems• 5 minutes
- Types of Parallelism: Data, Task and Pipeline Parallelism• 8 minutes
- Speedup and Efficiency• 5 minutes
- Amdahl’s Law • 5 minutes
- Gustafson’s Law • 5 minutes
- Scalability in Parallel Systems• 5 minutes
- Cost of Parallelisation• 7 minutes
- Sources of Overhead in Parallel Programs • 5 minutes
- Timing Parallel Programs: Methods and Best Practices• 7 minutes
- GPU Performance• 5 minutes
2 readings• Total 120 minutes
- Recommended Reading: Fundamentals of Parallel Computing• 60 minutes
- Recommended Reading: Introduction to Performance Metrics in Parallel Computing• 60 minutes
12 assignments• Total 36 minutes
- Need for Ever-Increasing Performance• 3 minutes
- Parallel Systems and Parallel Programs• 3 minutes
- Concurrent, Parallel, Distributed Systems• 3 minutes
- Types of Parallelism: Data, Task and Pipeline Parallelism• 3 minutes
- Speedup and Efficiency• 3 minutes
- Amdahl’s Law • 3 minutes
- Gustafson’s Law • 3 minutes
- Scalability in MIMD Systems• 3 minutes
- Cost of Parallelisation• 3 minutes
- Sources of Overhead in Parallel Programs• 3 minutes
- Taking Timings of Parallel Programs• 3 minutes
- GPU Performance• 3 minutes
1 discussion prompt• Total 30 minutes
- Why Parallelism? Revisiting the Roots of Multicore Programming• 30 minutes
This module provides an in-depth exploration of multicore processor architectures, examining the design principles, performance considerations, and challenges involved in building efficient multicore systems. Students will study how multiple cores interact within a processor, focusing on memory hierarchies, caching mechanisms, and the role of parallelism in improving computational performance.
What's included
15 videos2 readings15 assignments1 discussion prompt
15 videos• Total 160 minutes
- The Von Neumann Architecture• 7 minutes
- Processes, Multitasking, and Threads• 5 minutes
- The Basics of Caching• 7 minutes
- Virtual Memory• 7 minutes
- Instruction-Level Parallelism• 9 minutes
- Hardware Multithreading• 6 minutes
- Classifications of Parallel Computers• 6 minutes
- SIMD and MIMD Systems• 7 minutes
- Interconnection Networks: Shared Memory Systems• 6 minutes
- Interconnection Networks: Distributed Memory Systems• 8 minutes
- Cache Coherence• 8 minutes
- Shared-Memory vs. Distributed-Memory• 4 minutes
- Parallel Software: Coordinating Process and Threads• 11 minutes
- Distributed Memory Software• 7 minutes
- Recording of Multicore and GPGPU Programming: Week 2 - Live Session on 25-05-30 18:35:08 [02:05]• 62 minutes
2 readings• Total 100 minutes
- Recommended Reading: Architecture Background• 40 minutes
- Recommended Reading: Parallel Hardware and Software• 60 minutes
15 assignments• Total 114 minutes
- The Von Neumann Architecture• 3 minutes
- Processes, Multitasking, and Threads• 3 minutes
- The Basics of Caching• 3 minutes
- Virtual Memory• 3 minutes
- Instruction-Level Parallelism• 3 minutes
- Hardware Multithreading• 3 minutes
- Classifications of Parallel Computer• 3 minutes
- SIMD and MIMD Systems• 3 minutes
- Interconnection Networks: Shared Memory Systems• 3 minutes
- Interconnection Networks: Distributed Memory Systems• 6 minutes
- Cache Coherence• 3 minutes
- Shared-Memory vs. Distributed-Memory• 3 minutes
- Parallel Software: Coordinating Process and Threads• 12 minutes
- Distributed Memory Software• 3 minutes
- Graded Quiz - Modules 1 and 2 • 60 minutes
1 discussion prompt• Total 30 minutes
- From Von Neumann to Multicore: Evolving Architectures and Memory Realities• 30 minutes
This module introduces students to the architectural principles of General-Purpose GPU (GPGPU) systems and the CUDA programming model. It explores the hardware components, including Streaming Multiprocessors (SMs), CUDA cores, and memory hierarchy, which form the foundation of GPU computing. The module also provides an overview of the CUDA programming model, emphasising its thread hierarchy, grid, and block organisation. By understanding these fundamental concepts, students will develop the ability to harness GPU architecture for high-performance parallel computing.
What's included
15 videos2 readings14 assignments1 discussion prompt
15 videos• Total 127 minutes
- GPUs and GPGPU• 5 minutes
- GPU Architecture• 5 minutes
- Heterogeneous Computing• 4 minutes
- Paradigm of Heterogeneous Computing• 5 minutes
- Introduction to CUDA• 5 minutes
- Structure of a CUDA Program• 8 minutes
- Threads, Blocks, and Grid• 9 minutes
- Managing Memory• 7 minutes
- Writing and Verifying Your Kernel• 6 minutes
- Compiling and Running CUDA Program• 4 minutes
- Nvidia Compute Capabilities and Device Architecture• 6 minutes
- Timing Your Kernel• 7 minutes
- Organising Parallel Threads• 5 minutes
- Managing Devices• 4 minutes
- Recording of Multicore and GPGPU Programming: Week 3 - Live Session on 25-06-06 18:31:21 [44:50]• 45 minutes
2 readings• Total 75 minutes
- Recommended Reading: GPGPU Architecture and CUDA• 15 minutes
- Recommended Reading: Programming Model Overview• 60 minutes
14 assignments• Total 48 minutes
- GPUs and GPGPU• 6 minutes
- GPU Architecture• 3 minutes
- Heterogeneous Computing• 3 minutes
- Paradigm of Heterogeneous Computing• 3 minutes
- Introduction to CUDA• 3 minutes
- Structure of a CUDA Program• 3 minutes
- Threads, Blocks, and Grid• 6 minutes
- Managing Memory• 3 minutes
- Writing and Verifying Your Kernel• 3 minutes
- Compiling and Running CUDA Program• 3 minutes
- Nvidia Compute Capabilities and Device Architecture• 3 minutes
- Timing Your Kernel• 3 minutes
- Organising Parallel Threads• 3 minutes
- Managing Devices• 3 minutes
1 discussion prompt• Total 30 minutes
- Harnessing GPU Power: Exploring CUDA and the Architecture of Parallelism• 30 minutes
This module provides a comprehensive understanding of how CUDA executes programs on GPUs. It covers key concepts such as warps, warp scheduling, and resource partitioning, which are critical for understanding GPU hardware behaviour. The module delves into branch divergence and its impact on performance, offering strategies to minimise its effects. It also emphasises exposing parallelism effectively by leveraging CUDA’s hierarchical execution model. Students will learn how to design and optimise GPU programs by aligning with the underlying execution model to maximise efficiency and throughput.
What's included
15 videos2 readings15 assignments1 discussion prompt
15 videos• Total 135 minutes
- Introduction to CUDA Execution Model• 7 minutes
- Warps and Thread Blocks• 4 minutes
- Warp Divergence• 9 minutes
- Resource Partitioning• 6 minutes
- Latency Hiding• 10 minutes
- Occupancy• 5 minutes
- Synchronization• 4 minutes
- Scalability• 5 minutes
- Exposing Parallelism• 10 minutes
- Checking Active Warps with Nvprof• 6 minutes
- Checking Memory Operations with Nvprof• 7 minutes
- Avoiding Branch Divergence• 3 minutes
- The Parallel Reduction Problem and Thread Divergence• 7 minutes
- Improving Divergence in Parallel Reduction• 6 minutes
- Recording of Multicore and GPGPU Programming: Week 4 - Live Session on 25-06-13 18:32:39 [49:37]• 45 minutes
2 readings• Total 120 minutes
- Recommended Reading: Structure of a CUDA Program• 60 minutes
- Recommended Reading: Exposing Parallelism and Avoiding Branch Divergence• 60 minutes
15 assignments• Total 105 minutes
- Introduction to CUDA Execution Model• 3 minutes
- Warps and Thread Blocks • 3 minutes
- Warp Divergence• 3 minutes
- Resource Partitioning• 6 minutes
- Latency Hiding• 3 minutes
- Occupancy• 3 minutes
- Synchronization• 3 minutes
- Scalability• 3 minutes
- Exposing Parallelism• 3 minutes
- Checking Active Warps with Nvprof• 3 minutes
- Checking Memory Operations with Nvprof• 3 minutes
- Avoiding Branch Divergence• 3 minutes
- The Parallel Reduction Problem and Thread Divergence• 3 minutes
- Improving Divergence in Parallel Reduction• 3 minutes
- Graded Quiz - Modules 3 and 4 • 60 minutes
1 discussion prompt• Total 30 minutes
- Under the Hood: Warps, Divergence, and CUDA Execution Dynamics• 30 minutes
The CUDA Memory Model & Streams and Concurrency module introduces students to the intricacies of memory hierarchy in CUDA, including global, shared, and local memory. It emphasises the importance of memory coalescing and efficient memory access patterns to optimise performance on GPUs. The module also covers CUDA streams, explaining how concurrent kernel execution and memory operations can be managed to enhance parallelism. By understanding these concepts, students will gain the ability to design GPU programs that maximise throughput and minimise latency.
What's included
14 videos2 readings14 assignments1 discussion prompt1 ungraded lab
14 videos• Total 126 minutes
- Introduction to CUDA Memory Model• 8 minutes
- Memory Allocation and Deallocation• 6 minutes
- Zero Copy Memory• 4 minutes
- Unified Virtual Addressing and Unified Memory • 3 minutes
- Aligned and Coalesced Access• 6 minutes
- CUDA Shared Memory• 6 minutes
- Shared Memory Banks and Access Mode • 7 minutes
- Configuring the Amount of Shared Memory• 5 minutes
- Synchronisation• 9 minutes
- CUDA Streams• 7 minutes
- Stream Scheduling and Priorities• 6 minutes
- CUDA Events• 6 minutes
- Concurrent Kernel Execution• 6 minutes
- Recording of Multicore and GPGPU Programming: Week 5 - Live Session on 25-06-20 18:31:59 [47:36]• 48 minutes
2 readings• Total 120 minutes
- Recommended Reading: CUDA Memory Model• 60 minutes
- Recommended Reading: Streams and Concurrency• 60 minutes
14 assignments• Total 342 minutes
- Introduction to CUDA Memory Model• 3 minutes
- Memory Allocation and Deallocation• 3 minutes
- Zero Copy Memory• 3 minutes
- Unified Virtual Addressing and Unified Memory • 3 minutes
- Aligned and Coalesced Access• 3 minutes
- CUDA Shared Memory• 6 minutes
- Shared Memory Banks and Access Mode • 3 minutes
- Configuring the Amount of Shared Memory• 3 minutes
- Synchronisation• 3 minutes
- CUDA Streams• 3 minutes
- Stream Scheduling and Priorities• 3 minutes
- CUDA Events• 3 minutes
- Concurrent Kernel Execution• 3 minutes
- SGA-1: CUDA Programming and Performance Optimisation• 300 minutes
1 discussion prompt• Total 30 minutes
- Smart Memory and Seamless Concurrency: CUDA Memory and Streams• 30 minutes
1 ungraded lab• Total 60 minutes
- Hands on lab: Parallel Matrix Addition Using CUDA• 60 minutes
This module explains in depth the difference between processes and threads and introduces multithreaded programming using pthreads library. Students are expected to learn about the various functions in pthreads library and implement those to solve real-world problems through a multithreaded approach. It also discusses precautions to take while developing an algorithm that uses multi-threading.
What's included
10 videos11 readings10 assignments1 discussion prompt
10 videos• Total 116 minutes
- Processes, Threads and Pthreads• 4 minutes
- Hello World!!• 9 minutes
- Matrix-Vector Multiplication• 13 minutes
- Critical Sections• 5 minutes
- Busy Waiting• 6 minutes
- Mutexes• 5 minutes
- Semaphores• 7 minutes
- Barriers and Condition Variables• 13 minutes
- Caches, Cache-Coherence and False Sharing• 9 minutes
- Recording of Multicore and GPGPU Programming: Week 6 - Live Session on 25-06-27 18:38:36 [43:53]• 44 minutes
11 readings• Total 295 minutes
- Recommended Reading: Processes, Threads and Pthreads• 10 minutes
- Recommended Reading: Hello World!!• 60 minutes
- Recommended Reading: Matrix-Vector Multiplication• 15 minutes
- Recommended Reading: Critical Sections• 30 minutes
- Recommended Reading: Busy Waiting• 20 minutes
- Recommended Reading: Mutexes• 15 minutes
- Recommended Reading: Semaphores• 30 minutes
- Recommended Reading: Barriers and Condition Variables• 30 minutes
- Recommended Reading: Read-Write Locks• 60 minutes
- Recommended Reading: Caches, Cache-Coherence and False Sharing• 15 minutes
- Lab Instruction Document• 10 minutes
10 assignments• Total 135 minutes
- Processes, Threads and Pthreads• 9 minutes
- Hello World!!• 9 minutes
- Matrix-Vector Multiplication• 9 minutes
- Critical Sections• 9 minutes
- Busy Waiting• 9 minutes
- Mutexes• 9 minutes
- Semaphores• 6 minutes
- Barriers and Condition Variables• 6 minutes
- Caches, Cache-Coherence and False Sharing• 9 minutes
- Graded Quiz - Modules 5 and 6 • 60 minutes
1 discussion prompt• Total 10 minutes
- Thread Synchronization and Shared Memory: Building Reliable Parallel Programs with Pthreads• 10 minutes
This module aims to introduce students to Distributed memory programming using the Message Passing Interface (MPI). Students will learn about the functions provided by the MPI library and their descriptions. It will enable students to develop parallel programming codes and also to convert a serial programmed code into a parallel code with the help of the MPI functions.
What's included
7 videos9 readings7 assignments1 discussion prompt
7 videos• Total 70 minutes
- Introduction to MPI• 4 minutes
- MPI Setup and Communicator Functions• 6 minutes
- SPMD and Communication• 10 minutes
- Potential Pitfalls• 4 minutes
- Simple Serial Sorting Algorithm• 20 minutes
- Parallel Odd-Even Transposition Sort• 19 minutes
- Safety in MPI Programs• 7 minutes
9 readings• Total 125 minutes
- Recommended Reading: Introduction to MPI• 15 minutes
- Recommended Reading: MPI Setup and Communicator Functions• 15 minutes
- Recommended Reading: SPMD and Communication• 15 minutes
- Recommended Reading: Potential Pitfalls• 15 minutes
- Recommended Reading: Simple Serial Sorting Algorithm• 15 minutes
- Recommended Reading: Parallel Odd-Even Transposition Sort• 15 minutes
- Recommended Reading: Safety in MPI Programs • 15 minutes
- Lab: Practice Code• 10 minutes
- Lab: Practice Solution• 10 minutes
7 assignments• Total 63 minutes
- Introduction to MPI• 9 minutes
- MPI Setup and Communicator Functions• 9 minutes
- SPMD and Communication• 9 minutes
- Potential Pitfalls• 9 minutes
- Simple Serial Sorting Algorithm• 9 minutes
- Parallel Odd-Even Transposition Sort• 9 minutes
- Safety in MPI Programs• 9 minutes
1 discussion prompt• Total 30 minutes
- MPI in Action: Understanding Setup, Communication, and Parallel Sorting• 30 minutes
This module aims to introduce the shared memory programming model with the help of the OpenMP library. Students will gain exposure to the functions in the OpenMP library and methods to implement those in code to implement parallelism using shared memory. Students will explore the foundational concepts of OpenMP through videos and readings, starting with the basics of the library and progressing to more advanced topics such as reduction clauses, variable scoping, and mutual exclusion. Through worked examples like the Trapezoidal Rule and sorting functions, learners will understand how to parallelise loops, manage scheduling, and apply critical sections and locks for safe concurrent execution. The module also covers tasking in OpenMP and classic concurrency problems like producers and consumers.
What's included
12 videos12 readings13 assignments1 discussion prompt
12 videos• Total 94 minutes
- Introduction to OpenMP• 5 minutes
- Programming in OpenMP• 10 minutes
- Trapezoidal Rule• 10 minutes
- Scope of Variables• 4 minutes
- Reduction Clause• 7 minutes
- Parallel-For Directive and Caveats in Them• 8 minutes
- Sorting Functions• 20 minutes
- Scheduling• 6 minutes
- Producers and Consumers• 6 minutes
- Termination, Startup and Atomic Directive• 7 minutes
- Critical Sections and Locks• 6 minutes
- Tasking• 5 minutes
12 readings• Total 152 minutes
- Recommended Reading: Introduction to OpenMP• 15 minutes
- Recommended Reading: Programming in OpenMP• 15 minutes
- Recommended Reading: Trapezoidal Rule• 15 minutes
- Recommended Reading: Scope of Variables• 15 minutes
- Recommended Reading: Reduction Clause• 15 minutes
- Recommended Reading: Parallel-For Directive and Caveats in Them• 15 minutes
- Recommended Reading: Sorting Functions• 15 minutes
- Recommended Reading: Scheduling • 15 minutes
- Recommended Reading: Producers and Consumers• 15 minutes
- Recommended Reading: Termination, Startup and Atomic Directive• 1 minute
- Recommended Reading: Critical Sections and Locks• 1 minute
- Recommended Reading: Tasking• 15 minutes
13 assignments• Total 168 minutes
- Introduction to OpenMP• 9 minutes
- Programming in OpenMP• 9 minutes
- Trapezoidal Rule• 9 minutes
- Scope of Variables• 9 minutes
- Reduction Clause• 9 minutes
- Parallel-For Directive and Caveats in Them• 9 minutes
- Sorting Functions• 9 minutes
- Scheduling• 9 minutes
- Producers and Consumers• 9 minutes
- Termination, Startup and Atomic Directive• 9 minutes
- Critical Sections and Locks• 9 minutes
- Tasking• 9 minutes
- Graded Quiz - Modules 7 and 8• 60 minutes
1 discussion prompt• Total 30 minutes
- Mastering OpenMP: From Parallel Patterns to Synchronisation• 30 minutes
This module will introduce the n-body problem in physics, examining its significance in simulating gravitational interactions among multiple particles. It will explore classical and modern algorithmic approaches to solving the n-body problem, followed by a discussion on their computational complexity. Emphasis will be placed on identifying opportunities for parallelisation, and students will analyse and implement efficient parallel solutions using the programming languages and parallel computing directives covered in the course.
What's included
13 videos13 readings13 assignments1 discussion prompt
13 videos• Total 107 minutes
- Introduction to N-body Problem• 8 minutes
- Serial Solutions to the N-body Problem• 16 minutes
- Parallelising Strategy• 13 minutes
- Parallelising Basic Solver Using OpenMP• 9 minutes
- Parallelising Reduced Solver Using OpenMP • 11 minutes
- Evaluating OpenMP Performance• 5 minutes
- Parallelising Basic Solver Using Pthreads • 4 minutes
- Parallelising Basic Solver Using MPI • 9 minutes
- Parallelising Reduced Solver Using MPI• 9 minutes
- Evaluating MPI Performance• 6 minutes
- Parallelising Basic Solver Using CUDA• 7 minutes
- Evaluating CUDA Solver and Improving Performance• 4 minutes
- Using Shared Memory for Solvers• 7 minutes
13 readings• Total 195 minutes
- Recommended Reading: Introduction to N-body Problem• 15 minutes
- Recommended Reading: Serial Solutions to the N-body Problem• 15 minutes
- Recommended Reading: Parallelising Strategy• 15 minutes
- Recommended Reading: Parallelising Basic Solver Using OpenMP• 15 minutes
- Recommended Reading: Parallelising Reduced Solver Using OpenMP• 15 minutes
- Recommended Reading: Evaluating OpenMP performance• 15 minutes
- Recommended Reading: Parallelising Basic Solver Using Pthreads• 15 minutes
- Recommended Reading: Parallelising Basic Solver Using MPI• 15 minutes
- Recommended Reading: Parallelising Reduced Solver Using MPI• 15 minutes
- Recommended Reading: Evaluating MPI Performance• 15 minutes
- Recommended Reading: Parallelising Basic Solver Using CUDA• 15 minutes
- Recommended Reading: Evaluating CUDA Solver and Improving Performance• 15 minutes
- Recommended Reading: Using Shared Memory for Solvers• 15 minutes
13 assignments• Total 138 minutes
- Introduction to N-body Problem• 9 minutes
- Serial Solutions to the N-body Problem• 9 minutes
- Parallelising Strategy• 9 minutes
- Parallelising Basic Solver Using OpenMP• 9 minutes
- Parallelising Reduced Solver Using OpenMP• 9 minutes
- Evaluating OpenMP Performance• 9 minutes
- Parallelising Basic Solver Using Pthreads• 9 minutes
- Parallelising Basic Solver Using MPI• 30 minutes
- Parallelising Reduced Solver Using MPI• 9 minutes
- Evaluating MPI Performance• 9 minutes
- Parallelising Basic Solver Using CUDA• 9 minutes
- Evaluating CUDA Solver and Improving Performance• 9 minutes
- Using Shared Memory for Solvers• 9 minutes
1 discussion prompt• Total 30 minutes
- The N-Body Solver: Exploring Parallelism Across Models• 30 minutes
This module focuses on hands-on implementations of the Sample Sort algorithm using OpenMP, Pthreads, MPI, and CUDA. Students will explore the strengths and limitations of each parallel programming model through practical coding exercises. The module includes performance benchmarking and comparative analysis of the implementations to highlight trade-offs in scalability, efficiency, and suitability for different architectures. By the end of the module, students will have a strong grasp of each API and be equipped to make informed decisions about the most appropriate tool for a given parallel computing task.
What's included
8 videos9 readings10 assignments1 discussion prompt
8 videos• Total 61 minutes
- Sample Sort and Bucket Sort• 10 minutes
- Map• 17 minutes
- Implementing Sample Sort Using OpenMP: First Implementation• 5 minutes
- Implementing Sample Sort Using OpenMP: Second Implementation• 7 minutes
- Implementing Sample Sort Using Pthreads • 4 minutes
- Implementing Sample Sort Using MPI• 6 minutes
- Implementing Sample Sort Using MPI: Example• 5 minutes
- Implementing Sample Sort Using CUDA • 7 minutes
9 readings• Total 115 minutes
- Recommended Reading: Sample Sort and Bucket Sort• 15 minutes
- Recommended Reading: Map• 10 minutes
- Recommended Reading: Implementing Sample Sort Using OpenMP: First Implementation• 15 minutes
- Recommended Reading: Implementing Sample Sort Using OpenMP: Second Implementation• 15 minutes
- Recommended Reading: Implementing Sample Sort Using Pthreads• 10 minutes
- Recommended Reading: Implementing Sample Sort Using MPI• 15 minutes
- Recommended Reading: Implementing Sample Sort Using MPI: Example• 15 minutes
- Recommended Reading: Implementing Sample Sort Using CUDA• 10 minutes
- Recommended Reading: Which API?• 10 minutes
10 assignments• Total 432 minutes
- Sample Sort and Bucket Sort• 9 minutes
- Map (Quiz)• 9 minutes
- Implementing Sample Sort Using OpenMP: First Implementation• 9 minutes
- Implementing Sample Sort Using OpenMP: Second Implementation• 9 minutes
- Implementing Sample Sort Using Pthreads• 9 minutes
- Implementing Sample Sort Using MPI• 9 minutes
- Implementing Sample Sort Using MPI: Example• 9 minutes
- Implementing Sample Sort Using CUDA• 9 minutes
- Graded Quiz - Modules 9 and 10• 60 minutes
- SGA-2: Odd-Even Transposition Sort Parallelisation • 300 minutes
1 discussion prompt• Total 30 minutes
- Parallel Sample Sort Across Platforms• 30 minutes
Final Comprehensive Examination
What's included
1 assignment
1 assignment• Total 30 minutes
- Final Comprehensive Examination • 30 minutes
Instructors


Offered by

Offered by

Birla Institute of Technology & Science, Pilani (BITS Pilani) is one of only ten private universities in India to be recognised as an Institute of Eminence by the Ministry of Human Resource Development, Government of India. It has been consistently ranked high by both governmental and private ranking agencies for its innovative processes and capabilities that have enabled it to impart quality education and emerge as the best private science and engineering institute in India. BITS Pilani has four international campuses in Pilani, Goa, Hyderabad, and Dubai, and has been offering bachelor's, master’s, and certificate programmes for over 58 years, helping to launch the careers for over 1,00,000 professionals.
Why people choose Coursera for their career

Felipe M.

Jennifer J.

Larry W.

Chaitanya A.

Open new doors with Coursera Plus
Unlimited access to 10,000+ world-class courses, hands-on projects, and job-ready certificate programs - all included in your subscription
Advance your career with an online degree
Earn a degree from world-class universities - 100% online
Join over 3,400 global companies that choose Coursera for Business
Upskill your employees to excel in the digital economy
Frequently asked questions
To access the course materials, assignments and to earn a Certificate, you will need to purchase the Certificate experience when you enroll in a course. You can try a Free Trial instead, or apply for Financial Aid. The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.
When you purchase a Certificate you get access to all course materials, including graded assignments. Upon completing the course, your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.
Yes. In select learning programs, you can apply for financial aid or a scholarship if you can’t afford the enrollment fee. If fin aid or scholarship is available for your learning program selection, you’ll find a link to apply on the description page.
More questions
Financial aid available,