Barnes-Hut N-body Simulation with MPI

Demo Video

This video demonstrates the Barnes-Hut N-body Simulation with MPI.

Project Overview: Barnes-Hut N-body Simulation with MPI

This project implements a Barnes-Hut algorithm for simulating gravitational interactions between N bodies using MPI (Message Passing Interface) for parallelization. Instead of calculating the O(N²) forces, the Barnes-Hut algorithm approximates groups of bodies far enough away with the center of mass and the group's total mass.

Implementation Challenges:

Parallelization: Efficiently dividing the workload across multiple MPI workers.

Communication Overhead: Managing the trade-off between parallel computation benefits and MPI communication overhead.

Optimizations: Implementing performance optimizations to reduce computational complexity.

Approach

The Barnes-Hut algorithm was implemented in two phases:

Phase 1: Constructing the Barnes-Hut quadtree from the bodies' positions
Phase 2: Calculating the gravitational forces between the N bodies

Program Flow

The program follows this structure:

Read input file into a vector of Body structs
Enter the main simulation loop:
- Calculate forces in parallel across MPI workers
- Update body positions and velocities
- Destroy and reconstruct the quadtree
Write final body positions to output file

Key Data Structures:

struct Body {
    int index;
    double px, py;      // Position
    double mass;
    double vx, vy;      // Velocity
    double fx, fy;      // Force (initialized to 0)
};

struct Node {
    struct Body* b;
    struct Node *NW, *NE, *SW, *SE;
    double minx, maxx, miny, maxy;
};

Architecture & System Details

The implementation was tested on systems with the following specifications:

System Architecture
Architecture	x86_64
CPU(s)	8
Thread(s) per core	2
Core(s) per socket	4
Socket(s)	1
Model name	Intel(R) Xeon(R) CPU E3-1270 v6 @ 3.80 GHz

MPI Execution Details:

To run the program with multiple MPI workers:

HWLOC_COMPONENTS=-gl mpirun -n [number_of_workers] --oversubscribe ./nbody -i input/nb-10.txt -o output/output.txt -s 1000 -t 0.35 -d 0.005

Key Findings & Optimizations

Performance Patterns:

Small datasets (10 bodies): Parallelization overhead exceeds benefits
Medium datasets (100 bodies): Optimal performance with 2-8 workers
Large datasets (100,000 bodies): Significant speedup with 4-8 workers
Performance generally decreases when using more workers than available physical cores (8)

Optimizations Implemented:

Eliminated Square Root Operations: Modified force calculations to avoid computationally expensive square root operations
Loop Invariant Optimization: Precomputed values outside loops to reduce redundant floating-point operations

Conclusion

This project successfully implemented a Barnes-Hut N-body simulation using MPI for parallelization. The results demonstrate the trade-offs between parallelization benefits and communication overhead in distributed computing environments. Optimal performance was achieved with 2-8 workers for medium-sized datasets and 4-8 workers for large datasets, closely matching the number of available physical cores on the test system.