Demo Video
This video demonstrates the K-means clustering algorithm accelerated using a GPU and showcases the project's capabilities.
Project Overview: GPU Accelerated K-means Clustering
The goal of this project was to significantly speed up the K-means clustering algorithm by leveraging the parallel processing power of a GPU. K-means, a popular method for grouping data points, can be computationally intensive, especially with large datasets. To address this, I implemented several variations of the algorithm using CUDA.
Here's a breakdown of the approach:
Key Implementations:
- Sequential (CPU): A standard, non-parallel K-means implementation used as a baseline.
- Sequential K-means++: A sequential version using K-means++ initialization, which helps the algorithm converge faster.
- GPU (Basic): A GPU-accelerated version using CUDA for parallel processing.
- GPU K-means++ (Basic): A GPU version with K-means++ initialization, enhancing convergence speed.
- GPU (Shared Memory): A GPU version utilizing shared memory for optimized data access.
- GPU K-means++ (Shared Memory): The most optimized version, combining GPU acceleration, K-means++ initialization, and shared memory.
Technical Details:
- Data transfer between host and device (CPU to GPU memory).
- CUDA synchronization to ensure correct parallel execution.
- Optimization of thread and block configurations for the GPU.
- Use of shared memory to expedite repeated memory accesses.
Analysis:
- Measurement of iterations required for convergence.
- Recording of execution time per iteration.
- Analysis of data transfer overhead.
Results:
The GPU implementations achieved a 6.3x speedup compared to the sequential versions. Furthermore, the K-means++ initialization method significantly improved convergence, reducing the number of iterations to just two in some cases.
This project demonstrates the effectiveness of GPU acceleration in enhancing the performance of the K-means clustering algorithm, showcasing significant speed improvements and efficient resource utilization.