Om Agrawal

Hey, I'm Om

I'm a CS student at UT Austin, passionate about building solutions to real problems.

Projects

Reproducing GPT-2 on 4 A100 GPUs

Reproducing GPT-2 Model on 4 A100 GPUs

📄 See the blog post here

Full reproduction of the 124M parameter GPT-2 from scratch: built the transformer (blocks, MLP, causal multi-headed self-attention), loaded released weights to verify correctness, then trained from scratch with bfloat16, torch.compile, FlashAttention, DDP across 4 A100 GPUs. Trained on FineWeb-Edu 10B; evaluated on HellaSwag. Our model beats the reference GPT-2 124M eval score.

PyTorch Transformers DDP FlashAttention
Building the LLM Stack from First Principles

Building the LLM Stack from First Principles

📄 Technical blog

Worked through Andrej Karpathy's Zero to Hero series, coding each lecture rather than just watching, and documented the journey in a technical blog. Built the language model stack from first principles: autograd engine, bigram character-level model, MLP, batch normalization, WaveNet-inspired model, character-level transformer, BPE tokenizer, and finally GPT-2 pretrained on 4 A100 GPUs.

PyTorch Neural Networks Transformers
Arbor Project

Arbor

📄 Access the paper here

A Git-like chat session management tool for coding agents. Implementation available as an open-source MCP server, with a modified mini-swe-agent integration for benchmarking.

Python SWE-Agent MCP Coding Agents
Cornucopia Smart Fridge System

Cornucopia

1st place + Best Use of GenAI at IEEE UT's 2025 Techathon

AI-powered system that tracks fridge inventory using computer vision, reduces food waste, and suggests recipes

Python Gemini HTML CSS Javascript Node.js PostgreSQL ESP32
Multithreaded Lock-Free Web Server

Multithreaded Lock-Free Web Server

Multithreaded Lock-Free Web Server in Rust

Rust
Barnes-Hut Algorithm

Barnes-Hut

Efficient N-body Simulation using the Barnes-Hut Algorithm

C++ MPI
2-Phase Commit in Rust

2-Phase Commit in Rust

Implemented a 2-Phase Commit Simulation in Rust

Rust
Kmeans on GPU

Kmeans on GPU

GPU Accelerated K-means Clustering Algorithm

C++ CUDA
BST Lightning

BST Lightning

Engineered a high-performance, multithreaded algorithm for identifying equivalent Binary Search Trees

Go
DDP Training Project

DDP Training

Implemented Distributed Data Parallel training of VGG11 model on CIFAR-10 dataset using AWS SageMaker, analyzing scaling performance across multiple GPUs to balance parallelism vs. communication overhead in gradient synchronization

PyTorch AWS SageMaker DistributedDataParallel NCCL
LoRA Parameter-Efficient Fine-Tuning Project

LoRA Parameter-Efficient Fine-Tuning

Applied LoRA for parameter-efficient fine-tuning of TinyLlama model on the guanaco dataset by injecting LoRA adapter modules into MLP linear layers, testing how training latency was affected by LoRA rank and module size

PyTorch PEFT Transformers BitsAndBytes TRL
Activation-Aware Weight Quantization Project

Activation-Aware Weight Quantization

Quantized facebook/opt-1.3b model to 3-bits while preserving performance by protecting top 1% of salient weights through hardware-friendly scaling approach rather than mixed-precision, with ablation study revealing optimal scaling factors

PyTorch Transformers Accelerate Datasets
Bhagavad-GPT Project

Bhagavad-GPT

Implemented a customized RAG model centered on the "Bhagavad Gita," optimizing output by referencing an authoritative knowledge base.

Python Langchain OpenAI API Flask Pinecone

Work Experience

Anduril logo

Anduril

Incoming Software Engineering Intern

UT Networked Systems Research Group

Student Researcher
  • Designing a caching mechanism to speed up inference for video generation diffusion models.
  • Working with postdoc Saurabh Agarwal and Prof. Aditya Akella.
ForeFlight logo

ForeFlight

Software Engineering Intern
  • Engineering a multithreaded Rust-based UDP client to implement a unified networking protocol for the ForeFlight app to standardize communication across ~50 hardware devices, replacing disparate protocols with a single, scalable solution
FAA Logo

Federal Aviation Administration

Intern, Office of Senior Technical Experts
  • Developed an algorithm using position and time calculations from 50 million datapoints of Authoritative FAA ADS-B data to detect aircraft GPS location "jumps"
  • Engineered an application displaying geographic flight track data with detected jumps using our "Jump" algorithm, Kepler.gl, and Streamlit
  • Worked directly under the FAA's Chief Scientific Technical Advisor for Satellite Navigation Systems and collaborated with Stanford/Virginia Tech researchers
UT Austin Logo

UT Austin Computer Science

Undergraduate Course Assistant, Discrete Mathematics
  • Helped students with topics including Proposition Logic, Proof Techniques, Graph Theory, and Asymptotic Notation
  • Conducted discussion sections, created practice questions, and graded assignments/exams