Machine Learning

I work on ML systems—the infrastructure layer between research and production. GPU kernels, distributed training, inference optimization. The stuff that determines whether a model actually runs fast or just runs. My background is in electrical engineering, so I think about these problems at the hardware level, not just the PyTorch API level.

I'm not interested in fine-tuning models in notebooks. I'm interested in understanding why training is slow and making it faster. Why inference costs what it does and making it cheaper. The systems work that most ML people never touch.

✦ Inference Systems ✦

I built Jinx, an LLM inference engine from scratch in JAX. Fused attention kernels, rotary embeddings, KV cache with paged memory management, continuous batching, chunked prefill, speculative decoding. Multiple quantization paths: int8, int4, and the normalized float formats. Not wrapping existing libraries—writing the kernels myself and understanding why things like prefix caching matter for real workloads.

I contribute to open-source inference engines working on routing, caching, and scheduler improvements. When latency spikes in production, I can trace it through the entire serving stack to find the root cause. That's a different skill than getting a model to run on your laptop.

✦ GPU Kernels ✦

I write CUDA kernels and I care about getting close to theoretical hardware limits. I've reimplemented attention kernels from scratch—the tiling, the online softmax trick, understanding why the backward pass is memory-bound even when forward isn't. I use Triton when I want to iterate fast and drop down to raw CUDA when I need more control.

I profile with Nsight until I understand exactly where cycles go. Warp divergence, shared memory bank conflicts, register pressure, occupancy tuning—these aren't abstract concepts, they're the things I actually debug. My EE background helps here: I understand the GPU architecture at the hardware level, not just the programming model. I know what the tensor cores are doing, why async copy matters, how memory coalescing actually works.

I hang out with people who care about every instruction. The kind of engineers who measure everything and optimize based on data, not intuition. That's the environment I learn best in.

✦ Distributed Training ✦

I've implemented multi-GPU training from scratch: tensor parallelism with column and row splits, pipeline parallelism with microbatch scheduling, optimizer state sharding, gradient checkpointing. Not just configuring flags—actually implementing the communication patterns and understanding when different strategies win.

I understand collective communication at a level where I know when ring all-reduce beats tree topology, how to overlap communication with compute, why the synchronization points in distributed backward matter. When I read papers about training large models, I can implement the core ideas and understand the edge cases they don't mention. The goal is a full training stack in C++ and CUDA without PyTorch—autograd, mixed precision, distributed backward, all of it.

✦ Open Source & Infrastructure ✦

I contribute to ML infrastructure projects—PyTorch internals, quantization libraries, distributed training frameworks. Not documentation fixes, but contributions that require understanding how the systems actually work. When I look at a compiler like torch.compile, I'm thinking about how to make it better, not just how to use it.

I lead a reading group where we go through systems papers and implement them: mixture of experts routing, attention variants, parallelism strategies. The open-source ML infrastructure community is where the real engineering happens, and that's where I want to make an impact.

✦ Research ✦

I've published peer-reviewed research on deep learning for network security (IEEE ICCCNT, with IBM). Did research at UCSC. I understand the difference between benchmark numbers and production performance—the data pipeline issues, distribution shift, all the infrastructure that papers don't talk about. ML engineering is mostly not about models; it's about everything else.

Being a grad student means I've developed some taste for what problems matter and what approaches are likely to work. When I read a new paper, I can usually tell where the method will break down in practice and what assumptions the authors are making that won't hold. That kind of judgment only comes from experience.