Skip to content
View greninja's full-sized avatar
😃
😃

Block or report greninja

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
greninja/README.md

Hi there 👋

I have a strong foundation in research 🧪, with some experience building software 💻 under my belt. I earned my Master’s degree in Computer Science from UBC (Vancouver), where my research focused on trustworthy machine learning, leading to a Spotlight acceptance at a top-tier Machine Learning Conference, NeurIPS'24 (Link). For more details on my background and projects, please visit my webpage (link on the left).

🔭 During my master’s, I was part of a Systems research lab that naturally led me to develop a deep interest in the intersection of Systems and Deep Learning. Over time, this interest has evolved, and I now find myself particularly drawn to the infrastructure and system-level challenges that come with deploying large language models (LLMs) in production. I’m especially fascinated by the components that make LLM inference faster. For instance, I am currently looking at caching layers, optimized scheduling and batching techniques, speculative decoding and attention mechanism optimizations via CUDA.

One of the projects that I am building along these lines is an open-source LLM cache -- designed to intercept and reuse responses for repeated or similar queries, cutting down on redundant computation. It’s implemented in Python with Redis as the backend, and includes support for semantic similarity search. Through this project, I’m diving into trade-offs in cache design and performance across different workloads. Exploring how these low and high level system optimizations directly impact real-world latency and throughput is something I’m really excited about, and I’m currently looking to contribute to projects or teams that operate at intersection of high-performance systems and cutting-edge ML.

🌱 Additionally, as I move forward in my career, I’m particularly drawn to early to mid-stage startups where I can contribute to meaningful growth. I highly value a strong, dynamic founding team and an engineering culture that promotes both personal and professional development. I’m open to exploring different domains, as long as the product and roadmap are compelling, the team is solid, and there’s significant opportunity for learning and growth. If you think I’d be a great fit, feel free to reach out!

Pinned Loading

  1. adaptive-randomized-smoothing adaptive-randomized-smoothing Public

    Forked from ubc-systopia/adaptive-randomized-smoothing

    Python

  2. NPLM NPLM Public

    Neural Network for word embeddings and Language Model

    Python 5 5

  3. PoU-Manifold PoU-Manifold Public

    Code to smoothly patch together local functions (linear/ polynomial) defined on charts of a manifold to give a global approximation using Partitions of Unity (PoU)

    Python 1 1

  4. javaminining javaminining Public

    Code to scrap git repositories and fetch commits (wherein new function arguments are added) using PyDriller

    Python

  5. ml_privacy_meter ml_privacy_meter Public

    Forked from privacytrustlab/ml_privacy_meter

    Machine Learning Privacy Meter: A tool to quantify the privacy risks of machine learning models with respect to inference attacks

    Python

  6. llm_cache llm_cache Public

    A simple, lightweight LLM cache using Redis to reduce latency and improve throughput

    Python