My research lies at the intersection of Computer Systems and Machine Learning. I design and build software infrastructure that makes large-scale AI models (LLMs) efficient, scalable, and programmable.

Recently, I have been focusing on programmable LLM serving systems to bridge the gap between static inference engines and dynamic application logic.

At Yale, I am fortunate to be advised by Prof. Lin Zhong.

News

Aug 2025
Cacheback accepted to EMNLP 2025 (Main).
Jul 2025
Pie accepted to SOSP 2025.
Apr 2025
Serve Programs, Not Prompts accepted to HotOS 2025.
Feb 2024
Prompt Cache accepted to MLSys 2024.

Publications

Cacheback: Speculative Decoding With Nothing But Cache
Zhiyao Ma*, In Gim*, and Lin Zhong
EMNLP 2025 Code (*Equal contribution)
Pie: A Programmable Serving System for Emerging LLM Applications
In Gim, Zhiyao Ma, Seung-Seob Lee, and Lin Zhong
SOSP 2025 Code
Serve Programs, Not Prompts
In Gim and Lin Zhong
HotOS 2025
Wiretapping LLMs: Network Side-Channel Attacks on Interactive LLM Services
Mahdi Soleimani, Grace Jia, In Gim, Seung-Seob Lee, Anurag Khandelwal
Preprint 2025
Asynchronous LLM Function Calling
In Gim, Seung-Seob Lee, and Lin Zhong
Preprint 2024
Confidential Prompting: Protecting User Prompts from Cloud LLM Providers
In Gim*, Caihua Li*, and Lin Zhong
Preprint 2024 Code (*Equal contribution)
Prompt Cache: Modular Attention Reuse for Low-Latency Inference
In Gim, Guojun Chen, Seung-Seob Lee, Nikhil Sarda, Anurag Khandelwal, and Lin Zhong
MLSys 2024 Code
Prior to Yale
Memory-Efficient DNN Training on Mobile Devices
In Gim and JeongGil Ko
MobiSys 2022 Code

Education

Yale University
2022 - Present
Ph.D. in Computer Science
Yale University
2024
M.S. in Computer Science
Yonsei University
2021
B.S. in Integrated Technology

Work Experience

Apple
Summer 2025
AI/ML Research Intern

Teaching

Principles of Computer System Design
Fall 2024
Yale CPSC 429
Intro to Systems Programming
Spring 2024
Yale CPSC 323

Invited Talks

Rethinking LLM Serving From the Application's Perspective
Oct 2025
Seoul National Univ, Yonsei Univ, NAVER Cloud, Furiosa AI
LLM Prompt Caching
Sep 2024
NVIDIA