February 2024 — Prompt Cache accepted to MLSys 2024.
Publications and preprints
Cacheback: Speculative Decoding With Nothing But Cache
Zhiyao Ma†, In Gim†, and Lin Zhong To appear in EMNLP 2025 (Main) †Equal contributors
Pie: A Programmable Serving System for Emerging LLM Applications In Gim, Zhiyao Ma, Seung-Seob Lee, and Lin Zhong To appear in SOSP 2025
Serve Programs, Not Prompts In Gim, and Lin Zhong HotOS 2025, May 2025 (pdf)
Wiretapping LLMs: Network Side-Channel Attacks on Interactive LLM Services
Mahdi Soleimani, Grace Jia, In Gim, Seung-Seob Lee, Anurag Khandelwal Preprint, February 2025 (pdf)
Asynchronous LLM Function Calling In Gim, Seung-Seob Lee, and Lin Zhong Preprint, December 2024 (pdf)
Confidential Prompting: Protecting User Prompts from Cloud LLM Providers In Gim†, Caihua Li†, and Lin Zhong Preprint, September 2024 (code, pdf) †Equal contributors
Prompt Cache: Modular Attention Reuse for Low-Latency Inference In Gim, Guojun Chen, Seung-Seob Lee, Nikhil Sarda, Anurag Khandelwal, and Lin Zhong MLSys 2024, May 2024 (code, pdf)
Memory-Efficient DNN Training on Mobile Devices In Gim and JeongGil Ko MobiSys 2022, July 2022 (code, pdf)
Fast Monte-Carlo Approximation of the Attention Mechanism In Gim and JeongGil Ko AAAI 2022, February 2022 (code, pdf)
Education
Yale University2022 - Present Ph.D. in Computer Science
Yale University2024 M.S. in Computer Science
Yonsei University2021 B.S. in Integrated Technology (focus: Computer Science)
Work experience
AppleSummer 2025 AI/ML Research Intern
Teaching
Fall 2024 — Principles of Computer System Design (CPSC 429), Teaching Fellow
Spring 2024 — Introduction to Systems Programming and Computer Organization (CPSC 323), Teaching Fellow