Welcome!
I am currently a Ph.D. student at Rensselaer Polytechnic Institute, Troy NY (RPI) and a visiting student at Caltech. Broadly speaking, I have an interdisciplinary background and am interested in the interplay between incentives/rewards (economics), algorithms (computer science), and learning (statistics).
My current work investigates how foundation models, such as large language models, can be leveraged for sequential decision making, integrating ideas from reinforcement learning, test-time compute, and adaptive search techniques. I am particularly interested in how foundation models can enhance autonomous agents’ ability to plan, reason, learn, and generalize in complex environments through self-improvement and post-training adaptations.
I like collaborations! Reach out if you've got a cool problem you'd like to chat about.
"Know what you know and know what you do not know. That is true wisdom."
-- Confucius
In modern terms: know the known knowns, known unknowns, and unknown unknowns. I see this as a guiding principle for research and a crucial challenge in building truly intelligent machines.
In my spare time, I enjoy playing and designing board games, reading science fiction, electronic music composition, grand strategy games, fencing, and squash. I find well-designed games to be not only elegant but also a deep source of inspiration for research in planning and reasoning.
Education
Rensselaer Polytechnic Institute (RPI), Troy, NY, U.S.
Ph.D. student in Computer Science

University of Chicago, Chicago, IL, U.S.
M.S. in Financial Mathematics

Reed College, Portland, OR, U.S.
B.S. in Mathematics and Economics

Intern Experience
-
May 2024 -- Jan 2025, NEC Laboratories America, Princeton, NJ
Research Intern
Publications

Strategist: Learning Strategic Skills by LLMs via Bi-Level Tree Search
Jonathan Light*, Min Cai, Weiqin Chen, Guanzhi Wang, Xiusi Chen, Wei Cheng, Yisong Yue, Ziniu Hu
International Conference on Learning Representations (ICLR), 2025
Covered by State of AI Report 2024, published by Air Street Capital
In this paper, we propose a new method STRATEGIST that utilizes LLMs to acquire new skills for playing multi-agent games through a self-improvement process. Our method gathers quality feedback through self-play simulations with Monte Carlo tree search and LLM-based reflection, leading to robust decision-making and better performance in games including the Game of Pure Strategy (GOPS) and The Resistance: Avalon.

Scattered Forest Search: Smarter Code Space Optimization improves LLM Inference Scaling
Jonathan Light*, Yue Wu, Yiyou Sun, Wenchao Yu, Yanchi Liu, Xujiang Zhao, Ziniu Hu, Haifeng Chen, Wei Cheng
International Conference on Learning Representations (ICLR), 2025
We propose a novel approach to scaling LLM inference for code generation. By framing code generation as a black-box optimization problem within code space, we introduce Scattered Forest Search to enhance diversity. Experiments show significant performance gains on HumanEval, MBPP, APPS, CodeContests, and Leetcode.

PIANIST: Learning Partially Observable World Models with LLMs for Multi-Agent Decision Making
Jonathan Light, Sixue Xing, Yuanzhe Liu, Weiqin Chen, Min Cai, Xiusi Chen, Guanzhi Wang,
Wei Cheng, Yisong Yue, Ziniu Hu
Language Gamification Workshop 2024 @ NeurIPS
We propose PIANIST, a framework for decomposing the world model into intuitive components for zero-shot LLM generation in complex multi-agent decision-making tasks. Given only natural language descriptions of the game and input observations, our method can generate a working world model for fast and efficient MCTS simulation.

From Text to Tactic: Evaluating LLMs Playing the Game of Avalon
Jonathan Light*, Min Cai, Sheng Shen, Ziniu Hu
NeurIPS Foundation Models for Decision Making Workshop, 2023
In this paper, we explore the potential of LLM Agents in playing the strategic social deduction game, Resistance Avalon. We introduce AVALONBENCH, a comprehensive game environment for multi-agent LLMs. Our evaluations highlight a capability gap between current LLM Agents and well-engineered baseline bots, revealing opportunities for improvement.

Dataset Distillation for Offline Reinforcement Learning
Jonathan Light*, Yuanzhe Liu, Ziniu Hu
ICML Data-centric Machine Learning Research Workshop, 2024
Offline reinforcement learning often requires a quality dataset for training. We propose data distillation to synthesize a smaller, higher-quality dataset for training a better policy. Our experiments show that models trained on the distilled dataset achieve comparable performance to those trained on the full dataset.
Academic Services
- Journal/Conference Reviewer: ICLR2025
Teaching
- Teaching Assistant, AI and Blockchain, Dacheng Xiu. Booth EMBA, 2023 Summer
- Teaching Assistant, Options Pricing, Roger Lee. UChicago PSD, 2023 Spring
- Teaching Assistant, Bayesian Statistical Inference and ML, Gordan Ritter. UChicago PSD, 2023 Spring
- Teaching Assistant, Decoding Fintech, Dacheng Xiu. Booth, 2023 Winter
- Teaching Assistant, Mathematical Statistics, Jonathan Wells. Reed College, 2021 Spring
- Teaching Assistant, Probability Theory, Jonathan Wells. Reed College, 2020 Fall
- Teaching Assistant, Macroeconomics, Zhe (Jasmine) Jiang. Reed College, 2020 Fall
- Teaching Assistant, Econometrics, Fellipe Carrera. Reed College, 2020 Fall
- Teaching Assistant, Introduction to Analysis, David Krumm. Reed College, 2019 Fall
Honors and Awards
- Phi Beta Kappa, 2021
- Reed Commendation for Excellence in Scholarship, 2018, 2019, 2020, 2021
- Reed Science Research Fellow, 2020
- Reed Financial Services Fellow, 2019
Other Notes
I go by either Jonathan Li or Jonathan Light. I usually use Light in publications because (1) Li is a very common last name. Without exception, every institution I've been to has had at least one other Jonathan Li. (2) Light is the semantic translation of my Chinese given name. (3) Light nearly preserves the lexigraphic ordering of Li.
I've also considered using 'Plum' (the semantic translation of my last name), but it doesn't have the same ring to it, nor does it preserve the lexigraphic ordering of Li. Generally I find semantic translations to be more faithful to the original meaning, as convenient as pinyin is for romanization.
Other Quotes and Historical Tidbits
I find quotes and historical tidbits to be a great source of inspiration and very fascinating. Here are some of my favorites that I've collected over the years.
- “Ask yourself whether you are happy, and you cease to be so” — John Stuart Mill. Says something about opportunity cost and the paradox of choice
- “The best is the enemy of the good” — Voltaire. This principle is used so often in optimization, approximation, and machine learning
- “Stay hungry. Stay foolish” — Steve Jobs. It's good to be foolish. Then you can ask any question you want