Welcome!

I am currently a Ph.D. student at Rensselaer Polytechnic Institute, Troy NY (RPI) and a visiting student at Caltech. Broadly speaking, I have an interdisciplinary background and am interested in the interplay between incentives/rewards (economics), algorithms (computer science), and learning (statistics).

My current work investigates how foundation models, such as large language models, can be leveraged for sequential decision making, integrating ideas from reinforcement learning, test-time compute, and adaptive search techniques. I am particularly interested in how foundation models can enhance autonomous agents’ ability to plan, reason, learn, and generalize in complex environments through self-improvement and post-training adaptations.

I like collaborations! Reach out if you've got a cool problem you'd like to chat about.

"Know what you know and know what you do not know. That is true wisdom."
-- Confucius

In modern terms: know the known knowns, known unknowns, and unknown unknowns. I see this as a guiding principle for research and a crucial challenge in building truly intelligent machines.

In my spare time, I enjoy playing and designing board games, reading science fiction, electronic music composition, grand strategy games, fencing, and squash. I find well-designed games to be not only elegant but also a deep source of inspiration for research in planning and reasoning.

Education

Ph.D. (Sep 2023 - Present)
Rensselaer Polytechnic Institute (RPI), Troy, NY, U.S.
Ph.D. student in Computer Science
M.S. (Aug 2021 - Mar 2023)
University of Chicago, Chicago, IL, U.S.
M.S. in Financial Mathematics
B.S. (Aug 2017 - May 2021)
Reed College, Portland, OR, U.S.
B.S. in Mathematics and Economics

Intern Experience

Publications

strategist

Strategist: Learning Strategic Skills by LLMs via Bi-Level Tree Search

Jonathan Light*, Min Cai, Weiqin Chen, Guanzhi Wang, Xiusi Chen, Wei Cheng, Yisong Yue, Ziniu Hu
International Conference on Learning Representations (ICLR), 2025
Covered by State of AI Report 2024, published by Air Street Capital

In this paper, we propose a new method STRATEGIST that utilizes LLMs to acquire new skills for playing multi-agent games through a self-improvement process. Our method gathers quality feedback through self-play simulations with Monte Carlo tree search and LLM-based reflection, leading to robust decision-making and better performance in games including the Game of Pure Strategy (GOPS) and The Resistance: Avalon.

sfs

Scattered Forest Search: Smarter Code Space Optimization improves LLM Inference Scaling

Jonathan Light*, Yue Wu, Yiyou Sun, Wenchao Yu, Yanchi Liu, Xujiang Zhao, Ziniu Hu, Haifeng Chen, Wei Cheng
International Conference on Learning Representations (ICLR), 2025

We propose a novel approach to scaling LLM inference for code generation. By framing code generation as a black-box optimization problem within code space, we introduce Scattered Forest Search to enhance diversity. Experiments show significant performance gains on HumanEval, MBPP, APPS, CodeContests, and Leetcode.

pianist

PIANIST: Learning Partially Observable World Models with LLMs for Multi-Agent Decision Making

Jonathan Light, Sixue Xing, Yuanzhe Liu, Weiqin Chen, Min Cai, Xiusi Chen, Guanzhi Wang, Wei Cheng, Yisong Yue, Ziniu Hu
Language Gamification Workshop 2024 @ NeurIPS

We propose PIANIST, a framework for decomposing the world model into intuitive components for zero-shot LLM generation in complex multi-agent decision-making tasks. Given only natural language descriptions of the game and input observations, our method can generate a working world model for fast and efficient MCTS simulation.

merlin

From Text to Tactic: Evaluating LLMs Playing the Game of Avalon

Jonathan Light*, Min Cai, Sheng Shen, Ziniu Hu
NeurIPS Foundation Models for Decision Making Workshop, 2023

In this paper, we explore the potential of LLM Agents in playing the strategic social deduction game, Resistance Avalon. We introduce AVALONBENCH, a comprehensive game environment for multi-agent LLMs. Our evaluations highlight a capability gap between current LLM Agents and well-engineered baseline bots, revealing opportunities for improvement.

dataset-distill-rl

Dataset Distillation for Offline Reinforcement Learning

Jonathan Light*, Yuanzhe Liu, Ziniu Hu
ICML Data-centric Machine Learning Research Workshop, 2024

Offline reinforcement learning often requires a quality dataset for training. We propose data distillation to synthesize a smaller, higher-quality dataset for training a better policy. Our experiments show that models trained on the distilled dataset achieve comparable performance to those trained on the full dataset.

Academic Services

Teaching

Honors and Awards

Other Notes

I go by either Jonathan Li or Jonathan Light. I usually use Light in publications because (1) Li is a very common last name. Without exception, every institution I've been to has had at least one other Jonathan Li. (2) Light is the semantic translation of my Chinese given name. (3) Light nearly preserves the lexigraphic ordering of Li.

I've also considered using 'Plum' (the semantic translation of my last name), but it doesn't have the same ring to it, nor does it preserve the lexigraphic ordering of Li. Generally I find semantic translations to be more faithful to the original meaning, as convenient as pinyin is for romanization.

Other Quotes and Historical Tidbits

I find quotes and historical tidbits to be a great source of inspiration and very fascinating. Here are some of my favorites that I've collected over the years.