Welcome!

I am currently a Ph.D. student at Rensselaer Polytechnic Institute, Troy NY (RPI) and a visiting student at Caltech. Broadly speaking, I have an interdisciplinary background and am interested in the interplay between incentives/rewards (economics), algorithms (computer science), and learning (statistics).

My current work investigates how foundation models, such as large language models, can be leveraged for sequential decision making, integrating ideas from reinforcement learning, test-time compute, and adaptive search techniques. I am particularly interested in how foundation models can enhance autonomous agents’ ability to plan, reason, learn, and generalize in complex environments through self-improvement and post-training adaptations.

I like collaborations! Reach out if you've got a cool problem you'd like to chat about.

"Know what you know and know what you do not know. That is true wisdom."
-- Confucius

In modern terms: know the known knowns, known unknowns, and unknown unknowns. I see this as a guiding principle for research and a crucial challenge in building truly intelligent machines.

In my spare time, I enjoy playing and designing board games, reading science fiction, electronic music composition, grand strategy games, fencing, and squash. I find well-designed games to be not only elegant but also a deep source of inspiration for research in planning and reasoning.

Education

Ph.D. (Sep 2023 - Present)
Rensselaer Polytechnic Institute (RPI), Troy, NY, U.S.
Ph.D. student in Computer Science

M.S. (Aug 2021 - Mar 2023)
University of Chicago, Chicago, IL, U.S.
M.S. in Financial Mathematics

B.S. (Aug 2017 - May 2021)
Reed College, Portland, OR, U.S.
B.S. in Mathematics and Economics

Work Experience

May 2024 -- Jan 2025, NEC Laboratories America, Princeton, NJ
Research Intern

Publications

Strategist: Learning Strategic Skills by LLMs via Bi-Level Tree Search

Jonathan Light*, Min Cai, Weiqin Chen, Guanzhi Wang, Xiusi Chen, Wei Cheng, Yisong Yue, Ziniu Hu
International Conference on Learning Representations (ICLR), 2025
Covered by State of AI Report 2024, published by Air Street Capital

In this paper, we propose a new method STRATEGIST that utilizes LLMs to acquire new skills for playing multi-agent games through a self-improvement process. Our method gathers quality feedback through self-play simulations with Monte Carlo tree search and LLM-based reflection, leading to robust decision-making and better performance in games including the Game of Pure Strategy (GOPS) and The Resistance: Avalon.

Scattered Forest Search: Smarter Code Space Optimization improves LLM Inference Scaling

Jonathan Light*, Yue Wu, Yiyou Sun, Wenchao Yu, Yanchi Liu, Xujiang Zhao, Ziniu Hu, Haifeng Chen, Wei Cheng
International Conference on Learning Representations (ICLR), 2025

We propose a novel approach to scaling LLM inference for code generation. By framing code generation as a black-box optimization problem within code space, we introduce Scattered Forest Search to enhance diversity. Experiments show significant performance gains on HumanEval, MBPP, APPS, CodeContests, and Leetcode.

PIANIST: Learning Partially Observable World Models with LLMs for Multi-Agent Decision Making

Jonathan Light, Sixue Xing, Yuanzhe Liu, Weiqin Chen, Min Cai, Xiusi Chen, Guanzhi Wang, Wei Cheng, Yisong Yue, Ziniu Hu
Language Gamification Workshop 2024 @ NeurIPS

We propose PIANIST, a framework for decomposing the world model into intuitive components for zero-shot LLM generation in complex multi-agent decision-making tasks. Given only natural language descriptions of the game and input observations, our method can generate a working world model for fast and efficient MCTS simulation.

From Text to Tactic: Evaluating LLMs Playing the Game of Avalon

Jonathan Light*, Min Cai, Sheng Shen, Ziniu Hu
NeurIPS Foundation Models for Decision Making Workshop, 2023

In this paper, we explore the potential of LLM Agents in playing the strategic social deduction game, Resistance Avalon. We introduce AVALONBENCH, a comprehensive game environment for multi-agent LLMs. Our evaluations highlight a capability gap between current LLM Agents and well-engineered baseline bots, revealing opportunities for improvement.

Dataset Distillation for Offline Reinforcement Learning

Jonathan Light*, Yuanzhe Liu, Ziniu Hu
ICML Data-centric Machine Learning Research Workshop, 2024

Offline reinforcement learning often requires a quality dataset for training. We propose data distillation to synthesize a smaller, higher-quality dataset for training a better policy. Our experiments show that models trained on the distilled dataset achieve comparable performance to those trained on the full dataset.

Academic Services

Journal/Conference Reviewer: ICLR2025

Teaching

Teaching Assistant, AI and Blockchain, Dacheng Xiu. Booth EMBA, 2023 Summer
Teaching Assistant, Options Pricing, Roger Lee. UChicago PSD, 2023 Spring
Teaching Assistant, Bayesian Statistical Inference and ML, Gordan Ritter. UChicago PSD, 2023 Spring
Teaching Assistant, Decoding Fintech, Dacheng Xiu. Booth, 2023 Winter
Teaching Assistant, Mathematical Statistics, Jonathan Wells. Reed College, 2021 Spring
Teaching Assistant, Probability Theory, Jonathan Wells. Reed College, 2020 Fall
Teaching Assistant, Macroeconomics, Zhe (Jasmine) Jiang. Reed College, 2020 Fall
Teaching Assistant, Econometrics, Fellipe Carrera. Reed College, 2020 Fall
Teaching Assistant, Introduction to Analysis, David Krumm. Reed College, 2019 Fall

Honors and Awards

Phi Beta Kappa, 2021
Reed Commendation for Excellence in Scholarship, 2018, 2019, 2020, 2021
Reed Science Research Fellow, 2020
Reed Financial Services Fellow, 2019

Other Notes

I go by either Jonathan Li or Jonathan Light. I usually use Light in publications because (1) Li is a very common last name. Without exception, every institution I've been to has had at least one other Jonathan Li. (2) Light is the semantic translation of my Chinese given name. (3) Light nearly preserves the lexigraphic ordering of Li.

I've also considered using 'Plum' (the semantic translation of my last name), but it doesn't have the same ring to it, nor does it preserve the lexigraphic ordering of Li. Generally I find semantic translations to be more faithful to the original meaning, as convenient as pinyin is for romanization.

Other Quotes and Historical Tidbits

I find quotes and historical tidbits to be a great source of inspiration and very fascinating. Here are some of my favorites that I've collected over the years.

“Ask yourself whether you are happy, and you cease to be so” — John Stuart Mill. Says something about opportunity cost and the paradox of choice
“The best is the enemy of the good” — Voltaire. This principle is used so often in optimization, approximation, and machine learning
“Stay hungry. Stay foolish” — Steve Jobs. It's good to be foolish. Then you can ask any question you want