Zhehui Huang

I am a final-year Computer Science Ph.D. Student at the University of Southern California (USC), advised by Prof. Gaurav Sukhatme.

Previously, I received a master's degree in Computer Science at USC. I completed my bachelor's degree in Computer Science at Harbin Institute of Technology.

News

[Sept 2025]	LAN2CB got accepted to MRS 2025.
[July 2025]	DSBench was selected as an evaluation benchmark for OpenAI's most advanced LLM model o3 and their first agent ChatGPT Agent to evaluate their reasoning and coding abilities.
[June 2025]	Successfully co-organized Resource Constrained Robotics Workshop at RSS 2025.
[Jan 2025]	DSBench got accepted to ICLR 2025.
[Dec 2024]	HRT-ML got accepted to WMAC @ AAAI 2025.
[Dec 2024]	MonTA got accepted to LM4Plan @ AAAI 2025.
[Sept 2024]	LLMs for Robot Routing got accepted to ISRR 2024.
[May 2024]	Started internship at Tencent America.
[Mar 2024]	Received AI Research Grant from Cohere.
[Jan 2024]	Two papers got accepted to ICRA 2024. [Paper #1] and [Paper #2]
[Nov 2023]	Gave a talk at USC Robotics Seminar (URoS).
[Apr 2023]	QuadSwarm got accepted to ICRA 2023 Workshop: The Role of Robotics Simulators for Unmanned Aerial Vehicles.
[Mar 2023]	Passed qualifying exam.
[Dec 2022]	Received $70,000 AWS cloud credit for research.
[May 2022]	Started internship at NVIDIA.
[Sept 2021]	Decentralized Control of Quadrotor Swarms got accepted to CoRL 2021.
[Sept 2021]	Received $43,000 AWS cloud credit for research.
[Aug 2021]	Started Ph.D. at USC.

Research

My research aims to develop intelligent agents that can robustly and safely perform complex tasks in unstructured environments by autonomously adapting to new situations through unsupervised and continual learning. To achieve this, my research spans three interconnected areas: (1) reinforcement learning, (2) robot learning, and (3) foundation models.

Reinforcement Learning (RL):

RL system: [Sample Factory'20]
RL algorithm: [FixPO'23]

Robot Learning:

Simulator: [QuadSwarm'23]
Navigation: [LEARN'25], [CAN'23], [DSQS'21]
Efficient neural network architecture search: [HyperPPO'23]

Foundation Models:

Benchmark: [DSBench'24], [Robot Routing'24]
Reasoning & Planning: [LAN2CB'25], [HRT-ML'24], [MonTA'24]

Publications

LEARN: Learning End-to-End Aerial Resource-Constrained Multi-Robot Navigation

Darren Chiu*, Zhehui Huang*, Ruohai Ge, Gaurav S. Sukhatme
* Equal contribution

Under Review

TL;DR: An end-to-end system for navigating a quadrotor swarm with severely resource-constrained hardware, fully relying on onboard resources for perception, localization, planning, and control.

📄 Paper

Latent Activation Editing: Inference-Time Refinement of Learned Policies for Safer Multirobot Navigation

Satyajeet Das, Darren Chiu, Zhehui Huang, Lars Lindemann, Gaurav S. Sukhatme

Under Review

TL;DR: A novel framwork for inference-time behavior steering of pre-trained robot policies without retraining or fine-tuning.

📄 PDF

Compositional Coordination for Multi-Robot Teams with Large Language Models

Zhehui Huang, Guangyao Shi, Yuwei Wu, Vijay Kumar, Gaurav S. Sukhatme

IEEE International Symposium on Multi-Robot & Multi-Agent Systems (MRS 2025)

TL;DR: A novel framework for multi-robot coordination using large language models to enable compositional coordination strategies for complex multi-robot tasks.

🌐 Webpage 📄 PDF

Can Large Language Models Solve Robot Routing?

Zhehui Huang, Guangyao Shi, Gaurav Sukhatme

International Symposium of Robotics Research (ISRR 2024)

TL;DR: The first comprehensive investigation of LLMs for robot routing problems.

🌐 Webpage 📄 PDF 💻 Code 📋 Appendix

HyperPPO: A scalable method for finding small policies for robotic control

Shashank Hegde, Zhehui Huang, Gaurav Sukhatme

IEEE International Conference on Robotics and Automation (ICRA 2024)

TL;DR: Search for the most efficient neural network architecture for robotic control without losing significant performance.

🌐 Webpage 📄 PDF

Collision Avoidance and Navigation for a Quadrotor Swarm Using End-to-end Deep Reinforcement Learning

Zhehui Huang*, Zhaojing Yang*, Rahul Krupani, Baskın Şenbaşlar, Sumeet Batra, Gaurav Sukhatme
* Equal contribution

IEEE International Conference on Robotics and Automation (ICRA 2024)

TL;DR: End-to-end deep RL policies enable quadrotor swarms to navigate obstacle-rich environments with robust collision avoidance capabilities.

🌐 Webpage 📄 PDF 💻 Code

Guaranteed Trust Region Optimization via Two-Phase KL Penalization

K.R. Zentner*, Ujjwal Puri*, Zhehui Huang, Gaurav Sukhatme
* Equal contribution

arXiv Preprint

TL;DR: FixPO combines the guarantees of trust region methods with the computational efficiency of proximal methods, enforcing trust regions via flexible KL penalization.

📄 PDF

Decentralized Control of Quadrotor Swarms with End-to-end Deep Reinforcement Learning

Sumeet Batra*, Zhehui Huang*, Aleksei Petrenko*, Tushar Kumar, Artem Molchanov, Gaurav Sukhatme
* Equal contribution

Conference on Robot Learning (CoRL 2021)

TL;DR: Decentralized quadrotor swarm control using RL with successful sim-to-real transfer.

🌐 Webpage 📄 PDF 💻 Code

Sample Factory: Egocentric 3D Control from Pixels at 100000 FPS with Asynchronous Reinforcement Learning

Aleksei Petrenko, Zhehui Huang, Tushar Kumar, Gaurav Sukhatme, Vladlen Koltun

International Conference on Machine Learning (ICML 2020)

TL;DR: The fastest open-source single-machine RL implementation, achieving 100,000 FPS for significantly accelerated experimentation and better performance through massive sample collection.

🌐 Webpage 📄 PDF 💻 Code

DSBench: How Far Are Data Science Agents to Becoming Data Science Experts?

Liqiang Jing, Zhehui Huang, Xiaoyang Wang, Wenlin Yao, Wenhao Yu, Kaixin Ma, Hongming Zhang, Xinya Du, Dong Yu

The International Conference on Learning Representations (ICLR 2025)

TL;DR: Comprehensive benchmark for evaluating data science agents with realistic tasks, bridging the gap between simplified settings and real-world data science applications.

🏆 Selected as evaluation benchmark for OpenAI's o3 model and ChatGPT Agent. Source: OpenAI Blog

📄 PDF 💻 Code

Effect of Adaptive Communication Support on Human-AI Collaboration

Shipeng Liu, Fnu Shrutika, Boshen Zhang, Zhehui Huang, Feifei Qian

Workshop on Multimodal AI Collaborations @ AAAI 2025

TL;DR: Investigates how agent feedback frequency affects team performance in human-AI collaborative scenarios, focusing on communication support optimization.

📄 PDF

Benchmark Real-time Adaptation and Communication Capabilities of Embodied Agent in Collaborative Scenarios

Shipeng Liu, Boshen Zhang, Zhehui Huang

Language Models for Planning Workshop @ AAAI 2025

TL;DR: Focuses on leveraging LLMs to enable agents with real-time adaptation capabilities in collaborative scenarios, establishing new benchmarks for embodied AI.

📄 PDF

QuadSwarm: A Modular Multi-Quadrotor Simulator for Deep Reinforcement Learning with Direct Thrust Control

Zhehui Huang, Sumeet Batra, Tao Chen, Rahul Krupani, Tushar Kumar, Artem Molchanov, Aleksei Petrenko, James Alan Preiss, Zhaojing Yang, Gaurav Sukhatme

Workshop of the Role of Robotics Simulators for Unmanned Aerial Vehicles @ ICRA 2023

TL;DR: Open-source modular simulator enabling realistic quadrotor swarm experimentation with direct thrust control for deep RL research.

📄 PDF 💻 Code 📊 Poster

Modified version of template from this and this.