Zhehui Huang

I am a forth-year Computer Science Ph.D. Student at the University of Southern California (USC), advised by Prof. Gaurav Sukhatme.

Previously, I received a master's degree in Computer Science at USC. I completed my bachelor's degree in Computer Science at Harbin Institute of Technology.

News

[Dec 2024]	One paper got accepted to WMAC @ AAAI 2025.
[Dec 2024]	One paper got accepted to LM4Plan @ AAAI 2025.
[Sept 2024]	One paper got accepted to ISRR 2024.
[May 2024]	Start intern at Tencent America.
[Mar 2024]	Received AI Research Grant from Cohere.
[Jan 2024]	Two papers got accepted to ICRA 2024. [Paper #1] and [Paper #2]
[Nov 2023]	Give a talk at USC Robotics Seminar (URoS).
[Apr 2023]	One paper (QuadSwarm) got accepted to ICRA 2023 Workshop: The Role of Robotics Simulators for Unmanned Aerial Vehicles.
[Mar 2023]	Pass qualifying exam.
[Dec 2022]	Received $70,000 AWS cloud credit for research..
[Sept 2022]	One paper present at the Southern California Robotics Symposium (SCR).
[May 2022]	Intern at NVIDIA.
[Sept 2021]	One paper got accepted to CoRL 2021.
[Sept 2021]	Received $43,000 AWS cloud credit for research.
[Aug 2021]	Admitted to USC as a CS Ph.D. Student.

Research

My goal is to build intelligent agents that with the ability to perform complex tasks robustly and safely within unstructured environments via autonomous and unsupervised adaptation to the environment. I mainly focus on reinforcement learning, robot learning and foundation models.

Robotic Learning

At the moment, my main research problem is to control quadrotor swarms to navigate in complex environments based on limited compute resource.

Conference

HyperPPO: A scalable method for finding small policies for robotic control
Shashank Hegde, Zhehui Huang, Gaurav Sukhatme
International Conference on Robotics and Automation (ICRA), 2024
webpage | pdf

Investigate the relationship between the performance and model size.

Collision Avoidance and Navigation for a Quadrotor Swarm Using End-to-end Deep Reinforcement Learning
Zhehui Huang*, Zhaojing Yang*, Rahul Krupani, Baskın Şenbaşlar, Sumeet Batra, Gaurav Sukhatme
International Conference on Robotics and Automation (ICRA), 2024
* means equal contribution
webpage | pdf | code

Learn end-to-end deep reinforcement learning policies to control quadrotor swarms to navigate in obstacle-rich environments.

Decentralized Control of Quadrotor Swarms with End-to-end Deep Reinforcement Learning
Sumeet Batra*, Zhehui Huang*, Aleksei Petrenko*, Tushar Kumar,
Artem Molchanov, Gaurav Sukhatme
Conference on Robot Learning (CoRL), 2021
Southern California Robotics Symposium (SCR), 2022
* means equal contribution
webpage | pdf | code

Trained policy based on self-built simulator quad-swarm-rl , by using deep reinforcement learning and using sim-to-real transfer to deploy the policy on the real world quadrotors.

Workshop

QuadSwarm: A Modular Multi-Quadrotor Simulator for Deep Reinforcement Learning with Direct Thrust Control
Zhehui Huang, Sumeet Batra, Tao Chen, Rahul Krupani, Tushar Kumar,
Artem Molchanov, Aleksei Petrenko, James Alan Preiss, Zhaojing Yang, Gaurav Sukhatme

ICRA Workshop: The Role of Robotics Simulators for Unmanned Aerial Vehicles, 2023
pdf | code

Deep Reinforcement Learning

At the moment, my main research problem is exploring better on-policy algorithm than PPO.

Conference

Guaranteed Trust Region Optimization via Two-Phase KL Penalization
K.R. Zentner*, Ujjwal Puri*, Zhehui Huang, Gaurav Sukhatme

In Submission
* means equal contribution
pdf

In this work we have shown how FixPO is able to combine the guarantees of trust region methods with the computational efficiency and rewards of proximal methods. FixPO enforces its trust region via KL penalization, which is flexible and well understood in the machine learning community. In future work, we would like to extend our work to a multi-task setting.

Sample Factory: Egocentric 3D Control from Pixels at 100000 FPS with Asynchronous Reinforcement Learning
Aleksei Petrenko, Zhehui Huang, Tushar Kumar, Gaurav Sukhatme
Vladlen Koltun
International Conference on Machine Learning (ICML), 2020
webpage | pdf | code

Sample Factory is the fastest open source single-machine RL implementations (see paper for details). If you plan to train RL agents on large amounts of experience, consider using it. Sample Factory can significantly speed up the experimentation or allow you to collect more samples in the same amount of time and achieve better performance.

Foundation Models

At the moment, my main research problem is investigating LLMs for multiple robots and LLMs for complex reasoning.

Conference

Can Large Language Models Solve Robot Routing?
Zhehui Huang, Guangyao Shi, Gaurav Sukhatme

International Symposium of Robotics Research (ISRR), 2024
Southern California Robotics Symposium (SCR), 2024

webpage | pdf | code | appendix

In this work, we investigate LLMs for vehicle routing problems. We construct a dataset that can be used as a benchmark for LLMs in vehicle routing problems. We propose a framework with self-debugging and self-verification. We show GPT-4 outperforms Claude 3 Opus, and Gemini 1.0 Pro

DSBench: How Far Are Data Science Agents to Becoming Data Science Experts?
Liqiang Jing, Zhehui Huang, Xiaoyang Wang, Wenlin Yao, Wenhao Yu, Kaixin Ma, Hongming Zhang, Xinya Du, Dong Yu

In Submission

pdf | code

Recently, many data science benchmarks have been proposed to investigate their performance in the data science domain. However, existing data science benchmarks still fall short when compared to real-world data science applications due to their simplified settings. To bridge this gap, we introduce DSBench, a comprehensive benchmark designed to evaluate data science agents with realistic tasks.

Effect of Adaptive Communication Support on Human-AI Collaboration
Shipeng Liu, Fnu Shrutika, Boshen Zhang, Zhehui Huang, Feifei Qian

Accepted to WMAC @ AAAI 2025

pdf

In this work, we focused on investigating the effect of agent feedback frequency on team performance. As such, each agent was set to a constant active level, and cannot dynamically adjust their level of support and language feedback throughout the task.

Benchmark Real-time Adaptation and Communication Capabilities of Embodied Agent in Collaborative Scenarios
Shipeng Liu, Boshen Zhang, Zhehui Huang

Accepted to LM4Plan @ AAAI 2025

pdf

In this paper, we focus on leveraging LLMs to enable agents with real-time adaptation capabilities.

In the acknowledge list

Cognitive Kernel: An Open-source Agent System towards Generalist Autopilots

pdf | code

Cognitive Kernel is an open-sourced agent system designed to achieve the goal of building general-purpose autopilot systems. It has access to real-time and private information to finish real-world tasks. In this repo, we release both the system and the backbone model to encourage further research on LLM-driven autopilot systems. Following the guide, everyone should be able to deploy a private 'autopilot' system on their own machines.

Modified version of template from this and this.