I am a third-year Computer Science Ph.D. Student at the University of Southern California (USC), advised by Prof. Gaurav Sukhatme.
Previously, I received a master's degree in Computer Science at USC. I completed my bachelor's degree in Computer Science at Harbin Institute of Technology.
My goal is to build intelligent agents that with the ability to perform complex tasks robustly and safely within
unstructured environments via autonomous and unsupervised adaptation to the environment. I mainly focus on
reinforcement learning, robot learning and foundation models.
Robotic Learning
At the moment, my main research problem is to control quadrotor swarms to navigate in complex environments based on limited compute resource.
Trained policy based on self-built simulator quad-swarm-rl , by using deep reinforcement learning and using sim-to-real transfer to deploy the policy on the real world quadrotors.
In this work we have shown how FixPO is able to combine the guarantees of trust region methods with the computational
efficiency and rewards of proximal methods. FixPO enforces its trust region via KL penalization, which is flexible and
well understood in the machine learning community. In future work, we would like to extend our work to a multi-task setting.
Sample Factory is the fastest open source single-machine RL implementations (see paper for details). If you plan to train RL agents on large amounts of experience, consider using it. Sample Factory can significantly speed up the experimentation or allow you to collect more samples in the same amount of time and achieve better performance.
Foundation Models
At the moment, my main research problem is investigating LLMs for multiple robots and LLMs for complex reasoning.
In this work, we investigate LLMs for vehicle routing problems. We construct a dataset that can be used as a benchmark
for LLMs in vehicle routing problems. We propose a framework with self-debugging and self-verification. We show GPT-4 outperforms Claude 3 Opus, and Gemini 1.0 Pro
Recently, many data science benchmarks have been proposed to investigate their performance in the data science domain. However, existing data science benchmarks still fall short when compared to real-world data science applications due to their simplified settings. To bridge this gap, we introduce DSBench, a comprehensive benchmark designed to evaluate data science agents with realistic tasks.
Cognitive Kernel is an open-sourced agent system designed to achieve the goal of building general-purpose autopilot systems. It has access to real-time and private information to finish real-world tasks.
In this repo, we release both the system and the backbone model to encourage further research on LLM-driven autopilot systems. Following the guide, everyone should be able to deploy a private 'autopilot' system on their own machines.