Skip to main content

Research

2026

LDA-1B: Scaling Latent Dynamics Action Model via Universal Embodied Data Ingestion
Jiangran Lyu*
Kai Liu*
Xuheng Zhang*
Haoran Liao
Yusen Feng
Wenxuan Zhu
Tingrui Shen
Jiayi Chen
Jiazhao Zhang
Yifei Dong
Wenbo Cui
Senmao Qi
Shuo Wang
Yixin Zheng
Mi Yan
Xuesong Shi
Haoran Li
Dongbin Zhao
Ming-Yu Liu
Zhizheng Zhang
Li Yi
Yizhou Wang
He Wang
Arxiv Github RSS 在投
Recent robot foundation models largely rely on large-scale behavior cloning, which imitates expert …
NavSpace: How Navigation Agents Follow Spatial Intelligence Instructions
Haolin Yang*
Yuxing Long*
Zhuoyuan Yu
Zihan Yang
Minghan Wang
Jiapeng Xu
Yihan Wang
Ziyan Yu
Wenzhe Cai
Lei Kang
Hao Dong1,2,‡
Arxiv ICRA
Instruction-following navigation is a key step toward embodied intelligence.
Neural Force Field: Few shot learning of generalized physical reasoning
Shiqian Li
Ruihong Shen
Yaoyu Tao
Chi Zhang
Yixin Zhu
Arxiv Github ICLR
We present NFF, a modeling framework built on NODE that learns interpretable force field …
MCPMark: A Benchmark for Stress-Testing Realistic and Comprehensive MCP Use
Zijian Wu
Xiangyan Liu
Xinyuan Zhang
Lingjun Chen
Fanqing Meng
Lingxiao Du
Yiran Zhao
Fanshi Zhang
Yaoqi Ye
Jiawei Wang
Zirui Wang
Jinjie Ni
Yufan Yang
Arvin Xu
Michael Qizhe Shieh
Arxiv Github ICLR 2026
The MCP standardizes how LLMs interact with external systems, forming the foundation for general …
Learning Physics-Grounded 4D Dynamics with Neural Gaussian Force Fields
Shiqian Li
Ruihong Shen
Junfeng Ni
Chang Pan
Chi Zhang
Yixin Zhu
Arxiv Github ICLR
Predicting physical dynamics from visual data remains a fundamental challenge in AI, as it requires …
Generalized Threshold Optimization with Harmony Multi-Threshold Neurons for Accurate ANN-to-SNN Conversion
Wenhan Zhang
Zihan Huang
Tong Bu
Tiejun Huang
Zhaofei Yu
Github AAAI-26 CCF A
Spiking Neural Networks (SNNs) are a promising paradigm designed to emulate the brain’s energy …
Luminark: Training-free, Probabilistically-Certified Watermarking for General Vision Generative Models
Jiayi Xu
Zhang Zhang
Yuanrui Zhang
Ruitao Chen
Yixian Xu
Tianyu He
Di He
Arxiv
In this paper, we introduce \emph{Luminark}, a training-free and probabilistically-certified …

2025

CorrectNav: Self-Correction Flywheel Empowers Vision-Language-Action Navigation Model
Zhuoyuan Yu*
Yuxing Long*
Zihan Yang
Chengyan Zeng
Hongwei Fan
Jiyao Zhang
Hao Dong†
Arxiv AAAI CCF A
Existing vision-and-language navigation models often deviate from the correct trajectory when …
ToolVQA: A Dataset for Multi-step Reasoning VQA with External Tools
Shaofeng Yin
Ting Lei
Yang Liu†
ICCV 2025
SimLauncher: Launching Sample-Efficient Real-world Robotic Reinforcement Learning via Simulation Pre-training
Mingdong Wu
Lehong Wu
Yizhuo Wu
Weiyao Huang
Hongwei Fan
Zheyuan Hu
Haoran Geng
Jinzhou Li
Jiahe Ying
Long Yang
Yuanpei Chen
Hao Dong
Arxiv IROS 2025
Autonomous learning of dexterous, long-horizon robotic skills has been a longstanding pursuit of …
Playing with Transformer at 30+ FPS via Next-Frame Diffusion
Xinle Cheng
Tianyu He†
Jiayi Xu
Junliang Guo
Di He
Jiang Bian
Arxiv NeurIPS 2025
In this work, we present Next-Frame Diffusion (NFD), an autoregressive diffusion transformer that …
Apply Hierarchical-Chain-of-Generation to Complex Attributes Text-to-3D Generation
Yiming Qin
Zhu Xu
Yang Liu†
Arxiv CVPR 2025
OmniPhysGS: 3D Constitutive Gaussians for General Physics-based Dynamics Generation
Yuchen Lin
Chenguo Lin†
Jianjin Xu
Yadong Mu‡
Arxiv ICLR 2025
ChemAgent: Self-updating Memories in Large Language Models Improves Chemical Reasoning
Xiangru Tang*
Tianyu Hu*
Muyang Ye*
Yanjun Shao*
Xunjian Yin
Siru Ouyang
Wangchunshu Zhou
Pan Lu
Zhuosheng Zhang
Yilun Zhao
Arman Cohan
Mark Gerstein
Arxiv ICLR 2025
We present ChemAgent, a novel framework designed to improve the performance of LLMs through a …

2024

ProgressGym: Alignment with a Millennium of Moral Progress
Tianyi Qiu*†
Yang Zhang*
Xuchuan Huang
Jasmine Xinze Li
Jiaming Ji
Yaodong Yang
Github NeurIPS2024
Autonomous Character-Scene Interaction Synthesis from Text Instruction
Nan Jiang
Zimo He
Zi Wang
Hongjie Li
Yixin Chen
Siyuan Huang†
Yixin Zhu†
Arxiv SIGGRAPH Asia 2024
DexGraspNet 2.0: Learning Generative Dexterous Grasping in Large-scale Synthetic Cluttered Scenes
Jialiang Zhang*
Haoran Liu*
Danshi Li*
Xinqiang Yu*
Haoran Geng
Yufei Ding
Jiayi Chen
He Wang†
Arxiv CoRL2024
Benchmarking Open-instruction 6-DoF Object Rearrangement and A VLM-based Approach
Yufei Ding*
Haoran Geng*
Chaoyi Xu
Xiaomeng Fang
Jiazhao Zhang
Songlin Wei
Qiyu Dai
Zhizheng Zhang
He Wang†
Github IROS2024
MacDiff: Unified Skeleton Modeling with Masked Conditional Diffusion
Lehong Wu
Lilang Lin
Jiahang Zhang
Yiyang Ma
Jiaying Liu†
Github ECCV2024
Exploring Conditional Multi-Modal Prompts for Zero-shot HOI Detection
Ting Lei
Shaofeng Yin
Yuxin Peng
Yang Liu†
Github ECCV2024
Language Models Represent Beliefs of Self and Others
Wentao Zhu
Zhining Zhang
Yizhou Wang
Github ICML2024
Scaling up dynamic human-scene interaction modeling
Nan Jiang
Zhiyuan Zhang
Hongjie Li
Xiaoxuan Ma
Zan Wang
Yixin Chen
Tengyu Liu
Yixin Zhu†
Siyuan Huang†
Arxiv Github CVPR2024
Exploring the Potential of Large Foundation Models for Open-Vocabulary HOI Detection
Ting Lei
Shaofeng Yin
Yang Liu†
Github CVPR2024
Diff-BGM: A Diffusion Model for Video Background Music Generation
Sizhe Li
Yiming Qin
Minghang Zheng
Xin Jin
Yang Liu†
Github CVPR2024