↓Skip to main content

Research

2026

LDA-1B: Scaling Latent Dynamics Action Model via Universal Embodied Data Ingestion ↗ ↖

12 February 2026

Arxiv Github RSS 在投

Recent robot foundation models largely rely on large-scale behavior cloning, which imitates expert …

NavSpace: How Navigation Agents Follow Spatial Intelligence Instructions ↗ ↖

31 January 2026

Hao Dong1,2,‡

Instruction-following navigation is a key step toward embodied intelligence.

Neural Force Field: Few shot learning of generalized physical reasoning ↗ ↖

26 January 2026

Ruihong Shen

Arxiv Github ICLR

We present NFF, a modeling framework built on NODE that learns interpretable force field …

MCPMark: A Benchmark for Stress-Testing Realistic and Comprehensive MCP Use ↗ ↖

26 January 2026

Lingjun Chen

Michael Qizhe Shieh

Arxiv Github ICLR 2026

The MCP standardizes how LLMs interact with external systems, forming the foundation for general …

Learning Physics-Grounded 4D Dynamics with Neural Gaussian Force Fields ↗ ↖

26 January 2026

Ruihong Shen

Arxiv Github ICLR

Predicting physical dynamics from visual data remains a fundamental challenge in AI, as it requires …

Generalized Threshold Optimization with Harmony Multi-Threshold Neurons for Accurate ANN-to-SNN Conversion ↗ ↖

20 January 2026

Wenhan Zhang

Github AAAI-26 CCF A

Spiking Neural Networks (SNNs) are a promising paradigm designed to emulate the brain’s energy …

Luminark: Training-free, Probabilistically-Certified Watermarking for General Vision Generative Models ↗ ↖

In this paper, we introduce \emph{Luminark｝, a training-free and probabilistically-certified …

2025

CorrectNav: Self-Correction Flywheel Empowers Vision-Language-Action Navigation Model ↗ ↖

8 November 2025

Arxiv AAAI CCF A

Existing vision-and-language navigation models often deviate from the correct trajectory when …

ToolVQA: A Dataset for Multi-step Reasoning VQA with External Tools ↗ ↖

Shaofeng Yin

SimLauncher: Launching Sample-Efficient Real-world Robotic Reinforcement Learning via Simulation Pre-training ↗ ↖

Arxiv IROS 2025

Autonomous learning of dexterous, long-horizon robotic skills has been a longstanding pursuit of …

Playing with Transformer at 30+ FPS via Next-Frame Diffusion ↗ ↖

Arxiv NeurIPS 2025

In this work, we present Next-Frame Diffusion (NFD), an autoregressive diffusion transformer that …

Apply Hierarchical-Chain-of-Generation to Complex Attributes Text-to-3D Generation ↗ ↖

Arxiv CVPR 2025

OmniPhysGS: 3D Constitutive Gaussians for General Physics-based Dynamics Generation ↗ ↖

Arxiv ICLR 2025

ChemAgent: Self-updating Memories in Large Language Models Improves Chemical Reasoning ↗ ↖

Wangchunshu Zhou

Zhuosheng Zhang

Arxiv ICLR 2025

We present ChemAgent, a novel framework designed to improve the performance of LLMs through a …

2024

ProgressGym: Alignment with a Millennium of Moral Progress

10 December 2024

Jasmine Xinze Li

Github NeurIPS2024

Autonomous Character-Scene Interaction Synthesis from Text Instruction

3 December 2024

Siyuan Huang†

Arxiv SIGGRAPH Asia 2024

DexGraspNet 2.0: Learning Generative Dexterous Grasping in Large-scale Synthetic Cluttered Scenes ↗ ↖

6 November 2024

Jialiang Zhang*

Benchmarking Open-instruction 6-DoF Object Rearrangement and A VLM-based Approach ↗ ↖

14 October 2024

Github IROS2024

MacDiff: Unified Skeleton Modeling with Masked Conditional Diffusion ↗ ↖

29 September 2024

Github ECCV2024

Exploring Conditional Multi-Modal Prompts for Zero-shot HOI Detection

29 September 2024

Shaofeng Yin

Github ECCV2024

Language Models Represent Beliefs of Self and Others

Zhining Zhang

Github ICML2024

Scaling up dynamic human-scene interaction modeling ↗ ↖

Siyuan Huang†

Arxiv Github CVPR2024

Exploring the Potential of Large Foundation Models for Open-Vocabulary HOI Detection

Shaofeng Yin

Github CVPR2024

Diff-BGM: A Diffusion Model for Video Background Music Generation

Github CVPR2024