最新论文

默认显示数据库中最新发布日期的结果。

vision-language-action VLA

Chain of World: World Model Thinking in Latent Motion

Vision-Language-Action (VLA) models are a promising path toward embodied intelligence, yet they often overlook the predictive and temporal-causal structure underlying visual dynamics. World-model VLAs address this by predicting future fram…

vision-language-action vision-language model

Utonia: Toward One Encoder for All Point Clouds

We dream of a future where point clouds from all domains can come together to shape a single model that benefits them all. Toward this goal, we present Utonia, a first step toward training a single self-supervised point transformer encoder…