Startup World

Navigation is a fundamental skill for any visually-capable organism, serving as a critical tool for survival.
It enables agents to locate resources, find shelter, and avoid threats.
In humans, navigation often involves mentally simulating possible future paths while accounting for constraints and alternative possibilities.
However, modern robotic navigation systems are far less flexible.
Current state-of-the-art navigation policies are typically hard-coded, meaning once training is complete, introducing new constraints is difficult.
Furthermore, existing supervised visual navigation models struggle to allocate additional computational resources when facing more complex navigation tasks.To address the abovementioned issues, in a new paper Navigation World Models, a research team from Meta, New York University and Berkeley AI Research proposes a Navigation World Model (NWM), a controllable video generation model designed to predict future visual observations based on past observations and navigation actions.
This model enables agents to simulate potential navigation plans and assess their feasibility before taking action.NWM is trained using a large dataset of video footage and navigation actions collected from various robotic agents.
The model learns to predict the future representations of video frames, given the representations of past frames and corresponding navigation actions.
After training, NWM can plan navigation trajectories in new environments by simulating potential paths and verifying if they lead to the target destination.Conceptually, NWM draws inspiration from recent diffusion-based world models, such as DIAMOND and GameNGen, which are used for offline model-based reinforcement learning.
However, unlike these models, NWM is trained on a wide range of environments and agent embodiments.
By leveraging this diverse dataset, the researchers successfully trained a large diffusion transformer model that can generalize across multiple environments.
This generalization capability is a significant departure from previous models that are often constrained to specific environments or tasks.NWM also shares conceptual similarities with Novel View Synthesis (NVS) methods like NeRF and GDC.
However, while NVS methods aim to reconstruct 3D scenes from 2D images, NWMs objective is more ambitious: it seeks to train a single model capable of navigating across diverse environments.
Unlike NVS approaches, NWM does not rely on 3D priors but instead models temporal dynamics directly from natural video data.A key technical component of NWM is the Conditional Diffusion Transformer (CDiT), which predicts the next visual state given past image states and actions as input.
Unlike a standard Diffusion Transformer (DiT), CDiT offers significantly better computational efficiency.
Its complexity scales linearly with the number of context frames, allowing it to handle larger models with up to 1 billion parameters across diverse environments and agent embodiments.
This efficiency allows CDiT to require four times fewer FLOPs than a standard DiT, all while delivering superior future prediction results.The research team conducted extensive experiments to validate NWMs capabilities.
One notable experiment involved using NWM in unfamiliar environments, where it benefited from training on unlabeled, action-free, and reward-free video data from the Ego4D dataset.
Qualitatively, NWM demonstrated improved video prediction and generation on individual images.
Quantitatively, it achieved more accurate future predictions on the Stanford Go dataset when trained with additional unlabeled video data.
These results highlight NWMs ability to generalize effectively across unseen environments, a key advantage for real-world navigation tasks.In summary, the Navigation World Model (NWM) represents a powerful leap forward for robotic navigation.
Its ability to simulate, plan, and adapt to new constraints makes it a promising approach for building more autonomous and flexible robotic systems.The project page is available here.
The paper Navigation World Models is on arXiv.


Author: Hecate He |Editor: Chain Zhang
Like this:LikeLoading...





Unlimited Portal Access + Monthly Magazine - 12 issues


Contribute US to Start Broadcasting - It's Voluntary!


ADVERTISE


Merchandise (Peace Series)

 


Fortnite will return to iOS as court slams Apple's disturbance and cover-up


If you’re in the market for a $1,900 color E Ink monitor, one of them exists now


DNA links modern pueblo dwellers to Chaco Canyon people


Raspberry Pi cuts product returns by 50% by altering its pin soldering


Research study roundup: Tattooed tardigrades and splash-free urinals


Sundar Pichai says DOJ demands are a “de facto” spin-off of Google search


Windows RDP lets you log in utilizing withdrawed passwords. Microsoft is OK with that.The ability to use a withdrawed password to visit through RDP takes place when a Windows maker that's checked in with a Microsoft or Azure account is configured to allow


RFK Jr. rejects cornerstone of health science: Germ theory


Millions of Apple Airplay-enabled devices can be hacked via Wi-Fi


NASA just swapped a 10-year-old Artemis II engine with one nearly twice its age


CBS owner Paramount reportedly intends to settle Trump’s $20 billion lawsuit


Nintendo imposes new limits on sharing for digital Switch games


After convincing senators he supports Artemis, Isaacman election advances


First Amendment doesn’t just protect human speech, chatbot maker argues


Republicans want to tax EV drivers $200/year in new transport bill


The end of an AI that shocked the world: OpenAI retires GPT-4


Redditor accidentally reinvents discarded ’90s tool to escape today’s age gates


Intel says it’s rolling out laptop GPU drivers with 10% to 25% better performance


OpenAI rolls back update that made ChatGPT a sycophantic mess


Baykar and Leonardo Partnership Officially Exchanged at Turkey – Italy Intergovernmental Summit


GA-ASI Delivers MQ-9A Block 5 Extended Range UAS to USMC


US Army Selects Near Earth Autonomy and Honeywell to Deliver Autonomous Black Hawk Logistics Solution


NASA Tests Ultralight Antennas


Altitude Angel and AirHub Sign Partnership Agreement


Piasecki Aircraft Acquires Kaman Air Vehicles' KARGO UAV Program


MBDA Invests in UK’s Hydra Drones


UK Royal Navy Jet-Powered Drones Project Completed


Volz Servos Gets EN/AS 9100 Aviation Certificate


China Unveils Thermos Drone


Why DJI drone batteries drain themselves


FlytBase intros $99/month plan to scale remote drones


Your guide to Day 1 of the 2025 Robotics Summit Expo


A guide to everything going on at the 2025 Robotics Summit Expo


NexCOBOT to demonstrate EtherCAT AI robot controllers at Robotics Summit


BurgerBots opens restaurant with ABB robots preparing fast food


Epson adds GX-C Series with RC800A controller to its robot line


DeepSeek Unveils DeepSeek-Prover-V2: Advancing Neural Theorem Proving with Recursive Proof Search and a New Benchmark


Sam Altman's World unveils a mobile verification gadget


Gruve.ai guarantees software-like margins for AI tech consulting, interfering with decades-old Industry


The increase of retail financiers in secondaries, and why postponed IPOs will end up being the standard


Social Agent's new app lets you book a photographer within 30 minutes


Cast your vote: Help shape the A Technology NewsRoom All Stage agenda


Side Event submission deadline extended for A Technology NewsRoom Sessions: AI


5 days left: $210 ticket discount rate and 50% off on the second for A Technology NewsRoom Sessions AI


Nuvo, a network for B2B trade, has nabbed $34M from Sequoia and Spark Capital


Supio, an AI-powered legal analysis platform, lands $60M


AI sales tax startup Kintsugi has doubled its valuation in 6 months