ATLAS: Agentic or Latent Visual Reasoning?

One Word is Enough for Both

Ziyu Guo1,2, Rain Liu1, Xinyan Chen2, Pheng-Ann Heng2
1Meta AI    2CUHK
ATLAS visual reasoning paradigm teaser.
Functional tokens internalize visual operations inside the standard autoregressive loop.

ATLAS

ATLAS pipeline.
Visual operations are represented as functional tokens and generated like ordinary words.

LA-GRPO

LA-GRPO overview.
LA-GRPO strengthens sparse functional-token updates and reduces gradient dilution.

Better reasoning, lower overhead

MethodV*WeMathBLINK Avg.Art.Count.Forensic.IQJigsawM-viewSpatial
Closed-source Models
GPT-4o62.850.661.082.949.279.531.355.359.469.2
Claude-4-Sonnet15.263.049.961.559.235.630.053.347.462.2
Gemini-2.0-Flash73.347.445.356.455.030.325.348.743.658.0
Gemini-2.5-Pro79.171.374.685.578.389.443.385.350.490.2
Standard VLMs
Qwen2.5-VL70.236.222.829.958.30.818.731.30.020.3
LLaVA-OneVision-7B75.423.136.647.043.325.020.738.733.847.6
MiniGPT-v235.611.032.843.613.324.220.334.748.944.8
Gemma-3-27B62.331.732.142.737.521.216.033.331.642.7
Unified Models
Anole25.424.716.431.625.011.714.32.03.027.3
Bagel55.539.451.163.260.837.130.057.339.869.2
Agentic Visual Models
Visual CoT44.528.644.447.057.525.020.752.744.463.6
V-Thinker41.432.535.026.943.319.718.742.051.143.4
VTS-V74.942.851.262.461.732.928.756.149.467.2
Latent Visual Models
LVR77.541.249.459.060.035.625.352.748.165.0
MCOT76.439.647.455.657.533.326.750.745.962.2
CoVT72.838.147.959.060.036.424.041.349.665.0
Monet77.836.941.841.056.722.728.045.340.558.7
Ours
ATLASSFT77.528.946.050.459.226.526.054.748.157.3
ATLASGRPO77.940.350.557.361.734.126.057.743.670.6
ATLASLA-GRPO75.445.051.365.062.537.926.351.353.462.9

Visualization

Qualitative ATLAS examples.
Qualitative examples of localization, annotation, and visual reasoning with functional tokens.
Attention visualization around ATLAS functional tokens.
Attention visualization around functional tokens.

BibTeX

@article{guo2026atlas,
  title   = {ATLAS: Agentic or Latent Visual Reasoning? One Word is Enough for Both},
  author  = {Guo, Ziyu and Liu, Rain and Chen, Xinyan and Heng, Pheng Ann},
  journal = {arXiv preprint},
  year    = {2026}
}