Contrastive Representations for Temporal Reasoning


*Indicates Equal Advising Contribution

Key Question: Can we improve temporal reasoning by learning better representations?

Method: A novel negative sampling scheme provably enables learning temporal representations.

Result: CRTR learns representations that enable solving Rubik's Cube without hand-crafted heuristics and improves the success rates overall.

Teaser Image

CRTR makes representations reflect the structure of combinatorial tasks. t-SNE visualization of representations learned by CRTR (right) and CRL (left) for Sokoban. Colors correspond to trajectories; three frames from two trajectories are shown in the center and linked to their representations. CRL embeddings cluster tightly within trajectories, failing to capture global structure and limiting their usefulness for planning. In contrast, CRTR organizes representations meaningfully across trajectories and time (vertical axis).


Abstract

In classical AI, perception relies on learning spatial representations, while planning—temporal reasoning over action sequences—is typically achieved through search. We study whether such reasoning can instead emerge from representations that capture both spatial and temporal structure. We show that standard temporal contrastive learning, despite its popularity, often fails to capture temporal structure, due to reliance on spurious features. To address this, we introduce Contrastive Representations for Temporal Reasoning (CRTR), a method that uses a negative sampling scheme to provably remove these spurious features and facilitate temporal reasoning. CRTR achieves strong results on domains with complex temporal structure, such as Sokoban and Rubik’s Cube. In particular, for the Rubik’s Cube, CRTR learns representations that generalize across all initial states and allow solving the puzzle much faster than BestFS—though with longer solutions. To our knowledge, this is the first demonstration of efficiently solving arbitrary Cube states using only learned representations, without hand-crafted search heuristics.

Experiments

Main Results

CRTR performs well in all the evaluated domains. Success rate as a function of search budget across five domains. CRTR compared to baselines: CRL, Supervised and DeepCubeA. Results are averaged over 5 seeds; shaded regions indicate standard error. For the Rubik's Cube, both the supervised and random baselines achieve a success rate of zero for all search budgets.

Rubik's Cube Results Lights Out Results Sokoban Results
Digit Jumper Results 15-Puzzle Results Legend

Correlation Analysis

Distances given by CRTR representations reflect the temporal structure well. Correlation (Spearman's ρ) between the distance induced by learned embeddings and actual distance across the training, CRTR compared with CRL.

Sokoban Correlation

No Search Results

CRTR solves most tasks without requiring any search. Adding BestFS results in shorter solutions. We plot the fraction of configurations solved with a solution length of at most x, while limiting the number of nodes created to 6000. Surprisingly, on the Rubik's cube CRTR achieves a higher success rate without search, solving all board configurations within the budget.

Rubik's Cube No Search Lights Out No Search Sokoban No Search Digit Jumper No Search 15-Puzzle No Search
Legend

Block Building Behavior

CRTR without search exhibits a block-building-like behavior. Intermediate states from solving a randomly scrambled cube, illustrating how the algorithm gradually builds partial structure. The average solve is about 400 moves, and we see similar block building behavior across solves.

Cube Visualization