CRTR

Key Question: Can we improve temporal reasoning by learning better representations?

Method: A novel negative sampling scheme provably enables learning temporal representations.

Result: CRTR learns representations that enable solving Rubik's Cube without hand-crafted heuristics and improves the success rates overall.

CRTR makes representations reflect the structure of combinatorial tasks. t-SNE visualization of representations learned by CRTR (right) and CRL (left) for Sokoban. Colors correspond to trajectories; three frames from two trajectories are shown in the center and linked to their representations. CRL embeddings cluster tightly within trajectories, failing to capture global structure and limiting their usefulness for planning. In contrast, CRTR organizes representations meaningfully across trajectories and time (vertical axis).

In classical AI, perception relies on learning spatial representations, while planning—temporal reasoning over action sequences—is typically achieved through search. We study whether such reasoning can instead emerge from representations that capture both spatial and temporal structure. We show that standard temporal contrastive learning, despite its popularity, often fails to capture temporal structure, due to reliance on spurious features. To address this, we introduce Contrastive Representations for Temporal Reasoning (CRTR), a method that uses a negative sampling scheme to provably remove these spurious features and facilitate temporal reasoning. CRTR achieves strong results on domains with complex temporal structure, such as Sokoban and Rubik’s Cube. In particular, for the Rubik’s Cube, CRTR learns representations that generalize across all initial states and allow solving the puzzle much faster than BestFS—though with longer solutions. To our knowledge, this is the first demonstration of efficiently solving arbitrary Cube states using only learned representations, without hand-crafted search heuristics.

CRTR performs well in all the evaluated domains. Success rate as a function of search budget across five domains. CRTR compared to baselines: CRL, Supervised and DeepCubeA. Results are averaged over 5 seeds; shaded regions indicate standard error. For the Rubik's Cube, both the supervised and random baselines achieve a success rate of zero for all search budgets.

Distances given by CRTR representations reflect the temporal structure well. Correlation (Spearman's ρ) between the distance induced by learned embeddings and actual distance across the training, CRTR compared with CRL.

CRTR solves most tasks without requiring any search. Adding BestFS results in shorter solutions. We plot the fraction of configurations solved with a solution length of at most x, while limiting the number of nodes created to 6000. Surprisingly, on the Rubik's cube CRTR achieves a higher success rate without search, solving all board configurations within the budget.

CRTR without search exhibits a block-building-like behavior. Intermediate states from solving a randomly scrambled cube, illustrating how the algorithm gradually builds partial structure. The average solve is about 400 moves, and we see similar block building behavior across solves.

Contrastive Representations for Temporal Reasoning

Abstract

Experiments

Main Results

Correlation Analysis

No Search Results

Block Building Behavior