Google Research has announced Titans and MIRAS, a new architecture-and-framework duo designed to help AI models learn, update, and retain information while they are running, enabling them to handle massive context windows far beyond what todayβs transformers can process.Β
In a year where AI models continue to hit limits around memory, speed, and context length, Googleβs latest research signals a major shift in how machine learning systems can adapt in real time.
This news lands at a time when transformer bottlenecks are becoming painfully obvious.Β
The world is generating longer documents, more complex sequences, larger genomes, and richer multimodal data streams.Β
And todayβs architecture struggles to keep up. Titans and MIRAS aim to fix that, not with incremental tweaks, but with a fundamental rethinking of how AI memory should work.

Why Were Titans and MIRAS Needed in the First Place?
The success of the Transformer architecture came from its attention mechanism. It is a way for models to βlook backβ at previous inputs. Yet attention becomes extremely expensive as sequences grow longer.Β
The computational cost scales quadratically. At hundreds of thousands or millions of tokens, traditional Transformers simply break down.
Researchers have tried alternative solutions, efficient RNNs, linear recurrent architectures, and state space models (SSMs) like Mamba-2.Β
These models scale linearly and allow fast processing, but at the cost of compressing all context into a fixed-size memory. That compression means crucial details often get lost.
This is where Titans and MIRAS step in. Instead of compressing everything into a static memory vector, they allow a model to update its memory on the fly, preserving the richness of long sequences without slowing down.
What Exactly Are Titans and MIRAS?
Google describes the two systems as complementary but distinct:
- Titans: The actual architecture is a fast, expressive, deep-memory model that merges RNN-like efficiency with Transformer-like precision.

- MIRAS: The theoretical foundation that generalizes how memory, attention, forgetting and online optimization should work in any sequence model.
Together, they bring a new ability to the forefront of AI research: test-time memorization, meaning an AI system can rewrite its own long-term memory while actively processing data, without requiring retraining.
This shifts AI away from the static paradigm, where the model only βknows what it was trained onβ towards a dynamic paradigm where the model learns continuously as it reads.
How Titans Reinvents Long-Term Memory in AI
Unlike traditional RNNs or SSMs, which rely on small vectors or matrices to store memory, Titans introduces a deep neural network (a multi-layer perceptron) as its long-term memory module.Β
This allows it to express relationships, concepts, and structures far more richly than previous approaches.

The architecture does not just store information, it interprets it. As new tokens stream in, Titans evaluate whether each piece of data is relevant, surprising, or redundant.
The βSurprise Metricβ: What Should Be Remembered?
Borrowing inspiration from human psychology, Titans uses an internal βsurprise metric,β which compares what the memory expects with what the new input provides.Β
A large difference signals something important.
- Low surprise means the input matches expectations and can be ignored.
- High surprise means the input breaks expectations and must be stored.
In everyday terms, if you are reading financial data and suddenly encounter a βbanana peel,β that anomaly will stick in your memory. Titans behaves similarly.
Momentum and Forgetting
Two refinements for Titans:
- Momentum, which ensures that information surrounding surprising tokens is also captured, useful when meaning spans several words or sentences.
- Adaptive forgetting (weight decay), which allows Titans to erase outdated or irrelevant information to make room for new memories during extremely long tasks.
This combination produces a model that learns continuously without drowning in its own accumulated history.
How MIRAS Rewrites the Theory of Sequence Modeling
MIRAS provides a unified lens through which different architecture like Transformers, RNNs, SSMs, can be viewed as variations of the same concept: associative memory systems.
It breaks down every sequence model into four components:
- Memory architecture: how information is stored
- Attentional bias: what the model prioritizes
- Retention gate: how forgetting or regularization works
- Memory algorithm: how updates occur over time
This allows researchers to explore new memory rules, attention rules, and retention mechanisms beyond the traditional mean squared error (MSE) paradigm.
Moving Beyond MSE
Most sequence models today use MSE or dot-product similarity as the basis for attention and memory updates. MIRAS argues that this limits expressiveness and creates sensitivity to outliers.
Applying this idea, Google introduced three MIRAS-based models like YAAD, MONETA, and MEMORA, each using non-Euclidean objectives to create more robust or more stable memory updates.
- YAAD uses Huber loss to reduce outlier sensitivity.
- MONETA uses generalized norms for strict, stable memory handling.
- MEMORA enforces probabilistic consistency for clean, controlled updates.
These variants prove that stepping outside the standard paradigm can provide significant accuracy and robustness improvements.
What Did Experiments Reveal About Titansβ Performance?
Google tested Titans extensively across multiple categories. The results consistently showed an advantage over leading models such as Transformer++, Mamba-2, and Gated DeltaNet.
Language Modeling and Reasoning
Across C4, WikiText, HellaSwag, and PIQA, Titans achieved:
- Lower perplexity
- Higher accuracy
- Faster inference
The MIRAS-based models also outperformed comparable baselines, proving that better memory rules lead to better results.
Genomic and Time-Series Applications
Because these tasks often involve extremely long sequences measured in millions of elements, traditional architectures struggle. Titans, however, generalized effectively, demonstrating its potential far beyond text.
Long-Context Performance
Perhaps the most remarkable results came from BABILong, a benchmark requiring models to reason over facts spread across extremely long documents. Titans:
- Outperformed all baselines
- Surpassed even massive models like GPT-4
- Scaled reliably to context windows beyond 2 million tokens
This showcases the true strength of Titans: Reliable memory over extraordinary lengths.
Why Is Deep Memory So Important?
Ablation studies showed that the depth of the memory network had a bigger impact than its size. Deeper memory modules consistently delivered lower perplexity and maintained performance even as sequences grew dramatically longer.
This suggests that future models may not need massive attention heads or huge parameter counts, they need deeper, smarter memory.Β
Key Takeaways
- Titans enables real-time long-term memory updates without retraining.
- MIRAS provides a unified framework for all sequence models as associative memories.
- Deep neural memory modules outperform traditional fixed-size RNN states.
- Surprise-based learning lets the model store only novel or important information.
- Titans achieves state-of-the-art long-context performance, surpassing GPT-4 in some tests.
- Models scale to 2M+ tokens while maintaining accuracy and efficiency.
Dipti Arora
AuthorDipti Arora is a Senior Content Writer with over seven years of experience creating impactful content across Digital Marketing, SEO, technology, and business domains. She has a strong background in managing news verticals and delivering editorial excellence. Dipti has contributed to leading publications such as The Times of India and CEO News, where her research-driven storytelling and ability to simplify complex subjects have consistently stood out. She is passionate about crafting content that informs, engages, and drives meaningful results.