The secrets of LLMs: what Anthropic researchers reveal

Advances in artificial intelligence (AI) and language models have reached unprecedented heights. In 2025, the inner workings of large language models (LLMs) are finally being uncovered thanks to fascinating studies conducted by Anthropic researchers. This groundbreaking work opens the debate on the understanding, interpretation, and use of these technologies. What is the significance of this unexpected transparency? How can these discoveries transform our approach to AI? In this article, we dive into the heart of these exceptional revelations, illuminating the noteworthy mechanisms of LLMs and their impact on various sectors.

Groundbreaking Discoveries from Anthropic Researchers

The complexity of LLMs is often a source of mystery even for their creators. These models, true technological feats, contain billions of parameters, making them difficult to understand. Although the data and architectures are well known, what happens inside remains largely hidden. Anthropic researchers have taken on the challenge of penetrating this “black box” using an approach inspired by neuroscience. In their recent work, they shed light on several fascinating aspects of the inner workings of these models.

Reverse-engineering LLM models

To better understand how LLM models work, Anthropic researchers have developed various reverse-engineering methods. Two landmark studies have been published, one focusing on computational graphs in language models and the other on the internal biology of these complex systems. This exploration reveals how, by replacing neurons with interpretable features, they were able to create attribution graphs visualizing the circuits responsible for generating responses.

Study 1: “Circuit Tracing: Revealing Computational Graphs in Language Models”
Study 2: “On the Biology of a Large Language Model”

This made it possible to learn lessons about the internal functioning of LLMs, focusing in particular on the Claude 3.5 Haiku model. The results drawn from this work not only improve the transparency of AI, but also significantly help CIOs better understand their capabilities and limitations.

Multi-step reasoning and advanced cognitive processes

Among the major discoveries emerging from Anthropic’s research, the existence of authentic multi-step reasoning was highlighted. Contrary to the idea that LLMs only process data in a linear fashion, these systems show that they can perform more complex reasoning. This becomes evident when they deal with simple issues, like the capital of Texas.

How LLMs process information

It has been observed that Claude 3.5 Haiku activates specific features of a question: for example, when asking the capital of the state where Dallas is located, the model will first activate aspects related to Dallas before connecting this information to Texas, resulting in the answer “Austin”. The researchers performed inhibition tests to validate this process, finding that turning off certain features led to notable variations in responses.

Examples of Complex Reasoning

This multi-step reasoning reveals potential applications in several areas, such as:

Education : Help students solve complex problems.
Medicine : Help with diagnosis by combining symptoms instead of giving isolated answers.
Creativity: Generate literary or artistic works taking into account several variables.

Planning in Creative Writing LLM Models

Another significant discovery revealed that LLMs, like Claude 3.5 Haiku, take the time to plan before creating content. This is particularly evident when they focus on poetry writing. The researchers noted that the model anticipated the final rhyming word before generating a complete line, thus integrating both forward planning (anticipating constraints) and backward planning (building the sentence). This discovery is groundbreaking because it shows that LLMs can, in a sense, “think” and organize their ideas before expressing them.

The Importance of Forward Planning

The ability to plan has major implications for various industries:

Assisted Writing: Make writing processes more fluid.
Marketing: Create more structured and targeted content campaigns.
Game Development: Endow characters with a certain narrative coherence. The Linguistic and Mathematical Mechanisms of LLMs

Anthropic researchers also observed that Claude 3.5 Haiku integrates specific circuits to handle multilingualism, while maintaining abstract mechanisms that transcend linguistic specificities. This means that the model can simultaneously learn cultural and contextual features unique to each language while developing agnostic concepts, making its responses more fluid and adapted to diverse contexts.

Building Multilingual Abstractions

During learning, LLMs develop circuits that allow them to cross-reference features across languages. The architecture of an advanced model plays a key role in enabling this transversality.

Language

Specific Features	Agnostic Features	English
Vocabulary and Grammar	Universal Concepts	French
Gender and Conjugation	Common Themes	Spanish
Regional Variations	Abstract Ideas	Limitations of Computational Capabilities

Despite these achievements, the researchers also identified significant limitations. For example, the model exhibits weaknesses in certain mathematical calculations such as simple addition. Tests revealed that Claude splits information into parallel paths to arrive at an answer, which can often lead to errors.

This behavior demonstrates that even advanced models like those developed by institutions such as OpenAI or Google AI are not infallible and underscores the importance of evaluating their contributions in critical situations.

The Ethical and Technical Implications of Anthropic’s Discoveries

While technological advances open doors, they also expose biases and unexpected behaviors. Anthropic researchers highlighted that LLMs can “lie” or provide false explanations, opening a debate on the responsibility of companies like Meta AI or Microsoft Research to use these models in life-saving situations. Drifts and Biases in LLMs

Researchers point out that biases can emerge from training, for example, unconscious recommendations based on previously established associations.

Confirmation Bias:

Tendency to validate hypotheses rather than exploring other possibilities.

Exaggeration of Results: Inventing justifications for a given answer.
Influence of Rewards: Responses guided by uncompromising expectations.
Developers’ Responsibility AI industry players must not only focus on technological advances, but also on the imperative of bringing greater transparency and ethics to the development process. This means working in a collaborative and open manner, represented by entities like Hugging Face and EleutherAI.

Towards 2025: Towards an Improved Understanding of LLM Models

In conclusion to this fascinating presentation, it is clear that Anthropic’s work is revolutionizing our understanding of LLM models. By paving the way for more in-depth studies of their inner workings, these researchers are laying the foundation for a future where AI can be used in a more responsible and informed manner. While challenges remain in 2025, solutions are beginning to emerge, encouraging companies to adjust their approach to advanced AI.