AI learning from AI

Does AI Learn from Data It Has Created Itself?

If AI learns from data it has created itself, there is a risk of self-reinforcing misinformation spirals. This makes it highly relevant to consider how we can safeguard democratic discourse in the age of AI. More and more people now receive their news through social media feeds. This shapes how they interpret reality. At the same time, a growing proportion of social media users are synthetic AI bots. Their purpose is often commercial, to promote specific products or to ensure users remain engaged on the platform. But AI bots also hold enormous political potential.

This is the second post in the trilogy on how AI can influence our perception of reality. It explores how AI often learns from data it has created itself. The first post discussed whether AI, by getting to know us well enough to hyper-personalize communication, can also predict and manipulate us. The final post will address the fundamental dilemma of what is even real, for example, whether humans have free will, or if reality is fundamentally deterministic.

Economic Growth Is Built on the Democratization of Knowledge

Since the Enlightenment in the 18th century, human innovation has increased by democratizing and distributing knowledge. This made possible the specialization and collaboration now seen as the key driver behind economic growth and prosperity.

In many ways, AI will reverse that decentralization of knowledge in a very short time. While a Google search returns thousands of relevant links to a question, LLMs like ChatGPT summarize the content of all those links into a single answer. That’s convenient for the user, but it also raises the risk that the simplicity makes the user interpret the answer as THE truth. Over time, users may begin to suspend parts of their critical thinking. Yet critical thinking is essential for development, including cognitive development.


AI May Undermine That Foundation—Partly Because It Becomes a Truth Filter …

LLMs generate their answers through chain-of-thought reasoning applied to vast data sources. At the same time, the reasoning patterns themselves are learned from similarly vast data sources. This makes the choice of training data, the "reality" the model is exposed to, crucial. AI will only be as reliable as the reality it is allowed to learn from. Just like humans.

Until 2024, AI providers were obligated to validate sources, prioritize verified data, and implement audit mechanisms for their models. They also had to ensure some degree of transparency in how AI adjusted for bias and made decisions.

… And Partly Because AI Is Now Trained on Data of Diminishing Quality

However, the efficiency of AI algorithms generally increases with the amount of data they are trained on. According to AI providers, we’ve now reached the limit of how much verified, complete, and balanced data is available. As of early 2025, the major hyperscalers and LLM providers are training their models on all available user data. 

In practice, this means that Meta is training its AI on data from Instagram, Facebook, and WhatsApp, while Amazon is training on data from Amazon Web Services. At best, these datasets reflect society’s average interpretations of facts. They do not represent objective truths, nor what might be worth aspiring to.

It is therefore worth considering the quality of this data, and who created it. Because much of it is either created or amplified by AI itself.

And Beyond That, AI May Be Feeding Itself with Data

One of the researchers most deeply engaged in how online data originates is Professor Filippo Menczer, an informatics expert at Indiana University. His work has focused especially on how to protect the foundations of democracy, namely democratic discourse. His research centers around two core areas:

So There Is a Real Risk That AI Will Learn from Politically Motivated AI Bots

Some years ago, the so-called “dead internet” hypothesis emerged. It suggested that most online conversations are actually generated by chatbots. The theory remains unproven and should be approached with caution. But according to Menczer, the underlying concern is serious: there is a real risk it could become reality unless Big Tech companies actively work to prevent it. Menczer says he is “very worried.”

The Risk Grows Because AI Is Central to U.S. Strategy

Yet this is not an easy task for AI providers. For instance, U.S. law grants the same rights to legal entities (e.g., corporations) as to individuals. In principle, this means an AI agent has the same rights, including freedom of speech, as a voting citizen. Legally, this makes it very complicated to prevent a "dead internet."

More importantly, AI plays a central role in the United States’ ambition to (re)establish its global leadership position, a geopolitical keystone. At this year’s AI Action Summit in February, JD Vance stated: “The AI race is not going to be won by hand wringing about safety.” That statement was followed in July by the release of the U.S. AI Action Plan (AAP).

The U.S. Acknowledges the Risks, But Prioritizes the AI Race

The AAP aims to accelerate AI development by eliminating (nearly) all regulations that could slow down AI progress , including those related to safety. The White House is aware of the risks, however, and states in the plan’s introduction: "Our AI systems must be free from ideological bias and be designed to pursue objective truth rather than social engineering agendas when users seek factual information or analysis. AI systems are becoming essential tools, profoundly shaping how Americans consume information, but these tools must also be trustworthy.”

The White House thus acknowledges that it faces a double-edged sword, and that maintaining a balanced approach may, in practice, prove self-contradictory. Which priority will ultimately prevail?

That’s Why It’s More Important Than Ever to Approach All Information with Skepticism and Critical Thinking

AI will increasingly define our descriptions of both the present and the past. As George Orwell wrote in 1984 (published in 1949): “Who controls the past controls the future. Who controls the present controls the past”.

From now on, it is more crucial than ever to approach the outputs of LLMs with critical scrutiny. AI is not a knowledge base. It is a statistical model of knowledge bases..

Related posts

Can AI predict us, and can it manipulate us?

If AI can get to know us so well that it can hyperpersonalize its communication, ...

Is Geothermal Energy on the Brink of a Breakthrough?

Geothermal energy has recently experienced major breakthroughs that significantly expand its role in future climate-friendly...

Does AI Challenge Fundamental Trust?

"Does AI challenge basic trust?" was the title for my article in renowned FinansInvest 3/2025. The summary ...

American Politicians Disagree on Stablecoins

American politicians disagree over stablecoins because they may become the tool for deregulating the ...

U.S. Megabanks Are Betting on Stablecoins

The U.S. megabanks are betting significantly on stablecoins, which promise a technologically efficient ...

Will AI Dull Our Consciousness?

There is risk that AI dulls our consciousness and thus evolution. This makes ...