Will Legal AI Get “Brain Rot,” Like the Rest of the Web, Or Are Lawyers Safe?
A multi-university study published on Cornell’s arXiv preprint server has sparked alarm across the tech industry. Researchers found that large language models exposed to low-quality online content experience cognitive decline. They called it brain rot. Accuracy drops from 74.9% to 57.2%. Long-context comprehension falls from 84.4% to 52.3%. The models start skipping steps in their thought process, rushing to superficial conclusions. Ethical consistency weakens, and what the researchers called “personality drift” sets in.
Sam Altman and other tech leaders warn that the dead internet theory is becoming reality. Much of what exists online today is bot-generated content, clickbait, and AI slop. As these models train on an increasingly polluted web, their performance degrades in a dose-response fashion. Even a 50-50 mix of quality and junk data causes measurable harm.
The question for lawyers is straightforward. Will AI tools used in legal practice suffer the same fate?
The Self-Cannibalization Problem
The web faces a degradation spiral that gets worse with each iteration. AI generates content. That content floods the internet. Future AI models train on that AI-generated content. The quality degrades. The degraded AI then produces even worse content. The cycle repeats.
AWS researchers found that 57% of content published online is now AI-generated or AI-translated. This creates a feedback loop where models increasingly train on the output of other models rather than on original human thought and expertise. Each generation consumes a diet more contaminated than the last.
The problem compounds because AI-generated content tends to be optimized for engagement metrics rather than accuracy or depth. It is designed to rank well in search results, accumulate clicks, and generate shares. These are not the same as producing careful analysis or reliable information. When the next generation of models trains on engagement-optimized content rather than expertise-driven content, the cognitive decline accelerates.
Former Twitter CEO Jack Dorsey warned that it will become impossible to tell what is real from fake. The above-mentioned study in the opening paragraph provides evidence for why that matters. It is not just a verification problem. It is a quality problem that degrades the models themselves.
Why Legal Practice Operates Differently
Legal practice sits in a different data economy than general-purpose AI scraping random social media feeds. Most legal AI tools do not rely primarily on the open web for their core functionality. They are built on curated legal databases, case law repositories, statutes, and firm-specific document collections.
When a lawyer uses AI for legal research, the system typically searches Westlaw, LexisNexis, or proprietary databases containing vetted judicial opinions and legislative materials. These sources are not cluttered with viral tweets or machine-generated nonsense. They are professionally maintained, structured, and authoritative.
Contract analysis tools train on millions of agreements, often sourced from law firms and corporate legal departments. Document review systems learn from annotated discovery sets prepared by experienced lawyers. These are supplied materials, your case files, discovery, transcripts, and curated databases, not open web sludge.
Retrieval-augmented systems that ground answers in trusted repositories are insulated from the train-on-sludge dynamic because they are not continually pre-training on junk content. Day-to-day legal workflows rely on targeted retrieval and supervised drafting, not ongoing indiscriminate pretraining.
Ethics guidance already pushes firms in that direction. The ABA’s Formal Opinion 512 tells lawyers to understand capabilities and limits, verify outputs, and protect confidentiality. The New York City Bar’s 2024-5 opinion adds informed-consent expectations for tools that process client data. These requirements naturally steer firms toward systems with controllable inputs and human review.
Where the Self-Cannibalization Risk Enters Legal AI
The insulation is not complete. The self-cannibalization problem can infiltrate legal AI in several ways.
First, many legal AI tools integrate general-purpose language models as their foundation. GPT-4, Claude, or Llama serve as the base, and then legal-specific layers are added on top. If the underlying model degrades because it trained on AI-generated web content during its general training, that decline affects the specialized legal application built on it. Firms that rely on third-party AI platforms do not control what data those platforms consume during ongoing training.
Second, legal content itself is not immune to AI contamination. Legal blogs, practice guides, form libraries, and even some case summaries are increasingly AI-generated. If a legal AI tool supplements its curated databases with content scraped from legal websites, and those websites contain AI-generated summaries of cases or AI-drafted analysis, the tool ingests second-generation content. The degradation spiral has entered the legal ecosystem.
Third, quiet continual tuning on mixed data poses risk. If a vendor improves a hosted model by periodically fine-tuning on customer prompts and outputs, and if some customers are using AI to generate the documents they then ask the AI to analyze, the model trains on its own species. The dose-response effect identified by researchers means even partial contamination degrades performance.
Fourth, retrieval is only as good as the corpus. When a system cannot find an answer in its curated legal database, it may fall back on general web search or retrieval from online sources. If those supplemental sources include AI-generated legal commentary, blog posts, or crowd-sourced explanations, the output inherits the flaws of second or third-generation content.
Fifth, smaller legal tech companies building AI tools may not have access to high-quality proprietary datasets. They might rely more heavily on publicly available content scraped from legal blogs, forums, or open-access resources. If those sources are increasingly AI-generated, the new tools launch with degradation already baked in.
The first study mentioned above thought-skipping phenomenon maps directly to legal tasks where models jump to conclusions without walking through the issues. That looks like missing limiting language, skipping adverse authority, or collapsing multi-factor tests. When AI trains on AI-generated legal content that already exhibits thought-skipping, the problem amplifies.
The Curation Imperative
The self-cannibalization dynamic makes data curation more important than ever. It is no longer enough to avoid low-quality content. Firms must also filter out AI-generated content unless that content has been verified by human experts and meets the same quality standards as original human work.
This creates new due diligence obligations when selecting legal AI vendors. Ask not just where the training data comes from, but whether the vendor has implemented controls to exclude AI-generated content from training sets. Ask whether they can distinguish between original judicial opinions and AI-generated case summaries. Ask whether their legal blog scraping excludes AI-authored posts.
Some vendors may not even know the answer. The contamination can be subtle. A legal research database might include case annotations written by humans five years ago but updated last year using AI. A contract template library might contain agreements originally drafted by lawyers but revised using AI tools. These hybrid sources blur the line between human expertise and machine-generated content.
Preventing and Minimizing Brain Rot
Law firms cannot fully control the foundational models their tools rely on, but they can take protective measures. Treat this as a training-time safety and governance problem, not a vague concern.
Own the diet. Keep a clean, versioned, and scoped knowledge base for retrieval. Admit sources by policy, not convenience. Prioritize primary sources like court opinions, statutes, and regulations over secondary sources like blog posts and legal commentary. When secondary sources are included, verify they are human-authored or human-verified. Require citations with every answer that can be clicked back to a canonical document.
Ban silent continual tuning. Contractually prohibit vendors from training on your prompts or outputs without explicit permission. If a vendor offers fine-tuning, demand a separate, auditable dataset with provenance documentation showing the content is original human work, not AI-generated material. Include a rollback plan when metrics slip. ABA 512 and NYC Bar guidance both support this level of supervision and documentation.
Add cognitive health checks. Borrow the above-mentioned study’s approach and run regular evaluation suites on your most common tasks. Test citation accuracy, adverse-authority recall, long-context summarization, and privilege spotting. Track scores over time. Investigate drops before they reach clients. Degradation from self-cannibalization happens gradually, making longitudinal monitoring essential.
Prefer retrieval over re-training. If you need domain adaptation, start with retrieval prompts and structured instructions. Reach for fine-tuning only when you have a clean, labeled, rights-cleared dataset with documented provenance showing it is free of AI-generated contamination. Include regression tests to prove you did not trade depth for speed.
Penalize thought-skipping. Enforce structured outputs that surface issue lists, elements, and authorities, not just bottom lines. Internally, require human attorneys to review those structures before any filing leaves the building. This aligns with competence and supervision duties in ABA 512.
Separate advocacy modes. Maintain two controlled profiles: pro-client drafting and neutral verification. The second pass should explicitly search for adverse law and exceptions. Make that second pass mandatory.
Track provenance and recency. Stamp each answer with the sources consulted, their dates, and ideally their authorship provenance. Distinguish between primary legal sources, human-authored secondary sources, and AI-assisted or AI-generated content. Stale authority causes more damage than most machine learning quirks, and provenance transparency disciplines everyone.
Log everything, disclose smartly. Keep prompt, source, and version logs for every matter. They are invaluable when a result is questioned. NYC Bar’s opinion contemplates robust supervision and client communication. Good logs make both easier.
Align with high-risk expectations early. If you serve EU clients or courts, baseline your stack against the AI Act’s human oversight and transparency requirements now. It is easier to meet those standards in design than to retrofit later.
Put it in the engagement letter. Explain which tasks may be AI-assisted, how you supervise, and how efficiency gains flow to the client. ABA 512 and state guidance emphasize informed consent and realistic billing tied to actual time and review.
When It Really Is a Non-Issue
If your workflow looks like this, brain rot and self-cannibalization are mostly irrelevant. You use a well-understood base model hosted in a non-training configuration. All answers are grounded in a curated internal corpus or licensed databases containing verified primary sources. No continual fine-tuning happens without your explicit, audited datasets free of AI-generated content. Every deliverable passes human review guided by structured checklists aligned to professional rules.
You are not feeding your tools junk or second-generation AI content, so you are not going to get degradation drift. Your risks are the classic ones already flagged by bars and courts. Those are better handled by verification and supervision than by worrying about the state of Twitter.
Where the Law and Policy Are Heading
Ethics bodies are converging on core duties: competence with the technology, confidentiality, informed client consent, supervision of outputs, and candor to tribunals. Expect more courts and clients to ask for disclosure of AI use, and more regulators to treat some legal AI as high-risk, with explainability and logging obligations attached.
The EU AI Act represents the most comprehensive framework to date. High-risk AI systems must comply with strict requirements before they can be put on the market, including detailed documentation, risk management systems running throughout the entire lifecycle, data governance ensuring training datasets are relevant and representative, and design for human oversight. Providers and deployers face registration obligations, quality management requirements, and incident reporting duties.
The Bottom Line
Brain rot is real in the context. The self-cannibalization dynamic makes it worse. AI training on AI-generated content creates a degradation spiral that compounds with each iteration. This is a useful warning about what happens when data curation fails.
Legal practice sits in a different context. When firms control the data diet, actively filter out AI-generated content from training sources, avoid silent re-training, and verify outputs, they are not at the mercy of a decaying open web or a self-consuming AI ecosystem.
If you treat model quality and data provenance like you treat privilege, conflicts, and work product, your AI will do just fine. If you do not, the rot is not in the model. It is in the workflow.
My Take
Ever since I started using AI a few years ago, I suspected brain rot was inevitable even if I didn’t have the term for it yet. It’s not rocket science. AI trains, in part, on the internet. AI then creates content. That content gets published online because, let’s face it, it’s faster and cheaper than doing it the old-fashioned way. When the smoke clears, AI is essentially feeding on itself. It’s a race to the bottom.
Fortunately, lawyers can avoid that fate by controlling their AI’s diet. That’s the difference between systems that stay sharp and those that slowly dull with every iteration. It takes more effort and more budget, but it’s an act of self-preservation and client protection. The firms that treat AI hygiene as seriously as privilege or conflicts will come out ahead. The ones that don’t will eventually discover that the rot wasn’t in the model. It was in the workflow.
Sources:
Xing, S., Hong, J., Wang, Y., Chen, R., Zhang, Z., Grama, A., Tu, Z., & Wang, Z. (2025). LLMs Can Get “Brain Rot”: Continual Pretraining on Junk Web Text Induces Cognitive Decay in Large Language Models PreprintPreprintPreprint. arXiv.
Okemwa, K. (2025, October 22). Reddit co-founder says much of the internet is dead — and Sam Altman agrees. Windows Central.
Dubey, A., & Garg, S. (2024). Model Autophagy Disorder: Synthetic Data Can Make Large Language Models Forget. arXiv.
Mint Staff. (2024, May 9). Don’t trust, verify, advises Jack Dorsey: In 5–10 years, we won’t know what information is real or not. Mint.
