Why Generative AI Hallucinates and What Lawyers Can Do About It
Courts are losing patience with fabricated citations. Understanding why AI invents legal facts, and how to stop it, has become essential to modern practice.
The Mathematics Behind AI Hallucinations
Hallucination in artificial intelligence refers to the confident generation of false or fabricated information. In the legal context, that can mean citing a case that does not exist or misquoting a statute that never said what the AI claims it did. The cause is not mystery or malice. It is mathematics. Large language models predict the next word based on probability rather than verified fact. They are trained on patterns in vast data sets, not on a database of truth.
When a model faces a gap in knowledge, it fills it with something statistically plausible. The fundamental problem is that AI systems cannot distinguish between “this pattern appeared in my training data” and “this is legally accurate.” If the training data contained errors, conflicting information, or insufficient legal material, the model may fabricate a new precedent that sounds legitimate. This happens most often when the AI operates without an external reference system such as retrieval-augmented generation (RAG), which grounds answers in verified sources. Even then, the quality of the retrieved data matters. A retrieval system tied to outdated or misclassified law can still generate fiction in formal language.
From Software Bug to Professional Crisis
In law, a hallucinated citation is not just a software bug. It is a potential breach of duty. A lawyer who files a document containing fake authorities can face sanctions, malpractice exposure, and reputational harm. Courts have already begun responding. In 2023, a New York attorney was sanctioned for citing six nonexistent cases in Mata v. Avianca. A federal judge fined the lawyer and ordered mandatory training on AI use. Similar episodes have since surfaced in Florida, Texas, and the United Kingdom.
The problem isn’t just that fake citations exist—it’s that they’re presented with unwavering confidence. Unlike human researchers who signal uncertainty (“I believe…” or “It appears…”), AI systems generate fabricated cases with the same authoritative tone as genuine precedents, complete with realistic citation formats, plausible judge names, and legally coherent reasoning. This linguistic fluency conceals factual uncertainty, making fabricated content difficult for even experienced lawyers to detect.
This week, The Guardian reported that a British barrister relied on an AI tool to prepare for a hearing and cited multiple fictitious cases. The judge condemned the lack of verification and noted that “plausibility is not authenticity.” These incidents reveal a pattern documented across multiple jurisdictions. Similar cases have emerged in Canada, where courts sanctioned lawyers for citing non-existent authorities, and in Australia and New Zealand, where legal regulators have issued urgent guidance in response to growing concerns.
The consequences are escalating. In July 2025, a federal court in Alabama took the unprecedented step of disqualifying attorneys from representing their client for the remainder of a case after discovering AI-generated hallucinations in a motion. The court declared that monetary sanctions alone were proving ineffective as a deterrent and directed its opinion to be published and sent to state bar regulators where the responsible attorneys were licensed. This marked a significant shift from fines to career-threatening sanctions.
What the Data Shows
Academic studies confirm that hallucination persists even in domain-specific systems. A 2024 Stanford paper titled “Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools” found that specialized legal models hallucinated between 17 and 33 percent of citations in benchmark tests. The study evaluated LexisNexis’s Lexis+ AI and Thomson Reuters’s Westlaw AI-Assisted Research, both of which use retrieval-augmented generation to reduce errors.
The findings sparked debate. Thomson Reuters contested the methodology, asserting internal testing showed 90 percent accuracy and noting that AI should be used “as an accelerant, rather than a replacement” for attorney research. LexisNexis acknowledged that “no Gen AI tool today can deliver 100% accuracy” but emphasized that its platform is designed “to enhance the work of an attorney, not replace it.” The vendors’ responses highlight a key tension: marketing claims about AI reliability versus empirical evidence of persistent errors.
Another study, “Profiling Legal Hallucinations in Large Language Models”, mapped hundreds of fabricated authorities generated by advanced systems when asked to summarize appellate decisions. The authors concluded that hallucination “remains a structural feature of generative language models, not a solvable glitch.”
These findings align with computer science research showing that models produce errors when prompted beyond their training scope. Because they generate text based on patterns rather than reasoning, they are prone to create legally plausible fiction whenever context is ambiguous.
Reducing Risk: Mitigation Strategies
While hallucinations cannot be eliminated entirely, they can be reduced through layered safeguards. The most effective include retrieval-augmented generation, post-generation validation, and human-in-the-loop oversight. RAG systems improve reliability by grounding responses in verified legal texts. Post-generation validation uses a secondary module or database query to confirm that cited authorities exist and match the quoted language. Human review remains essential: no AI output should reach a client or court without verification by a licensed attorney.
Law firms and corporate legal departments can also mitigate risk through structured prompts and restricted domains. Limiting AI systems to specific, well-curated corpora minimizes speculative output. Clear instructions such as “cite only from these documents” or “verify each citation against the official database” help anchor the model. Even simple red-flag checks—like automatically verifying unfamiliar citations—can prevent embarrassment before filing.
When properly supervised and verified, AI tools offer genuine efficiency gains in legal research, document review, and drafting. The technology can accelerate routine tasks and allow attorneys to focus on higher-level analysis and strategy. Economic pressure to capture these efficiency gains is driving rapid adoption, sometimes before firms fully understand the risks. The key is maintaining appropriate oversight: AI should enhance human judgment, not replace it.
Ethical and Regulatory Dimensions
Legal ethics rules make the issue unavoidable. The American Bar Association’s Formal Opinion 512, issued in July 2024, requires lawyers to maintain technological competence and verify AI-generated work. The opinion states that lawyers have a duty to understand AI tools’ “benefits and risks,” including their “propensity to hallucinate.” Submitting hallucinated authorities violates the duty of candor to the tribunal. The Model Rules of Professional Conduct impose personal responsibility: lawyers cannot delegate factual accuracy to a machine.
The California State Bar’s Practical Guidance, issued in November 2023, provides the most comprehensive state-level framework. It emphasizes that AI-generated outputs must be “critically analyzed for accuracy and bias” and warns that “a lawyer’s professional judgment cannot be delegated to generative AI.” The guidance explicitly addresses billing practices, stating lawyers may charge for time spent reviewing AI output but “must not charge” for time saved by using AI. Multiple state bars, including Florida and the District of Columbia, have since issued similar guidance reinforcing that lawyers cannot delegate judgment to AI systems and must verify all outputs.
Policy debates now focus on regulation. A 2025 working paper, “The Accuracy Paradox in Large Language Models: Regulating Hallucination Risks in Generative AI,” argues that accuracy alone is not enough. It proposes governance frameworks emphasizing transparency, auditability, and context-aware risk management. The authors suggest that providers of legal AI tools may need to meet higher disclosure standards when outputs could influence judicial proceedings.
Best Practices for Lawyers
- Use AI only as an assistant, not an authority.
- Verify every citation and quotation against the original source.
- Adopt firm-wide policies requiring human review of all AI-generated text before submission.
- Log AI interactions for accountability and quality assurance.
- Disclose AI use to clients when relevant to representation.
- Train staff to recognize red-flag patterns such as overly confident or oddly formatted citations.
Many firms now require attorneys to append a certification that “no portion of this filing was generated by AI without human verification.” This is rapidly becoming more than an industry norm—it’s becoming a judicial requirement. Federal courts are taking the lead: Judge Brantley Starr in the Northern District of Texas issued one of the first standing orders in May 2023 requiring attorneys to certify that any AI-generated content has been verified for accuracy.
Similar orders have followed in the Southern District of New York, the District of Delaware, and other jurisdictions nationwide. These orders typically require lawyers to either certify that no AI was used or, if AI was used, confirm that all citations and legal authorities have been verified by a human attorney. The trend underscores that judicial oversight of AI use is no longer optional—it’s mandatory.
Malpractice insurers are paying attention: while most carriers haven’t yet made AI competence a formal underwriting requirement, several now include questions about AI use in applications and offer risk management guidance. Some provide premium discounts for firms with documented AI training protocols and verification procedures.
No Easy Fix: Why Verification Remains Essential
AI hallucinations are not a temporary inconvenience but a fundamental limitation of probabilistic language models. Eliminating them entirely would require a new architecture grounded in verified reasoning rather than predictive text. While researchers are exploring solutions such as chain-of-thought reasoning, constitutional AI, and enhanced retrieval systems, none have proven capable of eliminating the problem completely. Some evidence suggests advanced reasoning models may actually hallucinate more frequently than their predecessors, though the causes remain unclear.
Until then, responsible use depends on human judgment, verified data, and transparent disclosure. The lesson for the legal profession is straightforward: artificial intelligence can accelerate research and drafting, but trust without verification is malpractice waiting to happen.
My Take
The real bottleneck in legal AI is not capability but oversight. The models already draft, summarize, and analyze with astonishing fluency. What lags behind is the human system that governs them. Firms that treat oversight as a design problem rather than a bureaucratic chore will lead the next phase of adoption. The winners will be those that build hybrid pipelines with AI generating, AI verifying, and humans supervising both. In law, efficiency no longer means replacing people with machines; it means building better systems around them.
Sources
- American Bar Association – Formal Opinion 512: Generative Artificial Intelligence Tools (July 2024)
- Artificial Lawyer – Thomson Reuters Contradicts Stanford GenAI Study: ‘We Are 90% Accurate’
- Baker Botts – Trust but Verify: Avoiding the Perils of AI Hallucinations in Court
- Bryan Cave Leighton Paisner – Fake Legal Authorities: AI Hallucination or Professional Negligence? (Canadian Case Ko v. Li)
- California State Bar – Practical Guidance for the Use of Generative AI in the Practice of Law (November 2023)
- Cornell University – The Accuracy Paradox in Large Language Models: Regulating Hallucination Risks in Generative AI
- Cornell University – Why Language Models Hallucinate
- D.C. Bar – Ethics Opinion 388
- Esquire Deposition Solutions – Johnson v. Dunn: Federal Court Escalates Sanctions for AI Misuse (N.D. Ala. July 2025)
- Florida Bar – Ethics Opinion 24-1
- IBM Think – What Are AI Hallucinations?
- IBM Think – Smarter Memory Could Help AI Stop Hallucinating
- IJLET Journal – Hallucinations in Legal Practice: A Comparative Case Law Analysis
- Journal of Legal Analysis – Profiling Legal Hallucinations in Large Language Models
- Justia – Mata v. Avianca, Inc., U.S. District Court Southern District of New York (2023)
- Reuters – AI Hallucinations in Court Papers Spell Trouble for Lawyers
- Ropes & Gray – Judges Guide Attorneys on AI Pitfalls With Standing Orders
- Stanford University – Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools
- The Guardian – Barrister Found to Have Used AI to Prepare for Hearing After Citing Fictitious Cases
- University of New South Wales – AI is Creating Fake Legal Cases With Disastrous Results (Australia & International)
- University of Oxford – Research on Hallucinating Generative Models and AI Reliability
Disclosure: This article was prepared for educational and informational purposes only. It does not constitute legal advice and should not be relied upon as such. All cases, sanctions, and sources cited are publicly available through court filings and reputable media outlets. Readers should consult professional counsel for specific legal or compliance questions related to AI use.