Researchers just published something that every small business owner using AI to edit documents should read carefully.
A team of scientists built a test called DELEGATE-52. They gave 19 AI models - including the top models from OpenAI, Google, and Anthropic - long, complex document editing tasks across 52 professional fields: contracts, code, music notation, scientific writing, and more. They wanted to see what happened when you asked an AI to handle a document through a long back-and-forth workflow, the same way you might if you were using it as a writing assistant for a client proposal.
The result: even the best AI models silently corrupted an average of 25% of document content by the end of a long workflow.
Not missing a step. Not giving a wrong answer. Quietly changing the document itself in ways that were hard to catch on a quick skim.
What "Corruption" Actually Means Here
This isn't the AI hallucinating a fake court case or inventing a statistic. The corruption the researchers identified is different - and in some ways more dangerous, because it's subtler.
Think of it like a contractor who renovates your kitchen and, while they're installing the new cabinets, accidentally disconnects a gas line that wasn't part of the job. They finish the work, everything looks fine, and you don't notice until something goes wrong.
In this study, the AI would complete the requested edit - and while doing so, change something else in the document that it wasn't supposed to touch. A number in a table. A clause in a contract. A footnote reference. Errors that compound over a long editing session.
The more back-and-forth with the document, the worse it got. Larger documents degraded faster. And here's the part that might surprise you: adding agentic AI tools - where the AI can take actions, not just respond - made things worse, not better.
What Kind of Business Gets Hit by This?
If you're using AI only for drafting - asking it to write something from scratch and then you clean it up - this is less of a concern. You're reviewing everything anyway.
The risk shows up when you're using AI in a more autonomous way:
- Asking an AI assistant to "update this contract with the new payment terms"
- Having AI revise and reformat a long proposal while keeping the original numbers
- Using an AI to clean up client-facing documents over multiple editing sessions
- Letting AI summarize and rewrite earlier sections while you add new ones
If any of that sounds like how you work, the DELEGATE-52 findings are directly relevant to you.
The Practical Response
This doesn't mean stop using AI for documents. It means build a small amount of verification into the workflow.
Before: Think of AI like a brilliant intern who sometimes fixes one thing and accidentally breaks another. You wouldn't send an intern's work to a client without a read-through. Same principle.
Spot-check your numbers. If a document has dollar amounts, dates, or specific metrics, those are the highest-risk elements. Run a quick check on those specifically after any AI editing session.
Use version control. Before handing a document to AI for editing, save a copy. That way you can compare before and after with a simple diff tool - many word processors and cloud platforms do this automatically.
Keep sessions shorter. The research shows degradation gets worse over longer workflows. If you have a big document to revise, break the AI session into smaller chunks rather than one long conversation.
Treat long documents with extra caution. The bigger the file, the higher the risk, according to the data. A one-page proposal is lower risk than a 40-page contract.
The Bigger Picture
The researchers behind DELEGATE-52 put it plainly: current AI models are unreliable delegates. They're good assistants for short tasks and drafts. They struggle to maintain fidelity across long, complex document workflows.
That's not a knock on AI - it's useful information about where the technology actually is right now. Every tool has limits. Knowing this one means you can work around it instead of getting surprised by it.
The study tested 19 models, including the flagships from all three major AI providers. None of them got it right consistently. This is a systemic limitation, not a bug in one product.
Terry Blake covers AI tools and business technology for beginners and skeptics at The Useful Daily. Source: DELEGATE-52: LLMs Corrupt Your Documents When You Delegate, published April 2026 on arXiv by Philippe Laban and colleagues. The study tested 19 LLMs across 52 professional domains in long delegated workflows.