Who Will Win? LLMs + Humanizers VS AI Detectors

Cheating on college essays and papers is nothing new. Plagiarizing from articles or books dates back hundreds of years. As Wikipedia grew in popularity and quality, cheaters turned to it. However, Large Language Models (LLMs) like ChatGPT, Grok, Gemini, and others have quickly become the favored tool for cheating. LLM-generated content eludes plagiarism detectors because the output is not identical to the material it has ingested. With a bit of “prompt engineering”, you can make an endless stream of “new” content on the same subject.

Many educators have prematurely given up, believing that AI-generated content can't be detected. This fatalistic view is often accompanied by the statement, "We need to teach our kids to be AI proficient or they will be left behind." Those of us familiar with AI understand that it doesn't take long to be "proficient." I recently spoke to a university professor who shares the above view, and he teaches writing courses!

More troubling is research from MIT indicating that the use of AI can harm our brains. One doesn't "strain" their brain when the LLM does all of the work. It's like trying to become stronger by watching others in the gym. Like muscles, our brains need to be exercised. We learn to read by reading. We learn math by solving math problems. And, we learn to write by writing.

This AI controversy reminds me of the advent of calculators (yes, I was around back then). Why bother learning math when a calculator can do it all? Schools eventually settled on the idea that once a student masters a mathematical concept, they can use calculators to solve even more complex problems more quickly. I suspect writing will evolve in a similar way. We still need to teach how to write, but we will allow AI for ideation, content editing, or content generation where it makes sense and increases productivity to solve more complex problems.

So, can AI-generated writing be detected? Absolutely, but this will be an “arms race” between the LLMs, “humanizers”, and AI detectors. AI detectors for written content—often called AI text classifiers or AI-generated content detectors—work by analyzing patterns in the text that are statistically more likely to appear in AI-generated language than in human-written text. Humanizers will take AI-generated content and replace statistically overused phrases with semantically similar, yet less frequently used ones. Humanizers will run them through AI detectors until they “pass” detection. But the problem with rewriting is it tends to create less intelligible and less human-like output which can be be detected by the best AI detectors like Pangram Labs.

Many AI detectors simply use LLMs to detect LLM generated text, an approached that worked sufficiently well in the early days of LLMs. This approach can result in many false negatives, where you fail to detect AI-generated text, but also too many false positives where you falsely accuse a student of cheating when in fact they were not. Ideally, you want to detect as many violations with as few of mistakes.

Many of Pangram’s competitors produce AI detectors along with humanizer products. They sell products to help students cheat while promising teachers they can detect cheating! This “business model” is like selling weapons to both sides in an escalating war. Unfortunately, there is likely more money on the humanizer side than the AI detection side, so the humanizers appear to have an advantage.. Pangram Labs unique approach was to build their own, specialized AI models to detect AI-generated text. These models are constantly being updated as new LLM models and humanizer products are released.

Pangram has recently released plagiarism detection in beta as part of their platform. Not only is cheating bad for your brain, but it’s going to be bad on your reputation as well.

However, there are cases where you need a good AI writing tool to increase writer productivity. Not all AI generated content is “cheating,”