Quetext Logo Detect AI and Plagiarism Confidently with Quetext Get Started
Featured blog Academic Guides
31st Mar 2026
Read Time
13 mins

Key Takeaways from Our Survey 

  • Over 37.8 million submissions-based feedback were analyzed across academic formats 
  • Average submission length: ~672 words (mini-essay range dominates) 
  • Highly original content (0–5% similarity) accounts for the majority 
  • However, ~28% of submissions show moderate to very high similarity 
  • Long-form academic writing (4,000+ words) contributes disproportionately to total word volume 
  • Short answers dominate in count, but standard essays (600–1,200 words) drive the most content volume 
  • A significant number of submissions still fall into high-risk plagiarism categories (30%+ similarity) 
  • 63% of students consider fully AI-generated work as cheating (Turnitin), showing strong awareness of academic integrity. 

The AI Adoption Curve Has Been Staggering 

Before our data: a brief look at how fast things have moved. When ChatGPT launched in late 2022, AI writing was a curiosity. Three years later, it’s a fixture in classrooms across every continent. 

  • 202313% of U.S. teens reported using ChatGPT for schoolwork. Turnitin flags just 3% of submissions as heavily AI-generated when it launches its detector in April. 
  • 2024: ChatGPT schoolwork use among U.S. teens doubles to 26%. Detection tools in higher education jumped from 38% to 68% adoption among institutions. Student AI use overall hits 66%. 
  • 202592% of students in HEPI’s survey report using AI in some form, up from 66% in just one year. 88% acknowledge using generative AI tools for assessments. Turnitin begins detecting “AI bypasser” tools. 
  • 2026Turnitin reports ~15% of all essays submitted have 80%+ AI content, a 5× increase from 2023. Pew Research finds 54% of teens now use AI for schoolwork. Our Quetext data enters here.

AI Adoption Curve

About This Study 

At Quetext, we surveyed users across a dataset of 37,851,702 submissions, comprising over 25.4 billion words, to understand how plagiarism trends are evolving across academic and professional writing. Rather than relying on assumptions or theoretical models, we wanted to understand real-world writing behavior as it’s happening today. 

The users we surveyed span a wide range of writing contexts, including students at various academic levels, professors and teachers managing classroom integrity, content creators producing professional work, and academic researchers working on complex, citation-heavy documents. This diversity makes the dataset one of the most comprehensive looks at originality and similarity in modern writing. 

The goal was simple: let the data speak. What we found both reassures and challenges the academic community, and underscores why originality tools are more critical than ever.

plagiarism data

What does plagiarism data reveal in 2026?

A large-scale survey based on 37.8 million submissions shows that while most content remains highly original, nearly 1 in 3 submissions still contains noticeable plagiarism. The highest volume of writing comes from mid-length academic work (600–1,200 words), and longer documents show increased plagiarism risk due to research complexity. 

With millions of submissions analysed, the data doesn’t just highlight isolated trends; it reveals consistent patterns in how modern writing is created, structured, and, in some cases, where it breaks down. 

Instead of seeing plagiarism as just original or copied, the data shows it exists on a spectrum influenced by factors like assignment length, research difficulty, and the use of AI tools. 

To better understand what’s really happening, we can break the findings down into three key questions:

plagiarism data 2026

What percentage of content is plagiarism-free?  

More than 50% of submissions fall into the highly original range (0–5% similarity). 

This suggests a growing awareness around originality, supported by increased use of plagiarism checkers, citation tools, and AI-assisted editing workflows. 

Where does plagiarism occur most?  

Plagiarism is most commonly found in: 

  • Research-heavy documents (2,000+ words)  
  • Repetitive academic assignments  
  • Poorly paraphrased or AI-assisted content without verification  

These patterns highlight that plagiarism risk is less about intent and more about process gaps in writing and research workflows. 

How Our Findings Compare to Industry Data? 

Despite growing concerns around AI-generated content, students themselves are not blindly embracing it as a shortcut. Research from Turnitin reveals a more nuanced reality: many students are actively questioning the ethical boundaries of AI in education. 

In fact, 63% of students believe using AI to write an entire assignment counts as cheating, a higher percentage than faculty (55%) and academic administrators (45%). This suggests that awareness around academic integrity is not only present among students, but in some cases, stronger than within institutions themselves. This is true as we’ve seen in the outcome of our survey where more than 50% of the submissions were original.  

Similarly, data from the International Center for Academic Integrity shows that a significant percentage of students admit to some form of academic dishonesty during their studies. 

What makes our dataset unique is its scale and real-world application. While industry reports often rely on surveys or institutional samples, this analysis reflects actual submission behaviour across millions of documents, offering a more grounded view of how plagiarism manifests today. 

Submission Trends: Where the Volume Lies 

Short vs. Long Content Distribution

What the data says: 

FormatSubmission Count
Short answers (0–150 words)11M+
Mini-essays (150–600 words)15.6M+ (highest volume)
Standard essays (600–1,200 words)7.6M+
Long-form (4,000+ words)Under 700K combined

The data on submission types shows a clear difference between submission types and similarity scores. While long-form writing (e.g., full-length papers) represents a much more serious and difficult writing task, they have far fewer submissions than short-form submissions (e.g., essays). As a result, short-form writing in higher volumes tends not to be associated with the same level of proximity to a plagiarism incident as longer-form writing. 

The main reason for this discrepancy may be related to the fact that the source management associated with long-form writing can sometimes be quite challenging. As a result, while those longer-form submission types are submitted by fewer individuals, they create larger academic challenges and thus require greater scrutiny to ensure that they are properly cited. 

The relationship between the number of submissions to a source and the type of writing submitted to a source may not be indicative of the similarity between the two. The primary difference between casual writing and serious academic writing lies in the fact that casual writing is done regularly and is generally done with fewer sources; whereas serious academic writing is produced infrequently, and when it is produced, it must have thorough research and source management. 

Word Volume Tells a Different Story

FormatTotal Words
Standard essays (600–1,200 words)6.6B words
Research papers (2,000–4,000 words)3.7B words
Long-form academic (4,000–10,000 words)3.2B words

When looking at raw word volume rather than submission count, the picture shifts dramatically. Standard essays, despite being fewer in number than mini-essays, account for the largest share of total words written, around 6.6 billion. Research papers and long-form academic documents together contribute nearly 7 billion additional words, reinforcing the dominance of in-depth academic writing in overall content volume. 

Why this matters? 

  • Word volume directly correlates with plagiarism risk  
  • Longer documents rely more heavily on external sources  
  • Increased research depth raises the chances of:  
    • Unintentional overlap  
    • Improper paraphrasing  
    • Missing or incorrect citations  
  • High-value academic submissions (research papers, long-form writing) carry the greatest integrity risk 

Plagiarism Score Breakdown: The Reality Check 

Original vs. Plagiarized Content

Similarity RangeSubmission Count
0–5% (Highly Original)19.1M
5–15% (Mostly Original)8.2M
15–30% (Moderate Similarity)4.6M
30–50% (High Similarity)2.4M
50%+ (Very High Similarity)3.7M

What Does This Means?

The Good News:

Roughly 72% of submissions surveyed were classified as mostly or highly original, a genuinely encouraging finding. It reflects growing awareness among writers about the importance of originality, and suggests that tools like Quetext, AI detection platforms, and citation generators are making a measurable difference. Writers today are more equipped than ever to check their work before submitting it. 

The Concern: 

Despite the positive trend, approximately 28% of submissions showed moderate to severe similarity. That translates to over 6 million submissions exceeding 30% similarity, a threshold that most academic institutions consider a serious red flag. This is not a marginal problem; it represents a substantial and ongoing challenge to academic integrity. 

The concern isn’t just the frequency of high-similarity submissions. It’s the scale. When you’re dealing with tens of millions of users and billions of words, even a fraction of problematic content represents an enormous volume of potentially plagiarized material in circulation. 

Word-Level Analysis: Where Risk Increases 

Similarity RangeTotal Words
Highly Original (0–5%)11.9B
Mostly Original (5–15%)7.5B
Moderate Similarity (15–30%)3.2B
High Similarity (30–50%)1.25B
Very High Similarity (50%+)1.32B

Even though high-similarity submissions represent a smaller portion of the total, they still account for over 2.5 billion words of potentially plagiarized content. That’s not a rounding error; it’s a systemic issue that demands attention. 

What’s particularly striking is the word volume in the “Very High Similarity” band, which exceeds that of the “High Similarity” band. This suggests that the most shocking cases tend to involve longer documents, reinforcing the connection between research-heavy writing and elevated plagiarism risk. Plagiarism isn’t just frequent in certain contexts; it’s substantial in volume across the board. 

Where Plagiarism Happens Most? 

Mid-to-Long Academic Writing

Research papers (2,000-4,000 words) and long-form academic content (4,000+ words) had the most risk of self-plagiarism amongst the users surveyed in the study conducted with writing centre clients. This pattern wasn’t unexpected, as longer writing projects usually require more reference sources and are therefore increasingly complex to manage.  

When writers try to synthesise information while also stating their own ideas, they are likely to be tempted to take shortcuts. Longer research papers and long-form academic content typically contain more in-text citations, weak paraphrasing, or ideas taken from other sources without attribution. 

Repetitive Academic Tasks

Another strong trend noticed in the survey was the impact that repetitive assignments can have on similar similarities and similarities among assessments based on essay writing, especially when they are about commonly known topics and are based on standard prompts.  

Students often reuse the structure and language of existing responses, especially when working on similar or repetitive assignments. Over time, this leads to copying or lightly paraphrasing previous work instead of creating something fully original. 

AI-Assisted Writing Without Verification

Now, with AI-based writing tools, such as ChatGPT, Perplexity, Claude and more are being used regularly for assignments and in the workplace, a new type of plagiarism risk has developed. As students begin to generate content through the use of AI and submit it without performing a plagiarism similarity check, similarity scores across the dataset are significantly inflated by AI-generated text that has similar writing structure and language to others already published but that is considered to be “new” qualitatively or quantitatively. 

What does this mean for Students and Educators? 

For Students 

The most important takeaway from our survey data is that originality is genuinely achievable, 72% of submissions prove it. Most writers are getting it right. But the data also makes clear that shortcuts carry real consequences.  

Whether it’s skipping a plagiarism check, leaning too heavily on a source, or submitting AI-generated content unedited, these behaviors push submissions into the moderate-to-high similarity range quickly. Proper paraphrasing and accurate citation aren’t optional; they’re the foundation of credible academic work. 

For Educators 

The implications for educators are equally significant. Plagiarism detection must evolve alongside writing behavior. As AI-assisted writing becomes the norm rather than the exception, AI detection is now an essential part of any integrity workflow, not a supplementary tool.  

Manual checking of student work is no longer scalable at the volume modern institutions handle. Institutions that haven’t yet integrated detection tools into their submission pipelines are operating with a significant blind spot. 

How to Stay in the “Highly Original” Zone? 

Using a tool like Quetext, writers can take concrete steps to keep their similarity scores low and their work genuinely original: 

Run Deep Similarity Checks 

Surface-level checks will reveal obvious copying, but deep checks identify hidden matches and patchwriting, the practice of slightly rewording source material without meaningfully transforming it. Running a thorough check before submission is the single most impactful habit any writer can develop. 

Improve Paraphrasing

Good paraphrasing isn’t about swapping words for synonyms. It’s about genuinely understanding a source and re-expressing its meaning in your own voice and structure. Rewrite with intent, not just cosmetic changes, and always preserve the original meaning without preserving the original phrasing. 

Use Citation Assistance

Accidental plagiarism is far more common than deliberate theft. Writers frequently omit citations not out of dishonesty, but out of poor organizational habits. Using citation assistance tools to track and format sources throughout the writing process eliminates this risk almost entirely. 

Combine AI with Human Editing

AI writing tools are powerful, but they work best as a starting point, not a final product. Always review, rewrite, and verify AI-generated content before submitting. A human editorial pass, followed by a similarity check, keeps AI-assisted work in the safe zone. 

The Bigger Picture: Plagiarism in 2026 

This dataset highlights the evolution of how we see plagiarism and the increasing access to strategies that help us avoid or reduce plagiarism. That 72% of submissions are mostly or entirely original is certainly a step in the right direction and reflects years of investment in education and tool development, which have come together to yield this result.   

The 28% of submissions that are not mostly or entirely original therefore, remains a significant ongoing challenge and has not been resolved with sufficient speed. This challenge is exacerbated by the complexity of different types of academic writing, the increasing availability of AI-based writing tools, and the prevalence of assignment types and formats that promote repetition and increase similarity scores. There is positive movement, but not quite fast enough.  

Addressing the issue of plagiarism cannot be done by detection alone. Detection catches the problem after it happens. Increasing education around writing ethics, providing more effective tools to help writers create original content in real-time, and creating institutional policies that keep up with technological advancements in writing are all needed to end the ongoing challenge of plagiarism. Tools like Quetext address all three areas and are more critical than ever for supporting academic integrity, according to the data. 

FAQs 

What is considered a good similarity score? 

  • 0–5% → Excellent. The submission is highly original with minimal or incidental overlap. 
  • 5–15% → Acceptable. Some similarity is present but within normal ranges for properly cited academic work. 
  • 15%+ → Needs review. This range warrants a closer look at sources, paraphrasing, and citation practices. 

Is 30% similarity bad? 

Yes, a 30%+ score is a serious concern in most academic contexts. It typically indicates poor paraphrasing, missing or incomplete citations, or direct copying from source material. Most institutions treat submissions in this range as potential violations of academic integrity policy. 

Do longer essays have more plagiarism? 

Generally, yes. Longer documents draw on more sources, which increases the chances of overlap, intentional or otherwise. The data from our survey strongly supports this: high-similarity submissions are disproportionately concentrated in the longer word-count categories. 

Can AI-generated content be plagiarized? 

Yes, and this is an increasingly important issue. AI-generated content can mirror the structure and phrasing of existing published work in ways that trigger high similarity scores. It can also reproduce content that closely resembles copyrighted material from training data. This is why running AI-generated content through a plagiarism checker before submission is essential, not optional. 

Data sourced from Quetext’s survey dataset. Survey insights reflect user behaviour across 37,851,702 submissions comprising over 25.4 billion words.