DeepSearch Technology

We’ve completely reimagined what plagiarism prevention technology can do with fine-tuned algorithms that dive deep into the internet, and millions of other sources. DeepSearch gives you the confidence and peace of mind that no other technology can match. It’s never been easier to save time and preserve academic integrity.

Compare your text with billions of sources

20 million books
We've amassed one of the largest searchable databases of books in the world.
35 billion webpages
Crawling the web and searching it in milliseconds is no trivial task. Our search engine is built from scratch specifically for plagiarism detection.
1 million journals
Our growing collection of academic journals is updated regularly

Technical Overview

Contextual Analysis & Conditional scoring

The DeepSearch algorithm goes beyond simple word matching, with what we call “contextual” plagiarism. Contextual plagiarism is a sentence or phrase that is not likely to plagiarized on its own, but in the context of other text, is very likely to be plagiarized; in other words, it is a conditional probability. This may sound confusing, so lets look at a simple example. Consider the following text:

"Stegosaurus and related genera were herbivores. However, their teeth and jaws are very different from those of other herbivorous ornithischian dinosaurs, suggesting a different feeding strategy that is not yet well understood." source:

If DeepSearch analyzed the first sentence all by itself (not in the context of its neighboring sentence after it), the probability that it is plagiarized would be lower than it is in this context. Intuitively this makes sense because it is a relatively short phrase at only 6 words, 50% of which are very common (“and”, “related”, and “were”). So it is not necessarily unlikely that two people could write that exact same phrase–as it is fairly factual, and contains no personal opinions or other highly unique components. Considering the second sentence now; this sentence is very different in that even if it were all by itself, DeepSearch would have enough information to conclude a high enough probability of plagiarism to mark it as a match, because the probability that two people happen to write that exact same sentence by chance is so low that you should expect it to never happen. Using the knowledge of the two sentences, DeepSearch uses the new conditional probability to update the score of the first sentence, based on the knowledge that the second sentence is almost certainly plagiarized (and found on the exact same source)! In other words, a very common phrase can be marked as a match depending on its context. This is useful because it provides a more thoughtful analysis to your text, and helps avoid erroneous matches.

Near-exact matching

Our technology goes beyond simple string matching. Our DeepSearch algorithm can even find text that has been slightly rewritten. The reason for this feature is due to the rising problem of near-duplicate content, where only 1 or 2 words were changed, but the text is clearly still plagiarized. This presents an interesting challenge to other plagiarism detection technologies, but DeepSearch is built to handle it all. There becomes a point when a piece of text has been changed so much that it is no longer a close enough match to the original text, and thus won't be labeled as a match. Our technology is smart enough to decide which phrases to mark as a match, based on how similar it is to the source. We calculate similarity based on many different technical factors, and cutting-edge machine learning techniques.