Chapter 5: Similarity Checking Software
It must be clear as
the nose on your face that software used to check your manuscript for
‘plagiarism’ is your greatest potential nemesis on the road to academic
publishing success. The reason for this
is the ‘similarity score’ that is generated, and as this process becomes more widely
accepted, the reduction of the score for your paper’s review. Today, it is
common to see a score of 10% - 15% used as the cutoff by journal editors and
institutions (here). A couple of years ago, a
maximum acceptable score of 30% was common.
PJSSH - Pertanika’s Journal of Social
Sciences & Humanities
“All submitted manuscripts must be in the
Journal’s acceptable similarity index
range:
< 30%– PASS;
30-40%– RESUBMIT MS; > 40%– REJECT.”
Trust me when I say
it does not take many software errors to flag a manuscript with 15%, and I
will show you why. I
therefore highly suggest you read this critical chapter carefully and try to
understand the nuances of what is happening and how to work around the
countless problems. I also suggest you do not try tricking the software by
placing hidden characters into the document. In reality, this is easily
detectable.
Plagiarism Checking or Similarity Checking?
First, let me very
clear in stating that software programs such as Turnitin are NOT plagiarism checking programs, but
instead calculate ‘similarity’ scores based on a single word, an abbreviation,
a string or words, a commonly used phrase, or your university email address.
For all those readers out there who doubt me, please go to Turnitin’s web site
and read the top 15 misconceptions about Turnitin.
However, as Turnitin
has become the 800-pound gorilla in the room in your success or failure in publishing
your paper, we need to spend some time discussing and understanding what this
monster does and how it reports what it finds (and how to work around its many
flaws). As a researcher, student, or advisor who needs to publish your paper
for graduation or academic rank, programs such as Turnitin are not your friends
and should be viewed as your worst nightmare.
Moreover, rarely do
I ever encounter a paper from a student who has truly been ‘plagiarized’, which
is when an ‘author’ cuts and pastes large sections from another paper and fails
to cite the passage, implying the words and ideas are their own. Instead, what
I have encountered countless times from countless Turnitin reports, are
sections of colored text in which a perfectly cited, paraphrased sentence is
‘flagged’ by the software. Actually, from my own analysis, the better a paper
is cited and referenced (along with DOIs), the more probable the software will
increase the score as you are telling it where to look!
What is Self-Plagiarism, and is it Legal?
The short answer is
no, self-plagiarism is not illegal, but many journals who sell your research
for a profit consider it unethical. The unethical part comes into effect when
you assert that your new manuscript contains new material, but it actually
contains recycled material (nominally used under fair use terms) - but it is
not breaking any laws.
However, it might
become ‘illegal’ when copyright or trademark is involved, as authors cede their
rights to a publication’s content to the previous publisher, which now owns the
copyrighted material. If the author uses the material in question, there might
be grounds for legal action, but it would be related to copyright, not
plagiarism. Although I recognize what is contained in Wikipedia is not always
the best information on a subject, I feel Wikipedia’s overview of the subject
to be very informative. Try this link for more information on self-plagiarism.
It might also be
interesting to review Turnitin’s perspective on the subject with their white
paper PDF file here. I extracted the following research from their paper:
The American Psychological Association
(2010) explains how plagiarism differs from self-plagiarism: “Whereas
plagiarism refers to the practice of claiming credit for the words, ideas, and
concepts of others, self-plagiarism refers to the practice of presenting one’s
own previously published work as though it were new” (pg. 170). As Roig (2006)
suggests, self-plagiarism occurs “when authors reuse their own previously
written work or data in a ‘new’ written product without letting the reader know
that this material has appeared elsewhere” (pg. 16).
Therefore, it seems
to me, if in your new paper you cite and reference that the material being used
is from your previously published research, you are not committing
self-plagiarism. Therefore, it seems very black and white to me from Turnitin’s
own white paper, that if previous research is cited and referenced, there is no
self-plagiarism. But once again, what does the software report?
Overcoming Turnitin Reporting Problems
First, always run a
Turnitin report on your final manuscript even if the journal does not state
they will run their own report on your paper (most will). Experience has shown
me that your journal will at some point, and you do not want to be blindsided
with an editor’s rejection because your paper has a score of 16%, 25%, or 35%.
Once you know what has been flagged, reducing your score by 50% is reasonably
easy if you can use synonyms and can paraphrase reasonably well.
However, you need
to be very crafty when Turnitin flags ultra-common abbreviations and phrases
such as the following:
- . Your university email.
- Your advisors’ names and their bios.
- Your hypotheses statements (e.g. ‘directly affects’, ‘has a direct effect’, ‘has a positive direct effect’).
- Your goodness-of-fit criteria (e.g. AGFI, GFI, RMSEA, RMR, SRMR, etc.) and their supporting theory citations. The following is a screen capture of a Turnitin similarity ‘flag’ of this required information (another 1%-2%).
- Other common terms such as total effect, indirect effect, and direct effect always get flagged, along with their abbreviations TE, DE, and IE (another 1% or more).
- The required legal statements added to the end of a paper by a journal. The following is a screen capture of a Turnitin similarity ‘flag’ of this required information (another <1%).
The first thing you
need to ask yourself, does the journal require me to send my own Turnitin
report with my submission, or will they run it themselves? Even if the
journal’s web site does not say if they will or will not run a report, you
should always run your own report to edit the problem areas, as
running a report allows you the opportunity to reduce the score by as much 50%.
Be warned; however, some publishers run Turnitin reports before a review
process and before publication (2 times!). If your paper or a similar paper has
already been published in another journal, or you submitted it to another
journal, and they ran a Turnitin report and saved it, you are going to be
getting some nasty emails from someone. I also have knowledge of publishers
‘black-listing’ authors for two years for ‘self-plagiarism’. The really sad
part of all this madness is that papers must now be written in weird and
unusual ways to avoid a deeply flawed piece of software. As such, academic
excellence suffers, and papers become confusing and, at times, nonsensical.
TIP - Remember that
Turnitin cannot see words in an image file (Figure), so if you can convert data
into an image, it will not be flagged. However, you cannot convert tables into
images in your final paper, as
journals always want tables in text format.
It is also
extremely critical that you do not save the report to the central
Turnitin database!!! Also, do not run the report on the reference section. If
you or your ‘assistant’ make these mistakes, it can be catastrophic to the
success of ever getting that paper published, because any future reports ‘see’
the paper as already published because it has been saved to the main Turnitin
repository.
Unfortunately,
editors do not look at the detail of the report; they only look at the final
‘plagiarism’ score (similarity is more technically accurate). With many
journals using a Turnitin score of 15% or less as their initial evaluation
criteria for further review, eliminating the small problems can save your
paper. In my work, running a report three times is very common before I am
satisfied that I have done the best I can do in reducing a Turnitin score before
submission.
You must really
understand that journal support staff often times have zero training concerning
the use of Turnitin. and due to the 1,000s of papers they receive yearly, the
editors only look at the composite score as the criteria to accept your paper
for review or reject it. I am also
convinced that the better a paper is cited and referenced, the higher the
Turnitin score becomes. The obvious result is that mediocre research is encouraged.
However, as the
press release says, Turnitin has become the ‘go-to’ gorilla when a journal
decides to start running ‘plagiarism’ checking software against your manuscript. With many editors now
stating that your paper must have a Turnitin total score of 15% or less, it
does not take many software flagging errors to get your paper thrown into the trash
bin. There are, however, ways to preclude this from happening with a little
preparation before submission.
Step 1 s always to make sure you run a Turnitin report before submission. Make
very sure you do not save the file to the central database, and you do not run
the report on your reference section.
Step 2 is to edit your paper, noting what Turnitin has flagged. Even if your academic
writing is properly cited and referenced, the score can be as high as 30-35% on
the first run. Cutting this by half is not too hard once you know where to
change your sentence structure and paraphrase where possible.
Step 3 is to run a second report. If this is below 15%, you have a reasonable
chance of getting your paper looked at in greater detail and maybe even sent to
a reviewer. I also suggest you send or upload your final Turnitin report with
your submission and state the report’s score somewhere in your cover letter.
However, what is
nearly impossible to deal with is when Turnitin flags your university email
address or required abbreviations, phrases or acronyms used in your research.
Quite honestly, with most SJR/Scopus indexed journals getting 800-2,000 or more
papers a year (Nature receives 10,000
submissions a year), a ‘proper
technical review’ is getting harder and harder to get as editors are relying on
the technology to save them time in their selection and review process. Editors
no longer are looking for reasons to publish your paper, but are instead trying
to find any reason to reject it. I know this is hard to accept but very true.
Another horrific
problem can occur with Turnitin when a journal runs their report and saves your
paper to the central repository/database, and then rejects your paper! Your
hell on earth has now just begun as when you take the same paper and re-submit
it to another journal, the new Journal’s Turnitin report will now see your
first submission, and the new report score will have a significant percentage
added to it because Turnitin thinks you have self-plagiarized yourself as your
research is at another journal! Turnitin can be a very dangerous weapon in the
wrong hands. Insanity? Yes. I also wonder how pre-print papers are going to get
around this problem.
I also want to make
this problem perfectly clear to anyone reading this. I have no problem with
journals or conferences running Turnitin reports. The problem is when a journal
runs a pre-screening report, captures the paper to the central repository, and
then rejects
the paper! In the two cases this has happened, the DLSU B&E Review
in the Philippines and ERSJ in Cyprus, when the papers were re-submitted to
anther journal (because of rejections), Turnitin thinking the paper is already
‘published’, created a cumulative score for the two papers which far exceeded
the maximum allowable Turnitin scores for the second journals. Thus, the papers
had to be significantly re-written for now a third submission. By the
time we got this traumatized doctoral student to her dissertation defense, her
paper was showing a Turnitin score of 97%!
It seems, however,
I am not the only one who is having problems with if or where a report gets
saved:
It is also
interesting to note that the Turnitin platform was sold in March 2019 for $1.7
billion dollars to Advance, which is a privately held media,
communications, and technology company. In The Chronicle of Higher Education article about the deal, Turnitin
was referred to as “the 800-pound gorilla of
plagiarism-detection services”. Please note in this headline article that the
authors choose to use ‘plagiarism’ and not the technically correct word
‘similarity’. In those choice of words lays the underlying massive problem in
the use of Turnitin and its interpretation. As I always say, “Academic
publishing is a business” and a very big business at that!
When this happens,
I always review what Turnitin has flagged and once again, it is flagging common
phrases and abbreviations, perfectly cited sentences that are not word for
word, actual citations within the body of the paper, numbers that are in
another paper somewhere (0.90. 0.93), etc. etc.
The results shown above
are from a single paper. Even in my effort to not have names and email domains
flagged by using a string of ‘xxxxx’s, Turnitin still flagged it!
This is total nuts! The word ‘Internet’ was also flagged because
someone else used it! Imagine that, someone else used the word ‘Internet’. The
numbers used for the table’s average mean range were also flagged (these
numbers are in 1,000s of papers and books). Also, somewhere in a Springer book
or paper, someone else has also used the word ‘Variables’ and the numbers ‘1,
2, 3, 4, 5, 6 and 7’. Obviously with Turnitin adding nearly 5% to this author’s
Turnitin report, the author must be committing some serious plagiarism from the
above real-world examples.
Please remember it is clearly stated
by way to many editors it is the total score they use to decide if your paper
is worthy of review or acceptance. Review of the report’s detail is seldom to
never done (Screen capture from an Inderscience journal editor email).
.
With many editors
now using 10% as the entrance threshold, you have no margin for error in what
you submit. You are therefore forced to use Turnitin which requires
someone somewhere to purchase a license….or you can pay Sage $110 to run a 2-day turnaround Turnitin report
for you a single time.
In the following image compilation from
another Turnitin report we see that the hypotheses phrases ‘positively and directly’ and the
hypotheses numbers H2 – H8 are also flagged. The phrase ‘positively and directly’ was chosen in the editing of this paper
only because I already knew that Turnitin would flag more commonly used phrases
such as ‘has a direct effect on’, ‘directly affects’, ‘has a direct and significant effect on’, etc. Basically, there is
no way I know of that you can write a hypothesis statement in which it is not
flagged by Turnitin.
You can also see
from the following image compilation that although good scholarship requires
citations for the theory and the criteria used, Turnitin flags both the
criteria and your citations! Notice, however, that under ‘Results’ we were able
to find words that Turnitin has yet to flag in this context (validated and
accepted). The entire concept of ethical scholarship is to cite your sources
for your theory and data so you are not accused of ‘plagiarism’. As anyone can
see in the two examples below, the Turnitin software penalizes the author for their
scholarship excellence. What is the solution? Don’t cite anything, and lower
your score! And this is the software that editors, journals, and universities
are using to judge academic excellence, getting a Ph.D., getting a job,
promotion, and tenure? This is total and complete madness!
In the next Turnitin report sample, we
see a list of ‘conflicts’ in my student’s paper being analyzed and seven papers
submitted to various universities around the world. The only logical way I can
interpret this is that since these are not published papers, some poor doctoral
student with similar content did a Turnitin analysis on their own paper at
these schools and then saved it to Turnitin’s central repository.
Now what would be fascinating to know
is if these students later submitted the same paper to a journal and what was
the outcome of that submission along with their paper’s new and much higher
Turnitin score.
Another odd problem
with another Turnitin report was even though an analysis of the references was
turned off; it still looked at this section because it was titled ‘Bibliographic
references’ (I assume). Turnitin then went and flagged each reference and
created some crazy score that no journal would accept. Of course, we re-ran the
report without the “Bibliographic references’ in the paper, and got to a score
a journal might consider.
I scratch my head
every day when I get these reports and wonder am I the only one who sees how
horribly flawed this software and system is? Of course when one man like
Jeffrey Beall can be the judge and jury for an entire $28 billion a year
industry, I guess anything is possible.
Therefore, if
Turnitin software is truly worth the money that they say it is ($1.7 billion), I would have a team of software engineers
writing code to place exceptions on these commonly used numbers, abbreviations,
headings, emails, university names, technical phrases, etc. into the search
engine database being used. Grammarly seems to be able to do this, why
can’t Turnitin?
I think readers
should also look at the following paragraph from Turnitin’s web site concerning
their privacy and security policies. The following is an excerpt from that site
I extracted on 21 March 2020 which is buried 7,376 words down the page. You,
therefore, have to be a very patient human (like me) to find the following Turnitin
use statements:
Your License to Us
(Turnitin)
Unless otherwise indicated in this Site, including our
Privacy Policy or in connection with one of our services, any communications or
material of any kind that you e-mail, post, or transmit through the
Site (excluding personally identifiable information of
students and any papers submitted to the Site), including, questions,
comments, suggestions, and other data and information (your
"Communications") will be treated as non-confidential and
non-proprietary. You grant Turnitin a non-exclusive,
royalty-free, perpetual, worldwide, irrevocable license to reproduce, transmit,
display, disclose, and otherwise use your Communications on the Site or
elsewhere for our business purposes. We are free to use any ideas,
concepts, techniques, know-how in your Communications for any purpose,
including, but not limited to, the development and use of products and services
based on the Communications.
So does this mean that Turnitin has given themselves a
license to commit Plagiarism? Does this mean they are monitoring and filtering
your communications? Can you imagine if Google had the above Turnitin statement
in their legalese for the use of their email? Maybe this is why someone paid $1.7
billion for Turnitin? There are so many questions that these statements
raise in my mind; or am I simply howling at the moon again?
Grammarly
There are two versions of this online tool, with one being
free and the other subscription-based. Unfortunately for you, their ‘plagiarism
checking’ component is part of the subscription-based tool. I have used
Grammarly countless times as it does help me identify spelling mistakes,
grammar ‘issues’, and can show me where there are potential similarity
problems.
Grammarly also recalculates your grammar and similarity scores in real-time, which I find to be one of the best features of their online tool and light-years beyond Turnitin’s static reports. Grammarly, however, does not know how to ‘not see’ your reference section, so the best method is to upload a paper without it. If, however, you leave the references in the upload, just hit the dismiss button when going through each flagged area.
Grammarly also recalculates your grammar and similarity scores in real-time, which I find to be one of the best features of their online tool and light-years beyond Turnitin’s static reports. Grammarly, however, does not know how to ‘not see’ your reference section, so the best method is to upload a paper without it. If, however, you leave the references in the upload, just hit the dismiss button when going through each flagged area.
As mentioned, you will need their ‘premium’ service to
access the similarity checking feature, which costs $11.66 per month. .
If you do not have
access to Turnitin, I consider Grammarly as a must have alternative before you submit your
manuscript. However, some journals want only Turnitin reports (according to Sage 80% of the world’s
‘high-impact’ journals use Turnitin) and will not accept Grammarly Image reports (I tried this already). I
think if Grammarly introduced some type of report generation and printing feature, it could
go a long way at giving editors an alternative to Turnitin.
TIP - Upload
your paper without the reference section when using Grammarly’s similarity
checking feature.
Grammarly’s software
does not flag commonly used phrases and words like Turnitin does. In my very
humble opinion, Grammarly is a far superior platform to Turnitin and the mess
that Turnitin’s static reports regurgitate. Turnitin’s excuse for not having
this type of real-time capability is because…..drum roll please….
“Once the student
receives a Similarity Report, they have to wait 24 hours to get another report
on a resubmission (if resubmissions have been enabled by their instructor); this prevents students from wordsmithing
and resubmitting repeatedly.” – Turnitin’s Top 15 Misconceptions
Really? So now, according to Turnitin, I have to wait 24
hours each time I make an ‘edit’ to a document or else I am guilty of ‘wordsmithing’. Oh,
goodness me. Who sits around and thinks up this stuff? Yes, yes, your honor, I
am guilty of WORDSMITHING! Can I
also plead guilty to ‘self-plagiarism’ while I am at it?
Publishing Your Academic Research Paper
Find out more at https://tinyurl.com/wm2ep5q
E-mail: csbyronbooks@gmail.com
My blog: https://academiceditor.blogspot.com/
Last
Edited: 2020.March.24 (Tuesday)
Find out more at https://tinyurl.com/wm2ep5q
E-mail: csbyronbooks@gmail.com
My blog: https://academiceditor.blogspot.com/
Last
Edited: 2020.March.24 (Tuesday)