Tuesday, March 24, 2020

Chapter 5: Similarity Checking Software from the book Publishing Your Academic Research Paper


Chapter 5: Similarity Checking Software



It must be clear as the nose on your face that software used to check your manuscript for ‘plagiarism’ is your greatest potential nemesis on the road to academic publishing success.  The reason for this is the ‘similarity score’ that is generated, and as this process becomes more widely accepted, the reduction of the score for your paper’s review. Today, it is common to see a score of 10% - 15% used as the cutoff by journal editors and institutions (here).   A couple of years ago, a maximum acceptable score of 30% was common.

PJSSH - Pertanika’s Journal of Social Sciences & Humanities
“All submitted manuscripts must be in the Journal’s acceptable similarity index range:
< 30%PASS; 30-40%– RESUBMIT MS; > 40%REJECT.”

 
Trust me when I say it does not take many software errors to flag a manuscript with 15%, and I will show you why. I therefore highly suggest you read this critical chapter carefully and try to understand the nuances of what is happening and how to work around the countless problems. I also suggest you do not try tricking the software by placing hidden characters into the document. In reality, this is easily detectable.

Plagiarism Checking or Similarity Checking?

First, let me very clear in stating that software programs such as Turnitin are NOT plagiarism checking programs, but instead calculate ‘similarity’ scores based on a single word, an abbreviation, a string or words, a commonly used phrase, or your university email address. For all those readers out there who doubt me, please go to Turnitin’s web site and read the top 15 misconceptions about Turnitin. 

However, as Turnitin has become the 800-pound gorilla in the room in your success or failure in publishing your paper, we need to spend some time discussing and understanding what this monster does and how it reports what it finds (and how to work around its many flaws). As a researcher, student, or advisor who needs to publish your paper for graduation or academic rank, programs such as Turnitin are not your friends and should be viewed as your worst nightmare. 

Moreover, rarely do I ever encounter a paper from a student who has truly been ‘plagiarized’, which is when an ‘author’ cuts and pastes large sections from another paper and fails to cite the passage, implying the words and ideas are their own. Instead, what I have encountered countless times from countless Turnitin reports, are sections of colored text in which a perfectly cited, paraphrased sentence is ‘flagged’ by the software. Actually, from my own analysis, the better a paper is cited and referenced (along with DOIs), the more probable the software will increase the score as you are telling it where to look!

What is Self-Plagiarism, and is it Legal?

The short answer is no, self-plagiarism is not illegal, but many journals who sell your research for a profit consider it unethical. The unethical part comes into effect when you assert that your new manuscript contains new material, but it actually contains recycled material (nominally used under fair use terms) - but it is not breaking any laws.
 

However, it might become ‘illegal’ when copyright or trademark is involved, as authors cede their rights to a publication’s content to the previous publisher, which now owns the copyrighted material. If the author uses the material in question, there might be grounds for legal action, but it would be related to copyright, not plagiarism. Although I recognize what is contained in Wikipedia is not always the best information on a subject, I feel Wikipedia’s overview of the subject to be very informative. Try this link for more information on self-plagiarism.

It might also be interesting to review Turnitin’s perspective on the subject with their white paper PDF file here. I extracted the following research from their paper:

The American Psychological Association (2010) explains how plagiarism differs from self-plagiarism: “Whereas plagiarism refers to the practice of claiming credit for the words, ideas, and concepts of others, self-plagiarism refers to the practice of presenting one’s own previously published work as though it were new” (pg. 170). As Roig (2006) suggests, self-plagiarism occurs “when authors reuse their own previously written work or data in a ‘new’ written product without letting the reader know that this material has appeared elsewhere” (pg. 16).

Therefore, it seems to me, if in your new paper you cite and reference that the material being used is from your previously published research, you are not committing self-plagiarism. Therefore, it seems very black and white to me from Turnitin’s own white paper, that if previous research is cited and referenced, there is no self-plagiarism. But once again, what does the software report?

Overcoming Turnitin Reporting Problems

First, always run a Turnitin report on your final manuscript even if the journal does not state they will run their own report on your paper (most will). Experience has shown me that your journal will at some point, and you do not want to be blindsided with an editor’s rejection because your paper has a score of 16%, 25%, or 35%. Once you know what has been flagged, reducing your score by 50% is reasonably easy if you can use synonyms and can paraphrase reasonably well.

However, you need to be very crafty when Turnitin flags ultra-common abbreviations and phrases such as the following:

  • .     Your university email.

  • Your advisors’ names and their bios.

  • Your hypotheses statements (e.g. ‘directly affects’, ‘has a direct effect’, ‘has a positive direct effect’).

  • Your goodness-of-fit criteria (e.g. AGFI, GFI, RMSEA, RMR, SRMR, etc.) and their supporting theory citations. The following is a screen capture of a Turnitin similarity ‘flag’ of this required information (another 1%-2%).


  • Other common terms such as total effect, indirect effect, and direct effect always get flagged, along with their abbreviations TE, DE, and IE (another 1% or more).


  • The required legal statements added to the end of a paper by a journal. The following is a screen capture of a Turnitin similarity ‘flag’ of this required information (another <1%).
The first thing you need to ask yourself, does the journal require me to send my own Turnitin report with my submission, or will they run it themselves? Even if the journal’s web site does not say if they will or will not run a report, you should always run your own report to edit the problem areas, as running a report allows you the opportunity to reduce the score by as much 50%. Be warned; however, some publishers run Turnitin reports before a review process and before publication (2 times!). If your paper or a similar paper has already been published in another journal, or you submitted it to another journal, and they ran a Turnitin report and saved it, you are going to be getting some nasty emails from someone. I also have knowledge of publishers ‘black-listing’ authors for two years for ‘self-plagiarism’. The really sad part of all this madness is that papers must now be written in weird and unusual ways to avoid a deeply flawed piece of software. As such, academic excellence suffers, and papers become confusing and, at times, nonsensical.

TIP - Remember that Turnitin cannot see words in an image file (Figure), so if you can convert data into an image, it will not be flagged. However, you cannot convert tables into images in your final paper, as journals always want tables in text format.

It is also extremely critical that you do not save the report to the central Turnitin database!!! Also, do not run the report on the reference section. If you or your ‘assistant’ make these mistakes, it can be catastrophic to the success of ever getting that paper published, because any future reports ‘see’ the paper as already published because it has been saved to the main Turnitin repository. 

Unfortunately, editors do not look at the detail of the report; they only look at the final ‘plagiarism’ score (similarity is more technically accurate). With many journals using a Turnitin score of 15% or less as their initial evaluation criteria for further review, eliminating the small problems can save your paper. In my work, running a report three times is very common before I am satisfied that I have done the best I can do in reducing a Turnitin score before submission.

You must really understand that journal support staff often times have zero training concerning the use of Turnitin. and due to the 1,000s of papers they receive yearly, the editors only look at the composite score as the criteria to accept your paper for review or reject it.  I am also convinced that the better a paper is cited and referenced, the higher the Turnitin score becomes. The obvious result is that mediocre research is encouraged. 

However, as the press release says, Turnitin has become the ‘go-to’ gorilla when a journal decides to start running ‘plagiarism’ checking software against your manuscript. With many editors now stating that your paper must have a Turnitin total score of 15% or less, it does not take many software flagging errors to get your paper thrown into the trash bin. There are, however, ways to preclude this from happening with a little preparation before submission.

Step 1 s always to make sure you run a Turnitin report before submission. Make very sure you do not save the file to the central database, and you do not run the report on your reference section.
Step 2 is to edit your paper, noting what Turnitin has flagged. Even if your academic writing is properly cited and referenced, the score can be as high as 30-35% on the first run. Cutting this by half is not too hard once you know where to change your sentence structure and paraphrase where possible.
Step 3 is to run a second report. If this is below 15%, you have a reasonable chance of getting your paper looked at in greater detail and maybe even sent to a reviewer. I also suggest you send or upload your final Turnitin report with your submission and state the report’s score somewhere in your cover letter.

However, what is nearly impossible to deal with is when Turnitin flags your university email address or required abbreviations, phrases or acronyms used in your research. Quite honestly, with most SJR/Scopus indexed journals getting 800-2,000 or more papers a year (Nature receives 10,000 submissions a year), a ‘proper technical review’ is getting harder and harder to get as editors are relying on the technology to save them time in their selection and review process. Editors no longer are looking for reasons to publish your paper, but are instead trying to find any reason to reject it. I know this is hard to accept but very true.

Another horrific problem can occur with Turnitin when a journal runs their report and saves your paper to the central repository/database, and then rejects your paper! Your hell on earth has now just begun as when you take the same paper and re-submit it to another journal, the new Journal’s Turnitin report will now see your first submission, and the new report score will have a significant percentage added to it because Turnitin thinks you have self-plagiarized yourself as your research is at another journal! Turnitin can be a very dangerous weapon in the wrong hands. Insanity? Yes. I also wonder how pre-print papers are going to get around this problem.

I also want to make this problem perfectly clear to anyone reading this. I have no problem with journals or conferences running Turnitin reports. The problem is when a journal runs a pre-screening report, captures the paper to the central repository, and then rejects the paper! In the two cases this has happened, the DLSU B&E Review in the Philippines and ERSJ in Cyprus, when the papers were re-submitted to anther journal (because of rejections), Turnitin thinking the paper is already ‘published’, created a cumulative score for the two papers which far exceeded the maximum allowable Turnitin scores for the second journals. Thus, the papers had to be significantly re-written for now a third submission. By the time we got this traumatized doctoral student to her dissertation defense, her paper was showing a Turnitin score of 97%!


It seems, however, I am not the only one who is having problems with if or where a report gets saved:

It is also interesting to note that the Turnitin platform was sold in March 2019 for $1.7 billion dollars to Advance, which is a privately held media, communications, and technology company. In The Chronicle of Higher Education  article about the deal, Turnitin was referred to as “the 800-pound gorilla of plagiarism-detection services”.  Please note in this headline article that the authors choose to use ‘plagiarism’ and not the technically correct word ‘similarity’. In those choice of words lays the underlying massive problem in the use of Turnitin and its interpretation. As I always say, “Academic publishing is a business” and a very big business at that!

I have also found that even though Grammarly might indicate your paper is 100% plagiarism clean, the same paper run against the Turnitin system moments later might show your paper having a score as high as 15%.

When this happens, I always review what Turnitin has flagged and once again, it is flagging common phrases and abbreviations, perfectly cited sentences that are not word for word, actual citations within the body of the paper, numbers that are in another paper somewhere (0.90. 0.93), etc. etc.



The results shown above are from a single paper. Even in my effort to not have names and email domains flagged by using a string of ‘xxxxx’s, Turnitin still flagged it! This is total nuts! The word ‘Internet’ was also flagged because someone else used it! Imagine that, someone else used the word ‘Internet’. The numbers used for the table’s average mean range were also flagged (these numbers are in 1,000s of papers and books). Also, somewhere in a Springer book or paper, someone else has also used the word ‘Variables’ and the numbers ‘1, 2, 3, 4, 5, 6 and 7’. Obviously with Turnitin adding nearly 5% to this author’s Turnitin report, the author must be committing some serious plagiarism from the above real-world examples.

Please remember it is clearly stated by way to many editors it is the total score they use to decide if your paper is worthy of review or acceptance. Review of the report’s detail is seldom to never done (Screen capture from an Inderscience journal editor email).

.

With many editors now using 10% as the entrance threshold, you have no margin for error in what you submit. You are therefore forced to use Turnitin which requires someone somewhere to purchase a license….or you can pay Sage $110 to run a 2-day turnaround Turnitin report for you a single time.


In the following image compilation from another Turnitin report we see that the hypotheses phrases ‘positively and directly’ and the hypotheses numbers H2 – H8 are also flagged. The phrase ‘positively and directly’ was chosen in the editing of this paper only because I already knew that Turnitin would flag more commonly used phrases such as ‘has a direct effect on’, ‘directly affects’, ‘has a direct and significant effect on’, etc. Basically, there is no way I know of that you can write a hypothesis statement in which it is not flagged by Turnitin.


You can also see from the following image compilation that although good scholarship requires citations for the theory and the criteria used, Turnitin flags both the criteria and your citations! Notice, however, that under ‘Results’ we were able to find words that Turnitin has yet to flag in this context (validated and accepted). The entire concept of ethical scholarship is to cite your sources for your theory and data so you are not accused of ‘plagiarism’. As anyone can see in the two examples below, the Turnitin software penalizes the author for their scholarship excellence. What is the solution? Don’t cite anything, and lower your score! And this is the software that editors, journals, and universities are using to judge academic excellence, getting a Ph.D., getting a job, promotion, and tenure? This is total and complete madness!


In the next Turnitin report sample, we see a list of ‘conflicts’ in my student’s paper being analyzed and seven papers submitted to various universities around the world. The only logical way I can interpret this is that since these are not published papers, some poor doctoral student with similar content did a Turnitin analysis on their own paper at these schools and then saved it to Turnitin’s central repository.



Now what would be fascinating to know is if these students later submitted the same paper to a journal and what was the outcome of that submission along with their paper’s new and much higher Turnitin score.


Another odd problem with another Turnitin report was even though an analysis of the references was turned off; it still looked at this section because it was titled ‘Bibliographic references’ (I assume). Turnitin then went and flagged each reference and created some crazy score that no journal would accept. Of course, we re-ran the report without the “Bibliographic references’ in the paper, and got to a score a journal might consider.

I scratch my head every day when I get these reports and wonder am I the only one who sees how horribly flawed this software and system is? Of course when one man like Jeffrey Beall can be the judge and jury for an entire $28 billion a year industry, I guess anything is possible.

Therefore, if Turnitin software is truly worth the money that they say it is ($1.7 billion), I would have a team of software engineers writing code to place exceptions on these commonly used numbers, abbreviations, headings, emails, university names, technical phrases, etc. into the search engine database being used. Grammarly seems to be able to do this, why can’t Turnitin?

I think readers should also look at the following paragraph from Turnitin’s web site concerning their privacy and security policies. The following is an excerpt from that site I extracted on 21 March 2020 which is buried 7,376 words down the page. You, therefore, have to be a very patient human (like me) to find the following Turnitin use statements:

Your License to Us (Turnitin)
Unless otherwise indicated in this Site, including our Privacy Policy or in connection with one of our services, any communications or material of any kind that you e-mail, post, or transmit through the Site (excluding personally identifiable information of students and any papers submitted to the Site), including, questions, comments, suggestions, and other data and information (your "Communications") will be treated as non-confidential and non-proprietary. You grant Turnitin a non-exclusive, royalty-free, perpetual, worldwide, irrevocable license to reproduce, transmit, display, disclose, and otherwise use your Communications on the Site or elsewhere for our business purposes. We are free to use any ideas, concepts, techniques, know-how in your Communications for any purpose, including, but not limited to, the development and use of products and services based on the Communications.

So does this mean that Turnitin has given themselves a license to commit Plagiarism? Does this mean they are monitoring and filtering your communications? Can you imagine if Google had the above Turnitin statement in their legalese for the use of their email? Maybe this is why someone paid $1.7 billion for Turnitin? There are so many questions that these statements raise in my mind; or am I simply howling at the moon again?

Grammarly

There are two versions of this online tool, with one being free and the other subscription-based. Unfortunately for you, their ‘plagiarism checking’ component is part of the subscription-based tool. I have used Grammarly countless times as it does help me identify spelling mistakes, grammar ‘issues’, and can show me where there are potential similarity problems.

Grammarly also recalculates your grammar and similarity scores in real-time, which I find to be one of the best features of their online tool and light-years beyond Turnitin’s static reports. Grammarly, however, does not know how to ‘not see’ your reference section, so the best method is to upload a paper without it. If, however, you leave the references in the upload, just hit the dismiss button when going through each flagged area.     
 As mentioned, you will need their ‘premium’ service to access the similarity checking feature, which costs $11.66 per month. .

If you do not have access to Turnitin, I consider Grammarly as a must have alternative before you submit your manuscript. However, some journals want only Turnitin reports (according to Sage 80% of the world’s ‘high-impact’ journals use Turnitin) and will not accept Grammarly Image reports (I tried this already). I think if Grammarly introduced some type of report generation and printing feature, it could go a long way at giving editors an alternative to Turnitin.

TIP - Upload your paper without the reference section when using Grammarly’s similarity checking feature.

Grammarly’s software does not flag commonly used phrases and words like Turnitin does. In my very humble opinion, Grammarly is a far superior platform to Turnitin and the mess that Turnitin’s static reports regurgitate. Turnitin’s excuse for not having this type of real-time capability is because…..drum roll please….

“Once the student receives a Similarity Report, they have to wait 24 hours to get another report on a resubmission (if resubmissions have been enabled by their instructor); this prevents students from wordsmithing and resubmitting repeatedly.” – Turnitin’s Top 15 Misconceptions
Really? So now, according to Turnitin, I have to wait 24 hours each time I make an ‘edit’ to a document or else I am guilty of ‘wordsmithing’. Oh, goodness me. Who sits around and thinks up this stuff? Yes, yes, your honor, I am guilty of WORDSMITHING! Can I also plead guilty to ‘self-plagiarism’ while I am at it? 


Publishing Your Academic Research Paper

Find out more at https://tinyurl.com/wm2ep5q

Last Edited: 2020.March.24 (Tuesday)


 

Scopus/TCI1 (not SJR) Journal of Multidisciplinary in Social Sciences (JMSS)

  https://so03.tci-thaijo.org/index.php/sduhs/article/view/274241