To Catch a Plagiarist by Joshua T. Katz | Articles

The plagiarism wars have begun. Claudine Gay is out as president of Harvard, in large part because of conduct that the Harvard Corporation and Gay herself refuse to describe with the p-word, and the coming months will probably be painful for quite a few people who write for a living.

As a result of outrage on both left and right (the former often seems intent on bringing down established universities and other institutions from the inside, the latter from the outside), we can be certain that a number of scholars, journalists, speechwriters, and pundits—men, women, black, white, young, old, Democratic, Republican—will be hit with credible charges of plagiarism. Although few cases are likely to be as remarkable in their bang-for-the-buck as Gay’s, many writers are wondering whether the inadvertent omission of a quotation mark decades ago will pop up in an AI search and destroy a career. In an article for The Atlantic, Ian Bogost describes the unnerving process of checking his own work himself, and I would not be surprised if someone, somewhere, were right now busily uploading into a plagiarism bot everything I’ve published on Homer, Old Irish, and the dismal state of higher education. (If you are doing this, I hope you’ll take the time actually to read the work.)

So, how do you catch a plagiarist? Of course there is plagiarism software, which computer science departments have been successfully using for years to detect programming assignments that students have copied from others, sometimes with light modifications. These days, both teachers and editors of scholarly journals are increasingly putting work written in a natural language—in short, essays—through plagiarism detection programs. The expansion of AI in everyday life will normalize such efforts.

But there are also old-fashioned methods of detecting plagiarism, and we should not abandon them. For one thing, software applied to work that was definitely composed by flesh-and-blood people still yields “false-positives”—putative instances of plagiarism that aren’t—through which a human must laboriously comb. Furthermore, such software does not yet appear to be especially good at reliably determining what was written by man and what by machine.

Two decades ago, before text-matching software was widely available, I served on a Princeton student–faculty committee that investigated dozens of cases of suspected plagiarism by undergraduates, many of them in essays for classes in the humanities and social sciences on such topics as the War of 1812, To the Lighthouse, and the Japanese economy. I was struck by just how easy it often was to spot instances of plagiarism, even before I had seen the copied text. The giveaways fell—and, I expect, still fall—into three categories: inconsistent typography, inconsistent punctuation, and broader stylistic inconsistencies.

First, inconsistent typography. It was astonishing to me how frequently students submitted work in which a sentence or paragraph was formatted differently from the rest of the paper. In such cases, a quick search would usually reveal that just those words had been copied and pasted from some online source. An essay written in twelve-point type would suddenly have a sentence in eleven-point type. Or, out of nowhere, words would appear in dark gray Calibri rather than in Times New Roman and standard black. The spacing between lines in one section would be subtly different from the spacing everywhere else. In one memorable instance, quotation marks and apostrophes in an essay were “curly”—except in the plagiarized sections, where they were “straight.”

Second, inconsistent punctuation. Some students regularly use the Oxford comma; others don’t. Unfortunately, increasingly many students have no conception of consistency, which is bad news but not a matter of plagiarism. However, for those who do, when one striking sentence has an Oxford comma and no other sentence with the form “X, Y(,) and Z” does, experience shows that something will turn up when that sentence is googled. I can say similar things about the use of lowercase or capital letters after a colon and any employment at all of the semicolon.

Finally, there are things that just don’t make stylistic sense. In an essay on World War II written by an American, you don’t expect to find instances of the locution “the Second World War”—unless they’re between quotation marks because they’ve been taken from a properly credited source. You also don’t expect to find, as I once did, a sentence beginning with the word “Whilst” but including the word “honor”—because the American student, copying the sentence from a British publication, knew enough to change “honour” to “honor” but not enough to change the conjunction.

Some readers may view these last paragraphs as quaint, and at some level they are. But even in the age of automated plagiarism detectors, these old-fashioned methods have their use: On occasion, a plagiarist claims not to have deliberately copied but rather to have internalized another’s language and accidentally reproduced it. This must indeed sometimes happen. With certain typographical and stylistic inconsistencies, however, anyone can tell at once that that’s not the case.

This brings me to a larger question: What is plagiarism? Two issues deserve attention. One, to which I will return, is whether we should learn to speak differently of different kinds of plagiarism, more or less as the law distinguishes among first-, second-, third-, and fourth-degree criminal offenses.

The other concerns AI. No one should forget that in the months before the shake-up at Harvard, academic dishonesty was already on everyone’s mind because of ChatGPT. More people in the United States googled “plagiarism” in the last week of April and first week of May 2023 than in the first two weeks of December, when Gay’s plagiarism was exposed. According to a poll conducted by the online magazine Intelligent, within weeks of the debut of ChatGPT, “30% of college students ha[d already] used ChatGPT on written homework.” Reliable statistics are hard to come by, in part because there are so many other AI-powered tools (for instance, Claude and Grok), but it is hard to imagine that the percentage has not been rising in the 2023–24 academic year.

Figuring out how to sustain academic integrity in an environment more and more dominated by AI—which, of course, also powers plagiarism-detection software—needs to be a top priority for administrators and teachers at all educational levels. We must decide whether what might be called conventional plagiarism is fundamentally the same as using AI to do what is supposed to be one’s own work. I admit that I don’t yet know what exactly I think but point readers to an essay in The Atlantic by Matteo Wong (I will assume he wrote it himself) titled “What if We Held ChatGPT to the Same Standard as Claudine Gay?” Noting that AI depends on copyrighted materials, Wong concludes that “the technology [is] guilty of mind-boggling levels of plagiarism.” Maybe so. But it is not obvious that for an individual to steal another person’s work (the Latin word plagiarius means “kidnapper”) is the same sort of offense as passing off as one’s own the work of an anonymous bot.

Another important question is how plagiarism should be punished. Does intent matter? What about magnitude? How should we assess the differences between copying one or two short sentences and copying one or two long paragraphs? Between copying a paragraph for one essay and copying dozens of paragraphs for dozens of essays?

I believe it is wrong to suspend undergraduates for comparatively minor academic infractions—and I feel more strongly about this than I did when I sat on that committee at Princeton. Anyone who disagrees with me on this point should at least recognize that it is hypocritical of universities like Harvard and Princeton to punish students harshly while downplaying the plagiaristic behavior of senior faculty. As Aaron Sibarium has pointed out, when she was dean of the Faculty of Arts and Sciences, Claudine Gay “watered down [Harvard’s] policy on research misconduct” so that faculty—but not students—“could be sanctioned only if they plagiarized ‘knowingly, intentionally, or recklessly.’” I am not deaf to arguments in favor of this sort of change for everyone, faculty and students alike. But if Gay’s record of verbal theft doesn’t count as “reckless,” we are redefining that term as well as “plagiarism.”

And one more question: What kind of offense is plagiarism? In January, the philosopher Kathleen Stock wrote an article titled “Plagiarism is not a Sin,” harshly condemning plagiaristic practice but arguing that “[t]he infringement is intellectual not moral.” To my eyes, it is both.

Simply put, plagiarism is theft. Yes, there is some truth to Stock’s assertion that “[w]ords are public property anyway. It’s not like you are stealing possessions from people”: Because intellectual property is not considered a possession, the law treats it differently from personal property; additionally, although I am not terribly sympathetic to them, there are philosophical objections to the very idea that intellectual property deserves robust protection. But words do matter. Consider how assiduously the Harvard Corporation and Claudine Gay worked to avoid the p-word: They spoke instead of “inadequate citation,” “duplicative language without appropriate attribution,” and “material [that] duplicated other scholars’ language, without proper attribution.” In this case, one particular word, “plagiarism,” mattered so much that they were willing to defy their English Sprachgefühl in order to avoid it.

It is telling that Harvard’s euphemisms compound the ugliness of plagiarism with the ugliness of deliberately obscurant bureaucratese. The bigger problem here is that we—parents, teachers, journalists, administrators, the wider public—often fail to model good linguistic practice, especially when it comes to inculcating in children an appreciation for the beauty and power of words. A proper education involves reading widely, admiring good sentences and scoffing at bad ones, writing draft after draft of one’s own compositions, and generally attending to how rhetoric shapes argument and narrative.

There are very few occasions—terse emergency instructions present one—when one person’s language should be interchangeable with another’s. You may or may not like my style, but for better or for worse, it is mine. If I suddenly began to sound like someone else, or produced what the technology writer Anna Wiener has dubbed “garbage language,” I hope that those who know me would notice. To judge by the dreck that so many people churn out, in some cases even duplicate, no one has taught them about style (the word is related to “stylus,” with both going back to Latin stilus, “spiked writing instrument”) or pointed out to them that (as I put it last year in the New Criterion) “[t]he sentences I write don’t sound as good in your mouth”—or on your page—“for much the same reason that your shirt doesn’t quite fit on me.”

Language reflects reality imperfectly, but it’s by far the best medium we have to express what is, as well as what was and what might yet be. Simply put, we are logocentric creatures. Style matters because it shows our interlocutors that we take our—and their—verbal expressions seriously. And for many of us, an appreciation of words is ultimately an appreciation of logos, of the Word.

If you believe, with John, that the Word is God, then to abuse it is a sin. But certain kinds of linguistic abuse can be a moral failure even for nonbelievers, who should strive to employ words faithfully, even though they do not have faith. Now and again, all of us do violence to and, maybe, also with language. We curse, prevaricate, belittle, and engage in sophistry. And sometimes we may, intentionally or not, take someone else’s phrase or thought as our own.

What we need now are honest discussions of issues that are colliding in new and forceful ways: of how to instill a love of language in the young; of when, if ever, language (or its absence: silence) may be called violence; of the future of authorship and personal style in the age of AI; and of what plagiarism is, how it should be punished, and how those who transgress may redeem themselves.

Language is a gift, whether or not you hold it to be divine. Language deserves to be appreciated, cultivated, and delighted in. It is high time that we recommit ourselves to logos.

Joshua T. Katz is senior fellow at the American Enterprise Institute.

Image by Microbiz Mag on Wikimedia Commons, licensed via Creative Commons. Image cropped.

This is the first of your three free articles for the month.

Prev Article

Next Article

To Catch a Plagiarist

Articles by Joshua T. Katz