An Interview with Scholar Anna Faktorovich

LibraryThing is pleased to sit down this month with author, academic and publisher Anna Faktorovich to discuss her fascinating new project, which attempts to solve some of the mysteries surrounding the authorship of many literary texts from the British Renaissance. From the works of William Shakespeare to Christopher Marlowe, these texts have been analyzed using the computational-linguistic method she invented—one incorporating a combination of 27 different tests—as well as being subjected to structural, biographical and other attribution approaches. Dr. Faktorovich concludes that all of the tested 284 works from between 1560 to 1650 were authored by six ghostwriters. The results of this massive study have been published in Re-Attribution of the British Renaissance Corpus, while a number of the texts themselves have been translated for the first time and released as part of her British Renaissance Re-Attribution and Modernization Series, published by Anaphora Literary Press.

Your project seeks to reshape our understanding of a key period in British literature. How did you get the idea for it?

This series would not have been possible without the previous two decades of research I undertook on surrounding topics. My PhD dissertation and my first scholarly book, Rebellion as Genre in the Novels of Scott, Dickens and Stevenson, explored the concept of formulas and structure of literature. Then, my second book, The Formulas of Popular Fiction, dissected the range, history and methodology of the formulas that modern readers are familiar with. Then I digressed from the standard topics covered in scholarly books to explore via my own publishing company more complex social questions such as the difference between mega-corporate capitalism and Radical Agrarian Economics. I also explored why the publishing industry has a bias that prefers lighter and more low-brow literature from female writers, while preferring denser fiction from male writers in Gender Bias in Mystery and Romance Novel Publishing. While writing this book, I realized that romances, mysteries and male and female voices had quantitatively different linguistic measurements.

None of these titles, “self-published” with my Anaphora Literary Press, received any recognition, so the next book I researched was The History of British and American Author-Publishers and Satirical 18th Century British Novels, which explained that the best British/American authors (Sir Walter Scott, Lord Byron, Charles Dickens, Virginia Woolf, Benjamin Franklin, Edgar Allan Poe, Mark Twain, Herman Melville and Alice Walker) self-published their best works. Then, I decided to attempt to return to traditional scholarly publishing by writing about the foundations of British satire in the eighteenth century, but as I began my research, I came across studies that questioned the authorship of texts assigned to Daniel Defoe. I designed a few linguistic measurement experiments to test these authorship questions and immediately discovered that some of “Defoe’s” novels were obviously mis-attributed and were actually ghostwritten by Robert Paltock. I made several other re-attributions of texts from the 18th century across fourteen essays, but before I edited these for publication as a book, I was asked by Editor Robert Hauptman, who published one of these essays in his Journal of Information Ethics, if I could prove the accuracy of my method by applying it to a uniquely complex period to re-attribute—the British Renaissance.

There were around sixty authors that had been previously proposed by scholars as the “true” Shakespeare, so I adjusted the computational-linguistic author-attribution method I had used for the 18th century to be more easily applicable to a much larger corpus of texts. The attribution process was indeed extremely difficult because I kept finding similar texts that matched only six linguistic signatures, so it became apparent that there were only six ghostwriters working across this period. The identity of these ghostwriters remained a mystery, until I expanded the study to 104 different bylines and over 200 different texts. Only one of these six ghostwriters (Ben Jonson) is familiar to modern literature researchers, while the rest remain obscure, despite their obvious significance in their own day (Josuah Sylvester was an official Court Poet, William Byrd was granted a music/poetry publishing monopoly by Elizabeth I, and Richard Verstegan held control over an exile Catholic publishing monopoly). When these six ghostwriters’ biographies were compared to all of the other biographies in their linguistic groups, it became clear that only they were alive long enough and had access to have written these clusters of texts.

While the evidence I was gathering, in the form of documentary records, handwriting, forensic accounting, and various other forms of proof was overwhelming, scholarly journals kept saying that I needed more proof. So I decided to add to the 698-page scholarly study, or Re-Attribution of the British Renaissance Corpus (Volumes 1-2), a full series of translations from Early Modern English into Modern English of previously untranslated plays, poetry, non-fiction and other genres (Volumes 3-14), with around fourteen more volumes forthcoming. These translations are accompanied by annotations, introductions and primary sources that add thousands of pieces of evidence that confirm the re-attributions made in the central study. This was a gradual process of digging into the research, and addressing new evidence and new questions as they came up.

How does this newly invented “computational-linguistic method” of textual analysis work? Can you give us an example?

Here is a simple set of steps anybody can take to apply my method.

1. Find a group of texts (there should be at least 20 texts by a few different bylines) from a given period that are connected to an authorial mystery you want to solve, and save them as plain documents.

2. Open free publicly-accessible websites—,, or—and download the WordSmith program.

3. Enter each of the texts separately into each of these platforms and record the data for several linguistic tests into a spreadsheet. For the Renaissance corpus I used 27 tests for punctuation, lexical density, parts of speech, passive voice, characters and syllables per word, psychological word-choice, and patterns of the top-6 words and letters. The tests for top-6 words and letters require additional steps, so you can skip these in favor of other simple single-number tests available on these platforms. Your first column should be the titles of all texts in the group, and the top row should be the names of the different tests applied to them. You will want to create duplicates of this raw data in separate tabs in the spreadsheet, with one sheet for each text in the group.

4. In the spreadsheet, organize the numbers for each of the tests from-smallest-to-largest, and mark only the texts that are with 17-18% of the compared text on this spectrum. For example, if you are only testing 20 texts, you can select 2 texts just above and 2 more just below your compared-against text’s value and change their numbers to 1, while changing all of the other numbers to 0; the 1 means the texts are similar, while the 0 means they are different.

5. When you have changed the entire sheet’s data into 0s and 1s, create a last column and automatically add up the Sum for each row.

6. Evaluate your results to determine what number in the sum column means two texts are by the same author, or if they were written by two or more authors, or if they were written by different authors. A smaller corpus can still have a few texts with extremely high numbers of matches to each other, if all of the other texts were written by different authors. And a large corpus might have fewer matches, but to a very large quantity of texts that all share a single underlying author. You will have to create a cut-off point for the number of matches that separate similar from divergent texts in your chosen group.

You can see the raw data and calculations I derived for the Re-Attribution series HERE. One of the tables I added to this GitHub site is “Koppel Experiment Reviewed – Data Tables.” This was a small experiment I ran for a second article I wrote for the Journal of Information Ethics, in which I discredit the findings and methodology applied in Moshe Koppel, Jonathan Schler and Elisheva Bonchek-Dokow’s 2007 article, Measuring Differentiability: Unmasking Pseudonymous Authors.” As you can see from the data, my findings are tragic from my perspective, as I am a fan of all of these great writers that I would not have thought were capable of being implicated in ghostwriting. For example, the data indicates only two linguistic signatures between the three Bronte sisters, suggesting it is likely the initial assignment of these texts to only two male brothers was more accurate than the current belief three women wrote them. This conclusion did not shock me as much as it would have a couple of years ago. I had initially hoped previous scholars who guessed “Emilia Bassano” could have been the true author behind “Shakespeare” were correct, but the data proved that “Bassano”, as well as several other ostensible female groundbreakers like “Mary Sidney” and “Lady Mary Wroth,” were not actually writers, but either hired ghostwriters or were mis-attributed with credits. You really have to read Volumes 1-2 to understand how overwhelming the evidence is for these conclusions, as reading this summary alone could not possibly convince anybody that the history with which they are familiar is entirely incorrect.

There have been challenges made in the past to the authorship of some of these works—in William Shakespeare’s case especially. What does your approach bring to the ongoing discussion that is new and convincing?

The approximately 60 previous bylines that have been proposed by scholars as alternative “true” authors behind the “Shakespeare” byline matches my finding that only six ghostwriters wrote all of the tested texts from this century. With only six authorial styles in this mix, it has been very easy for scholars to find linguistic, structural, thematic and other similarities between any cluster of randomly selected texts by two given bylines or between a questionable text and a text by another byline. While scholars in this field have made the current attributions seem rational, a close examination of all past re-attributions betrays nonsensical chaos. For example, A Yorkshire Tragedy was bylined as “Written by W. Shakspeare”, but it is currently attributed to “Thomas Middleton” in Roger Holdsworth’s analysis in the The New Oxford Shakespeare Authorship Companion. Another absurd string of past re-attributions I found was for the short poem, “Funeral Elegy by W. S.” (1612), which was first attributed to “Shakespeare” by Donald W. Foster, before it was re-assigned to “Davies” by Brian Vickers and then to “Ford” with equal certainty. My study re-attributes this “Elegy” to Gabriel Harvey, the Cambridge rhetoric professor.

William Percy has never been previously proposed as a potential underlying author behind the “William Shakespeare” pseudonym. I started my study by researching all previous articles, books and the like that suggested alternative “Shakespeares” and this list included many obscure names such as Alexander, Armin, Aylett and Daniel, popular bylines such as Bacon and Fletcher, and aristocrats such as Dyer, and even Queen Elizabeth. There are a mere 70 titles in WorldCat attributed to William Percy as an author versus the 81,521 titles attributed to “Shakespeare”. The translations of Percy’s self-attributed 5 plays and sonnet collection that I executed in Volumes 3-8 have never been attempted before, so modern scholars have not even had access to these to allow them to realize their similarity with “Shakespeare” in structure, storylines, as well as in linguistics. I only came across Percy’s name when I considered nearly all of the bylines used across this century that could have approximately fit with the timeline of these publications. Percy’s sonnet collection (the only book he published under his own byline) happened to have been digitized in Early English Books Online, and this invited me to dig up his buried in the archives plays.

The 284 texts I tested comprise the largest corpus of Renaissance work ever subjected to computational-linguistic analysis. My combination of 27 different tests is thousands of times more accurate than the standard method in this field, which only tests the frequency of common words. The point that swayed me beyond all doubt towards Percy was when I learned about the £2,400 loan William and his brother Henry Percy (Earl of Northumberland) took out from Arthur Medleycote (London merchant tailor) in 1593, just before the granting of the theater duopoly by Elizabeth I in 1594. This documented proof, without any corresponding record of what else William could have spent this sum on, firmly establishes that William re-invested this sum in troupe-development and theater-building in London, under pseudonyms such as “Shakespeare”. The currently accepted mythologic belief that “Shakespeare” was a real person who was a theater investor and manager was largely started by Nicholas Rowe, in his 1709 Some Account of the Life of Mr. William Shakespear. Rowe absurdly claimed that Sir William Davenant had started the gossip that Henry Wriothesley, 3rd Earl of Southampton, gave Shakespeare £1,000 “to set him up in his career.” It is absurd to believe that any aristocrat would have gifted this astronomic sum to any actor without a record in his accounts of this irrationally generous gift. There are similar irrefutable pieces of proof in every line and paragraph of my series.

You also seek to modernize and reintroduce the works to the public. Why is this important?

As I mentioned, none of William Percy’s plays or poetry, or the plays I am re-attributing to him in this part of the series, have ever been translated into accessible Modern English before. In the middle of my computational study, it became clear that the “Shakespeare” plays and poetry translated into Modern English registered as a separate linguistic signature from these same texts in their original spelling. In other words, editors have made such heavy changes to the canon of “Shakespeare” texts that this resulting style is a distinct linguistic signature or author. Modern readers and scholars alike intuitively believe in “Shakespeare’s” superiority and distinction from the bulk of other British Renaissance bylines such as “Robert Armin”, or anonymous plays such as Look Around You, because they are used to reading the understandable and polished modern versions of “Shakespeare”. This is also why when computational-linguists have tested plays such as the anonymous True Chronicle History of King Leir (1605), they have concluded that it was written by a different author than the modernized version of King Lear (1608). When I compared the original-spelling Lear to Leir, they had some near-identical linguistic measurements and were obviously both written by William Percy. Leir was the first experiment that Percy very heavily re-wrote in the second polished Lear edition under the “Shakespeare” byline.

All of the texts I included in this series have unique significance to literary history. For example, Look Around You was the first part of the myth-starting Robin Hood trilogy that previous critics have missed. And while the second quarto of “Shakespeare’s” Hamlet has been repeatedly re-translated, the first quarto that I translated in this series (Hamlet: The First Quarto) has never been translated in full before. It appears to have been intentionally censored by academia as “bad” because it (unlike the later versions) clearly points to Hamlet deflowering Ofelia and pretending to be mad to hide his homosexual relationship with Horatio (who threatens to kill himself for Hamlet). There is more literary and historical value in each of this series’ texts than in any of the canonical “Shakespeare” plays. It is impossible for even a seasoned scholar to read any of these texts in their old-spelling originals, not only because the meaning of most words has changed, but also because Percy also uses multiple languages (Latin, French, Italian), makes up words (which have been claimed to be nonsense by most scholars, when they have clear meanings when their parts are isolated), and uses allusions and quotes from obscure sources that need to be digested in annotations to be grasped. Some of these texts were never published or staged, and those that were printed were mostly only printed in as few as one or two copies. Thus, these Renaissance plays have never been introduced to the public before.

Tell us about your library. What’s on your own shelves?

My shelves are full with around 500 physical books, most of which I received for free from academic publishers in exchange for reviewing them in my Pennsylvania Literary Journal. You can see the latest of these reviews HERE. Some of them I received as free exam copies from publishers, when I have taught these textbooks in my college classes. I used to buy books back in college and graduate school, but I have moved so frequently in those years that I have donated most of them. The only paid-for books I now have are Anaphora titles by myself and other writers.

What have you been reading lately?

In addition to the hundred or so books I read annually, in order to review them in PLJ, I read thousands of other books for my research projects. When I am teaching in universities or live near an academic library, I check out the maximum-allowed pile of books every couple of weeks. But across the last four years I have conducted my research remotely by accessing free books on Google, Project Gutenberg and various other platforms. I also use LookInside features to find evidence in newer books, or request some relevant new titles for review before reusing them in my research. I also have access to research articles on TexShare. Most of the books I needed for the translation series were published during the Renaissance and have been digitized to be freely available. On an average day of translation research, I probably check 100 different sources to write a single page of annotations, and the series has 2,500 pages so far. It would have been impossible for me to check out a quarter-of-a-million books from even the biggest library, and most of the contemporary books are rare single-copies that are in closed collections. The names of the specific texts I have been reading are thus cited in the annotations; I will not attempt to insert a bibliography here to name them.

