Voynich Reconsidered: alternative transliterations
During the late 1990s, Timothy Rayhel developed a new transliteration of the complete text of the Voynich manuscript. He made it available online under the pseudonym Glen Claston. He called the transliteration v101. Rayhel passed away in 2014, aged only fifty-six; but the v101 transliteration lived on. Of the several widely used (and very different) transliterations of the Voynich manuscript, it is my personal favourite.
The v101 transliteration identifies far more glyphs than there are letters in any modern or medieval phonetic alphabet. Rayhel’s basic character set had seventy-one glyphs: for which he had to assign the following keys:
In many cases, Rayhel assigned different keys to glyphs that seemed to differ in minor respects. For example:
Nevertheless, like any transliteration of an unknown set of symbols, v101 incorporates certain assumptions, as to what constitutes a glyph. The most consequential of these assumptions relate to the following groups of glyphs, which other transliterations such as EVA treat as glyph strings:
I like to think that Rayhel anticipated these concerns; and that he did not intend his v101 to be definitive. Indeed, the appellation v101 invites us to imagine further transliterations, numbered v102, v103 and so on.
A final reason why I like v101 is that, thanks to Rebecca Bettencourt of KreativeKorp, the glyphs are available as a Unicode font. Thereby, in Microsoft Word or Excel, we can write a line of v101 characters, change the font to Voynich v101, and see the glyphs in a form close to that which appears in the original document MS408 at the Beinecke Rare Book and Manuscript Library.
Considerations such as these prompted me, when writing Voynich Reconsidered, to work from the v101 transliteration and to depart from it whenever it was necessary for the statistical testing of hypotheses. Indeed, in the book I presented three alternative transliterations, which I numbered v102, v103 and v104.
In subsequent research, I started by developing twenty additional transliterations, currently numbered from v120 to v202. Their main characteristics are summarised below.

A summary of transliterations v101④ (i.e. v101 with 4o replaced by ④) to v202. Included but not shown in the F group: v101 glyphs {j}, {J}, {l}, {L}, [r}, {R}, {u}, {U}. Author’s analysis.
There were two common characteristics of these transliterations, as follows:
Examples of variant transliterations
An example may illustrate the differences between the transliterations. I selected page f1r, line 13, as a line of Voynich text which could be transliterated in many different ways. It includes some “gallows” glyphs, some “pedestal” glyphs, and some occurrences of the {2} group, the {8} group, {m} and {n}. It does not have any occurrences of {4o}, {A}, {C} or {cc}.
Below is a summary of my transliterations of this line in v101④ through v104.

Prioritising the transliterations
The issue then arose: was it possible to rank or prioritise these transliterations, in terms of their probability of a sensible mapping from a precursor language?
For any given transliteration, a natural and objective test would be to compare the glyph frequencies with the letter frequencies in selected precursor languages.
However, we do not know the precursor languages; we can only assign probabilities to various plausible candidates. In other posts on this platform, I have argued that, given that Voynich found the manuscript in Italy, the languages spoken and written in or near medieval Italy are more probable precursors than those more geographically distant. Therefore, I thought it reasonable to begin with medieval Italian, represented broadly by the OVI corpus, or more narrowly by Dante’s La Divina Commedia; with other Romance languages such as French, Galician-Portuguese and Latin; and with other European languages such as Bohemian and German.
If one or another transliteration produced a good correlation, we could then plan on some trial mappings.
In calculating the letter frequencies, I made no distinction between vowels and consonants. In other posts on this platform, I have advanced the hypothesis that the Voynich scribes, after mapping letters to glyphs, re-ordered the glyphs in each "word". If they did, the typical alternation of consonants and vowels, which is familiar to us in European languages, would disappear; and we would have no way of knowing which glyphs represented consonants and which vowels.
The correlations are simply the correlation coefficients calculated by the R-squared function (RSQ in Microsoft Excel). The coefficient is only calculated on the data for which matches can be made in both datasets. For example, Glen Claston’s basic v101 glyph set has seventy-one glyphs, while medieval Italian (in the OVI corpus) has only thirty-one letters, including accented letters. If we are matching v101 with medieval Italian, the correlation is calculated only for the thirty-one most frequent glyphs; the forty least frequent glyphs are ignored. Fortunately, those last forty glyphs have frequencies of less than 0.1 percent; and most of them can be combined with other glyphs which they resemble.
In each case, in order to have some confidence of using Voynich text containing (or derived from) a single language, I used the glyph frequencies from the “herbal” section of the manuscript.
Without further ado, below is a summary of the correlations between the glyph frequencies in my transliterations v101④ to v202, and the letter frequencies in ten candidate medieval languages (eight European, plus Arabic and Persian).

Correlations between glyph frequencies (v101④ through v202) and letter frequencies in potential precursor languages. I made no distinction between vowels and consonants (i.e., no implementation of the Sukhotin algorithm. Author’s analysis.
For each potential precursor language, we find that there is one transliteration (or sometimes two) that is a best fit for the letter frequencies in the language in question.
The logical next step is to select the best-fitting pairings of transliteration and language, and to attempt some trial mappings. My provisional procedure is as follows:
The v101 transliteration identifies far more glyphs than there are letters in any modern or medieval phonetic alphabet. Rayhel’s basic character set had seventy-one glyphs: for which he had to assign the following keys:
• the twenty-six lower-case letters of the Latin alphabet;In addition, Rayhel defined an extended character set with eighty-one additional glyphs, each of which occurs only once or a few times in the whole manuscript.
• plus twenty-five upper-case letters (all except O);
• plus the numbers from 1 to 9;
• plus eleven special characters such as $.
In many cases, Rayhel assigned different keys to glyphs that seemed to differ in minor respects. For example:
• His glyphs {2}, {3}, {5} and four others differ only in the shape and position of a small accent or diacritic.However, none of this matters. The researcher is free to treat any group of similar glyphs as one glyph; v101 places no constraint on doing so.
• His glyphs {6}, {7}, {8} and {&} are sufficiently similar that we might ascribe their differences to styles of handwriting.
• The differences between {f} and {u}, and between {g} and {j}, are tiny hooks or flourishes.
Nevertheless, like any transliteration of an unknown set of symbols, v101 incorporates certain assumptions, as to what constitutes a glyph. The most consequential of these assumptions relate to the following groups of glyphs, which other transliterations such as EVA treat as glyph strings:
• the v101 glyphs {m} and {n}; for example, the v101 {m} is {iin} in EVA;In cases such as these, the researcher who suspects the single glyphs in v101 to be strings must be prepared to break them up, as EVA does.
• the v101 “pedestal” glyphs {F}, {G}, {H} and {K}, and their relatives; for example, v101 {K} is {cth} in EVA;
• the v101 “multiple” glyphs {C} and its relatives, and {I}; the v101 glyph {C} is {ee} in EVA, and the v101 {I} is {ii} in EVA.
I like to think that Rayhel anticipated these concerns; and that he did not intend his v101 to be definitive. Indeed, the appellation v101 invites us to imagine further transliterations, numbered v102, v103 and so on.
A final reason why I like v101 is that, thanks to Rebecca Bettencourt of KreativeKorp, the glyphs are available as a Unicode font. Thereby, in Microsoft Word or Excel, we can write a line of v101 characters, change the font to Voynich v101, and see the glyphs in a form close to that which appears in the original document MS408 at the Beinecke Rare Book and Manuscript Library.
Considerations such as these prompted me, when writing Voynich Reconsidered, to work from the v101 transliteration and to depart from it whenever it was necessary for the statistical testing of hypotheses. Indeed, in the book I presented three alternative transliterations, which I numbered v102, v103 and v104.
In subsequent research, I started by developing twenty additional transliterations, currently numbered from v120 to v202. Their main characteristics are summarised below.

A summary of transliterations v101④ (i.e. v101 with 4o replaced by ④) to v202. Included but not shown in the F group: v101 glyphs {j}, {J}, {l}, {L}, [r}, {R}, {u}, {U}. Author’s analysis.
There were two common characteristics of these transliterations, as follows:
• In all cases, I redefined the v101 glyph string {4o} as a single glyph, to which I assigned the Unicode symbol ④. The v101 glyph {4}, in 96 percent of its occurrences, is followed by, and indeed joined to, what appears to be a v101 {o}; I felt that this pairing could not be two glyphs.I grouped the alternative transliterations from v120 to v191 in series, each characterised by a specific divergence from v101. For example, the v160 series shared a disaggregation of the “pedestal” glyphs, which I called the “F group”. In the v200 series, I experimented with combining several divergences from v101.
• In all cases, I assumed that a glyph retained its meaning (that is, its presumed precursor letter) independently of its position within the Voynich “word”.
Examples of variant transliterations
An example may illustrate the differences between the transliterations. I selected page f1r, line 13, as a line of Voynich text which could be transliterated in many different ways. It includes some “gallows” glyphs, some “pedestal” glyphs, and some occurrences of the {2} group, the {8} group, {m} and {n}. It does not have any occurrences of {4o}, {A}, {C} or {cc}.
Below is a summary of my transliterations of this line in v101④ through v104.

Prioritising the transliterations
The issue then arose: was it possible to rank or prioritise these transliterations, in terms of their probability of a sensible mapping from a precursor language?
For any given transliteration, a natural and objective test would be to compare the glyph frequencies with the letter frequencies in selected precursor languages.
However, we do not know the precursor languages; we can only assign probabilities to various plausible candidates. In other posts on this platform, I have argued that, given that Voynich found the manuscript in Italy, the languages spoken and written in or near medieval Italy are more probable precursors than those more geographically distant. Therefore, I thought it reasonable to begin with medieval Italian, represented broadly by the OVI corpus, or more narrowly by Dante’s La Divina Commedia; with other Romance languages such as French, Galician-Portuguese and Latin; and with other European languages such as Bohemian and German.
If one or another transliteration produced a good correlation, we could then plan on some trial mappings.
In calculating the letter frequencies, I made no distinction between vowels and consonants. In other posts on this platform, I have advanced the hypothesis that the Voynich scribes, after mapping letters to glyphs, re-ordered the glyphs in each "word". If they did, the typical alternation of consonants and vowels, which is familiar to us in European languages, would disappear; and we would have no way of knowing which glyphs represented consonants and which vowels.
The correlations are simply the correlation coefficients calculated by the R-squared function (RSQ in Microsoft Excel). The coefficient is only calculated on the data for which matches can be made in both datasets. For example, Glen Claston’s basic v101 glyph set has seventy-one glyphs, while medieval Italian (in the OVI corpus) has only thirty-one letters, including accented letters. If we are matching v101 with medieval Italian, the correlation is calculated only for the thirty-one most frequent glyphs; the forty least frequent glyphs are ignored. Fortunately, those last forty glyphs have frequencies of less than 0.1 percent; and most of them can be combined with other glyphs which they resemble.
In each case, in order to have some confidence of using Voynich text containing (or derived from) a single language, I used the glyph frequencies from the “herbal” section of the manuscript.
Without further ado, below is a summary of the correlations between the glyph frequencies in my transliterations v101④ to v202, and the letter frequencies in ten candidate medieval languages (eight European, plus Arabic and Persian).

Correlations between glyph frequencies (v101④ through v202) and letter frequencies in potential precursor languages. I made no distinction between vowels and consonants (i.e., no implementation of the Sukhotin algorithm. Author’s analysis.
For each potential precursor language, we find that there is one transliteration (or sometimes two) that is a best fit for the letter frequencies in the language in question.
The logical next step is to select the best-fitting pairings of transliteration and language, and to attempt some trial mappings. My provisional procedure is as follows:
• select (preferably randomly) a chunk of Voynich text, consisting of at least a whole line
• check the selected text for Zattera’s “separable words”; for this purpose, each “word” needs to be tested against Zattera’s “slot alphabet”, to see whether it conforms; and if not, separate the word with a space or spaces at appropriate points;
• map the text to the target precursor language, with a straight frequency mapping, without distinguishing vowels and consonants;
• check the resultant text strings against a suitable corpus or text of the target language, as to whether they are real words in the language;
• and in cases where a text string is not a real word, or is a rare word, examine the possibilities of re-arranging vowels and consonants in the string, with a view to finding a common real word (again by reference to the corpus or text)
• and finally, if a whole line is thus mapped to real words, examine whether the result makes any sense.
No comments have been added yet.
Great 20th century mysteries
In this platform on GoodReads/Amazon, I am assembling some of the backstories to my research for D. B. Cooper and Flight 305 (Schiffer Books, 2021), Mallory, Irvine, Everest: The Last Step But One (Pe
In this platform on GoodReads/Amazon, I am assembling some of the backstories to my research for D. B. Cooper and Flight 305 (Schiffer Books, 2021), Mallory, Irvine, Everest: The Last Step But One (Pen And Sword Books, April 2024), Voynich Reconsidered (Schiffer Books, August 2024), and D. B. Cooper and Flight 305 Revisited (Schiffer Books, coming in 2026),
These articles are also an expression of my gratitude to Schiffer and to Pen And Sword, for their investment in the design and production of these books.
Every word on this blog is written by me. Nothing is generated by so-called "artificial intelligence": which is certainly artificial but is not intelligence. ...more
These articles are also an expression of my gratitude to Schiffer and to Pen And Sword, for their investment in the design and production of these books.
Every word on this blog is written by me. Nothing is generated by so-called "artificial intelligence": which is certainly artificial but is not intelligence. ...more
- Robert H. Edwards's profile
- 67 followers
