Voynich Reconsidered: “leaf words” revisited
In the context of the Voynich manuscript, I use the term “leaf words” to denote “words” that occur immediately before (to the left of), or immediately after (to the right of) an intervening illustration. These illustrations are in most cases of plants, hence my expression “leaf word”.

Examples of "leaf words" on line 1, page f26r of the Voynich manuscript. Image credit: Beinecke Rare Book and Manuscript Library.
To my mind, there is no doubt that in the Voynich manuscript, someone (or more than one person) drew the illustrations first, and later the scribes wrote the text around and within the illustrations, taking care not to overwrite an illustration. Therefore, the “leaf words” may tell us something about how the scribes managed the text in relation to the available space; and possibly, about the instructions that the producer gave them.
Dr Steckley’s database contains 625 lines. Each line contains at least two “leaf words” (a left “word” and a right “word”); a few contain as many as nine. I counted a total of 1,508 occurrences of “leaf words”.
As I had expected, most of the “leaf words” occur in the “herbal” section, which accounts for 531 of the 625 lines. Other thematic sections represented are the “balneological” section (56 lines), the “pharmaceutical” section (25 lines) and the “cosmic” section (13 lines). The database represents all five of the scribes identified by Dr Lisa Fagin Davis. Of the lines in the database, 436 were written by Scribe 1, 127 by Scribe 2, 22 by Scribe 3, 13 by Scribe 4 and 27 by Scribe 5.

Examples of "leaf words" in the Voynich manuscript, page f75r, line 17. Image credit: Beinecke Rare Book and Manuscript Library.
It seemed to me that this database would permit testing of the following hypotheses:
Short “words” to the left
The left “leaf words” had an average length of 3.49 glyphs; the right “leaf words” had an average length of 3.92 glyphs.
Dr. Steckley, in his recent paper with Noah Steckley, “Subtle Signs of Scribal Intent in the Voynich Manuscript”, found a similar result. Working with the pages of the “herbal” section written by Scribe 1, the Steckleys found that left “leaf words” (in their terminology, “before” tokens) had an average length of 4.24 glyphs; and right “leaf words” (“after” tokens) had an average length of 4.66 glyphs.
The Steckleys applied the chi-squared test of statistical significance, and concluded that there was some underlying causal mechanism that distinguished left “leaf words” from right “leaf words”. I have not performed an equivalent test; but the difference in the lengths of “words” seems to me significant.
From these results, we are encouraged to think that the left “leaf words” were inserted simply to fill the space that was available as the text approached the illustration.
For me, this is an unexpected and somewhat counter-intuitive inference. I have imagined the scribes as copyists. That is, they were professional writers, paid by the page, and working from documents that the producer or client had provided. In this scenario, the producer instructed the scribes to transliterate from letters in a (presumed) natural script to what we now call the Voynich glyphs; but did not give them authority or instructions to add or insert anything that did not represent the original documents.
Yet, in the left “leaf words” at least, we see some evidence of the scribe as author: that is, the scribe appearing to insert “words” or strings that the producer did not provide, in order to achieve a certain visual or artistic effect.
This result, in itself, did not imply that the “leaf words” were either meaningful, or what we might call junk. For that, I needed another test.
The top “leaf words”
Using the word counter at https://www.browserling.com/tools/wor..., I assembled rankings of the “leaf words” in descending order of frequency. These rankings could then be compared with the frequencies of the same “words” in the Voynich manuscript as a whole. A summary of the results is below.

The top ten left “leaf words”; the top ten right “leaf words”; and their frequencies in the Voynich manuscript as a whole. Highlighted frequencies denote cases where the “word” is much more common as a “leaf word” than in the manuscript as a whole. Author’s analysis, based on data kindly provided by Dr Andrew Steckley. Higher resolution at https://flic.kr/p/2q3TgRu
I am inclined to read these frequency comparisons as follows.
One way to test this would be to remove all occurrences of, for example, {8am} from the transliteration; to recalculate the frequencies of the glyphs that remain; and to test whether the remaining common “words” can be mapped to real words in natural languages.
For the moment, I am not inclined to attempt such tests: if only because, as I have reported in other articles on this platform, the “word” {8am} seems capable of mapping to real words in several medieval natural languages that I have examined. For example, {8am} can be mapped to منع (“prevention”) in Arabic; or “and” in English; or “con” or “dio” in Italian.
Next steps
I think that the true test of the function of “leaf words” will come when, or if, it becomes possible to map whole lines of Voynich text. As noted above, I have had some encouraging results in mapping {8am} to real words in some natural languages. I envisage a subsequent step of attempting mappings of other common “words”: probably “words" of at least three glyphs, such as the v101 "words" {1oe}, {1oy} or {2c9). If these mappings yield real words in any language, we may have enough correspondence of Voynich glyphs to letters to attempt a whole line. At that point, it may become more clear whether the “leaf words” represent words or junk.

Examples of "leaf words" on line 1, page f26r of the Voynich manuscript. Image credit: Beinecke Rare Book and Manuscript Library.
To my mind, there is no doubt that in the Voynich manuscript, someone (or more than one person) drew the illustrations first, and later the scribes wrote the text around and within the illustrations, taking care not to overwrite an illustration. Therefore, the “leaf words” may tell us something about how the scribes managed the text in relation to the available space; and possibly, about the instructions that the producer gave them.
Dr Steckley’s database contains 625 lines. Each line contains at least two “leaf words” (a left “word” and a right “word”); a few contain as many as nine. I counted a total of 1,508 occurrences of “leaf words”.
As I had expected, most of the “leaf words” occur in the “herbal” section, which accounts for 531 of the 625 lines. Other thematic sections represented are the “balneological” section (56 lines), the “pharmaceutical” section (25 lines) and the “cosmic” section (13 lines). The database represents all five of the scribes identified by Dr Lisa Fagin Davis. Of the lines in the database, 436 were written by Scribe 1, 127 by Scribe 2, 22 by Scribe 3, 13 by Scribe 4 and 27 by Scribe 5.

Examples of "leaf words" in the Voynich manuscript, page f75r, line 17. Image credit: Beinecke Rare Book and Manuscript Library.
It seemed to me that this database would permit testing of the following hypotheses:
• that the “leaf words” are the equivalent of hyphenated words: that is, fragments of longer “words” that were broken up by the illustrations (as if, when writing Shelley’s Ozymandias around an existing drawing, we had to write: “I met a traveller fro<->m an antique land”)Accordingly, I parsed Dr Steckley’s 625 lines into their component parts, breaking the lines where an illustration intervened; and built a database of the “leaf words” to left and to right of the “plant breaks”. Working with just the first two “leaf words” on each line, my first test was to calculate the average lengths of the “words”. The results were as follows.
• that the “leaf words” are complete “words”, inserted to match the available space and remaining grammatically correct (as in: “I met a traveller far <-> from an antique land”)
• that the “leaf words” are complete “words” but are not grammatically correct (as in: “I met a traveller sky <-> from an antique land”)
• that the “leaf words” are not “words” but meaningless strings, inserted to fill the available space (as in “I met a traveller xkz <-> from an antique land”).
Short “words” to the left
The left “leaf words” had an average length of 3.49 glyphs; the right “leaf words” had an average length of 3.92 glyphs.
Dr. Steckley, in his recent paper with Noah Steckley, “Subtle Signs of Scribal Intent in the Voynich Manuscript”, found a similar result. Working with the pages of the “herbal” section written by Scribe 1, the Steckleys found that left “leaf words” (in their terminology, “before” tokens) had an average length of 4.24 glyphs; and right “leaf words” (“after” tokens) had an average length of 4.66 glyphs.
The Steckleys applied the chi-squared test of statistical significance, and concluded that there was some underlying causal mechanism that distinguished left “leaf words” from right “leaf words”. I have not performed an equivalent test; but the difference in the lengths of “words” seems to me significant.
From these results, we are encouraged to think that the left “leaf words” were inserted simply to fill the space that was available as the text approached the illustration.
For me, this is an unexpected and somewhat counter-intuitive inference. I have imagined the scribes as copyists. That is, they were professional writers, paid by the page, and working from documents that the producer or client had provided. In this scenario, the producer instructed the scribes to transliterate from letters in a (presumed) natural script to what we now call the Voynich glyphs; but did not give them authority or instructions to add or insert anything that did not represent the original documents.
Yet, in the left “leaf words” at least, we see some evidence of the scribe as author: that is, the scribe appearing to insert “words” or strings that the producer did not provide, in order to achieve a certain visual or artistic effect.
This result, in itself, did not imply that the “leaf words” were either meaningful, or what we might call junk. For that, I needed another test.
The top “leaf words”
Using the word counter at https://www.browserling.com/tools/wor..., I assembled rankings of the “leaf words” in descending order of frequency. These rankings could then be compared with the frequencies of the same “words” in the Voynich manuscript as a whole. A summary of the results is below.

The top ten left “leaf words”; the top ten right “leaf words”; and their frequencies in the Voynich manuscript as a whole. Highlighted frequencies denote cases where the “word” is much more common as a “leaf word” than in the manuscript as a whole. Author’s analysis, based on data kindly provided by Dr Andrew Steckley. Higher resolution at https://flic.kr/p/2q3TgRu
I am inclined to read these frequency comparisons as follows.
• Both the left “leaf words” and the right “leaf words” seem to have a relatively compact vocabulary, in which certain “words” are used again and again; for example, among the left “leaf words”, the top five account for 14.5 percent of all occurrences of such “words”. This is consistent with natural language. For example, in the Brown corpus of modern American English, the top five words (“the”, “of”, “and”, “to” and “a”) account for 15.7 percent of all words in the corpus.Finally, if we are starting to suspect that the “leaf words” are junk, we have to ask the question: are they junk only next to illustrations, or are they junk wherever they occur? For example, is the v101 “word” {8am}, which is the most common “word” in the manuscript, just a filler, and not a meaningful “word”?
• There is some overlap in the vocabularies of the left and right “leaf words”. For example, the v101 “words” {s}, {8am} and {8an} occur frequently both as left and as right “leaf words”. This encourages us to think that they are not fragments of “words”: they are not evidence of a form of hyphenation.
• Nearly all of the common “leaf words” are common in the Voynich manuscript as a whole. Again, they do not look like fragments or parts of hyphenated “words”.
• Of the top ten “leaf words”, most are vastly more frequent as “leaf words” than in the manuscript as a whole. For example, the v101 “word” {89} accounts for 4.6 percent of all the left “leaf words”. To my mind, this encourages the inference that the “leaf words”, or some of them, are arbitrary fillers, or junk. We might expect a left “leaf word” to be a filler, as the text approaches the illustration and the available space contracts. But it is more surprising that the right “leaf words”, or some of them, although less constrained by space, should also seem to be junk.
One way to test this would be to remove all occurrences of, for example, {8am} from the transliteration; to recalculate the frequencies of the glyphs that remain; and to test whether the remaining common “words” can be mapped to real words in natural languages.
For the moment, I am not inclined to attempt such tests: if only because, as I have reported in other articles on this platform, the “word” {8am} seems capable of mapping to real words in several medieval natural languages that I have examined. For example, {8am} can be mapped to منع (“prevention”) in Arabic; or “and” in English; or “con” or “dio” in Italian.
Next steps
I think that the true test of the function of “leaf words” will come when, or if, it becomes possible to map whole lines of Voynich text. As noted above, I have had some encouraging results in mapping {8am} to real words in some natural languages. I envisage a subsequent step of attempting mappings of other common “words”: probably “words" of at least three glyphs, such as the v101 "words" {1oe}, {1oy} or {2c9). If these mappings yield real words in any language, we may have enough correspondence of Voynich glyphs to letters to attempt a whole line. At that point, it may become more clear whether the “leaf words” represent words or junk.
Published on July 13, 2024 02:20
•
Tags:
voynich
No comments have been added yet.
Great 20th century mysteries
In this platform on GoodReads/Amazon, I am assembling some of the backstories to my research for D. B. Cooper and Flight 305 (Schiffer Books, 2021), Mallory, Irvine, Everest: The Last Step But One (Pe
In this platform on GoodReads/Amazon, I am assembling some of the backstories to my research for D. B. Cooper and Flight 305 (Schiffer Books, 2021), Mallory, Irvine, Everest: The Last Step But One (Pen And Sword Books, April 2024), Voynich Reconsidered (Schiffer Books, August 2024), and D. B. Cooper and Flight 305 Revisited (Schiffer Books, coming in 2026),
These articles are also an expression of my gratitude to Schiffer and to Pen And Sword, for their investment in the design and production of these books.
Every word on this blog is written by me. Nothing is generated by so-called "artificial intelligence": which is certainly artificial but is not intelligence. ...more
These articles are also an expression of my gratitude to Schiffer and to Pen And Sword, for their investment in the design and production of these books.
Every word on this blog is written by me. Nothing is generated by so-called "artificial intelligence": which is certainly artificial but is not intelligence. ...more
- Robert H. Edwards's profile
- 67 followers
