Brain Science Podcast discussion
Off Topic
>
Theory of Reading and a great example
date
newest »


http://www.trainingzone.co.uk/anyansw...
I will now try to explain why we can read the above.
In his 1996 book, David Caplan explained how experiments in Cognitive Psychology have shown that there exist 8 dictionaries in the brain, that can be classified into a taxonomy that branches between speech and text, then between input and output, then between whole-word or part-word (phoneme for speech, grapheme for text). This work was the basis of the shallow and deep distinction between different variants of dyslexia, and it found its way into undergraduate texts by Eysenck on Cognitive Psychology.
In the case of text input, the data originally comes from the hypercolumns in the striate cortex at the back of the brain, where short lines or “edges” at different angles are extracted from the image coming from the eye.
The whole-word-text-input dictionary matches all the edges in a single word with the vocabulary that it contains, and finds the closest matching known word. It does this with a lot of neurons working in parallel so word-recognition is very fast. It seems likely that the white edges between letters form edges that are also recognized. (Think of the optical illusion with the vase and the two women looking at each other where foreground and background can be easily interchanged. )
Since we have to recognise different fonts, or handwriting styles in which some letters may be pushed down below the main line of the word, or even be suppressed completely, recognition is always a statistical process that can be characterised by a weighting indicating how good the match is.
The part-word-text-input dictionary works more slowly, and matches at one time only a single letter or possibly two or three letters that our brains have learnt do often occur frequently together. So treating the whole of a word is a serial process. (Artificial Intelligence programmers can mimic this sort of word look-up in a trie data structure.) This serial matching seems to be going on in parallel with the operation of the whole-word-text-input dictionary, but if this returns a high weighting the part-word results are discarded before they are complete. Otherwise, reading becomes slower as the results from the two dictionaries have to be compared and a consensus reached by some kind of voting strategy.
The funny text message above is a bit of a cheat. Most of the letter substitutions maintain a similar set of edges although these have been shuffled around a bit.
5 --> S
1 --> I
4 --> A
In this last case, on both sides of the substitution, there is a horizontal bar, a diagonal at the left, and on the right there is vertical edge in ‘4’, but a similarly-sized and positioned diagonal in ‘A’. Screw up your eyes a bit, and both sides look very similar.
The hardest substitution to make is
3 --> E
but even here the curves at the top and bottom of ‘3’ are fairly similar to the bars at the top and bottom of ‘E’, and ‘3’ even contains what is close to a vertical edge, although in ‘E’ the matching vertical edge is shifted to the left. And since E is the commonest letter in English, that match already has a high prior probability of being selected. We can fancify this up with Beyesian statistics, but another way of looking at it is that a trained Neural Net will automatically tolerate a wider variation in the formation of ‘E’ than for other letters, since its training set will present ‘E’ more frequently. For a subject who reads a lot of handwriting, the shape of that ‘E’ will vary a lot depending on its surrounding letters. So later processing levels will tend to prefer ‘E’ to less frequent false matches (and no words in our dictionary contain digits).
Experiments in AI with artificial Neural Nets show that they will quite automatically learn to extract differing metrics from a presented instance. In this case, the number of edges is the same (if the striate cortex presents alternative edge interpretations of what are slight curves), giving a high weighting, but their relative positioning forms a second metric where the match is not so good. That suggests that the whole-word dictionary will produce a reasonably good match even where ‘3’ is substituted for ‘E’.
The part-word-text-input dictionary is more accurate and less tolerant since as you descend through the trie that encodes the whole vocabulary, only a small number of potential letters are admissible at any point. For example, “be” cannot be followed by any of the following {i, j, k, o, p, u, x, y, z} in a person with a normal English vocabulary (unless they like football or pop-music, in the case of ‘x’ or ‘y’). If a letter is incorrectly identified, then the chance that the next letter just happens to be correct becomes very small indeed.
So part-word recognition is more reliable than whole-word recognition, over the whole word. Another mechanism that comes into play is sub-vocalisation, where we move our lips and tongues to match the letters as we work through a word. Since we have a part-word-speech-input dictionary, this gives a second method of checking. Although we might confuse a “h” with a “b” based on its edges, the way we sound the two letters is completely different. It therefore probably jars with you that I did not write “an ‘h’”. I bet you didn’t remember sub-vocalising, in which case it was a subconscious process.
Finally, Cognitive Psychology teaches us that there is a lot of “top-down processing” where we interpret things according to what we expect. “Priming” is an example of this, and this can be implemented here by a dictionary of frequent consecutive words or “collocations”. In this example, “without even” would tend to make us more likely to recognize “even” once we had seen “without”. In fact, the only words I can think of that come frequently after “without” are {even, a, the, risk, damage, danger, loss}. At a higher linguistic level , where we allow for at least some degree of parsing, if “without even” is followed by a verb, it must end in “-ing”. Clearly we are parsing as we read, since that is an essential step in understanding the semantic meaning of a sentence.
I like the multi-level model of language that Jackendoff presents in his 2003 book. I haven’t yet read his 2010 book and I really should make time for that.
Sorry if I have bored you. But the funny text seemed so rich in unexplained phenomena that I haven’t been able to stop thinking about it.
John wrote: "A friend who is a retired General Practioner sent this to me. I have my own theories of why we can read it, which I will add later. Other people may have other theories. I think it reveals a lot ab..."
There is a pretty convincing discussion of this sort of things in Reading in the Brain, by Stanislas Dehaene. I don't remember the details but it was more straight forward, and didn't involve the brain containing 8 dictionaries.
One of the themes of Dehaene's book is that reading makes use of the parts of the brain non-readers use to recognize objects, which is why many letters of successful alphabets resemble common objects. The other idea is that since once we learn to read we see whole words, we tend to ignore wrong letters/numbers and "see" what we expect. A similar phenomenum is observed when listening to spoken words.
There is a pretty convincing discussion of this sort of things in Reading in the Brain, by Stanislas Dehaene. I don't remember the details but it was more straight forward, and didn't involve the brain containing 8 dictionaries.
One of the themes of Dehaene's book is that reading makes use of the parts of the brain non-readers use to recognize objects, which is why many letters of successful alphabets resemble common objects. The other idea is that since once we learn to read we see whole words, we tend to ignore wrong letters/numbers and "see" what we expect. A similar phenomenum is observed when listening to spoken words.

Thanks. I have ordered Dehaene. I do buy into Caplan's model, because of personal experience. Eighteen years ago a cousin had an operation to remove a brain tumour, and he lost some of his language areas. He could not read, but when he could remember peoples' addresses for Christmas cards, if he kept his eyes shut he could write these out.
If he opened them, his writing just collapsed. (Two years with a Speech Therapist brought his reading back again). Caplan describes these sorts of cases. The explanation is that a full-word-text-output dictionary is used to do writing, but that the brain monitors this using its two input text dictionaries, one for whole words, and one for part-words. Where these have been lesioned, the feedback is faulty and the writing breaks down.
This sort of lesioning evidence, and that derived from people reading from computer screens where programs modified the letters, were the basis of the 8-dictionary model that Caplan described.
It is surprising to me that Caplan's work was not referenced in "Proust and the Squid", and I have not done a full literature survey based on citations. So there could have been later work invalidating it, but the arguments that Caplan presented seemed very convincing to me.
John

I am part-way through Dehaene, and it is very good indeed. From the dates of his references, he bases the book on the same research that Caplan used. He has slight differences in emphasis, but he does refer in passing to the grapheme-sequence versus whole-word models of reading. He indicates that a number of competing models attempt to explain the experimental data, but he emphasises the contrast between phonetic and whole-word recognition techniques.
I still have lots more to read. Really good stuff!


I suspect that the adaptation is linked to our toleration of different handwriting styles. With type-written text we can set the threshold demanded of our matching circuitry very high, and still get lots of positive word identifications from our internal vocabulary. With this sort of text on the other hand, we have to accept a word-match with a lot lower coefficient of matching.
Of course the words have been carefully chosen so that we are not bothered by multiple fairly close matches.
And I think we then look-up word-pairs in high-frequency collocation lists, and that is where our previous low-weighting choices have their weightings raised.
Eg. serves to
can do
thinking about
There also seems to be some classification of word by part of speech, since the following are high-frequency collocations:
to {verb-stem}
can {verb-stem}
We also seem to find lists of phrases with repetitive alliteration very compelling:
amazing things
impressive things
I am reminded of
This is the dog that worried the cat
That killed the rat that ate the malt
That lay in the house that Jack built.
People who used to read a lot of handwriting, like teachers, would be expected to find relaxing their thresholds much easier than a new generation raised on a handfull of Windows fonts.
My sources are various cognitive psychology books, Caplan, Dehaene, several of Pinker's books, and for collocation lists Jurafsky and Martin.
I find it quite amazing that we can do such recognition over a vocabulary of maybe 100,000 words, and for some people over 2 or 3 languages.

Reading seems to be completely non-linear even though it appears seamless to the conscious mind. Would you say reading is generally recursive? There are tons of comparisons and calcuations made to past words as each new word is seen. Plus you have a whole other system that takes in the big picture to actually internalize what is being said and think about it.

"The horse raced past the barn fell" is a popular one, which I found personally found very hard to parse using conventional recursive models of English grammar.
"The horse raced past the barn door" is on the other hand very easy, and so is
"The taxi booked by John is outside the house".
The collocation "barn fell" is very low frequency, compared to "barn door".
The collocation {horse, race}ACTIVE covers
"horce races"
"horces race"
"horse racing"
"horse raced"
but the collocation {taxi, book}PASSIVE covers
"taxi booked"
"taxi bookings"
Most sentences are active, of the form:
Agent Verb Patient in semantics or
Subject Verb Object in parts-of-speech
Once you use collocations to group words, a Subject/Object group always has a head at the right-hand end which is either a noun or an -ing verb. The thing between your two noun-phrases is always the verb group. A passive sentence always has a main -ed verb preceded by a sequence of modal and auxilliary verbs: eg.
"could have been booked"
Illegal sequences, like "been have could booked" just never occur, and are meaningless, so we don't have to check for them.
With this sort of approach, you avoid an excessive number of grammar rules, compared to recursive grammar rule systems which end up with tens of thousands of rules, and take a long time to parse.
It also makes a semantic interpretation of a sentence available at a very early stage, so we can check our statistical assumptions against internal real-world ontologies.
In "The hippo raced past the barn door",
even though we almost certainly do not assign high frequency to the collocation
{hippo race}ACTIVE,
we must be consulting an ontology that quickly tells us that "hippo" and "horse" share a common ancestor so that {horse, race}ACTIVE implies that
{hippo, race}ACTIVE is also high frequency.
Linking pronouns to their earlier references is a fairly easy linear algorithm which Jurafsky and Martin explain pretty clearly.
Once you have the main verb of each sentence. producing a semantic interpretation is computationally easy, eg.
IS(taxi booked by John, outside the house)
I think this is the model that Jackendoff is moving towards, where word morphology guides early parsing, and there is early reference to our real-world ontologies.
TH15 M3554G3
53RV35 TO PR0V3 H0W 0UR M1ND5
C4N D0 4M4Z1NG TH1NG5!
1MPR3551V3 TH1NG5! 1N TH3
B3G1NN1NG 1T WA5 H4RD BUT NOW,
0N TH15 L1N3 YOUR M1ND 1S R34D1NG 1T
4UT0M4T1C4LLY W1TH 0UT 3V3N
TH1NK1NG 4B0UT 1T, B3 PROUD!
0NLY C3RT41N P30PL3
C4N R3AD TH15.