Yong Huang's Blog: Learning Spanish and French Words Through Etymology and Mnemonics

February 27, 2021

Dual-gender French noun

Some French nouns can take either masculine or feminine gender, each having a different meaning. There's an obscure rule about the gender-meaning relationship for these words. I call it "masculine-concrete, feminine-abstract" rule. Take the following headwords from my Learning French Words book as examples,

mémoire memory (fem. n.); memo, memoir (masc. n.). The first therefore more common meaning is obviously more abstract, and so tends to be feminine.

critique criticism (fem. n.); critic (masc. or fem. n.); critical (adj.). Note that in the sense of “criticism”, the word is feminine. Obviously “criticism” is more abstract than “critic” (denoting a person).

mort dead; death (fem. n.); dead person (masc. n.); past participle of mourir (“to die”). Cognate with mortal (“subject to death”), mortify (“to humiliate”, but literally “to cause death”), mortuary. As a rule, the more abstract sense of a noun takes the feminine form and the concrete sense the masculine.

médecin (masc.) doctor, docteur, physician. médecine medicine (field of study). médicament medicine, medication. Note that the masculine word médecin can refer to a male or a female doctor, une femme médecin if needed to specify (just as the feminine word personne can refer to a person of either sex). The feminine word médecine does not refer to a female doctor, nor to medicine (as drug or medication), but to the field of study or medical science (see §3 of the Notes of this book for the tendency of a feminine noun to refer to an abstract concept). To refer to a female doctor, just say une médecin even though the word is masculine, grammatically. Note the spelling of the two words; the second vowel is e, while it is i in médicament. Also note médecin should not be confused with the unrelated French word physicien (“physicist”, not “physician”).

moyenne average (fem. n.); feminine singular of moyen (“average”, masc. adj.). At least as a mnemonic, remember the rule that abstract entities tend to be feminine; an entity, i.e. a noun here, is the feminine form of the masculine adjective.

Etymologist Auguste Brachet in his An Etymological Dictionary of the French Language remarked that “[in the case where a concrete substantive or noun took an abstract sense] the concrete substantive is often masculine, whereas the abstract was feminine”. This phenomenon was also observed by non-specialists such as Simone de Beauvoir, who in her The Second Sex (trans. H.M. Parshley) said “most abstract entities are feminine”.

This "masculine-concrete and feminine-abstract" rule does have exceptions, though. For example, masculine tourment means "torment", "torture". The feminine tourmente means "storm". But we can't say a storm is more abstract than torture. There's no good explanation for this semantic separation.

This rule can be traced to Latin before the words were inherited by Old French. But there is no good morpho-semantic justification in general. Nevertheless, this rule serves as a convenient mnemonic, which the reader of my book is reminded of wherever appropriate.
 •  0 comments  •  flag
Share on Twitter
Published on February 27, 2021 19:48

January 2, 2021

Frequency of various conjugated forms: persons and numbers

According to Google Ngram,
the frequencies of these 2-word sentences
I go,you go,he goes,she goes,we go,they go
at least since 2000 till 2019, are in this order (>> here means "much greater" and ~ "approximately equal"):
"you go" >> "I go" > "we go" >> "they go" > "it goes" > "he goes" > "she goes"
It's not surprising "you go" is high; it merges singular and plural second person forms. But "it goes", "he goes" and "she goes" are so low that even when they are combined the frequency is still lower than that of "we go". That surprises me.
Let's check "do" in place of "go":
The result is about the same, except for the first-person singular "I do", which is very high presumably due to the wedding vow.

Let's check the various forms of French "aller" ('to go'), including "on va":
The result is:
"je vais" >> "tu vas" ˜ "nous allons" ~ "il va" ~ "on va" ~ "vous allez" > "elle va" ~ "ils vont".
It's interesting to see "je vais" with a much higher frequency. And if we add up "il va" and "elle va", the combined frequency will be higher than "tu vas", but still much lower than that of "je vais".

Lastly, let's check Spanish "ir" ('to go'):
Probably due to a problem in data availability, I have to choose the time range of 1990-2007 instead of 1800-2019 or 2000-2019. Since Spanish is a pro-drop language, I omitted the pronoun so the input is a little simpler. The result is:
"va" >> "van" >> "voy" ~ "vamos" > "vas" > "vais".
I don't know why the third persons are very much more frequent, especially the singular. In any case, this is quite different from English and French, where second-person (singular and plural combined) and first-person singular, respectively are by far the most frequent.

Knowing these frequencies is helpful for foreign language education, maybe not so much in English, but in most other languages. For example, we all know we should not strive for memorizing all tenses and conjugated forms with equal effort at the beginning of the study. Prioritizing based on frequencies has practical implications. While tenses, moods and voices are intuitively ranked in frequency, persons and numbers are less so. With the above Google ngram result of frequencies for different persons and numbers, textbook and dictionary writers may choose to give more example sentences in the more frequent persons and numbers to optimize learners' language study.

The above work is relevant to my books. I started to add example sentences to my Learning French Words book, only for the first few hundred common words, or the words that are understood better with examples. No doubt I'll avoid uncommon tenses in these sentences and only use present indicative and occasionally imperfect tenses. As to persons and numbers, however, so far I have opted to give examples using third-person singular more than any other, because that's been my impression of the most frequent one. Not so, says the Google Ngram, in spite of its known defects. Anyway, I may change to first-person singular more often. And for Spanish, the third-person singular is indeed the most frequent. I haven't had time to update that Spanish book with example sentences yet, but I will. That's one of my projects for 2021.

* This message was originally posted at
 •  0 comments  •  flag
Share on Twitter
Published on January 02, 2021 18:16

November 8, 2020

Example headwords in "Learning French/Spanish Words" that mention heads of government

The following are some example headword entries in my Learning French Words Through Etymology and Mnemonics that mention heads of government.

putain (vulgar) whore, bitch, prostitute; (vulgar) fuck (interj.), bloody hell. Spanish puta (“prostitute”), a possible cognate, has entered English vocabulary. Also possibly cognate with putrid (“rotten”, “stinky”), putrefy (“to rot”), which can sure be used as mnemonics. This word is absolutely unrelated to Russian president, commonly Latinized as Putin, and the French transliteration of his name has to be manually adjusted to Poutine to avoid confusion as well as for phonological reasons (see www.nytimes.com/2005/04/03/magazine/p...). Also unrelated to poutine (a type of Canadian food). See also pute (“whore”).

tromper to deceive, to cheat; (reflexive) to be wrong. Cognate with trumpet. According to A. Brachet, an etymologist, this word means “properly to play the horn, alluding to quacks and mountebanks, who attracted the public by blowing a horn, and then cheated them into buying; thence to cheat”. You may also create a mnemonic if the word sounds like the name of a politician you don’t trust. To balance that joke, though, also learn the less common word berner, which has the same meaning.

gilet waistcoat, vest, sleeveless jacket. Arabic origin, but ultimately from Turkish. This word has entered English vocabulary. Etymologist Edward Pick believes the word is related to guile (“deceit”, “deceptiveness”, “astuteness”). Alternatively, use a mnemonic such as “He put on a waistcoat because it was chilly.” Or imagine an advertising poster featuring a handsome, cleanly-shaved young man wearing a sleeveless jacket holding a Gillette razor. In 2018, Mouvement des gilets jaunes (Yellow Vests Movement) in France, described by some as the most violent protest since 1968, called for President Macron’s resignation.

In the Learning Spanish Words Through Etymology and Mnemonics :

trampa trap (cognate), snare; cheating. Mary Trump in her book Too Much - Never Enough (Spanish edition) says, hace trampa como una forma de vida (“he cheats as a way of life”).
 •  0 comments  •  flag
Share on Twitter
Published on November 08, 2020 08:52

September 25, 2020

Michel de Montaigne and essai, ensayo, saggio

The French thinker, philosopher, and essayist Michel de Montaigne (1533-1592) inadvertently created the words essai in French, ensayo in Spanish, and saggio in Italian, in the sense of "essay". Montaigne "was one of the most significant philosophers of the French Renaissance, known for popularizing the essay as a literary genre" according to Wikipedia. His works are collected in a book titled Essais, which literally means "trials", "experiments", "attempts", "assays". Apparently he named his book of essays as such in his modesty as if he would not advise readers in an overly confident tone, even though most of his writings are considered gems in ruminating thoughts about life at the time and ever since. Due to the great influence of his essays, the word essai had acquired a new sense, "essay", or the literary genre or an article written in this genre. And this sense entered Spanish ensayo and Italian saggio, both of which now mean both "trial" and "essay".[note]

When the Middle French word essai entered English in the 16th century, it had both meanings, "trial" and "essay". The latter must have been due to Montaigne's influence, considering the fact that Francis Bacon wrote famous essays of his own more or less by modeling after Montaigne. Over the centuries, however, English essay has lost the first sense and now exclusively refers to the prose-like written composition, and sometimes is used interchangeably with the word prose.
[note] Italian saggio also happens to be another word spelled in this form, which means "wise" or "sage" (adjective) (cognate). That's irrelevant to this saggio here.
 •  0 comments  •  flag
Share on Twitter
Published on September 25, 2020 07:36

September 21, 2020

Spanish mozo and Moses

Spanish mozo means "boy", "young man", and in some Latin American countries, "waiter". Its etymology is uncertain. Some connect it to muchacho (“boy”). But none is convincing.

Well, I can propose one that is equally unconvincing, or rather, with probably about the same degree of convincing power. I propose its connection to Moses, the biblical figure. Wiktionary states that the word Moses is "[f]rom Latin Mōsēs, Mōȳsēs, from Ancient Greek Μωϋσῆς (Mōüsês), from Biblical Hebrew מֹשֶׁה‎ (mōše). Further etymology is unclear, but it may have come from Ancient Egyptian." This Egyptian origin is further explained by Wikipedia on Moses:

--- begin quote ---
An Egyptian root msy ('child of') has been considered as a possible etymology, arguably an abbreviation of a theophoric name, as for example in Egyptian names like Thutmoses ('child of Thoth') and Ramesses ('child of Ra'),[18] with the god's name omitted.
--- end quote ---

where reference [18] points to Christopher B. Hays Hidden Riches: A Sourcebook for the Comparative Study of the Hebrew Bible and Ancient Near East , which on p.116, after dismissing the folk etymology, reads, "'Moses' is derived from the common element in names such as Thutmosis (‘Thoth created him’)".

Semantically, a child is not far from a boy (by the way, moza means "girl"). In terms of pronunciation, the Egyptian word or word suffix is remarkably close to English Moses or Spanish Moisés.

But we don't have solid historical linguistic evidence to prove this connection. To be on the safe side, in my revised edition of Learning Spanish Words, I write "For lack of definitive proof, we can at least take Moses as a mnemonic for mozo and imagine him as a young boy or young man."
 •  0 comments  •  flag
Share on Twitter
Published on September 21, 2020 21:17

August 26, 2020

My daily study

Humans are not machines. Only in the rich context can we learn a language, or even just the vocabulary. Except for some words, we can't fully learn a word unless we encounter it in a meaningful passage or article. Etymology or mnemonics for the learners that like to analyze words, and flashcards for those that don't, help us remember the words. But only the rich context in which the words occur helps us grasp the nuances and solidify our memory of the words.

Every day, I spend some time reading a few news feeds in various languages, and readers' comments if the news sounds interesting. For some years, I'm a subscriber to these feeds on Facebook:

le Figaro (in French)
el País (in Spanish)
der Spiegel (in German)
Corriere della sera (in Italian)

The Facebook website or cell phone app has a problem that you can't copy the text on screen. So you can't use a translator such as Google Translate to translate just the text you want. To solve this problem, I found and installed one specific version of a web browser, Opera Mini Version 14 for Android. The version has to be 14, not 12, not 16 (there's no odd-number version). And I have to set "Don't auto-update apps" in Google Play Store. Then I can copy the text I'd like to check and click the popup bubble of Google Translate. New words or the words I'm rusty on are looked up on Wiktionary, and the browser tabs for these web pages are left in there for one or a few days, depending on whether the words are committed to my memory. When I read a Wiktionary page, I always read its Etymology section, sometimes that on its alternative language pages.

Reading the news feeds, or readers' comments, or occasionally full articles, is one way to learn new words or expressions and review what I already learned. When I see a new word, or a word I barely remember, I always check its etymology to help me remember it. I do this study multiple times a day, a few to 10, at most 20 minutes, at a time. This strategy fits my need very well as it doesn't put pressure on my study. After all, I have a full time job. Learning languages is just my hobby.

Occasionally, I read articles on other web sites, such as Nonfiction.fr (Le quotidien des liveres et des idées), Pijamasurf.com (Noticias e Información alternativa), or language learning online forums such as forum.wordreference.com (mostly discussions about vocabulary), Language-Learners.org (language in general), and Facebook Linguistics group, Polyglots group.
 •  0 comments  •  flag
Share on Twitter
Published on August 26, 2020 22:05

July 5, 2020

Google Translate shows word frequency

I don't know when Google Translate started to have this feature. But I recently noticed that the translated words are marked with 1 to 3 bars indicating "how often a translation appears in public documents". Take French word baie as an example. It's most frequently translated to English as bay, less as berry, least as bight, marked with 3, 2, 1 bar, respectively. (The word bight means "a curve or recess in a coastline, river, or other geographical feature" according to Google.) That matches our expectation. Of course baie is really one form of two distinct words (two lemmas). But that doesn't matter.

In my Learning French Words , the entry for reconnaître is

reconnaître to recognize (both “to recognize a person or thing seen before” and “to acknowledge existence or contribution”) (cognate); to admit, to concede as true, to acknowledge; to reconnoiter, to do reconnaissance. ...

From my reading, I sense that the primary meaning is "to recognize", followed by "to admit", and lastly, "to do reconnaissance". Google Translate lists the meanings "recognize" and "acknowledge" with 3 bars, "admit" and "know" with 2 bars, while "spy" (i.e. "to do reconnaissance") is given 1 bar. I don't know how exactly Google makes use of technology to rank frequency of meanings of a translated word, but it's gratifying to find my ranking matches theirs.

In the following weeks and months, I plan to review my French and Spanish books against this new feature of Google Translate so that the meanings of each headword are correctly listed according to their frequency. (For instance, I thought the primary sense of distraire was "to entertain". But Google Translate lists "to distract" first. So I need to switch the two meanings.) While almost all dictionaries in the world list the literal meaning before a derived meaning for a word (e.g. for Spanish denunciar: "to denounce"; "to report"), I think a more useful method for a foreign language learner is giving the more common meaning first (denunciar: "to report"; "to denounce").

 •  0 comments  •  flag
Share on Twitter
Published on July 05, 2020 21:34

June 23, 2020

Word frequency

Both of my books
Learning Spanish Words Through Etymology and Mnemonics
Learning French Words Through Etymology and Mnemonics
order the headwords in word usage frequency. While the frequency of word occurrences in a corpus (collection of written texts) is a simple concept, there are interesting issues on this topic. For example, (1) should various conjugated forms of verbs and declined forms of nouns be included or only the lemmas (canonical forms i.e. infinitives of verbs, singular nominative masculine nouns, etc.) be included? (2) What are the advantages and disadvantages of the sources of mostly books vs. movie subtitles? (3) How is the frequency found?

Learning Spanish Words goes by the frequency list of RAE, Real academia española (Royal Spanish Academy). There's no doubt about the authority of this prestigious institution. Unfortunately, their frequency list includes all conjugations and plurals and my book suffers from ambiguity in frequency ranking after lemmatization (finding the lemma from an inflected word form). As I went on with writing my book, I slowly came to realize such difficulty. For example, I'm pretty sure queda (conjugated form of quedar "to remain"; "curfew") should not be included as a separate headword for the meaning of "curfew" and be given a high frequency in my book. It was a mistake and my planned revision of the book will delete it or move it to a much later page with a much lower frequency.

My Learning French Words suffers from a problem related to question (2). It uses the Lexique frequency list and follows its freqlemlivres order, i.e. la fréquence du lemme selon le corpus de livres (lemma frequency according to the corpus of books). A few months after I started on this book project, I posted a question to their web forum, asking why some words appear in frequency positions quite different from our common sense dictates. The forum moderator and probably one of the owners of Lexique told me freqlemlivres is not as good as freqlemfilms (lemma frequency according movie subtitles), which he recommends. (The web forum has been decommissioned and old messages are gone, even from archive.org, so I can't reference his words.) If I could start over, I might re-order according to freqlemfilms. After all, books mostly record written form. To capture both written and oral language, movie subtitles serve as a better source.

Now let's consider question (3). The traditional way to get word frequency is to get a large number of books, magazines and newspapers, movie and theater scripts, record the number of occurrences for each word and sort them. Well known lists that fall into this category are the Wiktionary lists (this or this), various frequency dictionaries on Amazon. But nowadays there are other ways. For example, many years ago I did something probably none in the world had attempted and will attempt: submit each word to Google or other search websites such as Yahoo or Baidu, record the approximate hit counts given by the website, and sort on the counts for the words. (See Word Usage Frequency and Chinese Character Usage Frequency.) The frequency values are implicitly given by the search sites. My script simply collects them. It reflects the word frequency on the Internet, or rather, the portion of the Internet indexed by the search engine.

And yet there is one more way to create a frequency list. Linguee is "an online bilingual concordance", says Wikipedia. But hidden in their web pages is a frequency list for various languages. You just have to go to a URL in this format, www.linguee.com/language-english/toplanguage/start#-end#.html to see the list for language, for the words in the start# to end# frequency range. For example, https://www.linguee.com/spanish-english/topspanish/1-200.html shows the first 200 most frequent (start# 1) Spanish words. Clicking the word gives you the dictionary entry for that word, unless the range spans more than 1000. What's new to this list? The webpage says "Most common Spanish queries, 1 to 200" (my bold text). It means these words are the most searched by Linguee website visitors, not the most frequently occurring in a large corpus. This is a great innovation or enhancement to the traditional occurrence-based frequency lists because language learners do not make the same amount of effort to study a group of words that have about the same occurrence-based frequency; of these words some are definitely harder to grasp in terms of understanding and usage than the others. A search frequency-based list adjusts or modifies the traditional frequency according to such varying difficulty.

I'm considering revising or re-writing my Learning Spanish Words book. I may opt for the Linguee frequency list due to its practical considerations, and as a bonus, the list contains frequently used short phrases and abbreviations, and almost all words in the list are clean lemmas.
 •  0 comments  •  flag
Share on Twitter
Published on June 23, 2020 20:11

May 27, 2020

"La Covid" vs. "Le Covid"

People whose native language is not gendered learning a gendered language almost always complain about the superfluous nature of the gender system. Grammatical gender offers no additional information in a sentence, except in contrived cases. One French teacher says "If the French language was 'redesigned' to make it easier for the student, we would certainly get rid of the gender system." He goes on to say "But this will not happen, so you need to learn it unless you want to sound like a child." Well, we have to bear with it, no matter how much confusion it may cause, even among native speakers in case of a new word whose gender is not set in stone. Here's an example. According to the Guardian, 'La Covid': coronavirus acronym is feminine, Académie Française says. When someone posted this news to the Facebook Linguistics group, there was heated discussion. Here are some excerpts of the comments, pro and con regarding Académie Française's decision:

Pro (saying la Covid):
* COVID-19 refers to disease caused by the virus and not the virus itself, COVID-19 being an acronym for "coronavirus disease 2019", which translates to "maladie à coronavirus 2019". We use "la" because of the feminine "maladie", followed by the english acronym "COVID-19". La COVID-19; le coronavirus; le SARS-CoV-2.
* In Quebec, we've been calling it la COVID for 6 weeks already. L'Académie needs to join the internet age.
* the Academy is right. In French, as we know, the word "virus" is masculine (hence "coronavirus" is masculine), but "disease" is feminine. Since "Covid" stands for coronavirus disease, it should be "La Covid" (La maladie à coronavirus) in French.

Con (saying le Covid):
* This doesn't make sense, we've all been saying 'le covid' ever since it began.
* "I’m French and we say “Le Covid” masculine even though the Académie Française may have decided otherwise.
* In spanish, according with the Real Academia Española, you can say "el covid" (le covid) or "la covid" (la covid). Learn from us, french people.
* Covid, no final "e" => masculine. Plus, the street says "le Covid".

And my opinion:

I always believe that the correct usage of a technical term should be determined by the specialists in that field, and that of a common word by the common people. The latter is due to the predominant trend of descriptivism over prescriptivism in the 21st century. The word "COVID" (or spelled "Covid" in some languages) should be considered technical. We should check the usage among the medical professionals and take the most common usage as the standard. If lexicographers think this is more like a common word instead, it's fine to follow the general public. In either case, Académie Française should not go beyond just a recommendation. As to this specific example, for now, I would say "le Covid" because a Google search for this exact string with the search modifier site:fr to limit to .fr country domains returns 72 million hits, while "la Covid" returns only 3.6 million.
 •  1 comment  •  flag
Share on Twitter
Published on May 27, 2020 21:07

April 25, 2020

politics, politician, policy

In English, the three words politics, politician, policy are all spelled differently. But in a few other European languages I can more or less read, the words for "politics" and "policy" have the same form (are spelled the same). The following is from Google Translate (conveniently aggregated by my Multi-Language Translator)

English: politics, politician, policy
French: politique, politicien, politique
Spanish: política, político, política
German: Politik, Politiker, Politik
Italian: politica, politico, politica

There is one thing in common: All the languages other than English use the feminine form for "politics" and "policy" and masculine for "politician". Any explanation? In the Introduction of my Learning French Words, I wrote

--- begin quote ---
A. Brachet in his Etym. Dict. French Lang. remarked that “[in the case a concrete substantive or noun took an abstract sense] the concrete substantive is often masculine, whereas the abstract was feminine”. This was also observed by non-specialists such as Simone de Beauvoir (“most abstract entities are feminine”, The Second Sex, trans. H.M. Parshley). Lack of morpho-semantic justification notwithstanding, that observation serves as a convenient mnemonic, which the reader will be reminded of wherever appropriate in this book.
--- end quote ---

It's obvious that the concept of politics or a policy is more abstract than that of a politician, which is some entity in space you can point to. With this general knowledge, you can easily remember, for instance, that French critique means “criticism” as a feminine noun, but “critic” as a masculine noun, not the other way around.

By the way, since a politician is a human, when you refer to a female politician, a different form of the word may have to be used, such as política in Spanish and politica in Italian (but Politikerin in German and the same form politicien in French). But these are not to be confused with the abstract nouns meaning "politics" or "policy". Also note that if you need to distinguish between "politics" and "policy", you can use the context as a clue and possibly check for the noun's countability; politics is uncountable but you can say one policy, two policies.
 •  0 comments  •  flag
Share on Twitter
Published on April 25, 2020 21:12

Learning Spanish and French Words Through Etymology and Mnemonics

Yong Huang
(1) Small corrections and updates to the published book, Learning Spanish Words Through Etymology and Mnemonics
(2) Miscellaneous notes about the unpublished book, Learning French Words Through Etymolo
Follow Yong Huang's blog with rss.