Jump to ratings and reviews
Rate this book

Text and Corpus Analysis: Computer Assisted Studies of Language and Culture

Rate this book
This book provides detailed studies in one of the fastest growing areas of linguistics - corpus analysis - and shows how computers can be used to reveal culturally significant patterns of language use.

288 pages, Hardcover

First published May 1, 1996

1 person is currently reading
11 people want to read

About the author

Michael Stubbs

45 books2 followers

Ratings & Reviews

What do you think?
Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars
2 (66%)
4 stars
1 (33%)
3 stars
0 (0%)
2 stars
0 (0%)
1 star
0 (0%)
Displaying 1 of 1 review
Profile Image for Khari.
3,130 reviews77 followers
May 11, 2024
This is one of the few books where I completely disagree with the author's worldview and yet find the book fascinating, eye-opening, and relevant.

I discovered this book while mining the bibliography of an article on corpus linguistics I read who knows how long ago. I don't even remember which article it was, but the book ended up on my list of 'possibly relevant to doctoral research' books. I finally cracked it open this year and have thoroughly enjoyed the process of reading it.

Not because it was actually relevant to my research. It wasn't. Not in any concrete way. But, man, was it ever relevant in an abstract theoretical and philosophical way. This was one of the first books I've read in linguistics that actually delves into the philosophy of why it is important to study linguistics, and why it is important to study linguistics in specific ways. I don't know if my linguistics program was just bad, or if I just didn't read widely enough or what, but it's the first time that someone actually approached the problem of why and how we should study linguistics, instead of just accepting it as self-evidently valuable and moving on from there.

It's also the first book of linguistics I have read that delineated ways of studying linguistics that are of more and less value. It was remarkably refreshing. His main point is that language is produced by humans and to understand what it is doing we need to study what humans actually produce, unvarnished, untainted, unembellished. He means by this, firstly, that we need to study utterances that are produced to achieve the desires of the speakers rather than utterances that are produced to illustrate a linguistic point. Secondly, he means that we need to study utterances that are produced by a large number of speakers, rather than utterances that are produced by a small number of speakers.

This makes complete and utter sense to me. You can't study a phenomenon by creating that phenomenon to study, there's an inherent bias to that. I can't decide that I'm going to study future tense, write 100 sentences in future tense, and then study those 100 sentences to discern patterns of how future tense works. All I will come away with is how I believe future tense works, not necessarily how future tense is actually used by the majority of people. I'm a sample size of one, obviously the way I do things is not generalizable to the entire future-tense using population.

Even though this makes complete and utter sense, he is absolutely right when he says that most linguistics are based on such created data. He is even more correct when he says that even the linguistics that are not based on created data are based on a methodology that is not replicable. The data is gathered in small amounts from single contributors and then is analyzed through the lens of the researcher's intuition rather than through any objectively delineated methodology (see Judith Baxter's Positioning Gender in Discourse and Naoko Takemaru's Women in the Language and Society of Japan as excellent examples).

He thinks this is a problem. So do I. He's an empiricist. So am I. He studies large amounts of utterances that other people have said or written and examines what is there, instead of what he wishes were there. So do I. I think this is the only method of research that can yield results that are defendable and that stand the test of time. Granted, I didn't necessarily agree with all of his interpretations of the patterns he found, I think there could possibly be alternative explanations for how reporters framed the acts of violence during apartheid other than that they were inherently racist, but I can't argue with his data. It's there. It exists. Everything he uses is in the public domain and I can pull it up at my fingertips. He provides it in its entire context, because context is king.

Ha. That line amuses me greatly, because I doubt very seriously that Stubbs would qualify himself as a believing Christian, and yet he clearly owes an intellectual debt to biblical hermeneutics. He might even acknowledge it, since he acknowledged that corpus linguistics was born out of biblical concordancing, but I digress. Context is king, and we ignore it constantly. We grab sound bites and present two seconds as if it clearly encapsulates everything anyone could ever need to know about a situation, but it isn't true. "Stand up for the terrorist" means something quite differently than "Stand up for the terrorist defeating soldier."

It was not just in his philosophy of how to conduct research that I found him to be clear and refreshing, it was also in how he demarcated what it is that his research can do and what it can't do. Half of the published books on discourse analysis are about showing how the way people talk is the way they view the world and the reason behind all of the ills that beset a society. He also believes that narrative and ascribes to that worldview, but he is absolutely clear that it is just that, a belief, not something that he can prove objectively. "There is always a category shift when one moves from discussing forms of language to forms of thought. I have assumed that if collocations and fixed phrases are repeatedly used as unanalyzed units in media discussion and elsewhere, then it is very plausible that people will come to think about things in such terms." He just comes out and calls it what it is: an assumption. I respect that kind of intellectual honesty.

I also respect how he can state baldly things that are true that nevertheless most people don't have the guts to say. My personal favorite was that "Bibliographic references in academic articles are an elaborate convention for encoding hearsay evidence." Every time I read it, it makes me snicker. It reminds me of the introduction to Don Quijote, where Cervantes is poking fun at the convention of adding quotes from other authors in order to make your own work sound more respectable or imposing or authoritative. That is what they do. They don't necessarily have to contribute anything to the story, or to the research, they add gravitas. But why should they?

This is a legitimate question to ask, especially today when people like to reference drop in their online debates, seeming to believe that as long as they drop an article from someone who has letters after their name then they win. Then their opponent drops an article from a different person with letters after their name that says the exact opposite and the argument devolves even further into a debate over whose sources have more gravitas. This seems to be measured mostly in how many articles said author has published, but why should it? If their research methodology is not robust, then why should their results be considered so?

This is a major problem in the current day and age. We are publishing research at an unprecedented rate. But we are also watching research be fraudulent, withdrawn, and discredited at an unprecedented rate because we aren't even trying to replicate it anymore, or it is designed to be unreplicable. We have forgotten that just because something is published doesn't mean it is true. We have forgotten that being published once just means you have a once tested hypothesis, now other people need to repeat your study and see if that hypothesis holds true in other situations. That is the essence of science, and in our striving to be published we have forgotten that it is not the ultimate goal, the ultimate goal is discover something that is true.

This is where Stubbs and I collide solidly in worldview. He believes knowledge is produced. I believe knowledge is discovered. It exists. We may not have observed it yet, but it already exists. This is why quotes like this are possible "In retrospect, lists of collocates can seem very obvious to the native speaker; they seem intuitively right. But this is the wisdom of hindsight: native speakers are quite unable to document such collocates from intuition alone." That's right, someone had to observe it and then everyone understands their conclusions, but there was something to observe. The fact that as soon as someone states their conclusions and everyone is like 'Aha! That's right' shows that it is something that is discovered, pointed out, drawn attention to and then accepted and disseminated. It's not created. It's not produced. It's just...documented.

That is really the only quibble I have with this book. I might have had more quibbles if I could understand what he was talking about. The sections on modality and evidentiality, speech acts and levels of commitment made my eyes cross. I think I understood one out of every 15 words in that section, even with the examples. I'm not sure if the problem is that I'm inherently not a syntactician or if it's because I think the very idea of speech acts is laughable. Even if I were to know which of those is the problem, I don't know if it's a problem because the theories are laughable, or if it's because I don't understand them at all and thus dismiss them out of hand.

This book helped me lean a little more towards the side of thinking syntax is a laughable theory because of his argument that all language is probabilistic. I think he was before his time and has been demonstrated to be correct with the advent of large language model ais. Words coexist with each other, as soon as you choose one word, then the number of choices you have for the next word are limited. That's how the ais work. They predict each word based on probabilities, and each successive word is more probabilistically limited than the word that went before it. Interestingly enough, this is exactly why the statistics that are normally used to strengthen research claims are "inapplicable to linguistic data, since normal statistical assumptions of random occurrence don't apply to textual data". Linguistic utterances are not random, they are predictable, and they are predictable because words control what words come after them. It's mind blowing.
Displaying 1 of 1 review

Can't find what you're looking for?

Get help and learn more about the design.