Jump to ratings and reviews
Rate this book

Very Short Introductions #539

Big Data: A Very Short Introduction

Rate this book
Since long before computers were even thought of, data has been collected and organized by diverse cultures across the world. Once access to the Internet became a reality for large swathes of the world's population, the amount of data generated each day became huge, and continues to grow exponentially. It includes all our uploaded documents, video, and photos, all our social media traffic, our online shopping, even the GPS data from our cars.

"Big Data" represents a qualitative change, not simply a quantitative one. The term refers both to the new technologies involved, and to the way it can be used by business and government. Dawn E. Holmes uses a variety of case studies to explain how data is stored, analyzed, and exploited by a variety of bodies from big companies to organizations concerned with disease control. Big data is transforming the way businesses operate, and the way medical research can be carried out. At the same time, it raises important ethical issues; Holmes discusses cases such as the Snowden affair, data security, and domestic smart devices which can be hijacked by hackers.

ABOUT THE SERIES: The Very Short Introductions series from Oxford University Press contains hundreds of titles in almost every subject area. These pocket-sized books are the perfect way to get ahead in a new subject quickly. Our expert authors combine facts, analysis, perspective, new ideas, and enthusiasm to make interesting and challenging topics highly readable.

125 pages, Paperback

First published January 30, 2018

68 people are currently reading
614 people want to read

About the author

Dawn E. Holmes

21 books2 followers

Ratings & Reviews

What do you think?
Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars
45 (18%)
4 stars
96 (39%)
3 stars
79 (32%)
2 stars
26 (10%)
1 star
0 (0%)
Displaying 1 - 23 of 23 reviews
Profile Image for Emily B.
491 reviews526 followers
August 8, 2022
(Read for my level three data citizen apprenticeship)

A helpful and informative introduction.
Profile Image for Paula.
27 reviews
March 17, 2023
sehr gutes Buch, bis auf die Stelle an der sie meinte, dass die meisten data scientist nachm Studium nicht genug wissen (aber n stern wieder drauf dafür, dass sie offenbart hat, wo der Mangel liegt!)
Profile Image for Declan Waters.
551 reviews4 followers
September 14, 2019
Big Data is a relatively new term, and whilst I had a basic understanding of it, I didn't fully understand how it differed from traditional statistics. This book attempts to explain this. As I have a background in mathematics I found this book very easy to follow, although I'm not sure all readers would feel the same. The author does reduce the maths to an easy level so most should be okay though.

Looking at how the big tech companies (Amazon, Ebay, Netflix, Google, Facebook, Twitter) approach the information they hold is a huge subject, but this book gives a good overview to 'how' they do it although some is (informed) speculation because the companies don't release their algorithms. An interesting introduction to the subject.
Profile Image for Madara.
76 reviews
April 16, 2021
Again, not what I was looking for but then my need for information in a specific area shouldn't detract from the book. It was a basic guide for anyone to read about all of big data, from history to future. If you are looking for an overview of application in certain areas, this is not the book for you. But if you want to get an introduction to all of it, yes, this is fine.
Profile Image for Estrella.
531 reviews3 followers
November 3, 2022
Un libro bastante interesante para introducirse en el amplio mundo del big data o los macrodatos, la bibliografía del final te ofrece libros con los que poder continuar aprendiendo sobre este tema.
Los capítulos más interesantes para mí han sido los tres últimos.
Profile Image for Aron.
144 reviews24 followers
February 7, 2021
This book suffers from several flaws:

1. It takes a fascinating topic and makes it boring
2. It’s sloppy in the use of technical terminology and ideas e.g. after providing an inaccurate definition of “phishing”, the author states: “But the main problem facing big data [as opposed to things like phishing] is that of hacking” Phishing IS a type of hacking, which is a general term for security attacks. or e.g. RFIDs are not yet ready for tracking to the extent claimed in the book, even four years later
3. It only briefly discusses the vast and important topic of machine learning and its relationship to big data
4. It spends way too much time on IT topics not specifically related to big data (e.g. Snowden and Assange)
5. It’s discussion of Facebook and Google advertising only briefly mentions some of the burning issues around privacy. The fact that this book is 4 years old is no excuse, because criticisms discussed in general media today, were actively discussed for years in more technical circles. And isn’t the idea of these books to bring expert knowledge to a wider audience?

Tl;dr this book is a terrible introduction to big data and another miss in the VSI series
Profile Image for Freso :watermelon:.
23 reviews10 followers
April 21, 2021
As is the whole point of the A Very Short Introduction series, this books gives a very superficial and rudimentary introduction to the topic, here “Big Data”, in layman’s language.

I feel like this book hit the nail on the head well, but given the nature of the subject, some things already felt every so slightly dated, even with the book only being 4 years old.

I don’t feel like I learned much new stuff myself, but I have also been dabbling in surrounding topics for a while. I do appreciate that it spends time discussing the ethical implications of all of this, and also spends considerable time talking about security concerns – both of which far too quickly get swept away by the buzzword hype in presentation.

All in all, I feel like this would be an excellent short primer to someone completely new to the subject, and some parts of it may even be a good refresher to people who are not. :)
Profile Image for Clementine.
695 reviews13 followers
March 10, 2024
I thought this would take more of a critical data studies approach and it's definitely written from a more technical and less theoretical perspective. That's my bad! I thought the first ~half of the book, which focused solely on the technical aspects of big data (what does it mean for data to be "big", what uses does it have, how is it stored and processed) was fine, though the presentation was dry. It was in the analytical chapters that the shortcomings become really apparent.

I will situate myself: I am by no means an expert in this topic, which is why I wanted to read this book! However, I've read a decent amount about data and the internet, and I worked for a year and a half on a research project about data breaches, done from a critical data studies perspective. I didn't come into this book with nothing, but my knowledge has been acquired haphazardly, so I suppose I was looking for an introductory text that would consolidate the most salient information in a more organized format.

Okay, anyway. To my slightly trained eye, Holmes approaches the topic of big data with very little criticality and essentially treats the creep of big data into virtually every sector as mostly neutral, with the possibility for it to be exploited by bad actors. Some examples of things I had issue with, by no means exhaustive:

"Variable selection, the process by which the most appropriate predictors are chosen, always presents a challenging problem and so is done algorithmically to avoid bias." I mean, one of the most basic concepts in science and technology studies is that algorithms cannot avoid bias because they are created by humans. There is so much research on this!!! For example, Simone Browne's Dark Matters; Safiya Umoja Noble's Algorithms of Oppression. Okay, those books came out after this one, but they were drawing on much older case studies, and people have been writing about algorithmic bias for a long time. It's just an incorrect statement, and as soon as I read it I knew that Holmes' analytical ability on this topic was not going to be great.

"The information gleaned [from health trackers] can then be uploaded onto our PCs and records kept privately or, as is sometimes the case, shared voluntarily with employers." There's a lot of work problematizing health trackers and especially the minefield of personal medical information shared with employers. On the most pedantic level, Holmes is right that employers can't compel employees to share their data, but this is strongly incentivized and the employer/employee relationship is a coercive one by default. Just a total lack of criticality here!

On online shopping: "The disadvantages to the customer are few." I suppose when we are looking solely at the surface experience of online shopping, yes, it is a pretty frictionless experience with a lot of conveniences from the consumer perspective. The glaring disadvantage, which Holmes fails to mention, is the collection of purchasing data and browsing history used to invasively commodify everything we do. The entire section on online shopping just reads as a manual for businesses on how to implement e-commerce techniques in the most advantageous way. There's no examination of how the profit-driven internet has foreclosed so many of its early promises and led to companies surveilling everything we do online in the hopes of selling us something. (And, yes, using Goodreads is giving Jeff Bezos data...) The discussion of using big data in medical fields is similarly not problematized or critiqued at all. I think many people who know nothing about big data would be able to name some of the basic concerns here, but Holmes is very gung-ho about algorithms being used in medicine to save money!!!

The main takeaway Holmes seems to get from the Snowden case is that the NSA should have better security so that a single person can't leak this type (or volume) of data. I guess... to make it easier to expand the surveillance state without anyone knowing? For me, this case study is an interesting way to explore how big data makes state-sanctioned privacy violations possible, but Holmes avoids that topic.

And, while writing of a related topic, Holmes misgenders Chelsea Manning and uses her deadname twice in one paragraph before acknowledging that she is "now known as Chelsea Manning" and "received treatment for gender dysphoria". Not exactly best practices for discussing trans people. Obviously, this isn't a critique of how the concept of big data is treated, but it's just so unnecessary?!

In general, the discussions of privacy are incredibly surface level and mostly contained to the threat of hacking. There is not really any acknowledgment that "legitimate" (legal) uses of data also have huge ethical concerns related to privacy and digital sovereignty. In general, a lot of CDS scholars suggest that the vast amount of digital data generated is fundamentally ungovernable and that rather than focusing on cases of hacking, leaks, or breaches as exceptional we should understand that this data is unstable so that we can make more informed choices as digital citizens. (And then, of course, there's the fact that we generate huge volumes of data unknown to us, a topic that is avoided as well.)

Holmes is a statistician, so while she can explain some of the technical processes accurately, she really falters in providing a critical approach to some of the big questions surrounding ethics and privacy. I think this book would have been in better hands with a critical data studies scholar who could more appropriately speak to these topics.

24 reviews
January 3, 2020
Perhaps I might have enjoyed this book more if the contents were less familiar. However, one useful nugget found was a section on why Google Flu Trends failed.
Profile Image for Saulo.
15 reviews
December 26, 2020
The book accomplishes its purposes. It is a very good introduction to the subject.
Profile Image for Sebastian.
192 reviews2 followers
January 13, 2024
Holmes, Dawn E., Big Data: A Very Short Introduction (Oxford University Press, 2017). Gelezen omdat iedereen die, zoals ik, beroepshalve met ‘big data’ te maken heeft, wordt geacht de wezenskenmerken van het fenomeen te kennen.

De term ‘big’ verwijst naar de reusachtige hoeveelheden data die iedere dag worden geproduceerd door een veelheid van sensoren, systemen en apparaten en die via het ‘internet’ — het geheel van computers en netwerken dat met elkaar in verbinding staat en waarbinnen het mogelijk is data te versturen naar een IP-adres — worden gedeeld. ‘Big data’ is onlosmakelijk verbonden met de razendsnelle ontwikkeling van digitale informatietechnologie sinds het einde van de twintigste eeuw en onderdeel van de informatierevolutie die van deze technologische ontwikkeling het gevolg is.

De betekenis van deze Digitale Revolutie is vergelijkbaar met die van de Industriële Revolutie in de achttiende en negentiende eeuw. Sinds de introductie van het ‘World Wide Web’ in 1989 en vooral de ‘smartphone’ in 2006 heeft het internet al tal van aspecten van de wereldsamenleving in hoog tempo getransformeerd. Hoewel het technologische einde van Murphy’s Law (“the performance of microchips doubles every eighteen months”) in zicht is (27-28), staat deze transformatie hoogstwaarschijnlijk pas aan het begin.

Onder ‘data’ wordt in deze context gedigitaliseerde informatie verstaan, dat wil zeggen informatie die met behulp van elektronische technologie en het binaire stelsel (‘nullen en enen’) in systemen is vastgelegd en/of wordt gecommuniceerd. De complexiteit van ‘big data’ vloeit voort uit de vier kenmerken van ‘big data’: ‘volume, velocity, variety and veracity’ (waarbij het laatste kenmerk niet is gewaarborgd en juist daardoor een uitdaging vormt). Een belangrijk onderscheid binnen ‘big data’ is dat tussen gestructureerde (lijsten, overzichten, registraties, etcetera), ongestructureerde (allerlei verschillende documenten, foto’s, films, etcetera) en semi-gestructureerde data. Verreweg de meeste data is ongestructureerd. Met behulp van algoritmes en data analyse is het mogelijk bruikbare informatie te destilleren uit grote hoeveelheden ongestructureerde data.

Daarvoor zijn wel rekenkracht (‘computing power’) en opslag (‘storage’) noodzakelijk, en daarmee stroom. Het Hadoop-cluster is ontworpen om grote hoeveelheden ongestructureerde data gedistribueerd op te slaan en te analyseren. Een groot voordeel van de gedistribueerde opslag (‘distributed file system’) is dat deze schaalbaar is. De ‘cloud’ verwijst naar het geheel van met elkaar verbonden ‘servers’ in datacentra, waar data gedistribueerd wordt opgeslagen en kan worden geanalyseerd en bewerkt (‘cloud computing’). De ‘cloud’ heeft dus niets met wolken te maken, maar is in werkelijkheid juist laag-bij-de-gronds: “…although the Internet and Cloud-based computing are generally thought of as wireless, they are anything but; data is transmitted through fibre-optic cables laid under the oceans. […] Fibre-optic cables provide the fastest means of data transmission available and so are generally preferable to satellites.” 96)

Het werken met grote datasets biedt volop mogelijkheden om meer inzicht te krijgen en maatschappelijke waarde te genereren. De auteur gaat bijvoorbeeld in op de rol van ‘big data’ in de gezondheidszorg en de commercie. De realiteit van — het werken met — ‘big data’ gaat echter ook met forse uitdagingen gepaard.

Het is, allereerst, niet eenvoudig om op metaniveau bruikbare, betrouwbare informatie uit ‘big data’ te halen. Het gebruik van ‘big data’ kan ons daardoor ook op het verkeerde been zetten (de auteur noemt de foutieve analyses van Google Flu over de verspreiding van het griepvirus in 2011-2012 als voorbeeld (62-63)). Met behulp van ‘big data’ kunnen allerlei correlaties worden gelegd, maar welke correlaties zinvol zijn blijft mensenwerk. Data science brengt computerwetenschappen en statistiek bij elkaar om grote hoeveelheden ongestructureerde data te kunnen analyseren.

Ten tweede geeft iedereen die gebruik maakt van digitale technologie een deel van zijn privacy prijs, een aspect waar de auteur meer aandacht aan had mogen besteden. Zodra we via een ‘internet service provider’ (ISP) toegang hebben tot het internet wordt er allerlei data gegenereerd en verzameld. Deze data, zoals ‘clickstream logs’ die ons surfgedrag op het internet registreren, heeft grote commerciële waarde en is inmiddels handelswaar geworden. ‘Cookies’ — kleine tekstbestanden die een site helpen om de gebruiker te identificeren— maken personalisatie mogelijk. Het gevolg is natuurlijk ook dat de persoonlijke levenssfeer in de digitale ruimte in een geheel ander daglicht komt te staan. ‘Big data’ maakt ‘mass surveillance’ mogelijk, met alle gevolgen van dien, een aspect waar de auteur nauwelijks op in gaat (behalve in de passages over de Snowden-affaire).

Ten derde is voor toepassingen in bijvoorbeeld navigatiesystemen of autonome auto’s ‘real time data processing’ noodzakelijk, wat astronomisch veel rekenkracht vergt en foutgevoelig is. Holmes: “… each autonomous car will generate on average 30 Tb of data daily, much of which will have to be processed almost instantly.” (11) Om over de ontwikkeling naar ‘smart cities’ en datagedreven huishoudens (IoT) nog maar te zwijgen. Wat betekent al deze rekenkracht en kwetsbaarheid voor duurzaamheid en veiligheid? Er zijn hoe dan ook tal van moeilijk oplosbare veiligheidsvraagstukken gemoeid met ‘big data’. Een ‘firewall’ beschermt een netwerk tegen ongeautoriseerde toegang via het internet, maar vasthoudende hackers weten niettemin vrijwel altijd toegang te krijgen. Holmes noemt het voorbeeld van gedigitaliseerde auto’s die zijn gehackt en concludeert dat “the potential for cyberattacks on smart vehicles will need to be addressed before the technology becomes fully public”. (108) Hetzelfde geldt qualitate qua voor ‘smart cities’. “If we want to make big data secure, encryption is vital.” (94) Verder zijn kabelgebonden netwerken, die de ruggengraat vormen voor het internet, kwetsbaar voor sabotage en “careless fishermen” (97). Ook die moeten worden beschermd. Security by design is echter nog ver weg als het om ‘big data’ gaat.

Ten vierde zijn lang niet alle maatschappelijke effecten van ‘big data’ positief en is de nog onbeantwoorde vraag hoe de balans uitslaat. Het merendeel van het internet blijft onzichtbaar voor zoekmachines zoals Google, Bing en Yahoo: de ‘Deep Web’ is ook een platform voor talloze illegale activiteiten. Holmes gaat verder nauwelijks in op het polarisatieversterkende effect van ‘big data’ in samenleving als gevolg van ‘information bubbles’ en ‘echo chambers’ die het gevolg zijn van door commerciële belangen ingegeven algoritmes. Ook in een ‘very short introduction’ hadden dergelijke aspecten niet onbenoemd mogen blijven.

Big data is here to stay, zoveel is duidelijk. De toekomst moet uitwijzen of dat vooral een zegen of een vloek is.
Profile Image for Lalit Singh Tomar.
60 reviews
June 12, 2022
This is a small book (112 pages of palm size). True to its name its really a very short introduction to Big Data and associated concepts. I think Author has done a wonderful job in explaining this topic to a laymen like me who have some idea of computer science. In this small space many a interesting topics are explained in short like Clustering, CAP Theorem, Compression, Bloom Filter, PageRank etc ...

Dont know why some people are giving negative review... Its a neatly written book and seems quite up to date till 2017. Ofcourse this book is not for getting detailed understanding of the topic.

The last 03 chapters are like case study summary of previous application success and failure of BIG DATA .. which i like the most.
1 review
May 13, 2020
Data trends: the google flu trends underestimated the human reaction to pandemic trends, therefore overestimated spreading by 50%
-high school basketball correlated to spreading of flu. Not accounted for.
-64Kb on Apollo 11
-genome sequencing and personal health devices will render the need for general practitioners relatively moot.
-smart watch data to workplace, lead to redundancies based on health
-Netflix can predict when a customer will cancel subscription
TOR: the onion route
63 reviews3 followers
December 21, 2021
Holmes has created an accessible and informative introduction to the evolution of data classification, processing, and storage. The voice is clear and the further reading section is rich with recommendations to further understand the intricacies of data as well as the mathematical basis behind modern computational tools. I would recommend the book for anyone interested in computer science, data science, and systems engineering.
Profile Image for Jenny Hong.
84 reviews
July 12, 2023
Absolutely incredible. The scope and depth of the content was absolutely perfect for a person with no CS or data science background. Very concrete and understandable. It covered everything from basic concepts and events related to big data that I felt like I should understand but was too embarrassed to ask about… to basic methods of data encryption and mining and stuff… to exciting new developments on the horizon.
Truly well done. I cannot recommend this book enough.
Profile Image for José.
455 reviews15 followers
March 7, 2022
I read this book because I was interested in blockchain (yes, i know is a very different topic), and I surprised myself with a very interesting book about the infinite ways of using data. It really feels short, though, but it would be unfair to ask more of a book with "a very short introduction" in the title.
Profile Image for Robert Uehlin.
17 reviews
April 21, 2020
Nicely written and comprehensive, this intro explains what makes data "big" and how big data is impacting various fields of study. The book also covers the fundamentals of how data is stored and how it is kept secure.
Profile Image for Ryan.
27 reviews1 follower
Read
April 13, 2022
Big data models sometimes fit the data very well--small random cases are all taken into account, but do not predict well due to this 'over-fitting'.
3 reviews
July 3, 2024
Does its job well as a short introduction. Succinct and to-the-point. However, it could've provided better references for further reading. It could also have packed in more information.
Profile Image for Mario  A.
133 reviews
July 30, 2024
It's a good picture for understanding the big picture around how big data is generated, processed and stores and the risks inherent due to not having enough cybersecurity measures in place.
Displaying 1 - 23 of 23 reviews

Can't find what you're looking for?

Get help and learn more about the design.