Jeremy Howard's Blog, page 2

May 16, 2022

AI Harms are Societal, Not Just Individual

Not just Individual, but Societal Harms Case Study: Privacy and surveillance Case Study: Disinformation and erosion of trust Individual Harms, Individual Solutions Parallels with Environmental Harms Directions ForwardNot just Individual, but Societal Harms

When the USA government switched to facial identification service ID.me for unemployment benefits, the software failed to recognize Bill Baine���s face. While the app said that he could have a virtual appointment to be verified instead, he was unable to get through. The screen had a wait time of 2 hours and 47 minutes that never updated, even over the course of weeks. He tried calling various offices, his daughter drove in from out of town to spend a day helping him, and yet he was never able to get a useful human answer on what he was supposed to do, as he went for months without unemployment benefits. In Baine���s case, it was eventually resolved when a journalist hypothesized that the issue was a spotty internet connection, and that Baine would be better off traveling to another town to use a public library computer and internet. Even then, it still took hours for Baine to get his approval.

Journalist Andrew Kenney of Colorado Public Radio has covered the issues with ID.me Journalist Andrew Kenney of Colorado Public Radio has covered the issues with ID.me

Baine was not alone. The number of people receiving unemployment benefits plummeted by 40% in the 3 weeks after ID.me was introduced. Some of these were presumed to be fraudsters, but it is unclear how many genuine people in need of benefits were wrongly harmed by this. These are individual harms, but there are broader, societal harms as well: the cumulative costs of the public having to spend ever more time on hold, trying to navigate user-hostile automated bureaucracies where they can���t get the answers they need. There is the societal cost of greater inequality and greater desperation, as more people are plunged into poverty through erroneous denial of benefits. And there is the undermining of trust in public services, which can be difficult to restore.

Potential for algorithmic harm takes many forms: loss of opportunity (employment or housing discrimination), economic cost (credit discrimination, narrowed choices), social detriment (stereotype confirmation, dignitary harms), and loss of liberty (increased surveillance, disproportionate incarceration). And each of these four categories manifests in both individual and societal harms.

It should come as no surprise that algorithmic systems can give rise to societal harm. These systems are sociotechnical: they are designed by humans and teams that bring their values to the design process, and algorithmic systems continually draw information from, and inevitably bear the marks of, fundamentally unequal, unjust societies. In the context of COVID-19, for example, policy experts warned that historical healthcare inequities risked making their way into the datasets and models being used to predict and respond to the pandemic. And while it���s intuitively appealing to think of large-scale systems as creating the greatest risk of societal harm, algorithmic systems can create societal harm because of the dynamics set off by their interconnection with other systems/ players, like advertisers, or commercially-driven media, and the ways in which they touch on sectors or spaces of public importance.

Still, in the west, our ideas of harm are often anchored to an individual being harmed by a particular action at a discrete moment in time. As law scholar Natalie Smuha has powerfully argued, legislation (both proposed and passed) in Western countries to address algorithmic risks and harms often focuses on individual rights: regarding how an individual���s data is collected or stored, to not be discriminated against, or to know when AI is being used. Even metrics used to evaluate the fairness of algorithms are often aggregating across individual impacts, but unable to capture longer-term, more complex, or second- and third-order societal impacts.

Case Study: Privacy and surveillance

Consider the over-reliance on individual harms in discussing privacy: so often focused on whether individual users have the ability to opt in or out of sharing their data, notions of individual consent, or proposals that individuals be paid for their personal data. Yet widespread surveillance fundamentally changes society: people may begin to self-censor and to be less willing (or able) to advocate for justice or social change. Professor Alvaro Bedoya, director of the Center on Privacy and Technology at the Georgetown University Law Center, traces a history of how surveillance has been used by the state to try to shut down movements for progress��� targeting religious minorities, poor people, people of color, immigrants, sex workers and those considered ���other���. As Maciej Ceglowski writes, ���Ambient privacy is not a property of people, or of their data, but of the world around us��� Because our laws frame privacy as an individual right, we don���t have a mechanism for deciding whether we want to live in a surveillance society.���

Drawing on interviews with African data experts, Birhane et al write that even when data is anonymized and aggregated, it ���can reveal information on the community as a whole. While notions of privacy often focus on the individual, there is growing awareness that collective identity is also important within many African communities, and that sharing aggregate information about communities can also be regarded as a privacy violation.��� Recent US-based scholarship has also highlighted the importance of thinking about group level privacy (whether that group is made up of individuals who identify as members of that group, or whether it���s a ���group��� that is algorithmically determined - like individuals with similar shopping habits on Amazon). Because even aggregated anonymised data can reveal important group-level information (e.g., the location of military personnel training via exercise tracking apps) ���managing privacy���, these authors argue ���is often not intrapersonal but interpersonal.��� And yet legal and tech design privacy solutions are often better geared towards assuring individual-level privacy than negotiating group privacy.

Case Study: Disinformation and erosion of trust

Another example of a collective societal harm comes from how technology platforms such as Facebook have played a significant role in elections ranging from the Philippines to Brazil, yet it can be difficult (and not necessarily possible or useful) to quantify exactly how much: something as complex as a country���s political system and participation involves many interlinking factors. But when ���deep fakes��� make it ���possible to create audio and video of real people saying and doing things they never said or did��� or when motivated actors successfully game search engines to amplify disinformation, the (potential) harm that is generated is societal, not just individual. Disinformation and the undermining of trust in institutions and fellow citizens have broad impacts, including on individuals who never use social media.

Reports and Events on Regulatory Approaches to Disinformation Reports and Events on Regulatory Approaches to Disinformation

Efforts by national governments to deal with the problem through regulation have not gone down well with everyone. ���Disinformation��� has repeatedly been highlighted as one of the tech-enabled ���societal harms��� that the UK���s Online Safety Bill or the EU���s Digital Services Act should address, and a range of governments are taking aim at the problem by proposing or passing a slew of (in certain cases ill-advised) ���anti-misinformation��� laws. But there���s widespread unease around handing power to governments to set standards for what counts as ���disinformation���. Does reifying ���disinformation��� as a societal harm become a legitimizing tool for governments looking to silence political dissent or undermine their weaker opponents? It���s a fair and important concern - and yet simply leaving that power in the hands of mostly US-based, unaccountable tech companies is hardly a solution. What are the legitimacy implications if a US company like Twitter were to ban democratically elected Brazilian President Jair Bolsonaro for spreading disinformation, for example? How do we ensure that tech companies are investing sufficiently in governance efforts across the globe, rather than responding in an ad hoc manner to proximal (i.e. mostly US-based) concerns about disinformation? Taking a hands off approach to platform regulation doesn���t make platforms��� efforts to deal with disinformation any less politically fraught.

Individual Harms, Individual Solutions

If we consider individual solutions our only option (in terms of policy, law, or behavior), we often limit the scope of the harms we can recognize or the nature of the problems we face. To take an example not related to AI: Oxford professor Trish Greenhalgh et al analyzed the slow reluctance of leaders in the West to accept that covid is airborne (e.g. it can linger and float in the air, similar to cigarette smoke, requiring masks and ventilation to address), rather than droplet dogma (e.g. washing your hands is a key precaution). One reason they highlight is the Western framing of individual responsibility as the solution to most problems. Hand-washing is a solution that fits the idea of individual responsibility, whereas collective responsibility for the quality of shared indoor air does not. The allowable set of solutions helps shape what we identify as a problem. Additionally, the fact that recent research suggests that ���the level of interpersonal trust in a society��� was a strong predictor of which countries managed COVID-19 most successfully should give us pause. Individualistic framings can limit our imagination about the problems we face and which solutions are likely to be most impactful.

Parallels with Environmental Harms

Before the passage of environmental laws, many existing legal frameworks were not well-suited to address environmental harms. Perhaps a chemical plant releases waste emissions into the air once per week. Many people in surrounding areas may not be aware that they are breathing polluted air, or may not be able to directly link air pollution to a new medical condition, such as asthma, (which could be related to a variety of environmental and genetic factors).

There are parallels between air polllution and algorithmic harms There are parallels between air polllution and algorithmic harms

There are many parallels between environmental issues and AI ethics. Environmental harms include individual harms for people who develop discrete health issues from drinking contaminated water or breathing polluted air. Yet, environmental harms are also societal: the societal costs of contaminated water and polluted air can reverberate in subtle, surprising, and far-reaching ways. As law professor Nathalie Smuha writes, environmental harms are often accumulative and build over time. Perhaps each individual release of waste chemicals from a refinery has little impact on its own, but adds up to be significant. In the EU, environmental law allows for mechanisms to show societal harm, as it would be difficult to challenge many environmental harms on the basis of individual rights. Smuha argues that there are many similarities with AI ethics: for opaque AI systems, spanning over time, it can be difficult to prove a direct causal relationship to societal harm.

Directions Forward

To a large extent our message is to tech companies and policymakers. It���s not enough to focus on the potential individual harms generated by tech and AI: the broader societal costs of tech and AI matter.

But those of us outside tech policy circles have a crucial role to play. One way in which we can guard against the risks of the ���societal harm��� discourse being co-opted by those with political power to legitimise undue interference and further entrench their power is by claiming the language of ���societal harm��� as the democratic and democratising tool it can be. We all lose when we pretend societal harms don���t exist, or when we acknowledge they exist but throw our hands up. And those with the least power, like Bill Baine, are likely to suffer a disproportionate loss.

In his newsletter on Tech and Society, L.M. Sacasas encourages people to ask themselves 41 questions before using a particular technology. They���re all worth reading and thinking about - but we���re listing a few especially relevant ones to get you started. Next time you sit down to log onto social media, order food online, swipe right on a dating app or consider buying a VR headset, ask yourself:

How does this technology empower me? At whose expense? (Q16) What feelings does the use of this technology generate in me toward others? (Q17) What limits does my use of this technology impose upon others? (Q28) What would the world be like if everyone used this technology exactly as I use it? (Q37) Does my use of this technology make it easier to live as if I had no responsibilities toward my neighbor? (Q40) Can I be held responsible for the actions which this technology empowers? Would I feel better if I couldn���t? (Q41)

It���s on all of us to sensitise ourselves to the societal implications of the tech we use.

 •  0 comments  •  flag
Share on Twitter
Published on May 16, 2022 07:00

March 14, 2022

There's no such thing as not a math person

On the surface, I may seem into math: I have a math PhD, taught a graduate computational linear algebra course, co-founded AI research lab fast.ai, and even go by the twitter handle @math_rachel.

Yet many of my experiences of academic math culture have been toxic, sexist, and deeply alienating. At my lowest points, I felt like there was no place for me in math academia or math-heavy tech culture.

It is not just mathematicians or math majors who are impacted by this: Western culture is awash in negative feelings and experiences regarding math, which permate from many sources and impact students of all ages. In this post, I will explore the cultural factors, misconceptions, stereotypes, and relevant studies on obstacles that turn people off to math. If you (or your child) doesn���t like math or feels anxious about your own capabilities, you���re not alone, and this isn���t just a personal challenge. The below essay is based on part of a talk I recently gave.

me, teaching sorting algorithms, at an all-women coding academy in 2015 me, teaching sorting algorithms, at an all-women coding academy in 2015 Myth of Innate Ability, Myth of the Lone Genius

One common myth is the idea that certain people���s brains aren���t ���wired��� the right way to do math, tech, or AI, that your brain either ���works that way��� or not. None of the evidence supports this viewpoint, yet when people believe this, it can become a self-fulfilling prophecy. Dr. Omoju Miller, who earned her PhD at UC Berkeley and was a senior machine learning engineer and technical advisor to the CEO at Github, shares some of the research debunking the myth of innate ability in this essay and in her TEDx talk. In reality, there is no such thing as ���not a math person.���

Dr. Cathy O���Neil, a Harvard Math PhD and author of Weapons of Math Destruction, wrote about the myth of the lone genius mathematician, ���You don���t have to be a genius to become a mathematician. If you find this statement at all surprising, you���re an example of what���s wrong with the way our society identifies, encourages and rewards talent��� For each certified genius, there are at least a hundred great people who helped achieve such outstanding results.���

Dr. Miller debunking the myth of innate ability, and Dr. O'Neil debunking the myth of the lone genius mathematician Dr. Miller debunking the myth of innate ability, and Dr. O'Neil debunking the myth of the lone genius mathematician Music without singing or instruments

Imagine a world where children are not allowed to sing songs or play instruments until they reach adulthood, after spending a decade or two transcribing sheet music by hand. This scenario is absurd and nightmarish, yet it is analogous to how math is often taught, with the most creative and interesting parts saved until almost everyone has dropped out. Dr. Paul Lockhart eloquently describes this metaphor in his essay,A Mathematician���s Lament, on ���how school cheats us out of our most fascinating and imaginative art form.��� Dr. Lockhart left his role as a university math professor to teach K-12 math, as he felt that so much reform was needed in how math is taught.

Dr. David Perkins uses the analogy of how children can play baseball wthout knowing all the technical details, without having a full team or playing a full 9 innings, yet still gain a sense of the ���whole game.��� Math is usually taught with an overemphasis on dry, technical details, without giving students a concept of the ���whole game.��� It can take years and years before enough technical details are accumulated to build something interesting. There is an overemphasis on techniques rather than meaning.

What if math was taught more like how music or sports are taught? What if math was taught more like how music or sports are taught?

Math curriculums are usually arranged in a vertical manner, with each year building tightly on the previous, such that one bad year can ruin everything that comes after. Many people I talk to can pinpoint the year that math went bad for them: ���I used to like math until 6th grade, when I had a bad teacher/was dealing with peer pressure/my undiagnosed ADHD was out of control. After that, I was never able to succeed in future years.��� This is less true in other subjects, where one bad history teacher/one bad year doesn���t mean that you can���t succeed at history the following year.

Gender, race, and stereotypes

Female teachers��� math anxiety affects girls��� math achievement: In the USA, over 90% of primary school teachers are female, and research has found ���the more anxious teachers were about math, the more likely girls (but not boys) were to endorse the commonly held stereotype that ���boys are good at math, and girls are good at reading��� and the lower these girls��� math achievement��� People���s fear and anxiety about doing math���over and above actual math ability���can be an impediment to their math achievement.���

Research across a number of universities has found that more women go into engineering when courses focus on problems with positive social impact.

Structural racism also impacts what messages teachers impart to students. An Atlantic article How Does Race Affect a Student���s Math Education? covered the research paper A Framework for Understanding Whiteness in Mathematics Education, noting that ���Constantly reading and hearing about underperforming Black, Latino, and Indigenous students begins to embed itself into how math teachers view these students, attributing achievement differences to their innate ability to succeed in math��� teachers start to expect worse performance from certain students, start to teach lower content, and start to use lower-level math instructional practices. By contrast, white and Asian students are given the benefit of the doubt and automatically afforded the opportunity to do more sophisticated and substantive mathematics.���

The mathematics community is ���an absolute mess which actively pushes out the sort of people who might make it better��� Dr. Harron's website, and some of the coverage of her number theory thesis, including on the Scientific American blog Dr. Harron's website, and some of the coverage of her number theory thesis, including on the Scientific American blog

Dr. Piper Harron made waves with her Princeton PhD thesis, utilizing humor, analogies, sarcasm, and genuine efforts to be accessible as she described advanced concepts in a ground-breaking way, very atypical for a mathematics PhD thesis. Dr. Harron wrote openly in the prologue of her thesis on how alienating the culture of mathematics is, ���As any good grad student would do, I tried to fit in, mathematically. I absorbed the atmosphere and took attitudes to heart. I was miserable, and on the verge of failure. The problem was not individuals, but a system of self-preservation that, from the outside, feels like a long string of betrayals, some big, some small, perpetrated by your only support system.��� At her blog, the Liberated Mathematician, she writes, ���My view of mathematics is that it is an absolute mess which actively pushes out the sort of people who might make it better.���

These descriptions resonate with my own experiences obtaining a math PhD (as well as the experiences of many friends, at a variety of universities). The toxicity of academic math departments is self-perpetuating, pushing out the people who could make them better.

The full talk

This post is based on the first part of the talk I gave in the below video, which includes more detail and a Q&A. The talk also includes recommendations about math apps and resources, as well as a framework for how to consider screentime. Stay tuned for a future fast.ai blog post covering math apps and screentime.

 •  0 comments  •  flag
Share on Twitter
Published on March 14, 2022 17:00

There's no such thing as not a math person

On the surface, I may seem into math: I have a math PhD, taught a graduate computational linear algebra course, co-founded AI research lab fast.ai, and even go by the twitter handle @math_rachel.

Yet many of my experiences of academic math culture have been toxic, sexist, and deeply alienating. At my lowest points, I felt like there was no place for me in math academia or math-heavy tech culture.

It is not just mathematicians or math majors who are impacted by this: Western culture is awash in negative feelings and experiences regarding math, which permate from many sources and impact students of all ages. In this post, I will explore the cultural factors, misconceptions, stereotypes, and relevant studies on obstacles that turn people off to math. If you (or your child) doesn���t like math or feels anxious about your own capabilities, you���re not alone, and this isn���t just a personal challenge. The below essay is based on part of a talk I recently gave.

me, teaching sorting algorithms, at an all-women coding academy in 2015 me, teaching sorting algorithms, at an all-women coding academy in 2015 Myth of Innate Ability, Myth of the Lone Genius

One common myth is the idea that certain people���s brains aren���t ���wired��� the right way to do math, tech, or AI, that your brain either ���works that way��� or not. None of the evidence supports this viewpoint, yet when people believe this, it can become a self-fulfilling prophecy. Dr. Omoju Miller, who earned her PhD at UC Berkeley and was a senior machine learning engineer and technical advisor to the CEO at Github, shares some of the research debunking the myth of innate ability in this essay and in her TEDx talk. In reality, there is no such thing as ���not a math person.���

Dr. Cathy O���Neil, a Harvard Math PhD and author of Weapons of Math Destruction, wrote about the myth of the lone genius mathematician, ���You don���t have to be a genius to become a mathematician. If you find this statement at all surprising, you���re an example of what���s wrong with the way our society identifies, encourages and rewards talent��� For each certified genius, there are at least a hundred great people who helped achieve such outstanding results.���

Dr. Miller debunking the myth of innate ability, and Dr. O'Neil debunking the myth of the lone genius mathematician Dr. Miller debunking the myth of innate ability, and Dr. O'Neil debunking the myth of the lone genius mathematician Music without singing or instruments

Imagine a world where children are not allowed to sing songs or play instruments until they reach adulthood, after spending a decade or two transcribing sheet music by hand. This scenario is absurd and nightmarish, yet it is analogous to how math is often taught, with the most creative and interesting parts saved until almost everyone has dropped out. Dr. Paul Lockhart eloquently describes this metaphor in his essay,A Mathematician���s Lament, on ���how school cheats us out of our most fascinating and imaginative art form.��� Dr. Lockhart left his role as a university math professor to teach K-12 math, as he felt that so much reform was needed in how math is taught.

Dr. David Perkins uses the analogy of how children can play baseball wthout knowing all the technical details, without having a full team or playing a full 9 innings, yet still gain a sense of the ���whole game.��� Math is usually taught with an overemphasis on dry, technical details, without giving students a concept of the ���whole game.��� It can take years and years before enough technical details are accumulated to build something interesting. There is an overemphasis on techniques rather than meaning.

What if math was taught more like how music or sports are taught? What if math was taught more like how music or sports are taught?

Math curriculums are usually arranged in a vertical manner, with each year building tightly on the previous, such that one bad year can ruin everything that comes after. Many people I talk to can pinpoint the year that math went bad for them: ���I used to like math until 6th grade, when I had a bad teacher/was dealing with peer pressure/my undiagnosed ADHD was out of control. After that, I was never able to succeed in future years.��� This is less true in other subjects, where one bad history teacher/one bad year doesn���t mean that you can���t succeed at history the following year.

Gender, race, and stereotypes

Female teachers��� math anxiety affects girls��� math achievement: In the USA, over 90% of primary school teachers are female, and research has found ���the more anxious teachers were about math, the more likely girls (but not boys) were to endorse the commonly held stereotype that ���boys are good at math, and girls are good at reading��� and the lower these girls��� math achievement��� People���s fear and anxiety about doing math���over and above actual math ability���can be an impediment to their math achievement.���

Research across a number of universities has found that more women go into engineering when courses focus on problems with positive social impact.

Structural racism also impacts what messages teachers impart to students. An Atlantic article How Does Race Affect a Student���s Math Education? covered the research paper A Framework for Understanding Whiteness in Mathematics Education, noting that ���Constantly reading and hearing about underperforming Black, Latino, and Indigenous students begins to embed itself into how math teachers view these students, attributing achievement differences to their innate ability to succeed in math��� teachers start to expect worse performance from certain students, start to teach lower content, and start to use lower-level math instructional practices. By contrast, white and Asian students are given the benefit of the doubt and automatically afforded the opportunity to do more sophisticated and substantive mathematics.���

The mathematics community is ���an absolute mess which actively pushes out the sort of people who might make it better��� Dr. Harron's website, and some of the coverage of her number theory thesis, including on the Scientific American blog Dr. Harron's website, and some of the coverage of her number theory thesis, including on the Scientific American blog

Dr. Piper Harron made waves with her Princeton PhD thesis, utilizing humor, analogies, sarcasm, and genuine efforts to be accessible as she described advanced concepts in a ground-breaking way, very atypical for a mathematics PhD thesis. Dr. Harron wrote openly in the prologue of her thesis on how alienating the culture of mathematics is, ���As any good grad student would do, I tried to fit in, mathematically. I absorbed the atmosphere and took attitudes to heart. I was miserable, and on the verge of failure. The problem was not individuals, but a system of self-preservation that, from the outside, feels like a long string of betrayals, some big, some small, perpetrated by your only support system.��� At her blog, the Liberated Mathematician, she writes, ���My view of mathematics is that it is an absolute mess which actively pushes out the sort of people who might make it better.���

These descriptions resonate with my own experiences obtaining a math PhD (as well as the experiences of many friends, at a variety of universities). The toxicity of academic math departments is self-perpetuating, pushing out the people who could make them better.

The full talk

This post is based on the first part of the talk I gave in the below video, which includes more detail and a Q&A. The talk also includes recommendations about math apps and resources, as well as a framework for how to consider screentime. Stay tuned for a future fast.ai blog post covering math apps and screentime.

 •  0 comments  •  flag
Share on Twitter
Published on March 14, 2022 07:00

March 13, 2022

7 Great Lightning Talks Related to Data Science Ethics

I have been organizing and facilitating a series of Ethics Workshops for the Australian Data Science Network, featuring lightning talks by Australian experts on a range of topics related to data science ethics, including machine learning in medicine, explainability, Indigenous-led AI, and the role of policy. Check out the videos from these thought-provoking lightning talks (with longer discussions at the end):

The False Hope of Explainability in Medicine Differences between understandings of explainability. Differences between understandings of explainability.

Lauren Oakden-Rayner, the Director of Research for Medical Imaging at Royal Adelaide Hospital, is both a radiologist and a machine learning expert. She spoke about mismatched expectations between technical and non-technical communities on what questions explainability answers, based on her paper ���The false hope of current approaches to explainable artificial intelligence in health care���. Lauren���s talk is at the start of Video #1.

Critical Gaps in ML Evaluation Practice Often unspoken assumptions underlying machine learning evaluation practices, and the gaps left by each Often unspoken assumptions underlying machine learning evaluation practices, and the gaps left by each

Ben Hutchinson is a senior engineer in Google Research based in Sydney. Practices for evaluating machine learning models are largely developed within academic research and rest on a number of assumptions that lead to concerning gaps when applied to real-world applications. Ben���s talk starts at 12 min mark of Video #1.

Indigenous-Led AI On empowering, enabling, and informing Indigenous knowledge throughout the model development process. On empowering, enabling, and informing Indigenous knowledge throughout the model development process.

Cathy Robinson is a principal research scientist at CSIRO, working on a project to center Indigenous data soveriegnty and Indigenous co-design in addressing complex ecological and conservation issues. Read more about CSIRO���s Healthy Country AI project or about CARE Indigenous Data Principles. Watch Cathy���s talk starting at 23 min mark of Video #1.

Near-Termism and AI Value Alignment The differences between definitive and normative understandings of explainability. The differences between definitive and normative understandings of explainability.

Aaron Snoswell is a postdoctoral research fellow at QUT, with over a decade���s experience in software development, industry research, and robotics. He spoke about the issues with focusing primarily on long-termism in AI value alignment and the need to consider short-term issues. Starts at 36 min mark Video #1.

Narrow vs Broad Understandings of Algorithmic Bias among Stakeholders in Healthcare AI The differences between narrow vs broad undersatndings of algorithmic bias. The differences between narrow vs broad undersatndings of algorithmic bias.

Yves Saint James Aquino is a philosopher and physician, currently working on the project ���The algorithm will see you now: ethical, legal and social implications of adopting machine learning systems for diagnosis and screening��� as a postdoctoral research fellow at the University in Wollongong. For his talk, he drew on interviews with 70 different stakeholders in healthcare AI, including software developers, medical doctors, and startup founders, to explore different conceptions of how algorithmic bias is understood. Watch the first talk in Video #2.

Towards Human-Centric XAI using Eye Tracking in Chest Xrays Using a multi-modal approach for machine learning on chest x-rays Using a multi-modal approach for machine learning on chest x-rays

Catarina Pinto Moreira is a Lecturer in Information Systems at Queensland University of Technology and a pioneer in non-classical probabilistic graphical models for decision making to empower human decision-making. Interviews with radiologists are crucial to her work; for example, interviews revealed that clinical notes are important for radiologists to use in diagnosis, even though this is not often mentioned in the literature. Her talk begins at 10 min mark of Video #2.

The Role of Policy in Data Ethics AI policy should span the entire AI life cycle; focus on applications rather than underlying tech; and move beyond abstract principles. AI policy should span the entire AI life cycle; focus on applications rather than underlying tech; and move beyond abstract principles.

Michael Evans crafted Australia���s National Artificial Intelligence Roadmap, contributed to the development of Australia���s national approach to governing autonomous vehicles, and represented Australia at the World Bank/IMF Annual Meetings. He gave an overview of the AI policy landscape, including policy tools, the disconnect between principles and application, and recommended ways forward. Watch Michael���s talk beginning at 20 min mark of Video #2.

Each talk is around 5 minutes long. Feel free to fast forward to those of particular interest, or watch them all!

The End The False Hope of Explainability in Medicine (Lauren Oakden-Rayner, Australian Institute for Machine Learning) Critical Gaps in ML Evaluation Practice (Ben Hutchinson, Google Sydney) Indigenous-Led AI (Cathy Robinson, CSIRO) Near-Termism and AI Value Alignment (Aaron Snoswell, Queensland Univ of Technology) Narrow vs Broad Understandings of Algorithmic Bias among Stakeholders in Healthcare AI (Yves Saint James Aquino, Univ of Wollongong) Towards Human-Centric XAI using Eye Tracking in Chest Xrays (Catarina Pinto Moreira, Queensland Univ of Technology) The Role of Policy in Data Ethics (Michael Evans, Evans AI)
 •  0 comments  •  flag
Share on Twitter
Published on March 13, 2022 17:00

7 Great Lightning Talks Related to Data Science Ethics

I have been organizing and facilitating a series of Ethics Workshops for the Australian Data Science Network, featuring lightning talks by Australian experts on a range of topics related to data science ethics, including machine learning in medicine, explainability, Indigenous-led AI, and the role of policy. Check out the videos from these thought-provoking lightning talks (with longer discussions at the end):

The False Hope of Explainability in Medicine Differences between understandings of explainability. Differences between understandings of explainability.

Lauren Oakden-Rayner, the Director of Research for Medical Imaging at Royal Adelaide Hospital, is both a radiologist and a machine learning expert. She spoke about mismatched expectations between technical and non-technical communities on what questions explainability answers, based on her paper ���The false hope of current approaches to explainable artificial intelligence in health care���. Lauren���s talk is at the start of Video #1.

Critical Gaps in ML Evaluation Practice Often unspoken assumptions underlying machine learning evaluation practices, and the gaps left by each Often unspoken assumptions underlying machine learning evaluation practices, and the gaps left by each

Ben Hutchinson is a senior engineer in Google Research based in Sydney. Practices for evaluating machine learning models are largely developed within academic research and rest on a number of assumptions that lead to concerning gaps when applied to real-world applications. Ben���s talk starts at 12 min mark of Video #1.

Indigenous-Led AI On empowering, enabling, and informing Indigenous knowledge throughout the model development process. On empowering, enabling, and informing Indigenous knowledge throughout the model development process.

Cathy Robinson is a principal research scientist at CSIRO, working on a project to center Indigenous data soveriegnty and Indigenous co-design in addressing complex ecological and conservation issues. Read more about CSIRO���s Healthy Country AI project or about CARE Indigenous Data Principles. Watch Cathy���s talk starting at 23 min mark of Video #1.

Near-Termism and AI Value Alignment The differences between definitive and normative understandings of explainability. The differences between definitive and normative understandings of explainability.

Aaron Snoswell is a postdoctoral research fellow at QUT, with over a decade���s experience in software development, industry research, and robotics. He spoke about the issues with focusing primarily on long-termism in AI value alignment and the need to consider short-term issues. Starts at 36 min mark Video #1.

Narrow vs Broad Understandings of Algorithmic Bias among Stakeholders in Healthcare AI The differences between narrow vs broad undersatndings of algorithmic bias. The differences between narrow vs broad undersatndings of algorithmic bias.

Yves Saint James Aquino is a philosopher and physician, currently working on the project ���The algorithm will see you now: ethical, legal and social implications of adopting machine learning systems for diagnosis and screening��� as a postdoctoral research fellow at the University in Wollongong. For his talk, he drew on interviews with 70 different stakeholders in healthcare AI, including software developers, medical doctors, and startup founders, to explore different conceptions of how algorithmic bias is understood. Watch the first talk in Video #2.

Towards Human-Centric XAI using Eye Tracking in Chest Xrays Using a multi-modal approach for machine learning on chest x-rays Using a multi-modal approach for machine learning on chest x-rays

Catarina Pinto Moreira is a Lecturer in Information Systems at Queensland University of Technology and a pioneer in non-classical probabilistic graphical models for decision making to empower human decision-making. Interviews with radiologists are crucial to her work; for example, interviews revealed that clinical notes are important for radiologists to use in diagnosis, even though this is not often mentioned in the literature. Her talk begins at 10 min mark of Video #2.

The Role of Policy in Data Ethics AI policy should span the entire AI life cycle; focus on applications rather than underlying tech; and move beyond abstract principles. AI policy should span the entire AI life cycle; focus on applications rather than underlying tech; and move beyond abstract principles.

Michael Evans crafted Australia���s National Artificial Intelligence Roadmap, contributed to the development of Australia���s national approach to governing autonomous vehicles, and represented Australia at the World Bank/IMF Annual Meetings. He gave an overview of the AI policy landscape, including policy tools, the disconnect between principles and application, and recommended ways forward. Watch Michael���s talk beginning at 20 min mark of Video #2.

Each talk is around 5 minutes long. Feel free to fast forward to those of particular interest, or watch them all!

The End The False Hope of Explainability in Medicine (Lauren Oakden-Rayner, Australian Institute for Machine Learning) Critical Gaps in ML Evaluation Practice (Ben Hutchinson, Google Sydney) Indigenous-Led AI (Cathy Robinson, CSIRO) Near-Termism and AI Value Alignment (Aaron Snoswell, Queensland Univ of Technology) Narrow vs Broad Understandings of Algorithmic Bias among Stakeholders in Healthcare AI (Yves Saint James Aquino, Univ of Wollongong) Towards Human-Centric XAI using Eye Tracking in Chest Xrays (Catarina Pinto Moreira, Queensland Univ of Technology) The Role of Policy in Data Ethics (Michael Evans, Evans AI)
 •  0 comments  •  flag
Share on Twitter
Published on March 13, 2022 07:00

November 22, 2021

Doing Data Science for Social Good, Responsibly

The phrase ���data science for social good��� is a broad umbrella, ambiguously defined. As many others have pointed out, the term often fails to specify good for whom. Data science for social good can be used to refer to: nonprofits increasing their impact through more effective data use, hollow corporate PR efforts from big tech, well-intentioned projects that inadvertently result in surveillance and privacy invasion of marginalized groups, efforts seeped in colonialism, or many other types of projects. Note that none of the categories in the previous list are mutually exclusive, and one project may fit several of these descriptors.


"Data for good" is an imprecise term that says little about who we serve, the tools used, or the goals. Being more precise can help us be more accountable & have greater positive impact. @sarahookr presents at @DataInstituteSF lunch seminar pic.twitter.com/efAMJxdQB8

— Rachel Thomas (@math_rachel) August 24, 2018

Picture from a presentation given in 2018 by Sara Hooker, founder of non-profit Delta Analytics and an AI researcher at Google, on Why ���data for good��� lacks precision.

I have been involved with data science for social good efforts for several years: chairing the Data for Good track at the USF Data Institute Conference in 2017; coordinating and mentoring graduate students in internships with nonprofits Human Rights Data Analysis Group (for a project on entity resolution to obtain more accurate casualty conflicts in Syria and Sri Lanka) and the American Civil Liberties Union (one student analyzed covid vaccine equity in California and another analyzed disparities in school disciplinary action against Black and disabled students) during my time as director of the Center for Applied Data Ethics at USF; and now as a co-lead of the Data Science for Social Good program at Queensland University of Technology (QUT). At QUT, grad students and recent graduates partnered with non-profits Cancer Council Queensland (well known for their Australian Cancer Atlas) and FareShare food rescue organisation, which operates Australia���s largest charity kitchens. While data for good projects can be incredibly useful, there are also pitfalls to be mindful of when approaching data for social good.

Some Questions & Answers

I recently spoke on a panel at the QUT Data Science for Social Good showcase event. I appreciated the thoughtful, nuanced questions from the moderators, Dr. Timothy Graham and Dr. Char-lee Moyle, who brought up some of the potential risks. I want to share their questions below, along with an expanded version of my answers.

What ethical and governance considerations do you think not-for-profits should consider when starting to adopt data science? Be specific about the goals of the project and how different stakeholders will be impacted:A series of interviews with African data experts revealed that power imbalances, failure to acknowledge extractive practices, failure to build trust, and Western-centric policies were all prevalent. Even in ���data for good��� projects, the people whose data is accessed and shared may not reap the benefits that those who control the project do. Stakeholders such as government bodies and non-profits have significantly more power and leverage compared to data subjects. There are issues where data gathered for one goal ends up being repurposed or shared for other uses. While Western ���notions of privacy often focus on the individual, there is growing awareness that collective identity is also important within many African communities, and that sharing aggregate information about communities can also be regarded as a privacy violation.��� Center the problem to be solved, not a flashy solution. Sometimes machine learning practitioners have a solution searching for a problem. It is important to stay focused on the root problem and be open to ���mundane��� or even non-technical solutions. One data for good project used the records of 15 million mobile phone owners in Kenya to quantify the movements of workers who migrate for seasonal work to an area with malaria, and made recommendations to increase malaria surveillance in their hometowns when they return. As a journalist for Nature reported, ���But it���s unclear whether the results were needed, or useful. Malaria-control officers haven���t incorporated the analyses into their efforts.��� The excitement around ���flashy��� big data approaches contrasts with the lack of funding for proven measures like bed nets, insecticides, treatment drugs, and health workers. Take data privacy seriously. Be clear about how the data will be stored, who has access to it, and what will happen to it later. Ask what data is truly needed, and if there are less invasive ways to get this information. Note that the above example tracking Kenyan mobile phone owners raises risks around lack of consent, invasion of privacy, and risk of de-anonymization. Include the people most impacted, and recognize that their values may be different from those of both non-profits or academic stakeholders involved. A recent article from AI Now Institute recommended that ���social good projects should be developed at a small scale for local contexts ��� they should be designed in consultation with the community or social environment impacted by the systems in order to identify core values and needs.��� One example of differing values: Indigenous scholars highlighted that a set of open data principles developed primarily by Western scholars to improve data discovery and reuse created tension with Indigenous values. The FAIR principles, first developed at a workshop in the Netherlands in 2014 and elaborated on in this paper published in Nature, call for data to be findable, accessible, interoperable, and reusable. In response, Indigenous scholars convened to develop the CARE principles for Indigenous data governance, calling for collective benefit, authority to control, responsibility, and ethics, intended as a complement to the FAIR principles. Avoid answering the ���wrong problem.��� For instance, many European governments are currently using algorithmic approaches to justify austerity cuts. Arguments about reducing fraud often accompany these, even when fraud is minimal. Due to a biometric identity system in India, many poor and elderly people are no longer able to access their food benefits due to faded fingerprints, not being able to travel to scanners, or intermittent internet connections.

My impression is that some folks use machine learning to try to "solve" problems of artificial scarcity. Eg: we won't give everyone the healthcare they need, so let's use ML to decide who to deny.

Question: What have you read about this? What examples have you seen?

— Rachel Thomas (@math_rachel) November 18, 2020
Do you think that data science for social good can increase the surveillance and control of disadvantaged groups or certain segments of society?

Many well-meaning projects inadvertently lead to increased surveillance, despite good intentions. Cell-phone data from millions of phone owners in over two dozen low- and middle- income countries has been anonymized and analyzed in the wake of humanitarian disasters. This data raises concern of the lack of consent of the phone users and risks of de-anonymization. Furthermore, it is often questionable whether the results are truly useful, as well as if they could have been discovered through other, less invasive approaches. One such project analyzed the cell phone data of people in Sierra Leone during an Ebola outbreak. However, this approach didn���t address how Ebola spreads (only through direct contact with body fluids) or help with the most urgent issue (which was convincing symptomatic people to come to clinics to isolate).

What do you think is the role of government and universities in supporting and incentivising the not-for-profit sector in adopting data science?

Academia and government have a big role to play. Often non-profits lack the in-house data science skill to take advantage of their data, and many data scientists who are searching for meaningful and impactful real-world problems to work on. We will also need the government to regulate topics such as data privacy to help protect those who may be impacted. It is important to recognize that privacy should not just be considered an individual right, but also a public good.

What are your thoughts around the development of ethical frameworks to guide data science ��� are they more than marketing tactics to increase trustworthiness and reputation of data science?

We need ethical frameworks AND regulation. Both are crucially important. Many people want to do the right thing, and having standardized processes to guide them can help. I recommend the Markkula Center Tech Ethics Toolkit, which includes practical processes you can implement in your organization to try to identify ethical risks BEFORE they cause harm. At the same time, we need legal protections anywhere that data science impacts human rights and civil rights. Meaningful consequences are needed for those who cause harm to others. Also, policy is the appropriate tool to address negative externalities, such as when corporations offset their costs and harms to society while reaping the profits for themselves. Otherwise, there will always be a race to the bottom.

What skills and training do you think the not-for-profits sector needs to embrace data science and what���s the best strategies for upskilling?

The people who are already working for an organization are best positioned to understand that organization���s problems and challenges, and where data science can help. Upskilling in-house talent is underutilized. Don���t feel that you need to hire someone new with a fancy pedigree, if there are people at your organization who are interested and eager to learn. I would start by learning to code in Python. Have a project from your not-for-profit that you are working on as you go, and let that project motivate you to learn what you need as you need to (rather than feeling like you need to spend years studying before you can tackle the problems you care about). One of our core missions with fast.ai is to train people in different domains to use machine learning for themselves, as they best understand the problems in their domain and what is needed. There are many myths that you need a super-elite background to use techniques like deep learning, but it���s not magic. Anyone with a year of coding experience can learn to use state-of-the-art deep learning.

Further Reading/Watching

Here are some additional articles (and one video) that I recommend to learn more on this topic:

Narratives and Counternarratives on Data Sharing in Africa Why ���data for good��� lacks precision Can tracking people through phone-call data improve lives? A New AI Lexicon: Social good fast.ai Practical Data Ethics Week 6: Algorithmic Colonialism
 •  0 comments  •  flag
Share on Twitter
Published on November 22, 2021 16:00

November 3, 2021

Avoiding Data Disasters

Things can go disastrously wrong in data science and machine learning projects when we undervalue data work, use data in contexts that it wasn���t gathered for, or ignore the crucial role that humans play in the data science pipeline. A new multi-university centre focused on Information Resilience, funded by the Australian government���s top scientific funding body (ARC), has recently launched. Information Resilience is the capacity to detect and respond to failures and risks across the information chain in which data is sourced, shared, transformed, analysed, and consumed. I���m honored to be a member of the strategy board, and I have been thinking about what information resilience means with respect to data practices. Through a series of case studies and relevant research papers, I will highlight these risks and point towards more resilient practices.

Case study: UK covid tracking app

Data from a covid-symptom tracking app was used in a research paper to draw wildly inaccurate conclusions about the prevalence of Long Covid, the often debilitating neurological, vascular, and immune disease that can last for months or longer (some patients have been sick for 20 months and counting). The app suggested that only 1.5% of patients still experience symptoms after 3 months, an order of magnitude smaller than estimates of 10-35% being found by other studies.

How could this research project have gone so wrong? Well, the app had been designed for a completely different purpose (tracking 1-2 week long respiratory infections), didn���t include the most common Long Covid symptoms (such as neurological dysfunction), had a frustrating user-interface that led many patients to quit using it, and made the erroneous assumption that those who stopped logging must be fully recovered. The results from this faulty research paper were widely shared, including in a BBC article, offering false reassurance than Long Covid prevalence is much rarer than it is. Patients had been voicing their frustrations with the app all along, and if researchers had listened sooner, they could have collected a much higher quality and more accurate data set.

This research failure illustrates a few common issues in data projects:

The context of the data was not taken into account. The user-interface, the categories listed, the included features��� these were all designed to record data about a short-term mild respiratory infection. However, when it was used for a different purpose (long covid patients suffering for months with vascular and neurological symptoms), it did a poor job, and led to missing and incomplete data. This happens all too often, in which data gathered for one context is used for another The people most impacted (long covid patients) were ignored. They had the most accurate expertise on what long covid actually entailed, yet were not listened to. Ignoring this expertise led to lower quality data and erroneous research conclusions. Patients have crucial domain expertise, which is distinct from that of doctors, and must be included in medical data science projects. From the start of the pandemic, patients who had suffered from other debilitating post-viral illnesses warned that we should be on the lookout for long-term illness, even in initially ���mild��� cases.Data is Crucial

Collecting data about covid and its long-term effects directly from patients was a good idea, but poorly executed in this case. Due to privacy and surveillance risks, I frequently remind people not to record data that they don���t need. However, the pandemic has been a good reminder of how much data we really do need, and how tough it is when it���s missing.

At the start of the pandemic in the United States, we had very little data about what was happening��� the government was not tabulating information on cases, testing, or hospitalization. How could we know how to react when we didn���t understand how many cases there were, what death rates were, how transmissible the disease was, and other crucial information? How could we make policy decisions in the absence of a basic understanding of the facts.

In early March 2020, two journalists and a data scientist from a medication-discovery platform began pulling covid data together into a spreadsheet to understand the situation in the USA. This launched into a 15-month long project in which 500 volunteers compiled and published data on COVID-19 testing, cases, hospitalizations, and deaths in the USA. During those 15 months, the Covid Tracking Project was the most comprehensive source of covid data in the USA, even more comprehensive than what the CDC had, and it was used by the CDC, numerous government agencies, and both the Trump and Biden Administrations. It was cited in academic studies and in thousands of news articles.

A data infrastructure engineer and contributor for the project later recounted, ���It quickly became apparent that daily, close contact with the data was necessary to understand what states were reporting. States frequently changed how, what, and where they reported data. Had we set up a fully automated data capture system in March 2020, it would have failed within days.��� The project used automation as a way to support and supplement manual work, not to replace it. At numerous points, errors in state reporting mechanisms were caught by eagle-eyed data scientists notifying discrepancies.

This vision of using automation to support human work resonates with our interest at fast.ai in ���augmentedML���, not ���autoML���. I have written previously and gave an AutoML workshop keynote on how too often automation ignores the important role of human input. Rather than try to automate everything (which often fails), we should focus on how humans and machines can best work together to take advantage of their different strengths.

Speaking about AugmentedML vs. AutoML at ICML 2019 Speaking about AugmentedML vs. AutoML at ICML 2019 Data Work is Undervalued

Interviews of 53 AI practitioners across 6 countries on 3 continents found a pattern that is very familiar to many of us (including me) who work in machine learning: ���Everyone wants to do the model work, not the data work.��� Missing meta-data leads to faulty assumptions. Data collection practices often conflict with the workflows of on-the-ground partners, such as nurses or farmers, who are usually not compensated for this extraneous effort. Too often data work is arduous, invisible, and taken for granted. Undervaluing of data work leads to poor practices and often results in negative, downstream events, including dangerously inaccurate models and months of lost work.

Throughout the pandemic, data about covid (both initial cases and long covid) has often been lacking. Many countries have experienced testing shortages, leading to undercounts of how many people have covid. The CDC decision not to track breakthrough cases unless they resulted in hospitalization made it harder to understand prevalence of break-throughs (a particularly concerning decision since break-throughs can still lead to long covid). In September, it was revealed that British Columbia, Canada was not including covid patients in their ICU counts once the patients were no longer infectious, a secretive decision that obscured how full ICUs were. Some studies of Long Covid have failed to include common symptoms, such as neurological ones, making it harder to understand the prevalence or nature.

Data has Context

Covid is giving us a first-hand view of how data, which we may sometimes want to think of as ���objective���, are shaped by countless human decisions and factors. In the example of the symptom tracking app, decisions about which symptoms were included had a significant impact on the prevalence rate calculated. Design decisions that influenced the ease of use impacted how much data was gathered. Lack of understanding of how the app was being used (and why people quit using it) led to erroneous decisions about which cases should be considered ���recovered���. These are all examples of the context for data. Here, the data gathered was reasonably appropriate for understanding initial covid infections (a week or two of respiratory symptoms), but not for patients experiencing months of neurological and vascular symptoms. Numbers can not stand alone, we need to understand how they were measured, who was included and excluded, relevant design decisions, under what situations a dataset is appropriate to use vs. not.

As another example, consider covid testing counts: Who has access to testing (this involves health inequities, due to race or urban vs. rural), who is encouraged to get tested (at various times, people without symptoms, children, or other groups have been discouraged from doing so), varying accuracies (e.g. PCR tests are less accurate on children, missing almost half of cases that later go on to seroconvert), and making decisions about what counts as a ���case��� (I know multiple people who had alternating test results: positive, negative, positive, or the reverse��� what counts as a positive case?)

Datasheet for an electrical component. Image from 'Datasheets for Datasets' Datasheet for an electrical component. Image from 'Datasheets for Datasets'

One proposal for capturing this context is Datasheets for Datasets. Prior to doing her PhD at Stanford in computer vision and then co-leading Google���s AI ethics team, Dr. Timnit Gebru worked at Apple in circuit design and electrical engineering. In electronics, each component (such as a circuit or transistor) comes with a datasheet that lists when and where it was manufactured, under what conditions it is safe to use, and other specifications. Dr. Gebru drew on this background to propose a similar idea for datasets: listing the context of when and how it was created, what data was included/excluded, recommended uses, potential biases and ethical risks, work needed to maintain it, and so on. This is a valuable proposal towards making the context of data more explicit.

The People Most Impacted

The inaccurate research and incomplete data from the covid tracking app could have been avoided by drawing on the expertise of patients. Higher quality data could have been collected sooner and more thoroughly, if patients were consulted in the app design and in the related research studies. Participatory approaches to machine learning is an exciting and growing area of research. In any domain, the people who would be most impacted by errors or mistakes need to be included as partners in the design of the project.

The Diverse Voices project from University of Washington Tech Policy Lab involves academic papers and practical how-to guides. The Diverse Voices project from University of Washington Tech Policy Lab involves academic papers and practical how-to guides.

Often, our approaches to addressing fairness or other ethics issues, further centralize the power of system designers and operators. The organizers of an ICML workshop on the topic called for more cooperative, democratic, and participatory approaches instead. We need to think not just about explainability, but about giving people actionable recourse. As Professor Berk Ustun highlights, when someone asks why their loan was denied, usually what they want is not just an explanation but to know what they could change in order to get a loan. We need to design systems with contestability in mind, to include from the start the idea that people should be able to challenge system outputs. We need to include expert panels of perspectives that are often overlooked, depending on the application, this could mean formerly or currently incarcerated people, people who don���t drive, people with very low incomes, disabled people, and many others. The Diverse Voices project from University of Washington Tech Lab provides guidance on how to do this. And it is crucial that this not just be tokenistic participation-washing, but a meaningful, appropriately compensated, and ongoing role in their design and operation.

Towards Greater Data Resilience

I hope that we can improve data resilience through:

Valuing data work Documenting context of data Close contact with the data Meaningful, ongoing, and compensated involvement of the people impacted

And I hope that when our data represents people we can remember the human side. As AI researcher Inioluwa Deborah Raji wrote, ���Data are not bricks to be stacked, oil to be drilled, gold to be mined, opportunities to be harvested. Data are humans to be seen, maybe loved, hopefully taken care of.���

Quote from AI researcher Inioluwa Deborah Raji Quote from AI researcher Inioluwa Deborah Raji
 •  0 comments  •  flag
Share on Twitter
Published on November 03, 2021 17:00

Avoiding Data Disasters

Things can go disastrously wrong in data science and machine learning projects when we undervalue data work, use data in contexts that it wasn���t gathered for, or ignore the crucial role that humans play in the data science pipeline. A new multi-university centre focused on Information Resilience, funded by the Australian government���s top scientific funding body (ARC), has recently launched. Information Resilience is the capacity to detect and respond to failures and risks across the information chain in which data is sourced, shared, transformed, analysed, and consumed. I���m honored to be a member of the strategy board, and I have been thinking about what information resilience means with respect to data practices. Through a series of case studies and relevant research papers, I will highlight these risks and point towards more resilient practices.

Case study: UK covid tracking app

Data from a covid-symptom tracking app was used in a research paper to draw wildly inaccurate conclusions about the prevalence of Long Covid, the often debilitating neurological, vascular, and immune disease that can last for months or longer (some patients have been sick for 20 months and counting). The app suggested that only 1.5% of patients still experience symptoms after 3 months, an order of magnitude smaller than estimates of 10-35% being found by other studies.

How could this research project have gone so wrong? Well, the app had been designed for a completely different purpose (tracking 1-2 week long respiratory infections), didn���t include the most common Long Covid symptoms (such as neurological dysfunction), had a frustrating user-interface that led many patients to quit using it, and made the erroneous assumption that those who stopped logging must be fully recovered. The results from this faulty research paper were widely shared, including in a BBC article, offering false reassurance than Long Covid prevalence is much rarer than it is. Patients had been voicing their frustrations with the app all along, and if researchers had listened sooner, they could have collected a much higher quality and more accurate data set.

This research failure illustrates a few common issues in data projects:

The context of the data was not taken into account. The user-interface, the categories listed, the included features��� these were all designed to record data about a short-term mild respiratory infection. However, when it was used for a different purpose (long covid patients suffering for months with vascular and neurological symptoms), it did a poor job, and led to missing and incomplete data. This happens all too often, in which data gathered for one context is used for another The people most impacted (long covid patients) were ignored. They had the most accurate expertise on what long covid actually entailed, yet were not listened to. Ignoring this expertise led to lower quality data and erroneous research conclusions. Patients have crucial domain expertise, which is distinct from that of doctors, and must be included in medical data science projects. From the start of the pandemic, patients who had suffered from other debilitating post-viral illnesses warned that we should be on the lookout for long-term illness, even in initially ���mild��� cases.Data is Crucial

Collecting data about covid and its long-term effects directly from patients was a good idea, but poorly executed in this case. Due to privacy and surveillance risks, I frequently remind people not to record data that they don���t need. However, the pandemic has been a good reminder of how much data we really do need, and how tough it is when it���s missing.

At the start of the pandemic in the United States, we had very little data about what was happening��� the government was not tabulating information on cases, testing, or hospitalization. How could we know how to react when we didn���t understand how many cases there were, what death rates were, how transmissible the disease was, and other crucial information? How could we make policy decisions in the absence of a basic understanding of the facts.

In early March 2020, two journalists and a data scientist from a medication-discovery platform began pulling covid data together into a spreadsheet to understand the situation in the USA. This launched into a 15-month long project in which 500 volunteers compiled and published data on COVID-19 testing, cases, hospitalizations, and deaths in the USA. During those 15 months, the Covid Tracking Project was the most comprehensive source of covid data in the USA, even more comprehensive than what the CDC had, and it was used by the CDC, numerous government agencies, and both the Trump and Biden Administrations. It was cited in academic studies and in thousands of news articles.

A data infrastructure engineer and contributor for the project later recounted, ���It quickly became apparent that daily, close contact with the data was necessary to understand what states were reporting. States frequently changed how, what, and where they reported data. Had we set up a fully automated data capture system in March 2020, it would have failed within days.��� The project used automation as a way to support and supplement manual work, not to replace it. At numerous points, errors in state reporting mechanisms were caught by eagle-eyed data scientists notifying discrepancies.

This vision of using automation to support human work resonates with our interest at fast.ai in ���augmentedML���, not ���autoML���. I have written previously and gave an AutoML workshop keynote on how too often automation ignores the important role of human input. Rather than try to automate everything (which often fails), we should focus on how humans and machines can best work together to take advantage of their different strengths.

Speaking about AugmentedML vs. AutoML at ICML 2019 Speaking about AugmentedML vs. AutoML at ICML 2019 Data Work is Undervalued

Interviews of 53 AI practitioners across 6 countries on 3 continents found a pattern that is very familiar to many of us (including me) who work in machine learning: ���Everyone wants to do the model work, not the data work.��� Missing meta-data leads to faulty assumptions. Data collection practices often conflict with the workflows of on-the-ground partners, such as nurses or farmers, who are usually not compensated for this extraneous effort. Too often data work is arduous, invisible, and taken for granted. Undervaluing of data work leads to poor practices and often results in negative, downstream events, including dangerously inaccurate models and months of lost work.

Throughout the pandemic, data about covid (both initial cases and long covid) has often been lacking. Many countries have experienced testing shortages, leading to undercounts of how many people have covid. The CDC decision not to track breakthrough cases unless they resulted in hospitalization made it harder to understand prevalence of break-throughs (a particularly concerning decision since break-throughs can still lead to long covid). In September, it was revealed that British Columbia, Canada was not including covid patients in their ICU counts once the patients were no longer infectious, a secretive decision that obscured how full ICUs were. Some studies of Long Covid have failed to include common symptoms, such as neurological ones, making it harder to understand the prevalence or nature.

Data has Context

Covid is giving us a first-hand view of how data, which we may sometimes want to think of as ���objective���, are shaped by countless human decisions and factors. In the example of the symptom tracking app, decisions about which symptoms were included had a significant impact on the prevalence rate calculated. Design decisions that influenced the ease of use impacted how much data was gathered. Lack of understanding of how the app was being used (and why people quit using it) led to erroneous decisions about which cases should be considered ���recovered���. These are all examples of the context for data. Here, the data gathered was reasonably appropriate for understanding initial covid infections (a week or two of respiratory symptoms), but not for patients experiencing months of neurological and vascular symptoms. Numbers can not stand alone, we need to understand how they were measured, who was included and excluded, relevant design decisions, under what situations a dataset is appropriate to use vs. not.

As another example, consider covid testing counts: Who has access to testing (this involves health inequities, due to race or urban vs. rural), who is encouraged to get tested (at various times, people without symptoms, children, or other groups have been discouraged from doing so), varying accuracies (e.g. PCR tests are less accurate on children, missing almost half of cases that later go on to seroconvert), and making decisions about what counts as a ���case��� (I know multiple people who had alternating test results: positive, negative, positive, or the reverse��� what counts as a positive case?)

Datasheet for an electrical component. Image from 'Datasheets for Datasets' Datasheet for an electrical component. Image from 'Datasheets for Datasets'

One proposal for capturing this context is Datasheets for Datasets. Prior to doing her PhD at Stanford in computer vision and then co-leading Google���s AI ethics team, Dr. Timnit Gebru worked at Apple in circuit design and electrical engineering. In electronics, each component (such as a circuit or transistor) comes with a datasheet that lists when and where it was manufactured, under what conditions it is safe to use, and other specifications. Dr. Gebru drew on this background to propose a similar idea for datasets: listing the context of when and how it was created, what data was included/excluded, recommended uses, potential biases and ethical risks, work needed to maintain it, and so on. This is a valuable proposal towards making the context of data more explicit.

The People Most Impacted

The inaccurate research and incomplete data from the covid tracking app could have been avoided by drawing on the expertise of patients. Higher quality data could have been collected sooner and more thoroughly, if patients were consulted in the app design and in the related research studies. Participatory approaches to machine learning is an exciting and growing area of research. In any domain, the people who would be most impacted by errors or mistakes need to be included as partners in the design of the project.

The Diverse Voices project from University of Washington Tech Policy Lab involves academic papers and practical how-to guides. The Diverse Voices project from University of Washington Tech Policy Lab involves academic papers and practical how-to guides.

Often, our approaches to addressing fairness or other ethics issues, further centralize the power of system designers and operators. The organizers of an ICML workshop on the topic called for more cooperative, democratic, and participatory approaches instead. We need to think not just about explainability, but about giving people actionable recourse. As Professor Berk Ustun highlights, when someone asks why their loan was denied, usually what they want is not just an explanation but to know what they could change in order to get a loan. We need to design systems with contestability in mind, to include from the start the idea that people should be able to challenge system outputs. We need to include expert panels of perspectives that are often overlooked, depending on the application, this could mean formerly or currently incarcerated people, people who don���t drive, people with very low incomes, disabled people, and many others. The Diverse Voices project from University of Washington Tech Lab provides guidance on how to do this. And it is crucial that this not just be tokenistic participation-washing, but a meaningful, appropriately compensated, and ongoing role in their design and operation.

Towards Greater Data Resilience

I hope that we can improve data resilience through:

Valuing data work Documenting context of data Close contact with the data Meaningful, ongoing, and compensated involvement of the people impacted

And I hope that when our data represents people we can remember the human side. As AI researcher Inioluwa Deborah Raji wrote, ���Data are not bricks to be stacked, oil to be drilled, gold to be mined, opportunities to be harvested. Data are humans to be seen, maybe loved, hopefully taken care of.���

Quote from AI researcher Inioluwa Deborah Raji Quote from AI researcher Inioluwa Deborah Raji
 •  0 comments  •  flag
Share on Twitter
Published on November 03, 2021 07:00

October 26, 2021

SARS-CoV-2 Spike Protein Impairment of Endothelial Function Does Not Impact Vaccine Safety

My colleague Dr Uri Manor was a senior author on a study in March this year which has become the most discussed paper in the history of Circulation Research and is in the top 0.005% of discussed papers across all topics. That���s because it got widely picked up by anti-vaxx groups that totally misunderstood what it says. Uri and I decided to set the record straight, and we wrote a paper that explains that ���SARS-CoV-2 Spike Protein Impairment of Endothelial Function Does Not Impact Vaccine Safety���. Unfortunately peer review has taken months, so it���s still not published. Therefore, we���ve decided to make the paper available prior to review below (as HTML) and here (as PDF).


Contents Abstract Background Analysis of Current Data Overall Vaccine Safety Comparative Analysis Mechanism of genetically-encoded spike protein vaccines Adenovirus Vector-Based Vaccines and VITT Conclusion ReferencesAbstract

Lei et al. [2021] showed the spike protein in SARS-CoV-2 alone was enough to cause damage to lung vascular endothelium. The authors noted that their results suggest that ���vaccination-generated antibody and/or exogenous antibody against S protein not only protects the host from SARS-CoV-2 infectivity but also inhibits S protein-imposed endothelial injury���. We show that there is no known mechanism by which the spike protein impairment of endothelial function could reduce vaccine safety, and that vaccine safety data clearly shows that the spike proteins in vaccines does not reduce vaccine safety. Overall, we conclude that spike proteins encoded by vaccines are not harmful and may be beneficial to vaccine recipients.

Background

COVID-19 has been widely understood to be a respiratory lung disease. However, there is now a growing consensus that SARS-CoV-2 also attacks the vascular system [Potus et al., 2020, Ackermann et al., 2020, Siddiqi et al., 2020, Teuwen et al., 2020]. Earlier studies of other coronaviruses have suggested that their spike proteins contributed to damaging vascular endothelial cells [Kuba et al., 2005].

Lei et al. [2021] created a pseudovirus surrounded by a SARS-CoV-2 crown of spike (S) proteins, but did not contain any actual virus, and found that exposure to this pseudovirus resulted in damage to the lungs and arteries of an animal model. They concluded that ���S protein alone can damage vascular endothelial cells (ECs) by downregulating ACE2 and consequently inhibiting mitochondrial function���.

Lei et al. [2021] noted that their conclusions suggest that vaccine-induced antibodies ���not only protects the host from SARS-CoV-2 infectivity but also inhibits S protein-imposed endothelial injury���. However, they did not tackle the question of whether the findings of EC damage from S protein might also have an unintended negative side effect of reducing vaccine safety.

Vaccine safety has become an important issue due to Vaccine-induced Immune Thrombotic Thrombocytopenia (VITT), also known as Vaccine-induced Immune Thrombocytopenia and Thrombosis, which has resulted in cases in recipients of the Oxford/AstraZeneca (AZ) and Johnson & Johnson (JJ) vaccines [Makris et al., 2021]. VITT refers to a rare combination of thrombosis (usually CVST) and thrombocytopenia which have been found in some patients 4 to 30 days after they receive their first AZ or JJ vaccine dose (and occasionally after their second dose).

Regulators have found that clots are extremely rare, and that the benefits of the vaccines outweigh the risks. However, the roll-out of the AZ and JJ vaccines have been restricted in many jurisdictions [Mahase, 2021]. In the UK, for instance, the Joint Committee on Vaccination and Immunisation (JCVI) recommend avoiding the AZ vaccine for those under 40 years old, based on ���reports of blood clotting cases in people who also had low levels of platelets in the UK, following the use of Oxford/AstraZeneca vaccine.��� [Public Health England, 2021]

With an Altmetric Attention Score of 3726 (as of May 23rd, 2021), Lei et al. [2021] has become the most discussed paper in the history of Circulation Research and is in the top 0.005% of discussed papers across all topics. By reading a random sample of social media posts that link to the paper, we found that the great majority of readers express a view that the paper shows that the vaccine is not safe, and that therefore people should not get vaccinated. This view has also been widely shared in blog posts, such as Adams [2021], which states, ���Bombshell Salk Institute science paper reveals the covid spike protein is what���s causing deadly blood clots and it���s in all the covid vaccines (by design)��� and concludes ���The vaccines literally inject people with the very substance that kills them. This isn���t medicine; its medical violence against humanity���. Furthermore, some doctors are now publicly expressing concerns about vaccine safety, based on concerns about the impact of spike proteins. [Bruno et al., 2021]

Because Lei et al. [2021] did not explicitly discuss the relevance of its findings to vaccine safety, and because it has been widely cited as showing that vaccines are not safe, including by some doctors, we will examine whether its findings should result in pausing or stopping the vaccine rollouts.

Analysis of Current Data

To ascertain whether spike protein impairment of endothelial function reduces vaccine safety we can directly observe the results of vaccine use.

Overall Vaccine Safety

The vaccine with the most recorded cases of VITT is the AZ vaccine. The largest roll-out of the AZ vaccine is in England. The roll-out began in December 2020, and by the start of February 2021 over 10 million people had received at least one dose. By mid-April 2021, over 10 million people had received their second dose.

Public Health England publishes data on ���Excess mortality in England���. This data shows that from March 20, 2020, until February 19, 2021, there were 101,486 excess deaths in England. From February 20, 2021 (two months after the start of England���s vaccine roll-out), until April 30, 2021 (the latest data available at writing), there have been no excess deaths in England.

As of May 5, 2021, there were 262 reported cases of VITT in the UK after the first dose of the vaccine, resulting in 51 deaths, and eight cases have been reported after a second dose [Medicines & Healthcare products Regulatory Agency, 2021]. 35 million people had received their first vaccination by this time. This is over half the population of the UK, and nearly all adults over 30 years old. Children and young adults in the UK will not be receiving the AZ vaccine, based on current guidelines.

Overall, with 51 deaths due to VITT, compared to 101,486 probably due to COVID-19, we can see that the overall impact of the vaccine is to greatly reduce deaths. Even if the spike proteins in vaccines resulted in reduced endothelial function (which, as we shall see shortly, they do not), the impact would clearly not be significant enough to result in the need to reduce or stop vaccine rollouts.

Comparative Analysis

All currently approved SARS-CoV-2 vaccines incorporate spike proteins. If the spike proteins in vaccines resulted in significantly reduced endothelial function, causing VITT, then we would expect to see reports of VITT in recipients of all the available vaccines. However, this is not the case. There are no reports of VITT in recipients of the Moderna or Pfizer vaccines.

It is unlikely that this is due to failure to identify VITT, since the particular combination of thrombosis and thrombocytopenia is very rare, and the issues around vaccine safety widely reported and discussed.

Furthermore, it is statistically unlikely. As of May 15, 2021, in the USA 156.2 million people had received at least one dose of SARS-CoV-2 vaccine, the vast majority of which were Pfizer and Moderna. Since each recipient���s vaccine VITT response is an independent binary event, we can model it with a binomial distribution. The UK VITT death rate is 0.0001%. If the spike proteins were the cause of VITT, we would expect the same death rate in the US, which would result in 183-273 deaths (99% confidence interval). However, we have seen zero reports of VITT in the US.

Mechanism of genetically-encoded spike protein vaccines

Lei et al. [2021] found that freely circulating, spike protein-decorated pseudovirus at a very high dosage (half a billion pseudovirus particles per animal) delivered directly to the trachea damages lung arterial endothelial cells in an animal model. Similarly, an extremely high concentration (4 micrograms per milliliter) of purified recombinant spike protein could damage human pulmonary arterial endothelial cells in vitro [Lei et al., 2021]. These extremely high concentrations were used to simulate what may happen during a severe case of COVID-19 infection, wherein humans may have what some have estimated to be as high as 1 to 100 billion virions in the lungs [Sender et al., 2020]. Given there are approximately 100 spike proteins per virion [Neuman et al., 2011], this means COVID-19 infections could in theory result in as many as 10 trillion spike proteins. In wild-type viruses, the spike protein is cleaved such that the S1 portion is released and can be free to circulate in the serum [Xia et al., 2020], where it could potentially interact with ACE2 receptors on the endothelium. Thus, in both the spike protein laboratory experiments described in Lei et al. [2021] and in severe COVID-19 cases, exceedingly large amounts of freely circulating spike protein are present.

Animal studies have been performed to measure the distribution of genetically-encoded vaccines and their protein products. In the intramuscular injection site, which is by definition where the maximum amount of payload (i.e. lipid nanoparticles-packed mRNA or adenovirus) will be present and, by extension, where the maximum amount of spike protein will be produced, the payload is undetectable within 24-72 hours in vivo and the protein is undetectable within 10 days at most, and closer to 4 days post-injection when using lower doses more similar to that given to patients. Animal studies show there is some dispersal of payload to distal regions of the body, but as expected the concentrations dramatically decrease from maximum concentration at the injection site (5,680 ng/mL) to much lower concentrations elsewhere, for example they found \>3000x lower concentrations (1.82 ng/mL) in the lung, and 10,000x lower concentrations in the brain (0.429 ng/mL) [Feldman et al., 2019].

Given only a fraction of the payload will be expressed and given that the measurements of mRNA do not necessarily distinguish between functional, full-length mRNA versus non-functional mRNA fragments, only a small fraction of the measured mRNA will be translated into spike protein. The distribution of actual spike protein throughout the body appears to follow an even steeper gradient ��� in vivo luciferase measurements in animals treated with mRNA vaccines show significant protein expression almost entirely confined to the site of injection [Pardi et al., 2015]. Note that the concentration given to patients is even lower than those used in these animal studies, and that the dispersion appears to drop off faster for lower doses. Overall, these data indicate relatively low, transient amounts of spike protein are produced by the vaccine, and the vast majority of spike protein produced is confined to the site of injection. Therefore, the concentration of freely circulating spike protein from vaccines available to the public is bound to be many orders of magnitude times lower than the amount used in Lei et al. [2021]. The impairment found in that study would not be expected from the relatively tiny, physiologically irrelevant amount of spike proteins found in a vaccine.

In order to be physiologically relevant to, let alone damaging to blood vessels, freely moving, soluble spike proteins would have to enter the circulatory system at high enough concentrations to bind and disrupt a significant number of ACE2 receptors on a significant number of vascular endothelial cells. As discussed above, measurements indicate that no significant amount of vaccine enters circulation. The confinement of the expressed spike protein away from the circulatory system prevents it from causing significant damage. In addition to the confined localization of expression, there is another safeguard preventing spike protein from accessing the vascular endothelium in any significant amount: The vaccine uses an engineered form of the spike protein that is fused to a transmembrane anchor. The transmembrane anchor allows the spike protein to appear on the surface or membrane of the cell, but it is held in place by the anchor. This prevents the vast majority of spike protein from drifting away while at the same time creates a fixed target for the immune system to recognize and develop antibodies against the spike protein [Corbett et al., 2020]. While there is a chance for the mRNA-expressing cells to release full spike protein upon destruction by immune cells, the amount released is only going to be a small fraction of that produced by the vaccine, and certainly at too low a level to be physiologically relevant.

In agreement with the mechanism-based estimates outlined above, Ogata et al. [2021] recently published empirical measurements of freely circulating spike protein produced by the vaccines using an ultra-sensitive SIMOA assay. Their measurements revealed the average spike protein levels to be less than 50 picograms per milliliter [Lei et al. 2021], which translates to 300 fM. In contrast, the dissociation constant for ACE2 is 15-40nM [Wang et al., 2020, Wrapp et al., 2020, Lan et al., 2020, Shang et al., 2020]. Thus, the femtomolar levels produced by the vaccines are approximately 100,000x lower than physiologically relevant concentrations, let alone pathological. Importantly, peak spike protein levels are reached within days after injection, and rapidly disappear to undetectable levels within 9 days of the first injection, and much lower to undetectable levels within 3 days after the second injection. At the same time, antibodies against spike protein are inversely correlated with circulating spike protein, supporting the hypothesis that anti-spike antibodies can quickly and effectively neutralize freely circulating spike protein.

Adenovirus Vector-Based Vaccines and VITT

Importantly, the endothelial damage described by Lei et al. [2021] is not the mechanism by which VITT occurs. VITT is an extremely rare and unique form of adverse event associated only with adenovirus vector-based vaccines. It is not caused by spike proteins targeting the endothelial cells, but rather due to induction of an immune response against platelet factor 4 (PF4) by adenovirus vector-based vaccines. PF4 is released by platelets and causes them to clump and form small clots and have a physiological role in stopping bleeding (hemostasis). Antibodies are not generated against self PF4, but have been described as a rare side effect of heparin, a commonly used blood thinner. In this condition termed heparin inducted thrombocytopenia (HIT), heparin binds to PF4, and the complex then stimulates an aberrant immune response. Antibodies to PF4 are generated, and these antibodies bind to PF4, and the resulting immune complex then binds to platelets and activates them. This releases more PF4, and a cycle ensues. Activated platelets in HIT form arterial and venous clots, and as platelets get consumed in the clots, the platelet count drops resulting in severe thrombocytopenia. This combination of clots and severely low platelets is unusual.

The reason VITT raised an alarm even with very few cases was the unique HIT like clinical presentation in the absence of any heparin exposure. The seriousness of the condition, and the need to use a blood thinner besides heparin also made it an important clinical and management issue. Studies now show that in VITT, the adenovirus vector-based vaccine is able to induce high levels of PF4 antibodies in 1 in 100,000 to 1 in 500,000 individuals much the same way as heparin does in patients with HIT [Greinacher et al., 2021a].

Greinacher et al. [2021b] assessed the clinical and laboratory features of VITT patients and found that the AZ vaccine ���can result in the rare development of immune thrombotic thrombocytopenia mediated by platelet-activating antibodies against PF4���. The AZ vaccine contains the preservative EDTA, which can help human cell-derived proteins from the vaccine enter the bloodstream, binding to PF4 and producing antibodies Greinacher et al. [2021a]. Lab tests showed that ���High-titer anti-PF4 antibodies activate platelets and induce neutrophil activation and NETs [neutrophil extracellular traps] formation, fuelling the VITT prothrombotic response��� Greinacher et al. [2021a]. Although the JJ vaccine does not use EDTA, it is an adenovirus vector-based vaccine, which is a particularly inflammatory stimulating virus [Appledorn et al., 2008, S Ahi et al., 2011]. The lack of EDTA may result in less cases of VITT, but even without EDTA proteins from the vaccine can enter the bloodstream.

The hypothesis that the cause of VITT is due to acute inflammatory reactions to vaccine components independent of the spike protein in adenovirus vector-based vaccines is consistent both with the experimental results of Greinacher et al. [2021a], and is consistent with the observation that only the AZ and JJ vaccines (both of which are adenovirus vector-based vaccines) have been associated with VITT, whereas the Moderna and Pfizer vaccines (both of which are mRNA vaccines) have not been associated with VITT.

Although the hypothesis of Greinacher et al. [2021a] has not yet been fully confirmed, it is consistent with lab testing, empirical evidence, the extreme rarity of VITT, and mechanistic constraints. It is also possible that it is remediable, since it is not due to the nature of the vaccine itself, but specific to the particulars of the formulation.

Conclusion

Given these observations, we conclude the vaccines do not produce enough freely circulating spike protein to induce vascular damage via the ACE2 receptor destabilization mechanism described in Lei et al. [2021]. On the contrary, the extremely low, femtomolar levels of circulating spike protein induced by the vaccine are unlikely to have any physiological relevance to vascular endothelial cells, while still allowing the immune system to develop a robust immune response to spike proteins. The presence of anti-spike antibodies may in fact serve to protect vaccinated individuals against not only SARS-CoV-2 infection, but also against spike-protein induced damage to the vascular endothelium. We speculate that this protection against spike protein-induced damage may in part explain why COVID19 symptoms are much less severe in vaccinated individuals Rossman et al. [2021].

There is now a very large amount of empirical data available that clearly shows the benefits of all approved SARSCoV-2 vaccines are far greater than the risks of extremely rare side effects. The data also is not consistent with the hypothesis that VITT is due to spike proteins, since the Pfizer and Moderna vaccines are not resulting in any reports of VITT. The data is, however, consistent with the hypothesis that side effects are due to inflammatory reactions to vaccine components in adenovirus vector-based vaccines.

Overall, we conclude that all approved SARS-CoV-2 vaccines provide far more benefits than risks, and that the very rare risk of VITT from the AZ and JJ vaccines is not due to the spike proteins, which are a fundamental part of how the vaccines work, but is most likely due to specific details of the formulation of the vaccines.

References

Yuyang Lei, Jiao Zhang, Cara R. Schiavon, Ming He, Lili Chen, Hui Shen, Yichi Zhang, Qian Yin, Yoshitake Cho, Leonardo Andrade, Gerald S. Shadel, Mark Hepokoski, Ting Lei, Hongliang Wang, Jin Zhang, Jason X. J. Yuan, Atul Malhotra, Uri Manor, Shengpeng Wang, Zu-Yi Yuan, and John Y-J. Shyy. Sars-cov-2 spike protein impairs endothelial function via downregulation of ace 2. Circulation Research, 128(9):1323���1326, 2021. doi:10.1161/CIRCRESAHA.121.318902.

Francois Potus, Vicky Mai, Marius Lebret, Simon Malenfant, Emilie Breton-Gagnon, Annie C Lajoie, Olivier Boucherat, Sebastien Bonnet, and Steeve Provencher. Novel insights on the pulmonary vascular consequences of covid-19. American Journal of Physiology-Lung Cellular and Molecular Physiology, 319(2):L277���L288, 2020.

Maximilian Ackermann, Stijn E Verleden, Mark Kuehnel, Axel Haverich, Tobias Welte, Florian Laenger, Arno Vanstapel, Christopher Werlein, Helge Stark, Alexandar Tzankov, et al. Pulmonary vascular endothelialitis, thrombosis, and angiogenesis in covid-19. New England Journal of Medicine, 383(2):120 128, 2020.

Hasan K Siddiqi, Peter Libby, and Paul M Ridker. Covid-19���a vascular disease. Trends in Cardiovascular Medicine, 2020.

Laure-Anne Teuwen, Vincent Geldhof, Alessandra Pasut, and Peter Carmeliet. Covid-19: the vasculature unleashed. Nature Reviews Immunology, 20(7):389���391, 2020.

Keiji Kuba, Yumiko Imai, Shuan Rao, Hong Gao, Feng Guo, Bin Guan, Yi Huan, Peng Yang, Yanli Zhang, Wei Deng, et al. A crucial role of angiotensin converting enzyme 2 (ace2) in sars coronavirus���induced lung injury. Nature medicine, 11(8):875���879, 2005.

M Makris, S Pavord, W Lester, M Scully, and BJ Hunt. Vaccine-induced immune thrombocytopenia and thrombosis (vitt). Research and Practice in Thrombosis and Haemostasis, page e12529, 2021.

Elisabeth Mahase. Astrazeneca vaccine: Blood clots are ���extremely rare��� and benefits outweigh risks, regulators conclude. BMJ, 373, 2021. doi:10.1136/bmj.n931.

Public Health England. Jcvi advises on covid-19 vaccine for people aged under 40, May 2021. URL https://tinyurl.com/a8eud9a6.

Mike Adams. Bombshell salk institute science paper reveals the covid spike protein is what���s causing deadly blood clots, Jul 2021. URL https://tinyurl.com/52pncva7.

Roxana Bruno, Peter McCullough, Teresa Forcades i Vila, Alexandra Henrion-Caude, Teresa Garc��a-Gasca, Galina P Zaitzeva, Sally Priester, Mar��a J Mart��nez Albarrac��n, Alejandro Sousa-Escandon, Fernando L��pez Mirones, et al. Sars-cov-2 mass vaccination: Urgent questions on vaccine safety that demand answers from international health agencies, regulatory authorities, governments and vaccine developers. Beaufort Observer, 2021.

Medicines & Healthcare products Regulatory Agency. Coronavirus vaccine - weekly summary of yellow card reporting, May 2021. URL https://tinyurl.com/8xwydmyf.

Ron Sender, Yinon Moise Bar-On, Avi Flamholz, Shmuel Gleizer, Biana Bernsthein, Rob Phillips, and Ron Milo. The total number and mass of sars-cov-2 virions in an infected person. medRxiv, 2020.

Benjamin W Neuman, Gabriella Kiss, Andreas H Kunding, David Bhella, M Fazil Baksh, Stephen Connelly, Ben Droese, Joseph P Klaus, Shinji Makino, Stanley G Sawicki, et al. A structural analysis of m protein in coronavirus assembly and morphology. Journal of structural biology, 174(1):11���22, 2011.

Shuai Xia, Qiaoshuai Lan, Shan Su, Xinling Wang, Wei Xu, Zezhong Liu, Yun Zhu, Qian Wang, Lu Lu, and Shibo Jiang. The role of furin cleavage site in sars-cov-2 spike protein-mediated membrane fusion in the presence or absence of trypsin. Signal transduction and targeted therapy, 5(1):1���3, 2020.

Robert A Feldman, Rainard Fuhr, Igor Smolenov, Amilcar Mick Ribeiro, Lori Panther, Mike Watson, Joseph J Senn, Mike Smith, rn Almarsson, Hari S Pujar, et al. mrna vaccines against h10n8 and h7n9 influenza viruses of pandemic potential are immunogenic and well tolerated in healthy adults in phase 1 randomized clinical trials. Vaccine, 37 (25):3326���3334, 2019.

Norbert Pardi, Steven Tuyishime, Hiromi Muramatsu, Katalin Kariko, Barbara L Mui, Ying K Tam, Thomas D Madden, Michael J Hope, and Drew Weissman. Expression kinetics of nucleoside-modified mrna delivered in lipid nanoparticles to mice by various routes. Journal of Controlled Release, 217:345���351, 2015.

Kizzmekia S Corbett, Darin K Edwards, Sarah R Leist, Olubukola M Abiona, Seyhan Boyoglu-Barnum, Rebecca A Gillespie, Sunny Himansu, Alexandra Sch��fer, Cynthia T Ziwawo, Anthony T DiPiazza, et al. Sars-cov-2 mrna vaccine design enabled by prototype pathogen preparedness. Nature, 586(7830):567���571, 2020.

Alana F Ogata, Chi-An Cheng, Micha��l Desjardins, Yasmeen Senussi, Amy C Sherman, Megan Powell, Lewis Novack, Salena Von, Xiaofang Li, Lindsey R Baden, and David R Walt. Circulating SARS-CoV-2 Vaccine Antigen Detected in the Plasma of mRNA-1273 Vaccine Recipients. Clinical Infectious Diseases, 05 2021. ISSN 1058-4838. doi:10.1093/cid/ciab465. ciab465.

Qihui Wang, Yanfang Zhang, Lili Wu, Sheng Niu, Chunli Song, Zengyuan Zhang, Guangwen Lu, Chengpeng Qiao, Yu Hu, Kwok-Yung Yuen, et al. Structural and functional basis of sars-cov-2 entry by using human ace2. Cell, 181 (4):894���904, 2020.

Daniel Wrapp, Nianshuang Wang, Kizzmekia S Corbett, Jory A Goldsmith, Ching-Lin Hsieh, Olubukola Abiona, Barney S Graham, and Jason S McLellan. Cryo-em structure of the 2019-ncov spike in the prefusion conformation. Science, 367(6483):1260���1263, 2020.

Jun Lan, Jiwan Ge, Jinfang Yu, Sisi Shan, Huan Zhou, Shilong Fan, Qi Zhang, Xuanling Shi, Qisheng Wang, Linqi Zhang, et al. Structure of the sars-cov-2 spike receptor-binding domain bound to the ace2 receptor. Nature, 581 (7807):215���220, 2020.

Jian Shang, Gang Ye, Ke Shi, Yushun Wan, Chuming Luo, Hideki Aihara, Qibin Geng, Ashley Auerbach, and Fang Li. Structural basis of receptor recognition by sars-cov-2. Nature, 581(7807):221���224, 2020.

Andreas Greinacher, Kathleen Selleng, Jan Wesche, Stefan Handtke, Raghavendra Palankar, Konstanze Aurich, Michael Lalk, Karen Methling, Uwe V��lker, Christian Hentschker, et al. Towards understanding chadox1 ncov19 vaccine-induced immune thrombotic thrombocytopenia (vitt). Research Square, 2021a.

Andreas Greinacher, Thomas Thiele, Theodore E Warkentin, Karin Weisser, Paul A Kyrle, and Sabine Eichinger. Thrombotic thrombocytopenia after chadox1 ncov-19 vaccination. New England Journal of Medicine, 2021b.

DM Appledorn, A McBride, S Seregin, JM Scott, Nathan Schuldt, A Kiang, S Godbehere, and A Amalfitano. Complex interactions with several arms of the complement system dictate innate and humoral immunity to adenoviral vectors. Gene therapy, 15(24):1606���1617, 2008.

Yadvinder S Ahi, Dinesh S Bangari, and Suresh K Mittal. Adenoviral vector immunity: its implications and circumvention strategies. Current gene therapy, 11(4):307���320, 2011.

Hagai Rossman, Smadar Shilo, Tomer Meir, Malka Gorfine, Uri Shalit, and Eran Segal. Covid-19 dynamics after a national immunization program in israel. Nature medicine, pages 1���7, 2021.

 •  0 comments  •  flag
Share on Twitter
Published on October 26, 2021 17:00

October 16, 2021

Statistical problems found when studying Long Covid in kids

Summary: Statistical tests need to be paired with proper data and study design to yield valid results. A recent review paper on Long Covid in children provides a useful example of how researchers can get this wrong. We use causal diagrams to decompose the problem and illustrate where errors were made.


Contents Background Control groups and RCTs Control groups and observational studies Structure of the Long Covid review The problem of p-values Conclusion and next steps AcknowledgementsBackground

A recent review paper by Australian and Swiss doctors, How Common Is Long COVID in Children and Adolescents?, was widely discussed in the press, with 128 news stories from 103 outlets. The headlines were reassuring:

���Global studies on long COVID and children ���unnecessarily worrying���, say researchers��� ���Long Covid in children and adolescents is less common than previously feared��� ���Kids��� Covid-19 risk less than we feared, says study���.

The paper in question does not actually say any of these things, but rather concludes that ���the true incidence of this syndrome in children and adolescents remains uncertain.��� However, the challenges of accurate science journalism are not the topic for our article today. Rather, we will describe a critical flaw in the statistical analysis in this review, as an exercise in better understanding how to interpret statistical tests.

A key contribution of the review is that it separates those studies that use a ���control group��� from those that do not. The authors suggest we should focus our attention on the studies with a control group, because ���in the absence of a control group, it is impossible to distinguish symptoms of long COVID from symptoms attributable to the pandemic.��� The National Academy of Sciences warns that ���use of an inappropriate control group can make it impossible to draw meaningful conclusions from a study.��� As we will see, this is, unfortunately, what happened in this review. But first, let���s do a brief recap of control groups and statistical tests.

Control groups and RCTs

When assessing the impact of an intervention, such as the use of a new drug, the gold standard is to use a Randomised Controlled Trial (RCT). In an RCT, a representative sample is selected, and randomly split into two groups, one of which receives the medical intervention (e.g. the drug), and one which doesn���t (normally that one gets a placebo). This can, when things go well, show clearly whether the drug made a difference. Generally, a ���p value��� is calculated, which is the probability that the effect seen in the data would be observed by chance if there was truly no difference between cases and controls (i.e. null hypothesis was true), along with a ���confidence interval���, which is the range of outcomes that would be expected after considering random variation. If the p value is less than some number (often 0.05) the RCT is considered to be ���statistically significant���. Without an RCT, it can be harder to distinguish whether two groups differ because of the intervention, or because of some other difference between the groups.

We can represent this analysis as a diagram like so:

[image error] Causal diagram for an RCT

This is an example of a (simplified and informal) causal diagram. The black arrows show the direct relationships we can measure or control ��� in this case, our selection of control group vs experimental group is used to decide who gets the drug, and we then measure the outcome (e.g. do symptoms improve) for each group based on our group selection. Because the selection was random (since this is an RCT), we can infer the dotted line: how much does taking the drug change the outcome? If the size of the control or experimental group is small, then it is possible that the difference in outcomes between the two groups is entirely due to random chance. To handle that, we pop the effect size and sample size into statistical software such as R and it will tell us the p value and confidence interval of the effect.

Because RCTs are the gold standard for assessing the impact of a medical intervention, they are used whenever possible. Nearly all drugs on the market have been through multiple RCTs, and most medical education includes some discussion of the use and interpretation of RCTs.

Control groups and observational studies

Sometimes, as discussed in The Planning of Observational Studies of Human Populations, ���it is not feasible to use controlled experimentation���, but we want to investigate a causal relationship between variables, in which case we may decide to use an observational study. For instance, studying ���the relationship between smoking and health���, risk factors for ���injuries in motor accidents���, or ���effects of new social programmes���. In cases like these, it isn���t possible to create a true ���control group��� like in an RCT, since we cannot generally randomly assign people, for instance, to a group that are told to start smoking.

Instead, we have to try to find two groups that are as similar as possible, but differ only in the variable under study ��� for instance, a group of smokers and a group of non-smokers that are of similar demographics, health, etc. This can be challenging. Indeed, the question ���does smoking cause cancer��� remained controversial for decades, despite many attempts at observational studies.

Researchers have noted that ���results from observational studies can confuse the effect of interest with other variables��� effects, leading to an association that is not causal. It would be helpful for clinicians and researchers to be able to visualize the structure of biases in a clinical study���. They suggest using causal diagrams for this purpose, including to help avoid confounding bias in epidemiological studies. So, let���s give that a try now!

Structure of the Long Covid review

In How Common Is Long COVID in Children and Adolescents? the authors suggest we focus on studies of Long Covid prevalence that include a control group. The idea is that we take one group that has (or had) COVID, and one group that didn���t, and then see if they have Long Covid symptoms a few weeks or months later. Here���s what the causal diagram would look like:

[image error] Idealised causal diagram for Long Covid prevalence

Here we are trying to determine if COVID infection causes Long Covid symptoms. Since COVID infection is the basis of the Control group selection, and we can compare the Long Covid symptoms for each group, that would allow us to infer the answer to our question. The statistical tests reported in the review paper only apply if this structure is correct.

However, it���s not quite this simple. We don���t directly know who has a COVID infection, but instead we have to infer it using a test (e.g serology, PCR, or rapid). It is so easy nowadays to run a statistical test on a computer, it can be quite tempting to just use the software and report what it says, without being careful to check that the statistical assumptions implicitly being made are met by the data and design.

We might hope that we could modify our diagram like so:

[image error] Idealised causal diagram including testing

In this case, we could still directly infer the dotted line (i.e ���does COVID infection cause Long Covid symptoms?���), since there is just one unknown relationship, and all the arrows go in the same direction.

But unfortunately, this doesn���t work either. The link between test results and infection is not perfect. Some researchers, for instance, have estimated that PCR tests may miss half, or even 90% of infections. Part of the reason is that ���thresholds for SARS-CoV-2 antibody assays have typically been determined using samples from symptomatic, often hospitalised, patients���. Others have found that 36% of infections do not seroconvert, and that children in particular may serorevert. It appears that false negative test results may be more common in children ��� tests are most sensitive when used for middle-aged men.

To make things even more complicated, research shows that ���Long-COVID is associated with weak anti-SARS-CoV-2 antibody response.���

Putting this all together, here���s what our diagram now looks like, using red arrows here to indicate negative relationships:

[image error] Causal diagram including some confounders

This shows that test results are now just associated with COVID infection, but also with Age and Long Covid symptoms, and that the association between COVID infection and test result is not imperfect and not fully understood.

Because of this, we can���t now directly infer the relationship between COVID infection and Long Covid symptoms. We would first need to fully understand and account for the confounders and uncertainties. Simply reporting the results of a statistical test does not give meaningful information in this case.

In particular, we can see that the issues we have identified all bias the data in the same direction: they result in infected cases being incorrectly placed in the control group.

For more details about this issue, see the article Long Covid: issues around control selection, by Dr Nisreen Alwan MBE.

The problem of p-values

The review claims that ���all studies to date have substantial limitations or do not show a difference between children who had been infected by SARS-CoV-2 and those who were not���. This claim appears to be made on the basis of p-values, which are shown for each control group study in the review. All but one study did actually find a statistically significant difference between the groups being compared (at p<0.05, which is the usual cut-off for such analyses).

Regardless of what the results actually show, p-values are not being used in an appropriate way here. The American Statistical Association (ASA) has released a ���Statement on Statistical Significance and P-Values��� with six principles underlying the proper use and interpretation of the p-value. In particular, note the following principles:

P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone. Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold. A p-value, or statistical significance, does not measure the size of an effect or the importance of a result.

A p-value is lower when there is more data, or a stronger relationship in the data (and visa versa). A high p-value does not necessarily mean that there is not a relationship in the data ��� it may simply mean that not enough data has been collected.

Because a p-value ���does not measure the size of an effect or the importance of a result���, they don���t actually tell us about the prevalence of Long Covid. The use of p-values in studying drug efficacy is very common, since we do often want to answer the question ���does this drug help at all���? But to assess what the range of prevalence levels may be, we instead need to look at confidence intervals, which unfortunately are not shown at all in the review.

Furthermore, we should not look at p-values out of context, but instead need to also consider the likelihood of alternative hypotheses. The alternative hypothesis provided in the review is that the symptoms may be due to ���lockdown measures, including school closures���.

One of the included control group studies stood out as an outlier, in which 10% of Swiss children with negative tests were found to have Long Covid symptoms, many times higher than other similar studies. Was this because of the confounding effects discussed in the previous section, or was it due to lockdowns and school closures? Switzerland did not have a full lockdown, and schools were only briefly closed, reopening nearly a year before the Long Covid symptom tests in the study. On the other hand, Switzerland may have had a very high number of cases. Wikipedia notes that ���the Swiss government has had an official policy of not testing people with only mild symptoms���, and has still recorded nearly 900 thousand cases in a population of just 8 million people.

In a statistical design, an alternative hypothesis should not be considered the null hypothesis unless we are quite certain it represents the normal baseline behaviour. But assuming that the symptoms found in the control group are due to pandemic factors other than infection is itself a hypothesis that needs careful testing and does not seem to be fully supported by the data in the study. It is not an appropriate design to use this as the base case, as was done in the review.

Conclusion and next steps

The problem with control group definition, incorrect use of statistical tests, and statistical design problems does not change the key conclusion of the review: ���the true incidence of this syndrome in children and adolescents remains uncertain.��� So, how to we resolve this uncertainty?

The review has a number of suggestions for future research to improve our understanding or Long Covid prevalence in children. As we���ve seen in this article, we also need to more carefully consider and account for confounding bias. It is often possible, mathematically, to infer an association even in more complex causal relationships such as we see above. However, doing so requires a full and accurate understanding of all of the relationships in the causal structure.

Furthermore, a more complete and rigorous assessment of confounders needs to be completed. We���ve only scratched the surface in this article on one aspect: bias in the control group. Bias in the ���Long Covid symptoms��� node also needs to be considered. For instance: are all Long Covid symptoms being considered; is there under-reporting due to difficulties of child communication or understanding; is there under-reporting due to gender bias; are ���on again / off again��� variable symptoms being tracked correctly; and so forth.

Whatever the solution turns out to be, it seems that for a while at least, the prevalence of Long Covid in children will remain uncertain. How parents, doctors, and policy makers respond to this risk and uncertainty will be a critical issue for children around the world.

Acknowledgements

Many thanks to Hannah Davis, Dr Deepti Gurdasani, Dr Rachel Thomas, Dr Zo�� Hyde, and Dr Nisreen Alwan MBE for invaluable help with research and review for this article.

 •  0 comments  •  flag
Share on Twitter
Published on October 16, 2021 17:00

Jeremy Howard's Blog

Jeremy   Howard
Jeremy Howard isn't a Goodreads Author (yet), but they do have a blog, so here are some recent posts imported from their feed.
Follow Jeremy   Howard's blog with rss.