The other way that data can be ‘big’ is by measuring many characteristics, or features, on each example. This quantity is often known as p, perhaps denoting parameters. Thinking again back to my statistical youth, p used to be generally less than 10 – perhaps we knew a few items of an individual’s medical history. But then we started having access to millions of that person’s genes, and genomics became a small n, large p problem, where there was a huge amount of information about a relatively small number of cases. And now we have entered the era of large n, large p problems, in which there
...more