How Availability of More Data and AI Capability Might not be Entirely Related

In this interesting column of WIRED ‘AI and “enormous data’ could make tech giants harder to topple‘, the results of a study by Google is exposed, which shows that machine-learning AI algorithms can become significantly better if exposed to very large amounts of data.


This is of course an interesting finding because intuitively, the learning capability must become asymptotic at the end; it seems that the asymptote would be actually somewhat further away than thought. It exists though: “Crunching Google’s giant dataset of 300 million images didn’t produce a huge benefit—jumping from 1 million to 300 million images increased the object detection score achieved by just 3 percentage points“.


The paper continues to discuss on the fact that this finding would prove an advantage for those companies that currently amass huge amounts of data such as Google or Facebook. However there is something to be examined as to whether this scale of ‘enormous data’ really means a definite advantage of the AI algorithms that are derived. I personally believe that there must be also something in the quality of the dataset – in particular, how many deviant data is available that is properly categorized – how much like real life it looks. And it is this quality of data which is important for proper learning.


I am thus not entirely convinced by the argument that ‘enormous data’ is better than ‘big data’ when it comes to the value to be derived from AI. Maybe real-life data would be more discriminating.


 •  0 comments  •  flag
Share on Twitter
Published on September 28, 2017 04:30
No comments have been added yet.