or Hadoop. All of the data in these various databases and data sources will need to be integrated, cleansed, transformed, normalized, and so on. These tasks go by many names, such as extraction, transformation, and load, “data munging,” “data wrangling,” “data fusion,” “data crunching,” and so on. Like source data, the data generated from data science activities also need to be stored and managed. Again, a database is the typical storage location for the data generated by these activities because they can then be easily distributed and shared with different parts of the organization. As a
...more
This highlight has been truncated due to consecutive passage length restrictions.

