One aspect of a typical data infrastructure that can be challenging is that data in databases and data warehouses often reside on servers different from the servers used for data analysis. As a consequence, when large data sets are handled, a surprisingly large amount of time can be spent moving data between the servers a database or data warehouse are living on and the servers used for data analysis and machine learning.

