Data Science - Part XVIII - Big Data Fundamentals for Data Scientists

This lecture will focus our attention towards understanding the Big Data landscape from a Data Scientists perspective. The presentation will start off with a brief overview of the need for large scale data processing technologies and then introduce the underlying technologies which drive the modern big data landscape. The techniques pioneered by the Apache Foundation will be discussed in some technical detail, however, the emphasis will remain on creating a broad awareness of the Hadoop 2.0technologies as it relates to data science and machine learning. We will then introduce some mechanisms for applying the MapReduce framework, accessing HDFS data, and creating analytics within the R programming language. Finally, we will bring all of the Big Data concepts into focus through working a practical example of New York Taxi Cab data within R.