Data Science

  • Data Science - Part I - Building Predictive Analytics Capabilities

    This is the first video lecture in a series of data analytics topics and geared to individuals and business professionals who have no understand of building modern analytics approaches. This lecture provides an overview of the models and techniques we will address throughout the lecture series, we will discuss Business Intelligence topics, predictive analytics, and big data technologies. Finally, we will walk through a simple yet effective example which showcases the potential of predictive analytics in a business context.

  • Data Science - Part II - Working with R & R Studio

    This tutorial will go through a basic primer for individuals who want to get started with predictive analytics through downloading the open source (FREE) language R. I will go through some tips to get up and started and building predictive models ASAP.

  • Data Science - Part III - EDA & Model Selection

    This lecture introduces the concept of EDA, understanding, and working with data for machine learning and predictive analysis. The lecture is designed for anyone who wants to understand how to work with data and does not get into the mathematics. We will discuss how to utilize summary statistics, diagnostic plots, data transformations, variable selection techniques including principal component analysis, and finally get into the concept of model selection.

  • Data Science - Part IX - Support Vector Machine

    This lecture provides an overview of Support Vector Machines in a more relatable and accessible manner. We will go through some methods of calibration and diagnostics of SVM and then apply the technique to accurately detect breast cancer within a dataset.

  • Data Science - Part VI - Market Basket and Product Recommendation Engines

    This lecture provides an overview of association analysis, which includes topics such as market basket analysis and product recommendation engines. The first practical example centers around analyzing supermarket retailer product receipts and the second example touches upon the use of the association rules in the political arena.

  • Data Science - Part X - Time Series Forecasting

    This lecture provides an overview of Time Series forecasting techniques and the process of creating effective forecasts. We will go through some of the popular statistical methods including time series decomposition, exponential smoothing, Holt-Winters, ARIMA, and GLM Models. These topics will be discussed in detail and we will go through the calibration and diagnostics effective time series models on a number of diverse datasets.

  • Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets

    This lecture provides an overview of some modern regression techniques including a discussion of the bias variance tradeoff for regression errors and the topic of shrinkage estimators. This leads into an overview of ridge regression, LASSO, and elastic nets. These topics will be discussed in detail and we will go through the calibration/diagnostics and then conclude with a practical example highlighting the techniques.

  • Data Science - Part XIV - Genetic Algorithms

    This lecture provides an overview on biological evolution and genetic algorithms in a machine learning context. We will start off by going through a broad overview of the biological evolutionary process and then explore how genetic algorithms can be developed that mimic these processes. We will dive into the types of problems that can be solved with genetic algorithms and then we will conclude with a series of practical examples in R which highlights the techniques: The Knapsack Problem, Feature Selection and OLS regression, and constrained optimizations.

  • Data Science - Part XVI - Fourier Analysis

    This lecture provides an overview of the Fourier Analysis and the Fourier Transform as applied in Machine Learning. We will go through some methods of calibration and diagnostics and then apply the technique on a time series prediction of Manufacturing Order Volumes utilizing Fourier Analysis and Neural Networks.

  • Data Science - Part XVII - Deep Learning & Image Processing

    This lecture provides an overview of Image Processing and Deep Learning for the applications of data science and machine learning. We will go through examples of image processing techniques using a couple of different R packages. Afterwards, we will shift our focus and dive into the topics of Deep Neural Networks and Deep Learning. We will discuss topics including Deep Boltzmann Machines, Deep Belief Networks, & Convolutional Neural Networks and finish the presentation with a practical exercise in hand writing recognition techniques on the MNIST dataset.

  • Data Science - Part XVIII - Big Data Fundamentals for Data Scientists

    This lecture will focus our attention towards understanding the Big Data landscape from a Data Scientists perspective. The presentation will start off with a brief overview of the need for large scale data processing technologies and then introduce the underlying technologies which drive the modern big data landscape. The techniques pioneered by the Apache Foundation will be discussed in some technical detail, however, the emphasis will remain on creating a broad awareness of the Hadoop 2.0technologies as it relates to data science and machine learning. We will then introduce some mechanisms for applying the MapReduce framework, accessing HDFS data, and creating analytics within the R programming language. Finally, we will bring all of the Big Data concepts into focus through working a practical example of New York Taxi Cab data within R.

  • Data Science Skills - Our MOOC and online platforms selection

    In today's Big Data-crazy world, data scientists might be the most-coveted workers in the IT industry, pretty much able to write their own ticket and work wherever they want in the midst of an ongoing skills shortage. Nevertheless, certain skills make potential candidates even more desirable to employers willing to shell out big bucks for the best talent.

    In order to help you gain new skills online, here is our MOOC and online platforms selection: 

  • Data Science vs Big Data vs Data Analytics

    Do you know the difference between a Data Scientist and a Data Analyst? To be honest, before I started doing research for this post, I’m not sure I really knew either.

  • Healthcare - Saving lives with Big Data

    Big Data has changed the way we manage, analyze and leverage data in any industry. One of the most promising areas where Big Data can be applied to make a change is healthcare. Healthcare analytics has the potential to reduce costs of treatment, predict outbreaks of epidemics, avoid preventable diseases and improve the quality of life in general.

  • How to Design and Implement a Data Lake?

    Companies  are continuously envisioning new and innovative ways to use data for operational reporting and advanced analytics. The Data Lake, a next-generation data storage and management solution, was developed to meet the ever-evolving needs of increasingly savvy users.

    The white paper published by Knowledgent (a data and analytics firm ) explores existing challenges with the enterprise data warehouse and other existing data management and analytic solutions. It describes the necessary features of the Data Lake architecture and the capabilities required to leverage a Data and Analytics as a Service (DAaaS) model. It also covers the characteristics of a successful Data Lake implementation and critical considerations for designing a Data Lake.