Skip to Main Content

Library & Information Services Sub Menu Dismiss

Data Science: Get Started

Tools, software packages, and library resources for doing big data and data science.

Open Source Data Science Courses

Data Science / Harvard Video Archive & Current Lecture Videos

  • "This course introduces methods for five key facets of an investigation: data wrangling, cleaning, and sampling to get a suitable data set; data management to be able to access big data quickly and reliably; exploratory data analysis to generate hypotheses and intuition; prediction based on statistical methods such as regression and classification; and communication of results through visualization, stories, and interpretable summaries."

Introduction to Data Science / University of Washington, Coursera

  • "This course offers the basic techniques of data science, including both SQL and NoSQL solutions for massive data management (e.g., MapReduce and contemporaries), algorithms for data mining (e.g., clustering and association rule mining), and basic statistical modeling (e.g., linear and non-linear regression)."

Statistics One/ Princeton, Coursera

  • This course introduced fundamental concepts in statistics. The course also provides an introduction to the R programming language in the lab session. 

Maching Learning/ Stanford, Coursera 

  • "This course provides a broad introduction to machine learning, datamining, and statistical pattern recognition. Topics include: (i) Supervised learning (parametric/non-parametric algorithms, support vector machines, kernels, neural networks). (ii) Unsupervised learning (clustering, dimensionality reduction, recommender systems, deep learning). (iii) Best practices in machine learning (bias/variance theory; innovation process in machine learning and AI)." Recommended to use with MatLab or Octave (An open source alternative to MatLab)