Data Science: Data Analysis

Tools, software packages, and library resources for doing big data and data science.

Text Analysis

  • OpenRefine
    "OpenRefine (formerly Google Refine) is a powerful tool for working with messy data: cleaning it; transforming it from one format into another; extending it with web services; and linking it to databases like Freebase." View large datasets and automatically create filters and facets, transform the structure of the data, and clean up inconsistencies by clustering and matching entries to preferred forms.

  • Voyant
    Voyant is a simple and easy web-based text analysis tool. You can paste URLs or upload the text in HTML, XML, PDF, RTF, and MS Word) or paste URLs to analyze word frequencies in the corpus. The picture below is an example of word frequency analysis and a word cloud for the library's homepage using Voyant. 

    Voyant Example

    For more information check the official documentation

  • R
    R is an open-source programming language and software environment for large scale text/data analysis and data mining. It includes various useful functions such as data extraction, genre detection, and topic modeling. It aims for more for advanced users. With knowledge in other pogromming languages such as C and C++, you can create your own functions. The precompiled packages are available for UNIX, Linux, Windows and MacOS.

    Book about R
    Introduction to Data Science
    Introduction to Data Science by Jeffrey M. Stanton is a textbook for data science courses at Syracuse. It is freely available in PDF (non-interactive version) and iTunes (interactive version for iPad). This book uses R language as a tool demonstrating the concepts and applications of data science for statistical computing and graphics. 


Software Packages

Check this link for a list of licensed software packages available on campus. 


Click the links above to locate relevant books available in the library.