Link Search Menu Expand Document

Cleaning Untidy Data with OpenRefine

Hosted by the Carnegie Mellon University (CMU) Libraries

About this Workshop

Tired of spending hours and hours cleaning messy data in Excel spreadsheets? Come learn OpenRefine, an easy-to-use, open source tool for data cleaning. OpenRefine (formerly Google Refine) helps you prepare your data for analysis.

Quickly and easily transform data, split and merge columns, remove whitespace, and perform many more common data cleaning tasks. With OpenRefine, you can also easily create JSON scripts for repeating series of tasks across multiple datasets.

No previous experience is required. This will be a hands-on workshop–please bring a laptop.


Presenters

Sarah Young
Principal Librarian
Office: 109G, Hunt Library
sarahy@andrew.cmu.edu

Goals of this Workshop

  • Import data and perform basic cleaning steps
  • Use sorting, filtering and facets to clean data
  • Restructure columns and cells
  • Export JSON script for repeating steps
  • Save and export files and projects

Schedule

Thursday, February 15, 2023

Time Content
15:00 to 15:05 Welcome and Introduction
15:05 to 15:15 Importing data and basic cleanup
15:15 to 15:30 Sorting, filtering and faceting
15:30 to 15:40 Transposing data and advanced facets
15:40 to 15:50 Clustering
15:50 to 16:10 Splitting and concatenating
16:10 to 16:20 Restructuring data
16:20 to 16:25 Exporting JSON scripts
16:25 to 16:30 Wrap-Up

Actual schedule may vary depending on group needs; all times refer to Eastern Standard Time (EST)

Slides

Click on the slides then press CTRL+Shift+F for full screen

Acknowledgements

The lesson materials and slides for this workshop were largely adapted from an OpenRefine Workshop delivered at IASSIST 2018 in Montreal by:

Leanne Trimble, Data & Statistics Librarian
Kelly Schultz, Data Visualization Librarian
University of Toronto Libraries


Table of contents