Link Search Menu Expand Document

Cleaning Untidy Data with OpenRefine

Hosted by the Carnegie Mellon University (CMU) Libraries

About this Workshop

Tired of spending hours and hours cleaning messy data in Excel spreadsheets? Come learn OpenRefine, an easy-to-use, open source tool for data cleaning. OpenRefine (formerly Google Refine) helps you prepare your data for analysis.

Quickly and easily transform data, split and merge columns, remove whitespace, and perform many more common data cleaning tasks. With OpenRefine, you can also easily create JSON scripts for repeating series of tasks across multiple datasets.

No previous experience is required. This will be a hands-on workshop–please bring a laptop.


Presenters

Sarah Young
Principal Librarian
Office: 109G, Hunt Library
sarahy@andrew.cmu.edu

Goals of this Workshop

  • Import data and perform basic cleaning steps
  • Use sorting, filtering and facets to clean data
  • Restructure columns and cells
  • Export JSON script for repeating steps
  • Save and export files and projects

Schedule

Tuesday, October 29, 2024

Time Content
10:00 to 10:05 Welcome and Introduction
10:05 to 10:15 Importing data and basic cleanup
10:15 to 10:30 Sorting, filtering and faceting
10:30 to 10:40 Transposing data and advanced facets
10:40 to 10:50 Clustering
10:50 to 11:10 Splitting and concatenating
11:10 to 11:20 Restructuring data
11:20 to 11:25 Exporting JSON scripts
11:25 to 11:30 Wrap-Up

Actual schedule may vary depending on group needs; all times refer to Eastern Standard Time (EST)

Slides

Click on the slides then press CTRL+Shift+F for full screen

Workshop surveys

Pre-workshop Survey
Post-workshop Survey

Acknowledgements

The lesson materials and slides for this workshop were largely adapted from an OpenRefine Workshop delivered at IASSIST 2018 in Montreal by:

Leanne Trimble, Data & Statistics Librarian
Kelly Schultz, Data Visualization Librarian
University of Toronto Libraries


Table of contents