Cleaning Untidy Data with OpenRefine
Hosted by the Carnegie Mellon University (CMU) Libraries
About this Workshop
Tired of spending hours and hours cleaning messy data in Excel spreadsheets? Come learn OpenRefine, an easy-to-use, open source tool for data cleaning. OpenRefine (formerly Google Refine) helps you prepare your data for analysis.
Quickly and easily transform data, split and merge columns, remove whitespace, and perform many more common data cleaning tasks. With OpenRefine, you can also easily create JSON scripts for repeating series of tasks across multiple datasets.
No previous experience is required. This will be a hands-on workshop–please bring a laptop.
Presenters
Sarah Young
Principal Librarian
Office: 109G, Hunt Library
sarahy@andrew.cmu.edu
Goals of this Workshop
- Import data and perform basic cleaning steps
- Use sorting, filtering and facets to clean data
- Restructure columns and cells
- Export JSON script for repeating steps
- Save and export files and projects
Schedule
Tuesday, October 29, 2024
Time | Content |
---|---|
10:00 to 10:05 | Welcome and Introduction |
10:05 to 10:15 | Importing data and basic cleanup |
10:15 to 10:30 | Sorting, filtering and faceting |
10:30 to 10:40 | Transposing data and advanced facets |
10:40 to 10:50 | Clustering |
10:50 to 11:10 | Splitting and concatenating |
11:10 to 11:20 | Restructuring data |
11:20 to 11:25 | Exporting JSON scripts |
11:25 to 11:30 | Wrap-Up |
Actual schedule may vary depending on group needs; all times refer to Eastern Standard Time (EST)
Slides
Click on the slides then press CTRL+Shift+F for full screen
Workshop surveys
Pre-workshop Survey
Post-workshop Survey
Acknowledgements
The lesson materials and slides for this workshop were largely adapted from an OpenRefine Workshop delivered at IASSIST 2018 in Montreal by:
Leanne Trimble, Data & Statistics Librarian
Kelly Schultz, Data Visualization Librarian
University of Toronto Libraries
Table of contents
- Setup
- Part 1 OpenRefine Basics with Personal Consumption Expeditures Data
- Part 2 Advanced Data Cleaning with Citizen Science Data