Link Search Menu Expand Document

About this Workshop

Tired of spending hours and hours cleaning messy data in Excel spreadsheets? Come learn OpenRefine, an easy-to-use, open source tool for data cleaning. OpenRefine (formerly Google Refine) helps you prepare your data for analysis.

Quickly and easily transform data, split and merge columns, remove whitespace, and perform many more common data cleaning tasks. With OpenRefine, you can also easily create JSON scripts for repeating series of tasks across multiple datasets.

No previous experience is required. This will be a hands-on workshop–please bring a laptop.

Pre-workshop set up

Please complete the following activities prior to the workshop.

1. Download the workshop files

Download the workshop files and save them in a folder on your desktop. Make sure to extract the files from the zip file.

2. Download and Install OpenRefine

You can download OpenRefine 3.4.1 from http://openrefine.org/download.html. There are versions for Windows, Mac OS X and Linux.

Installing OpenRefine

For Windows and Linux, the address above will provide a zip file. Unzip the downloaded file wherever you want to install the program.

For Mac, you will be downloading a ‘dmg’. Open it and then drag the OpenRefine application to an appropriate folder on you computer.

You need to have a ‘Java Runtime Environment’ (JRE) installed on your computer to run OpenRefine. If you don’t already have one installed then you can download and install it from http://java.com. Click on “Free Java Download”.

Running OpenRefine

On Windows: Navigate to the folder where you’ve installed OpenRefine and either double-click ’openrefine.exe’ or ‘refine.bat’.

On Linux: Navigate to the folder where you’ve installed OpenRefine in a terminal window and type ‘./refine’

On Mac: Navigate to where you installed OpenRefine and click the OpenRefine icon

The interface to OpenRefine is accessed via a web browser. When you run Refine normally this should open a window in your default web browser pointing at the address http://127.0.0.1:3333. If this doesn’t happen automatically you can open a web browser and type in this address.

Workshop Slides