Building your own data set

The video explores how to build a dataset and analyze it with a series of DH tools from the perspective of journalism. It covers database building skills such as mapping data structures, choosing between manual versus optical character recognition assisted data entry, and keeping a data dictionary. It also discusses visualizing data such as with the tool Tableau.

Further Reading and Resources

The New York Times spreadsheet training kit. Includes tip sheets about editing your data and brainstorming stories from it. About the kit: https://open.nytimes.com/how-we-helped-our-reporters-learn-to-love-spreadsheets-adc43a93b919 Training materials: https://drive.google.com/drive/u/0/folders/1ZS57_40tWuIB7tV4APVMmTZ-5PXDwX9w

The Data Journalism Handbook: https://www.aup.nl/en/book/9789462989511/the-data-journalism-handbook

Data Wrangling with Python: http://shop.oreilly.com/product/0636920032861.do

Creating a workflow for a project (especially with multiple stakeholders): https://source.opennews.org/articles/5-digital-workflow/

How to get things done (as a one-person show): https://source.opennews.org/articles/lonely-newsroom-coder/

Open Data, Text Mining and Analytics, Corpus Linguistics, Digital Forensics, Stylometry

Posted by

AmyJo Brown’s excitement for city and county budgets, committee meetings and dusty shelves of public records is matched only by her happiness in the day’s first cup of coffee. An editor with more than 15 years experience as an investigative journalist, she is the principal and founder of War Streets Media, an information design firm that offers expertise in data and other nonfiction storytelling.

Similar Projects by Discipline

Journalism

Civic Data Intermediaries

Bob Gradeck, WPRDC

Open data and what can be done with it.

Similar Projects by Topics

Open Data

Civic Data Intermediaries

Bob Gradeck, WPRDC

Open data and what can be done with it.

No other videos for this topic yet.

Text Mining and Analytics

Marriage & Divorce of Capitalism & Democracy

Simon DeDeo

DH methods for interdisciplinary studies and results.

Structure-based Network Analysis

S.E. Hackney

Structure-based network analysis.

DocuScope

David Kaufer

Computer Support for Close Reading and Textual Analysis in DH.

Logistic Regression

Matthew J. Lavin

Machine learning for literary analysis.

Metadata Heatmaps for Distant Reading

Benjamin Miller

Distant reading of a textual corpus.

The Historical TV Guide

Kathy M. Newman, Steven Gotzler

Using digitized text to study television history.

Topic Modeling Subreddits

Chloe Perry

Computational techniques to topic model subreddits.

No other videos for this topic yet.

Corpus Linguistics

Marriage & Divorce of Capitalism & Democracy

Simon DeDeo

DH methods for interdisciplinary studies and results.

Structure-based Network Analysis

S.E. Hackney

Structure-based network analysis.

DocuScope

David Kaufer

Computer Support for Close Reading and Textual Analysis in DH.

Logistic Regression

Matthew J. Lavin

Machine learning for literary analysis.

Metadata Heatmaps for Distant Reading

Benjamin Miller

Distant reading of a textual corpus.

The Historical TV Guide

Kathy M. Newman, Steven Gotzler

Using digitized text to study television history.

Topic Modeling Subreddits

Chloe Perry

Computational techniques to topic model subreddits.

Stylometry and Authorship Analysis

Patrick Juola

Machine learning to identify authors.

Digital Forensics

Stylometry and Authorship Analysis

Patrick Juola

Machine learning to identify authors.

Stylometry

Stylometry and Authorship Analysis

Patrick Juola

Machine learning to identify authors.

Last updated: August 29, 2019
https://github.com/cmu-lib/dhlg/blob/master/_projects/brown.md