Metadata Heatmaps for Distant Reading
This video explores how metadata can be used to undertake a distant reading of a textual corpus, in this case 1,8000 dissertations, for the purposes of rhetorical analysis. It shows how heatmaps of metadata can provide visualizations that help lead to further analysis in R. The video also shows how to use the data cleaning tool OpenRefine to easily convert from MARC codes into a more readable format such as CSV to TSV.
Further Reading and Resources
I highly recommend http://openrefine.org: it’s an excellent, free tool for data inspection and transformation. OpenRefine lives in your browser, but operates locally, so you don’t have to share data with any third parties. The site includes walkthrough videos for common operations.
For inspiration and some great tutorials on data visualization, including heat maps, check out https://flowingdata.com.
The R software includes extensive built-in help pages, which I found were enough to get my bearings (especially when supplemented with web searches that mostly led to Stack Overflow). But Springer has put out several books specific to learning R for DH, including Humanities Data in R(Arnold and Tilton), Text Analysis with R for Students ofLiterature (Jockers), and CorpusLinguistics and Statistics with R (Desaguiler). I haven’t read any of them, so your mileage may vary, but if I were starting again I’d probably start with one of these.
If you do get into R, before too long, you’ll likely want to use the interoperable collection of add-on packages known as “the tidyverse,” which make it easier to rearrange and re-represent your data (without having to kick it back out to openrefine). There’s a series of free courses by the creators at https://www.datacamp.com/courses/introduction-to-the-tidyverse.
Distant Reading, Corpus Linguistics, Computational Linguistics, Text Mining and Analytics
Posted by
Benjamin Miller is an Assistant Professor of English at the University of Pittsburgh, focusing on digital research and pedagogy. He is the author of “Mapping the Methods of Composition/Rhetoric Dissertations: A ‘Landscape Plotted and Pieced,’” an article drawing on data visualization of several thousand documents, published in College Composition and Communication. Ben received a CCCC Emergent Research/er Grant in 2017 for work toward his multimodal book project, “Distant Readings of Disciplinarity: Knowing and Doing in Composition/Rhetoric Dissertations.”
Similar Projects by Discipline
English
The Historical TV Guide
Kathy M. Newman, Steven Gotzler
Using digitized text to study television history.
Literature
The Latin American Comics Archive (LACA)
Felipe Gómez
Online archives in comic book markup language.
Similar Projects by Topics
Distant Reading
Marriage & Divorce of Capitalism & Democracy
Simon DeDeo
DH methods for interdisciplinary studies and results.
The Historical TV Guide
Kathy M. Newman, Steven Gotzler
Using digitized text to study television history.
Corpus Linguistics
Marriage & Divorce of Capitalism & Democracy
Simon DeDeo
DH methods for interdisciplinary studies and results.
The Historical TV Guide
Kathy M. Newman, Steven Gotzler
Using digitized text to study television history.
Computational Linguistics
Text Mining and Analytics
Marriage & Divorce of Capitalism & Democracy
Simon DeDeo
DH methods for interdisciplinary studies and results.
The Historical TV Guide
Kathy M. Newman, Steven Gotzler
Using digitized text to study television history.