Stylometry and Authorship Analysis

This video discusses the question of authorship attribution and shows how stylometry and machine learning have been able to prove the true authorship of documents. It describes the process of using the tool JGAAP, which proved that Harry Potter author J.K. Rowling also wrote the Robert Galbraith detective novel The Cuckoo’s Calling.

Further Reading and Resources

Stylometry, Machine Learning, Digital Forensics, Computational Linguistics, Corpus Linguistics

Posted by

Patrick Juola (who hates writing about himself in the third person) is a computational linguist, stylometrist, digital humanist and forensic scientist. He is currently Professor of Computer Science at Duquesne University. He is best known for his identification of J.K. Rowling as the true author of Robert Galbraith’s The Cuckoo’s Calling, but was also principal violist of the Holmdel Symphony Orchestra in Holmdel, NJ.

Similar Projects by Discipline

Literature

How to grow data forests with XML trees

Elisa Beshero-Bondar

eXtensible Markup Language (XML).

Beyond the Ant Brotherhood

Tatyana Gershkovich

Dynamic digital archives of writings and timelines.

The Latin American Comics Archive (LACA)

Felipe Gómez

Online archives in comic book markup language.

DocuScope

David Kaufer

Computer Support for Close Reading and Textual Analysis in DH.

Logistic Regression

Matthew J. Lavin

Machine learning for literary analysis.

Metadata Heatmaps for Distant Reading

Benjamin Miller

Distant reading of a textual corpus.

Shakespeare-VR

Stephen Wittek

Building immersive VR projects.

No other videos for this discipline yet.

LCS

Shakespeare-VR

Stephen Wittek

Building immersive VR projects.

The Historical TV Guide

Kathy M. Newman, Steven Gotzler

Using digitized text to study television history.

No other videos for this discipline yet.

English

How to grow data forests with XML trees

Elisa Beshero-Bondar

eXtensible Markup Language (XML).

DocuScope

David Kaufer

Computer Support for Close Reading and Textual Analysis in DH.

Logistic Regression

Matthew J. Lavin

Machine learning for literary analysis.

Metadata Heatmaps for Distant Reading

Benjamin Miller

Distant reading of a textual corpus.

Shakespeare-VR

Stephen Wittek

Building immersive VR projects.

The Historical TV Guide

Kathy M. Newman, Steven Gotzler

Using digitized text to study television history.

Data Visualization: Tableau

Emma Slayton

Data visualization with Tableau.

History

Data Visualization: Tableau

Emma Slayton

Data visualization with Tableau.

Networks and Medieval Schoolbooks

Elizabeth Archibald

Network analysis in the context of the history.

Improving Access to Video Oral Histories

Michael Christel

Video oral history projects.

Marriage & Divorce of Capitalism & Democracy

Simon DeDeo

DH methods for interdisciplinary studies and results.

Finding the Klan with Network Analysis

Elaine Frantz

Historical network analysis.

Civic Data Intermediaries

Bob Gradeck, WPRDC

Open data and what can be done with it.

GIS Mapping

Susan Grunewald

GIS mapping with an emphasis on history projects.

Measuring Art Historical Networks

Matthew Lincoln

Network analysis in the context of art history.

Archives, Museums & the Digital Humanities

Dominique Luster

DH methods for public history museum projects.

Historical Gazetteers

Ruth Mostern

Building historical gazetteers.

SocialChange101.org

Nico Slate

The online historical project SocialChange101.org.

Similar Projects by Topics

Stylometry

Building your own data set

AmyJo Brown

A Journalist's approach

No other videos for this topic yet.

Machine Learning

Logistic Regression

Matthew J. Lavin

Machine learning for literary analysis.

No other videos for this topic yet.

Digital Forensics

Building your own data set

AmyJo Brown

A Journalist's approach

No other videos for this topic yet.

Computational Linguistics

Logistic Regression

Matthew J. Lavin

Machine learning for literary analysis.

Structure-based Network Analysis

S.E. Hackney

Structure-based network analysis.

Metadata Heatmaps for Distant Reading

Benjamin Miller

Distant reading of a textual corpus.

Topic Modeling Subreddits

Chloe Perry

Computational techniques to topic model subreddits.

Corpus Linguistics

Building your own data set

AmyJo Brown

A Journalist's approach

Logistic Regression

Matthew J. Lavin

Machine learning for literary analysis.

Structure-based Network Analysis

S.E. Hackney

Structure-based network analysis.

Metadata Heatmaps for Distant Reading

Benjamin Miller

Distant reading of a textual corpus.

Topic Modeling Subreddits

Chloe Perry

Computational techniques to topic model subreddits.

Marriage & Divorce of Capitalism & Democracy

Simon DeDeo

DH methods for interdisciplinary studies and results.

DocuScope

David Kaufer

Computer Support for Close Reading and Textual Analysis in DH.

The Historical TV Guide

Kathy M. Newman, Steven Gotzler

Using digitized text to study television history.

Last updated: August 29, 2019
https://github.com/cmu-lib/dhlg/blob/master/_projects/juola.md