A Home for Windsor's Tech Community

Newspaper Data Mining Challenge

March 7, 2023

by Lauren Hedges in News

A collage of images. On the far left is a photograph of an old, yellowed, and slightly damaged copy of "The Kingsville Reporter" newspaper. In the centre is a recent photograph of a roll of microfilm, with a hadnwritten note stating "Essex Free Press / 04, 11, 18, 25, 2007" underneath it. The note is written on letterhead from the Ontario Community Newspaper Association. On the right is a set of overlapping digital newspaper scans in black and white. The quality of the scans varies, and only one title can be made out: "The Essex Free Press."

The University of Windsor’s Leddy Library and Academic Data Centre has partnered with the Essex County Library System and Hackforge to promote a data mining challenge using digitized local newspapers.

Papers dating as far back as 1982 have been made available as OCR (Optical Character Recognition) data, allowing users to query 60 years of local history. A Notebook that walks users through the basic process of accessing and querying the data has also been made available.

Leddy Library is inviting the community to come up with their own ways to use this important and extensive data collection. Interested parties can submit ideas or completed code that could potentially be used on the thousands of pages of newspapers that have been digitized.

If you have questions about this challenge, contact libdata@uwindsor.ca.

Powered by WordPress and Hackforge theme by William Comartin