Stephen Krewson
Drag to set position!
Unlocking the Medical Heritage Library with Deep Learning. This photostream contains printed illustrations extracted using a two-stage convolutional neural net pipeline.
In the first stage, a retrained CNN is used to eliminate false positive candidates. These are pages that do not contain illustration regions but were estimated as having "picture" blocks by the OCR software used to scan them before they were added to the Internet Archive.
In the second stage, regions of interest (i.e. illustrations and figures) are extracted from page camera data with a Mask-RCNN model, trained on several hundred randomly sample pages. Models correspond to 50-year windows. Thus, when an item from 1922 is downloaded, we use a Stage 1 classifier trained on sample data from the MHL collection from 1900-1949. The same goes for the Stage 2 region extractor.
Current experiment: all MHL items published 1800-1849.
- JoinedJune 2018
Most popular photos
Testimonials
Nothing to show.