Abstract
Free and open digital libraries have been gaining steady mo-mentum as key resources to support practice in Digital Humanities.Project Gutenberg is one of the oldest repositories of such a kind. TheDHTK Python library is able to retrieve content from Gutenberg throughquerying the RDF metadata that Gutenberg itself publishes regularly,however this process is hampered by said metadata constituting a datasetthat lacks a documented ontology, is largely unlinked and significantlybloated with redundant RDF triples. In this paper we detail the processesthat were put in place with the aim of improving ontology-based data ac-cess to Gutenberg via DHTK, including (a) bottom-up extraction of theGutenberg Ontology; (b) cleanup, linking and shrinking of the Guten-berg metadata set; (c) refactoring and alignment of said ontology withcommon vocabularies and (d) incorporation of the enhancements into theDHTK access routines. Early results show that we were able to reducethe size of the Gutenberg metadata set by nearly 29% whilst linking itwith Library of Congress datasets, DBpedia and others.
| Original language | English (Ireland) |
|---|---|
| Title of host publication | Third Workshop on Humanities in the Semantic Web (WHiSe 2020) |
| Place of Publication | online |
| Publication status | Published - 1 Oct 2020 |
Authors (Note for portal: view the doc link for the full list of authors)
- Authors
- Mattia Egloff; Alessandro Adamou; and Davide Picca
Fingerprint
Dive into the research topics of 'Enabling Ontology-Based Data Access to Project Gutenberg'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver