Abstract
This paper describes the creation of a workbench tool designed to make technologies developed throughout the lifespan of the Cardamom project easily accessible to researchers who could most benefit from them, but who may not have the technical expertise to apply bleeding edge technologies to their own datasets. The workbench provides an intuitive graphical user interface (GUI) and workflow which abstract users away from underlying technical tasks, while providing them with
a suite of powerful NLP tools developed by the Cardamom team. These include tokenisers, POS-taggers, various annotation tools, and ML models. The performance of workbench tools can be improved as text and annotations are added by users. It is envisioned that this workbench will provide a simple route to digital publication for academics in the humanities, or more specifically, for linguists working with under-resourced or historical languages, who have collected text data but are unable to make it available online as a result of financial or technical restraints. This has the added benefit of increasing the availability of high quality, annotated text data to NLP researchers, thereby providing value to both communities of researchers.
a suite of powerful NLP tools developed by the Cardamom team. These include tokenisers, POS-taggers, various annotation tools, and ML models. The performance of workbench tools can be improved as text and annotations are added by users. It is envisioned that this workbench will provide a simple route to digital publication for academics in the humanities, or more specifically, for linguists working with under-resourced or historical languages, who have collected text data but are unable to make it available online as a result of financial or technical restraints. This has the added benefit of increasing the availability of high quality, annotated text data to NLP researchers, thereby providing value to both communities of researchers.
| Original language | English (Ireland) |
|---|---|
| Title of host publication | Proceedings of the 4th Conference on Language, Data and Knowledge |
| Editors | Sara Carvalho, Anas Fahad Khan, Ana Ostroški Anić, Blerina Spahiu, Jorge Gracia, John P. McCrae, Dagmar Gromann, Barbara Heinisch, Ana Salgado |
| Publisher | NOVA CLUNL, Portugal |
| Pages | 109-120 |
| Number of pages | 12 |
| DOIs | |
| Publication status | Published - Sep 2023 |
| Event | The 4th Conference on Language, Data and Knowledge - Vienna, Austria Duration: 12 Sep 2023 → 15 Sep 2023 Conference number: 4 https://2023.ldk-conf.org/ |
Conference
| Conference | The 4th Conference on Language, Data and Knowledge |
|---|---|
| Abbreviated title | LDK |
| Country/Territory | Austria |
| City | Vienna |
| Period | 12/09/23 → 15/09/23 |
| Internet address |
Authors (Note for portal: view the doc link for the full list of authors)
- Authors
- Adrian Doyle and Theodorus Fransen and Bernardo Stearns and John Philip McCrae and Oksana Dereza and Priya Rani