CoFiF: A Corpus of Financial Reports in French Language

Sina Ahmadi

Research output: Chapter in Book or Conference Publication/ProceedingConference Publicationpeer-review

Abstract

In an era when machine learning and artificial intelligence have huge momentum, the data demand to train and test models is steadily growing. We introduce CoFiF, the first corpus comprising company reports in the French language. It contains over 188 million tokens in 2655 reports, covering reference documents, annual, semestrial and trimestrial reports. Our main focus is on the 60 largest French companies listed in Frances main stock indices CAC40 and CAC Next 20. The corpus spans over 20 years, ranging from 1995 to 2018. To evaluate this novel collection of organizational writing, we use CoFiF to generate two character-level language models, a forward and a backward one, which we use to demonstrate the corpus potential on business, economics, and management research in the French language.
Original languageEnglish (Ireland)
Title of host publicationThe First Workshop on Financial Technology and Natural Language Processing (FinNLP)
Place of PublicationMacao, China
DOIs
Publication statusPublished - 1 Aug 2019

Authors (Note for portal: view the doc link for the full list of authors)

  • Authors
  • Ahmadi, Sina; Daudert, Tobias

Fingerprint

Dive into the research topics of 'CoFiF: A Corpus of Financial Reports in French Language'. Together they form a unique fingerprint.

Cite this