CALM-Bench: A Multi-task Benchmark for Evaluating Causality Aware Language Models

    Research output: Chapter in Book or Conference Publication/ProceedingConference Publicationpeer-review

    7 Citations (Scopus)

    Abstract

    Causal reasoning is a critical component of human cognition and is required across a range of question-answering (QA) tasks (such as abductive reasoning, commonsense QA, and procedural reasoning). Research on causal QA has been underdefined, task-specific, and limited in complexity. Recent advances in foundation language models (such as BERT, ERNIE, and T5) have shown the efficacy of pre-trained models across diverse QA tasks. However, there is limited research exploring the causal reasoning capabilities of those language models and no standard evaluation benchmark. To unify causal QA research, we propose CALM-Bench, a multi-task benchmark for evaluating causality-aware language models (CALM). We present a standardized definition of causal QA tasks and show empirically that causal reasoning can be generalized and transferred across different QA tasks. Additionally, we share a strong multi-task baseline model which outperforms single-task fine-tuned models on the CALM-Bench tasks.

    Original languageEnglish
    Title of host publicationEACL 2023 - 17th Conference of the European Chapter of the Association for Computational Linguistics, Findings of EACL 2023
    PublisherAssociation for Computational Linguistics (ACL)
    Pages296-311
    Number of pages16
    ISBN (Electronic)9781959429470
    Publication statusPublished - 2023
    Event17th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2023 - Findings of EACL 2023 - Dubrovnik, Croatia
    Duration: 2 May 20236 May 2023

    Publication series

    NameEACL 2023 - 17th Conference of the European Chapter of the Association for Computational Linguistics, Findings of EACL 2023

    Conference

    Conference17th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2023 - Findings of EACL 2023
    Country/TerritoryCroatia
    CityDubrovnik
    Period2/05/236/05/23

    Fingerprint

    Dive into the research topics of 'CALM-Bench: A Multi-task Benchmark for Evaluating Causality Aware Language Models'. Together they form a unique fingerprint.

    Cite this