Language Models Are Poor Learners of Directional Inference

Li, TianyiTianyiLiHosseini, Mohammad JavadMohammad JavadHosseiniWeber, SabineSabineWeber0000-0002-5577-3356Steedman, MarkMarkSteedman2025-11-102025-11-102025https://fis.uni-bamberg.de/handle/uniba/110866We examine LMs’ competence of directional predicate entailments by supervised fine-tuning with prompts. Our analysis shows that contrary to their apparent success on standard NLI, LMs show limited ability to learn such directional inference; moreover, existing datasets fail to test directionality, and/or are infested by artefacts that can be learnt as proxy for entailments, yielding over-optimistic results. In response, we present BoOQA (Boolean Open QA), a robust multi-lingual evaluation benchmark for directional predicate entailments, extrinsic to existing training sets. On BoOQA, we establish baselines and show evidence of existing LM-prompting models being incompetent directional entailment learners, in contrast to entailment graphs, however limited by sparsity.engLanguage ModelsLanguage Models Are Poor Learners of Directional Inferenceconferenceobjecturn:nbn:de:bvb:473-irb-110866x