Options
Multilingual Text Summarization Approaches : A Case Study on Generative and Extractive Methods
Dal Cin, Giulia (2026): Multilingual Text Summarization Approaches : A Case Study on Generative and Extractive Methods, Bamberg: Otto-Friedrich-Universität, doi: 10.20378/irb-112942.
Author:
Publisher Information:
Year of publication:
2026
Pages:
Supervisor:
Language:
English
Remark:
Masterarbeit, Otto-Friedrich-Universität Bamberg, 2025
DOI:
Abstract:
In recent years, research in the field of automatic text summarization (ATS) has mainly focused on improving model performance, but it has rarely considered the context and the purpose for which summaries are produced. Therefore, in this master’s thesis, five multilingual ATS scenarios are defined, and each of them is associated with a purpose and some specific requirements. These scenarios are used to evaluate and compare the summaries produced by three ATS systems: extractive algorithm LexRank, pre-trained language model mLongT5, and large language model Mistral NeMo. Both quantitative and qualitative evaluation is performed.
Results show that LexRank often fails at writing well-structured and coherent summaries; to a minor extend, mLongT5 does as well. In some of the five scenarios, both systems also produce summaries with insufficient information coverage. Additionally, mLongT5-generated summaries often contain factually incorrect statements or hallucinations. Problems linked to factually incorrect content, hallucinations and insufficient information coverage also occur in NeMo-generated summaries, but only rarely. Additionally, NeMo often does not respect length requirements, and it sometimes switches language in its summaries. Despite these problems, NeMo has good results in almost every scenario, outperforming the other two systems. However, performance differences between the systems vary based on the scenario.
Results show that LexRank often fails at writing well-structured and coherent summaries; to a minor extend, mLongT5 does as well. In some of the five scenarios, both systems also produce summaries with insufficient information coverage. Additionally, mLongT5-generated summaries often contain factually incorrect statements or hallucinations. Problems linked to factually incorrect content, hallucinations and insufficient information coverage also occur in NeMo-generated summaries, but only rarely. Additionally, NeMo often does not respect length requirements, and it sometimes switches language in its summaries. Despite these problems, NeMo has good results in almost every scenario, outperforming the other two systems. However, performance differences between the systems vary based on the scenario.
GND Keywords: ;
Automatische Sprachanalyse
Textverarbeitung
Keywords: ; ;
Automatic Text Summarization
Natural Language Processing
Multilingual Text Summarization
DDC Classification:
RVK Classification:
Type:
Masterthesis
Activation date:
February 26, 2026
Permalink
https://fis.uni-bamberg.de/handle/uniba/112942