Thomas, PhilippePhilippeThomasSolt, IllésIllésSoltKlinger, RomanRomanKlinger0000-0002-2014-6619Leser, UlfUlfLeser2024-03-072024-03-072011https://fis.uni-bamberg.de/handle/uniba/94069Most relation extraction methods, especially in the domain of biology, rely on machine learning methods to classify a cooccurring pair of entities in a sentence to be related or not. Such an approach requires a training corpus, which involves expert annotation and is tedious, time-consuming, and expensive. We overcome this problem by the use of existing knowledge in structured databases to automatically generate a training corpus for protein-protein interactions. An extensive evaluation of different instance selection strategies is performed to maximize robustness on this presumably noisy resource. Successful strategies to consistently improve performance include a majority voting ensemble of classifiers trained on subsets of the training corpus and the use of knowledge bases consisting of proven non-interactions. Our best configured model built without manually annotated data shows very competitive results on several publicly available benchmark corporaengProtein-Protein Interactions004Learning to Extract Protein-Protein Interactions using Distant Supervisionconferenceobjecthttp://www.aclanthology.org/W11-3904