BIOtagme: an algorithm for prediction of annotations on PubMed

The main purpose of BioTAGME is to provide a tool that allows algorithmic text analysis texts and the extraction of latent associations in order to enrich their comprehension. In particular, BioTAGME focuses on biology, using the PubMed database as a source of knowledge, and as a basis for researching new biological knowledge. Given an input set of texts annotated with terms characterizing each document, the aim of this methodology consists of computing a new set of annotation terms as much as possible related to the input set but having no synonyms among the old annotations. To reach this purpose, approach consists of defining a correlation measure to compute a score which simultaneously ensures high correlation with the source and no possibility to build a random set of terms of the same size having a correlation greater than or similar to the computed one. BioTAGME implies the execution of four main steps:

  • Apply TagMe algorithm (Ferragina and Scaiella, 2010), to each input text to build the first set of related terms.
  • Execute the recommendation procedure DT-Hybrid algorithm (Alaimo et al., 2013) to extend this annotations.
  • Compute for each annotation a correlation score through a similarity function;
  • Use this score to compute a set of highly correlated terms together with a probability expressing the quality of such set.

  • Job : Web Develompemt
  • Date : Jul 2015
  • Agency : University of Catania