Evaluation of an application for automatic text categorization


Issues which might constitute problems in this type of evaluation are taken up here (in Swedish).




About application Djupindexering


The task is to select proper keywords for a text, from a thesaurus specially created for Riksdagen's documents. Keywords should not only identify the main subject but also have proper level of generality in the thesaurus hierarchy. Lexware Djupindexering is a knowledge-based system where the external topic representatin - in this case Riksdagen's thesaurus – is integrated with Lexware's own rich language representation. Thesaurus terms are identified not only in direct text occurrances, but also indirectly, as occurrances of terms closely related in the thesaurus or in the lexicon.




Materials available for evaluation


The evaluation basically consists in a comparison of keywords assigned automatically by Lexware and manually by indexers at Riksdagsbiblioteket to 1403 riksdag’s documents. Specific information on the data and how it can be accessed are provided here.