Andrea Tagarelli, Mario Longo and Sergio Greco. Word Sense Disambiguation for XML Structure Feature Generation

ABSTRACT: A common limit of most existing methods that manage XML structure information is that they do not handle  the semantic meanings that might be associated to the markup tags.   In this paper, we study how to map structure information available from XML elements into semantically related concepts in order to support the generation of XML semantic features of XML structural type. For this purpose, we define an unsupervised word sense disambiguation method to select the most appropriate meaning for each element contextually to its respective XML path. The proposed approach exploits conceptual relations provided by a lexical ontology such as WordNet and employs different notions of sense relatedness. Experiments with data from various application domains are discussed, showing that our approach can be effectively used to generate structural semantic features.

