semantics-metriclearning

This project is maintained by thomasniebler

Learning Semantic Relatedness from Human Feedback Using Relative Relatedness Learning

This page contains all the necessary information to reproduce the results given in the ISWC’17 poster “Learning Semantic Relatedness from Human Feedback Using Relative Relatedness Learning” by Thomas Niebler, Martin Becker, Christian Pölitz and Andreas Hotho, all members of the DMIR group at the University of Würzburg.

Thomas is maintaining the code, while the DMIR group will offer support

Overview

In our work, we learned a semantic relatedness measure from human feedback, using a metric learning approach. Human Intuition Datasets contain direct human judgments about the relatedness of words, i.e. human feedback. We exploit these datasets to then learn a parameterization of the cosine measure, while resorting to a metric learning approach, which is based on relative distance comparisons. We validate our approach on several different embedding datasets, which we either make public or provide a download a link here.

Furthermore and to the best of our knowledge, we were the first to explore the possibility of learning word embeddings from tagging data. We further elaborated on this in a different paper.

Reference Implementations

From Tag Co-Occurrences to Tag Embeddings

To calculate the tag cooccurrence graph as input for the GloVe algorithm, we applied the method presented in “Semantic Grounding of Tag Relatedness in Social Bookmarking Systems” by Cattuto et al.

More specifically, we used the co-occurrence based on posts as described in Equation (1) in the linked paper: Here, , i.e. all tags t, which hav been assigned to resource r by user u.

In src/embeddings/example_call.py, we provided an example on how to call the corresponding methods to construct the co-occurrence graph. It then needs to be saved to a file, before the GloVe algorithm can be called on that file.

LSML

RRL is inspired by the LSML metric learning algorithm. We built on the LSML implementation contained in the metric_learn python package.

GloVe

We used the published code of GloVe to create the tag embeddings of dimension 100. We used the predefined parameter values of alpha=0.75 and x_max=100.

Word Embedding Datasets

These are the datasets that we used for our experiments.

Human Intuition Datasets

The Human Intuition Datasets (HIDs) can be retrieved as preprocessed pandas-friendly csv files here or from the corresponding original locations.