RolX - Role Classification for Networks

I have been studying the paper “RolX: Structural Role Extraction & Mining in Large Graphs” for quite a while, and made a presentation. The paper describes an algorithm RolX, which can perform unsupervised soft clustering for nodes in networks based on structural features extracted. It also supports transductive transfer learning (across-network role classification), which I am mainly interested in.

Unsupervised Soft Clustering

Some analysis by others, already demonstrates the unsupervised learning part. I am not going too much into details here, but you can read the article or my slides.

Transfer Learning

RolX uses the setting transductive transfer learning. In other words, both source and target networks are totally known during training. Moreover, the source network has all known labels available, i.e., roles of nodes are known. However, the target network is completely unlabeled, i.e., roles of nodes are unknown. The goal is to predict the roles of nodes in the target dataset.

RolX performs the above described unsupervised soft clustering on both datasets. It trains a logistic regression model, which uses the soft clustering result as features. Then, it uses this model to predict the roles of nodes in the target dataset.

While the way RolX extracts features and performs soft clustering seems to be exactly the same for both datasets, the semantics of the clusters of nodes in these two networks are still arguably different. The legitimacy of this “transfer” is still a question mark.

Implementation

Implementation is also available (in Python), but has something missing obviously:

does not work for undirected networks;
again, has only the unsupervised learning part.

I extend the Python implementation here, which works for undirected networks and can perform tranfer learning.

Evaluation

The authors highlight that community detection and role classification are fundamentally different. But in one of their evaluations, they strangely use the MIT Reality Mining Dataset, which contains different people (students or faculty) from different groups (e.g. MIT Media Lab, or Sloan Business School). They perform two binary classification tasks,

to predict whether a given node is a business school student or not;
to predict whether a given node is a graduate student in the Media Lab or not.

For me it is not so clear that they are actually role classification tasks.

TBC