A PyTorch Implementation of "SINE: Scalable Incomplete Network Embedding" (ICDM 2018).

Overview

Scalable Incomplete Network Embedding

Arxiv codebeat badge repo sizebenedekrozemberczki

A PyTorch implementation of Scalable Incomplete Network Embedding (ICDM 2018).


Abstract

Attributed network embedding aims to learn low-dimensional vector representations for nodes in a network, where each node contains rich attributes/features describing node content. Because network topology structure and node attributes often exhibit high correlation, incorporating node attribute proximity into network embedding is beneficial for learning good vector representations. In reality, large-scale networks often have incomplete/missing node content or linkages, yet existing attributed network embedding algorithms all operate under the assumption that networks are complete. Thus, their performance is vulnerable to missing data and suffers from poor scalability. In this paper, we propose a Scalable Incomplete Network Embedding (SINE) algorithm for learning node representations from incomplete graphs. SINE formulates a probabilistic learning framework that separately models pairs of node-context and node-attribute relationships. Different from existing attributed network embedding algorithms, SINE provides greater flexibility to make the best of useful information and mitigate negative effects of missing information on representation learning. A stochastic gradient descent based online algorithm is derived to learn node representations, allowing SINE to scale up to large-scale networks with high learning efficiency. We evaluate the effectiveness and efficiency of SINE through extensive experiments on real-world networks. Experimental results confirm that SINE outperforms state-of-the-art baselines in various tasks, including node classification, node clustering, and link prediction, under settings with missing links and node attributes. SINE is also shown to be scalable and efficient on large-scale networks with millions of nodes/edges and high-dimensional node features.

This repository provides an implementation of SINE as described in the paper:

SINE: Scalable Incomplete Network Embedding. Daokun Zhang, Jie Yin, Xingquan Zhu, Chengqi Zhang. ICDM, 2018. [Paper]

The SINE model is available in [Karate Club] framework.

The original C implementation is available [here].

Requirements

The codebase is implemented in Python 3.5.2. package versions used for development are just below.

networkx          2.4
tqdm              4.28.1
numpy             1.15.4
pandas            0.23.4
texttable         1.5.0
scipy             1.1.0
argparse          1.1.0
torch             1.1.0.
torchvision       0.3.0

Datasets

The code takes an input graph in a csv file. Every row indicates an edge between two nodes separated by a comma. The first row is a header. Nodes should be indexed starting with 0. Sample graphs for the `Twitch Brasilians` and `Wikipedia Chameleons` are included in the `input/` directory.

The feature matrix can be stored two ways as a **sparse binary** one. For simplicity, it is a JSON. Nodes are keys of the json and features are the values. For each node feature column ids are stored as elements of a list. The feature matrix is structured as:

{ 0: [0, 1, 38, 1968, 2000, 52727],
  1: [10000, 20, 3],
  2: [],
  ...
  n: [2018, 10000]}

Options

Learning of the embedding is handled by the `src/main.py` script which provides the following command line arguments.

Input and output options

  --edge-path    STR     Input graph path.           Default is `input/chameleon_edges.csv`.
  --feature-path STR     Input Features path.        Default is `input/chameleon_features.json`.
  --output-path  STR     Embedding path.             Default is `output/chameleon_sine.csv`.

Model options

  --dimensions              INT       Number of embeding dimensions.         Default is 128.
  --budget                  INT       Sampling budget.                       Default is 10^5.
  --noise-samples           INT       Number of noise samples.               Default is 5.
  --batch-size              INT       Number of source nodes per batch.      Default is 32.
  --walk-length             INT       Truncated random walk length.          Default is 80.  
  --number-of-walks         INT       Number of walks per source node.       Default is 10.
  --window-size             INT       Skip-gram window size.                 Default is 5.
  --learning-rate           FLOAT     Learning rate value.                   Default is 0.001.

Examples

The following commands learn a graph embedding and write the embedding to disk. The node representations are ordered by the ID.

Creating a SINE embedding of the default dataset with the default hyperparameter settings. Saving the embedding at the default path.

python src/main.py

Creating a SINE embedding of the default dataset with 256 dimensions.

python src/main.py --dimensions 256

Creating a SINE embedding of the default dataset with a low sampling budget.

python src/main.py --budget 1000

Creating an embedding of an other dense structured dataset the Twitch Brasilians. Saving the output in a custom folder.

python src/main.py --edge-path input/ptbr_edges.csv --feature-path input/ptbr_features.json --output-path output/ptbr_sine.csv

License


Owner
Benedek Rozemberczki
Machine Learning Engineer at AstraZeneca | PhD from The University of Edinburgh.
Benedek Rozemberczki
Novel and high-performance medical image classification pipelines are heavily utilizing ensemble learning strategies

An Analysis on Ensemble Learning optimized Medical Image Classification with Deep Convolutional Neural Networks Novel and high-performance medical ima

14 Dec 18, 2022
【Arxiv】Exploring Separable Attention for Multi-Contrast MR Image Super-Resolution

SANet Exploring Separable Attention for Multi-Contrast MR Image Super-Resolution Dependencies numpy==1.18.5 scikit_image==0.16.2 torchvision==0.8.1 to

36 Jan 05, 2023
CVAT is free, online, interactive video and image annotation tool for computer vision

Computer Vision Annotation Tool (CVAT) CVAT is free, online, interactive video and image annotation tool for computer vision. It is being used by our

OpenVINO Toolkit 8.6k Jan 04, 2023
Exploit ILP to learn symmetry breaking constraints of ASP programs.

ILP Symmetry Breaking Overview This project aims to exploit inductive logic programming to lift symmetry breaking constraints of ASP programs. Given a

Research Group Production Systems 1 Apr 13, 2022
face property detection pytorch

This is the face property train code of project face-detection-project

i am x 2 Oct 18, 2021
Determined: Deep Learning Training Platform

Determined: Deep Learning Training Platform Determined is an open-source deep learning training platform that makes building models fast and easy. Det

Determined AI 2k Dec 31, 2022
A gesture recognition system powered by OpenPose, k-nearest neighbours, and local outlier factor.

OpenHands OpenHands is a gesture recognition system powered by OpenPose, k-nearest neighbours, and local outlier factor. Currently the system can iden

Paul Treanor 12 Jan 10, 2022
La source de mon module 'pyfade' disponible sur Pypi.

Version: 1.2 Introduction Pyfade est un module permettant de créer des dégradés colorés. Il vous permettra de changer chaque ligne de votre texte par

Billy 20 Sep 12, 2021
AI virtual gym is an AI program which can be used to exercise and can be used to see if we are doing the exercises

AI virtual gym is an AI program which can be used to exercise and can be used to see if we are doing the exercises

4 Feb 13, 2022
A PyTorch implementation of "SelfGNN: Self-supervised Graph Neural Networks without explicit negative sampling"

SelfGNN A PyTorch implementation of "SelfGNN: Self-supervised Graph Neural Networks without explicit negative sampling" paper, which will appear in Th

Zekarias Tilahun 24 Jun 21, 2022
A basic duplicate image detection service using perceptual image hash functions and nearest neighbor search, implemented using faiss, fastapi, and imagehash

Duplicate Image Detection Getting Started Install dependencies pip install -r requirements.txt Run service python main.py Testing Test with pytest How

Matthew Podolak 21 Nov 11, 2022
Learning Optical Flow from a Few Matches (CVPR 2021)

Learning Optical Flow from a Few Matches This repository contains the source code for our paper: Learning Optical Flow from a Few Matches CVPR 2021 Sh

Shihao Jiang (Zac) 159 Dec 16, 2022
Lingvo is a framework for building neural networks in Tensorflow, particularly sequence models.

Lingvo is a framework for building neural networks in Tensorflow, particularly sequence models.

2.7k Jan 05, 2023
CycleTransGAN-EVC: A CycleGAN-based Emotional Voice Conversion Model with Transformer

CycleTransGAN-EVC CycleTransGAN-EVC: A CycleGAN-based Emotional Voice Conversion Model with Transformer Demo emotion CycleTransGAN CycleTransGAN Cycle

24 Dec 15, 2022
Fast and Simple Neural Vocoder, the Multiband RNNMS

Multiband RNN_MS Fast and Simple vocoder, Multiband RNN_MS. Demo Quick training How to Use System Details Results References Demo ToDO: Link super gre

tarepan 5 Jan 11, 2022
EmoTag helps you train emotion detection model for Chinese audios

emoTag emoTag helps you train emotion detection model for Chinese audios. Environment pip install -r requirement.txt Data We used Emotional Speech Dat

_zza 4 Sep 07, 2022
ColBERT: Contextualized Late Interaction over BERT (SIGIR'20)

Update: if you're looking for ColBERTv2 code, you can find it alongside a new simpler API, in the branch new_api. ColBERT ColBERT is a fast and accura

Stanford Future Data Systems 637 Jan 08, 2023
Official Implementation of "LUNAR: Unifying Local Outlier Detection Methods via Graph Neural Networks"

LUNAR Official Implementation of "LUNAR: Unifying Local Outlier Detection Methods via Graph Neural Networks" Adam Goodge, Bryan Hooi, Ng See Kiong and

Adam Goodge 25 Dec 28, 2022
A diff tool for language models

LMdiff Qualitative comparison of large language models. Demo & Paper: http://lmdiff.net LMdiff is a MIT-IBM Watson AI Lab collaboration between: Hendr

Hendrik Strobelt 27 Dec 29, 2022
Official implementation of "Refiner: Refining Self-attention for Vision Transformers".

RefinerViT This repo is the official implementation of "Refiner: Refining Self-attention for Vision Transformers". The repo is build on top of timm an

101 Dec 29, 2022