Graph Neural Network

Summary of learning graph neural network

Graph Study

I started studying graph neural network following lectures Stanford CS224W [playlist]. This post is the summary and my thoughts on the materials covered in the lectures.


🪴 Lecture 1.1 [Youtube]

This section explains several graphs based on data types, necessity of graph representations, and covered materials in the lectures. In a graph, entities (vertices) have relationships to each other. Based on the entity and relation types, there are several graph representations.

Examples of Graphs based on Data Types

Network or Graph?
Sometimes the distinction between networks and graphs is blurred.

Difference between Graph and other representations

Complex domains have a rich relational structure, which can be represented as a graph. The features of graph are as follows:

What can we do with graph?

Note that the goal of graph is mostly learning representation, that is, having a good representation of graphical data. As such, the input is network and output is several graph components such as Node labels, New links, Generated graphs and subgraphs. With the progress of deep learning, we can automatically extract informative features. A deep neural network map nodes to d-dimensional embeddings such that similar nodes in the network are embedded close together.


🪴 Lecture 1.2 [Youtube]

Lecture 1.2 is about basic components (node and edge) and its applications. The level of graph representation can be seen as

With such representations possible tasks are

Node Level Example : Protein 3D structure

Protein is a sequence of Amino acids. Can we predict 3D structure based solely on its amino acid sequence? The key idea of Alphafold is spatial graph where

Figure. Alphafold inference of 3D structure given a sequence of amino acid.
Image from Wikipedia Alphafold.

Edge Level

Unlike node level, edge level is a task of finding proper edges. For example, when users interact with items, the goal is to recommend items users might like (predict links). Task learn node embeddings $z_i$ such that

\[d(z_{cake_1}, z_{cake_2} ) < d(z_{cake_1}, z_{sweater} )\]

Example: Drug Side Effects

Many patients take multiple drugs to treat complex or co-existing diseases: For example, 46% of people ages 70-79 take more than 5 drugs (in US) and many patients take more than 20 drugs to treat heart disease, depression, insomnia, etc.

👨‍⚕️Task: Given a pair of drugs predict adverse side effects

The edge (side effect) can even help discovering new side effects.


🪴 Lecture 1.3 [Youtube]

Finally, lecture 1.3 deals with basic notations in graph representations such as nodes, edges, degree, and components. When we use graph representations, we must set proper features in data to make node and edges. The choice of the proper network representation of a given domain/problem determines our ability to use network successfully.

Undirected vs Directed

Undirected

A graph with no directional edges. For example, collaborations, friendship on facebook.
Node degree $k_i$ and average degree : $\bar{k} = < k > = \frac{1}{N} \sum_{i=1}^N k_i = \frac{2E}{N}$

Directed

A graph with directional edges. For example, phone calls, following in tweets.
$\bar{k}^{in} = \bar{k}^{out}= \frac{E}{N}$ : the number of in-degree and out-degree is same.

Bipartite Graph

A graph with two disjoint sets $U$ and $V$

Folded Networks :

project two sets in the bipartite graph respectively. For example, authors collaboration networks

Adjacency Matrix

$A_{ij}=1$ if there is a link from node $i$ to node $j$ and $A_{ij}=0$ otherwise.

Represent graph as a list of edges

Most real-world networks are sparse $E « E_{max}$

G = {(1,2), (2,3), (3,4), (3,2)}

Adjacency list

G is a form of 
1: [] 
2: [3,4,5]
3: [3,4,6]

Possible Options

Strongly Connected Directed

Strongly connected directed graph has a path from each node to every other node and vice versa (A-B path and B-A path) Weakly connected directed graph : is connected if we disregard the edge directions

Strongly Connected Components (SCC)

Strongly connected components (SCCs) can be identified, but not every node is part of a nontrivial strongly connected component.


Lecture 2


Eigenvector Centrality

Let $v$ be a node of a graph $\mathcal{G}$ with $v = 1, 2, \cdots, N$. The eigenvector centrality $c_v$ of node $v$ is defined by $c_v = \frac{1}{\lambda} \sum_{w\in N(v)} c_w$. As centrality score $c_v$ is defined by other centrality scores $c_w$, which is again determined by other centrality scores, this equation makes simultaneous equations for centrality score $c_v$ with all $v\in V$ where $V$ is the set of nodes. In other words, in the following form with some function $f$

\[\begin{bmatrix} c_1 \\ c_2 \\ \vdots \\ c_N \end{bmatrix} = f \Big(\begin{bmatrix} c_1 \\ c_2 \\ \vdots \\ c_N \end{bmatrix} \Big)\]

As the neighbors $N(v)$ determines adjacency vector $A_v$ of node $v$, summation part $\sum_{w\in N(v)}$ could be replaced by an adjacency vector such as $[1,0,1,0,0,\cdots]$. As such, we obtain the following simultaneous equation of centrality scores with the adjacency matrix $A$ of graph $\mathcal{G}$ :

\[\begin{align} \lambda \cdot \begin{bmatrix} c_1 \\ c_2 \\ \vdots \\ c_N \end{bmatrix} = \begin{bmatrix} & A_1 & \\ & A_2 & \\ & \vdots & \\ & A_N & \end{bmatrix} & \begin{bmatrix} c_1 \\ c_2 \\ \vdots \\ c_N \end{bmatrix} \\ = \begin{bmatrix} & & \\& A & \\ & & \end{bmatrix} & \begin{bmatrix} c_1 \\ c_2 \\ \vdots \\ c_N \end{bmatrix} \end{align}\]

Solving this equation is equal to finding an eigenvector. Note that the equation requires only finding a single eigenvector and each element of the vector is the centrality score of each node.

The eigenvector with the largest eigenvalue is the desired centrality measure. See [wiki]


Graphlet Degree Vector (GDV) : a count vector of graphlets rooted at a given node. For example, [2,1,0,0] where each index is the isomorphic graphlet.

Importance-based Features vs Structure-based Features

Node Representation

Goal : Efficient task-independent feature learning

Task: map nodes into an embedding space

Goal : similarity $(u, v) \approx z_v^\top z_u$

  1. Encoder : maps from notes to embeddings $ENC(v) = z_v$ (just embedding-lookup) (very large as the parameter increases as the number of nodes.)
  2. Decoder : maps from embeddings to the similarity score $\operatorname{similarity(u,v)} \approx z_v^\top z_u $

Random Walk Embedding

\[\log (\frac{\exp (z_u^\top z_v)}{\sum_{n\in V} \exp (z_u^\top z_n)}) \\ \log\Big( \sigma(z_u^\top z_v) \Big) - \sum_{i=1}^k \log \Big( \sigma(z_u^\top z_{n_i}) \Big)\]