Summary of learning graph neural network
I started studying graph neural network following lectures Stanford CS224W [playlist]. This post is the summary and my thoughts on the materials covered in the lectures.
This section explains several graphs based on data types, necessity of graph representations, and covered materials in the lectures. In a graph, entities (vertices) have relationships to each other. Based on the entity and relation types, there are several graph representations.
Network or Graph?
Sometimes the distinction between networks and graphs is blurred.
Complex domains have a rich relational structure, which can be represented as a graph. The features of graph are as follows:
Note that the goal of graph is mostly learning representation, that is, having a good representation of graphical data. As such, the input is network
and output is several graph components such as Node labels
, New links
, Generated graphs
and subgraphs
.
With the progress of deep learning, we can automatically extract informative features. A deep neural network map nodes to d-dimensional embeddings such that similar nodes in the network are embedded close together.
Lecture 1.2 is about basic components (node and edge) and its applications. The level of graph representation can be seen as
With such representations possible tasks are
Protein is a sequence of Amino acids. Can we predict 3D structure based solely on its amino acid sequence?
The key idea of Alphafold
Unlike node level, edge level is a task of finding proper edges. For example, when users interact with items, the goal is to recommend items users might like (predict links). Task learn node embeddings $z_i$ such that
\[d(z_{cake_1}, z_{cake_2} ) < d(z_{cake_1}, z_{sweater} )\]Many patients take multiple drugs to treat complex or co-existing diseases: For example, 46% of people ages 70-79 take more than 5 drugs (in US) and many patients take more than 20 drugs to treat heart disease, depression, insomnia, etc.
👨⚕️Task: Given a pair of drugs predict adverse side effects
The edge (side effect) can even help discovering new side effects.
Finally, lecture 1.3 deals with basic notations in graph representations such as nodes, edges, degree, and components. When we use graph representations, we must set proper features in data to make node and edges. The choice of the proper network representation of a given domain/problem determines our ability to use network successfully.
A graph with no directional edges. For example, collaborations, friendship on facebook.
Node degree $k_i$ and average degree : $\bar{k} = < k > = \frac{1}{N} \sum_{i=1}^N k_i = \frac{2E}{N}$
A graph with directional edges. For example, phone calls, following in tweets.
$\bar{k}^{in} = \bar{k}^{out}= \frac{E}{N}$ : the number of in-degree and out-degree is same.
A graph with two disjoint sets $U$ and $V$
project two sets in the bipartite graph respectively. For example, authors collaboration networks
$A_{ij}=1$ if there is a link from node $i$ to node $j$ and $A_{ij}=0$ otherwise.
Most real-world networks are sparse $E « E_{max}$
G = {(1,2), (2,3), (3,4), (3,2)}
G is a form of
1: []
2: [3,4,5]
3: [3,4,6]
Strongly connected directed graph has a path from each node to every other node and vice versa (A-B path and B-A path) Weakly connected directed graph : is connected if we disregard the edge directions
Strongly connected components (SCCs) can be identified, but not every node is part of a nontrivial strongly connected component.
Let $v$ be a node of a graph $\mathcal{G}$ with $v = 1, 2, \cdots, N$. The eigenvector centrality
$c_v$ of node $v$ is defined by $c_v = \frac{1}{\lambda} \sum_{w\in N(v)} c_w$. As centrality score $c_v$ is defined by other centrality scores $c_w$, which is again determined by other centrality scores, this equation makes simultaneous equations for centrality score $c_v$ with all $v\in V$ where $V$ is the set of nodes.
In other words, in the following form with some function $f$
As the neighbors $N(v)$ determines adjacency vector $A_v$ of node $v$, summation part $\sum_{w\in N(v)}$ could be replaced by an adjacency vector such as $[1,0,1,0,0,\cdots]$. As such, we obtain the following simultaneous equation of centrality scores with the adjacency matrix $A$ of graph $\mathcal{G}$ :
\[\begin{align} \lambda \cdot \begin{bmatrix} c_1 \\ c_2 \\ \vdots \\ c_N \end{bmatrix} = \begin{bmatrix} & A_1 & \\ & A_2 & \\ & \vdots & \\ & A_N & \end{bmatrix} & \begin{bmatrix} c_1 \\ c_2 \\ \vdots \\ c_N \end{bmatrix} \\ = \begin{bmatrix} & & \\& A & \\ & & \end{bmatrix} & \begin{bmatrix} c_1 \\ c_2 \\ \vdots \\ c_N \end{bmatrix} \end{align}\]Solving this equation is equal to finding an eigenvector. Note that the equation requires only finding a single eigenvector and each element of the vector is the centrality score of each node.
The eigenvector with the largest eigenvalue is the desired centrality measure. See [wiki]
Graphlet Degree Vector (GDV) : a count vector of graphlets rooted at a given node. For example, [2,1,0,0] where each index is the isomorphic graphlet.
Importance-based Features vs Structure-based Features
Goal : Efficient task-independent feature learning
Task: map nodes into an embedding space
Similarity of embeddings between nodes indicates their similarity in the network.
DeepWalk : Online learning of social representations. (KDD2014)
Goal : similarity $(u, v) \approx z_v^\top z_u$