Pooling layers

The following pooling layers are available in Spektral.

See the convolutional layers page for the notation.

SRCPool

spektral.layers.SRCPool(return_selection=False)

A general class for graph pooling layers based on the "Select, Reduce, Connect" framework presented in:

Understanding Pooling in Graph Neural Networks.
Daniele Grattarola et al.

This layer computes: $\begin{align} & \mathcal{S} = \left\{\mathcal{S}_k\right\}_{k=1:K} = \textrm{Sel}(\mathcal{G}) \\ & \mathcal{X}'=\left\{\textrm{Red}( \mathcal{G}, \mathcal{S}_k )\right\}_{k=1:K} \\ & \mathcal{E}'=\left\{\textrm{Con}( \mathcal{G}, \mathcal{S}_k, \mathcal{S}_l )\right\}_{k,L=1:K} \\ \end{align}$ Where $\textrm{Sel}$ is a node equivariant selection function that computes the supernode assignments $\mathcal{S}_k$ , $\textrm{Red}$ is a permutation-invariant function to reduce the supernodes into the new node attributes, and $\textrm{Con}$ is a permutation-invariant connection function that computes the link between the pooled nodes.

By extending this class, it is possible to create any pooling layer in the SRC formalism.

Input

x: Tensor of shape ([batch], N, F) representing node features;
a: Tensor or SparseTensor of shape ([batch], N, N) representing the adjacency matrix;
i: (optional) Tensor of integers with shape (N, ) representing the batch index;

Output

x_pool: Tensor of shape ([batch], K, F), representing the node features of the output. K is the number of output nodes and depends on the specific pooling strategy;
a_pool: Tensor or SparseTensor of shape ([batch], K, K) representing the adjacency matrix of the output;
i_pool: (only if i was given as input) Tensor of integers with shape (K, ) representing the batch index of the output;
s: (if return_selection=True) Tensor or SparseTensor representing the supernode assignments;

API

pool(x, a, i, **kwargs): pools the graph and returns the reduced node features and adjacency matrix. If the batch index i is not None, a reduced version of i will be returned as well. Any given kwargs will be passed as keyword arguments to select(), reduce() and connect() if any matching key is found. The mandatory arguments of pool() must be computed in call() by calling self.get_inputs(inputs).
select(x, a, i, **kwargs): computes supernode assignments mapping the nodes of the input graph to the nodes of the output.
reduce(x, s, **kwargs): reduces the supernodes to form the nodes of the pooled graph.
connect(a, s, **kwargs): connects the reduced supernodes.
reduce_index(i, s, **kwargs): helper function to reduce the batch index (only called if i is given as input).

When overriding any function of the API, it is possible to access the true number of nodes of the input (n_nodes) as a Tensor in the instance variable self.n_nodes (this is populated by self.get_inputs() at the beginning of call()).

Arguments:

return_selection: if True, the Tensor used to represent supernode assignments will be returned with x_pool, a_pool, and i_pool;

[source]

AsymCheegerCutPool

spektral.layers.AsymCheegerCutPool(k, mlp_hidden=None, mlp_activation='relu', totvar_coeff=1.0, balance_coeff=1.0, return_selection=False, use_bias=True, kernel_initializer='glorot_uniform', bias_initializer='zeros', kernel_regularizer=None, bias_regularizer=None, kernel_constraint=None, bias_constraint=None)

An Asymmetric Cheeger Cut Pooling layer from the paper

Total Variation Graph Neural Networks
Jonas Berg Hansen and Filippo Maria Bianchi

Mode: single, batch.

This layer learns a soft clustering of the input graph as follows: $\begin{align} \S &= \textrm{MLP}(\X); \\ \X' &= \S^\top \X \\ \A' &= \S^\top \A \S; \\ \end{align}$ where $\textrm{MLP}$ is a multi-layer perceptron with softmax output.

The layer includes two auxiliary loss terms/components: A graph total variation component given by $L_\text{GTV} = \frac{1}{2E} \sum_{k=1}^K \sum_{i=1}^N \sum_{j=i}^N a_{i,j} |s_{i,k} - s_{j,k}|,$ where $E$ is the number of edges/links, $K$ is the number of clusters or output nodes, and $N$ is the number of nodes.

An asymmetrical norm component given by $L_\text{AN} = \frac{N(K - 1) - \sum_{k=1}^K ||\s_{:,k} - \textrm{quant}_{K-1} (\s_{:,k})||_{1, K-1}}{N(K-1)},$

The layer can be used without a supervised loss to compute node clustering by minimizing the two auxiliary losses.

Input

Node features of shape (batch, n_nodes_in, n_node_features);
Adjacency matrix of shape (batch, n_nodes_in, n_nodes_in);

Output

Reduced node features of shape (batch, n_nodes_out, n_node_features);
If return_selection=True, the selection matrix of shape (batch, n_nodes_in, n_nodes_out).

Arguments

k: number of output nodes;
mlp_hidden: list of integers, number of hidden units for each hidden layer in the MLP used to compute cluster assignments (if None, the MLP has only one output layer);
mlp_activation: activation for the MLP layers;
totvar_coeff: coefficient for graph total variation loss component;
balance_coeff: coefficient for asymmetric norm loss component;
return_selection: boolean, whether to return the selection matrix;
use_bias: use bias in the MLP;
kernel_initializer: initializer for the weights of the MLP;
bias_regularizer: regularization applied to the bias of the MLP;
kernel_constraint: constraint applied to the weights of the MLP;
bias_constraint: constraint applied to the bias of the MLP;

[source]

DiffPool

spektral.layers.DiffPool(k, channels=None, return_selection=False, activation=None, kernel_initializer='glorot_uniform', kernel_regularizer=None, kernel_constraint=None)

A DiffPool layer from the paper

Hierarchical Graph Representation Learning with Differentiable Pooling
Rex Ying et al.

Mode: single, batch.

This layer learns a soft clustering of the input graph as follows: $\begin{align} \Z &= \textrm{GNN}_{embed}(\A, \X); \\ \S &= \textrm{GNN}_{pool}(\A, \X); \\ \X' &= \S^\top \Z; \\ \A' &= \S^\top \A \S; \\ \end{align}$ where: $\textrm{GNN}_{\square}(\A, \X) = \D^{-1/2} \A \D^{-1/2} \X \W_{\square}.$ The number of output channels of $\textrm{GNN}_{embed}$ is controlled by the channels parameter.

Two auxiliary loss terms are also added to the model: the link prediction loss $L_{LP} = \big\| \A - \S\S^\top \big\|_F$ and the entropy loss $L_{E} - \frac{1}{N} \sum\limits_{i = 1}^{N} \S \log (\S).$

The layer can be used without a supervised loss to compute node clustering by minimizing the two auxiliary losses.

Input

Node features of shape (batch, n_nodes_in, n_node_features);
Adjacency matrix of shape (batch, n_nodes_in, n_nodes_in);

Output

Reduced node features of shape (batch, n_nodes_out, channels);
Reduced adjacency matrix of shape (batch, n_nodes_out, n_nodes_out);
If return_selection=True, the selection matrix of shape (batch, n_nodes_in, n_nodes_out).

Arguments

k: number of output nodes;
channels: number of output channels (if None, the number of output channels is the same as the input);
return_selection: boolean, whether to return the selection matrix;
activation: activation to apply after reduction;
kernel_initializer: initializer for the weights;
kernel_regularizer: regularization applied to the weights;
kernel_constraint: constraint applied to the weights;

[source]

LaPool

spektral.layers.LaPool(shortest_path_reg=True, return_selection=False)

A Laplacian pooling (LaPool) layer from the paper

Towards Interpretable Sparse Graph Representation Learning with Laplacian Pooling
Emmanuel Noutahi et al.

Mode: disjoint.

This layer computes a soft clustering of the graph by first identifying a set of leaders, and then assigning every remaining node to the cluster of the closest leader: $\V = \|\L\X\|_d; \\ \i = \{ i \mid \V_i > \V_j, \forall j \in \mathcal{N}(i) \} \\ \S^\top = \textrm{SparseMax}\left( \beta \frac{\X\X_{\i}^\top}{\|\X\|\|\X_{\i}\|} \right)$ $\beta$ is a regularization vecotr that is applied element-wise to the selection matrix. If shortest_path_reg=True, it is equal to the inverse of the shortest path between each node and its corresponding leader (this can be expensive since it runs on CPU). Otherwise it is equal to 1.

The reduction and connection are computed as $\X' = \S\X$ and $\A' = \S^\top\A\S$ , respectively.

Note that the number of nodes in the output graph depends on the input node features.

Input

Node features of shape (n_nodes_in, n_node_features);
Adjacency matrix of shape (n_nodes_in, n_nodes_in);

Output

Reduced node features of shape (n_nodes_out, channels);
Reduced adjacency matrix of shape (n_nodes_out, n_nodes_out);
If return_selection=True, the selection matrix of shape (n_nodes_in, n_nodes_out).

Arguments

shortest_path_reg: boolean, apply the shortest path regularization described in the papaer (can be expensive);
return_selection: boolean, whether to return the selection matrix;

[source]

MinCutPool

spektral.layers.MinCutPool(k, mlp_hidden=None, mlp_activation='relu', return_selection=False, kernel_initializer='glorot_uniform', bias_initializer='zeros', kernel_regularizer=None, bias_regularizer=None, kernel_constraint=None, bias_constraint=None)

A MinCut pooling layer from the paper

Spectral Clustering with Graph Neural Networks for Graph Pooling
Filippo Maria Bianchi et al.

Mode: single, batch.

This layer learns a soft clustering of the input graph as follows: $\begin{align} \S &= \textrm{MLP}(\X); \\ \X' &= \S^\top \X \\ \A' &= \S^\top \A \S; \\ \end{align}$ where $\textrm{MLP}$ is a multi-layer perceptron with softmax output.

Two auxiliary loss terms are also added to the model: the minimum cut loss $L_c = - \frac{ \mathrm{Tr}(\S^\top \A \S) }{ \mathrm{Tr}(\S^\top \D \S) }$ and the orthogonality loss $L_o = \left\| \frac{\S^\top \S}{\| \S^\top \S \|_F} - \frac{\I_K}{\sqrt{K}} \right\|_F.$

The layer can be used without a supervised loss to compute node clustering by minimizing the two auxiliary losses.

Input

Node features of shape (batch, n_nodes_in, n_node_features);
Symmetrically normalized adjacency matrix of shape (batch, n_nodes_in, n_nodes_in);

Output

Reduced node features of shape (batch, n_nodes_out, n_node_features);
Reduced adjacency matrix of shape (batch, n_nodes_out, n_nodes_out);
If return_selection=True, the selection matrix of shape (batch, n_nodes_in, n_nodes_out).

Arguments

k: number of output nodes;
mlp_hidden: list of integers, number of hidden units for each hidden layer in the MLP used to compute cluster assignments (if None, the MLP has only one output layer);
mlp_activation: activation for the MLP layers;
return_selection: boolean, whether to return the selection matrix;
use_bias: use bias in the MLP;
kernel_initializer: initializer for the weights of the MLP;
bias_initializer: initializer for the bias of the MLP;
kernel_regularizer: regularization applied to the weights of the MLP;
bias_regularizer: regularization applied to the bias of the MLP;
kernel_constraint: constraint applied to the weights of the MLP;
bias_constraint: constraint applied to the bias of the MLP;

[source]

SAGPool

spektral.layers.SAGPool(ratio, return_selection=False, return_score=False, sigmoid_gating=False, kernel_initializer='glorot_uniform', kernel_regularizer=None, kernel_constraint=None)

A self-attention graph pooling layer from the paper

Self-Attention Graph Pooling
Junhyun Lee et al.

Mode: single, disjoint.

This layer computes: $\y = \textrm{GNN}(\A, \X); \;\;\;\; \i = \textrm{rank}(\y, K); \;\;\;\; \X' = (\X \odot \textrm{tanh}(\y))_\i; \;\;\;\; \A' = \A_{\i, \i}$ where $\textrm{rank}(\y, K)$ returns the indices of the top K values of $\y$ and $\textrm{GNN}(\A, \X) = \A \X \W.$

$K$ is defined for each graph as a fraction of the number of nodes, controlled by the ratio argument.

The gating operation $\textrm{tanh}(\y)$ (Cangea et al.) can be replaced with a sigmoid (Gao & Ji).

Input

Node features of shape (n_nodes_in, n_node_features);
Adjacency matrix of shape (n_nodes_in, n_nodes_in);
Graph IDs of shape (n_nodes, ) (only in disjoint mode);

Output

Reduced node features of shape (ratio * n_nodes_in, n_node_features);
Reduced adjacency matrix of shape (ratio * n_nodes_in, ratio * n_nodes_in);
Reduced graph IDs of shape (ratio * n_nodes_in, ) (only in disjoint mode);
If return_selection=True, the selection mask of shape (ratio * n_nodes_in, ).
If return_score=True, the scoring vector of shape (n_nodes_in, )

Arguments

ratio: float between 0 and 1, ratio of nodes to keep in each graph;
return_selection: boolean, whether to return the selection mask;
return_score: boolean, whether to return the node scoring vector;
sigmoid_gating: boolean, use a sigmoid activation for gating instead of a tanh;
kernel_initializer: initializer for the weights;
kernel_regularizer: regularization applied to the weights;
kernel_constraint: constraint applied to the weights;

[source]

TopKPool

spektral.layers.TopKPool(ratio, return_selection=False, return_score=False, sigmoid_gating=False, kernel_initializer='glorot_uniform', kernel_regularizer=None, kernel_constraint=None)

A gPool/Top-K layer from the papers

Graph U-Nets
Hongyang Gao and Shuiwang Ji

and

Towards Sparse Hierarchical Graph Classifiers
Cătălina Cangea et al.

Mode: single, disjoint.

This layer computes: $\y = \frac{\X\p}{\|\p\|}; \;\;\;\; \i = \textrm{rank}(\y, K); \;\;\;\; \X' = (\X \odot \textrm{tanh}(\y))_\i; \;\;\;\; \A' = \A_{\i, \i}$ where $\textrm{rank}(\y, K)$ returns the indices of the top K values of $\y$ , and $\p$ is a learnable parameter vector of size $F$ .

$K$ is defined for each graph as a fraction of the number of nodes, controlled by the ratio argument.

The gating operation $\textrm{tanh}(\y)$ (Cangea et al.) can be replaced with a sigmoid (Gao & Ji).

Input

Node features of shape (n_nodes_in, n_node_features);
Adjacency matrix of shape (n_nodes_in, n_nodes_in);
Graph IDs of shape (n_nodes, ) (only in disjoint mode);

Output

Reduced node features of shape (ratio * n_nodes_in, n_node_features);
Reduced adjacency matrix of shape (ratio * n_nodes_in, ratio * n_nodes_in);
Reduced graph IDs of shape (ratio * n_nodes_in, ) (only in disjoint mode);
If return_selection=True, the selection mask of shape (ratio * n_nodes_in, ).
If return_score=True, the scoring vector of shape (n_nodes_in, )

Arguments

ratio: float between 0 and 1, ratio of nodes to keep in each graph;
return_selection: boolean, whether to return the selection mask;
return_score: boolean, whether to return the node scoring vector;
sigmoid_gating: boolean, use a sigmoid activation for gating instead of a tanh;
kernel_initializer: initializer for the weights;
kernel_regularizer: regularization applied to the weights;
kernel_constraint: constraint applied to the weights;

[source]

JustBalancePool

spektral.layers.JustBalancePool(k, mlp_hidden=None, mlp_activation='relu', normalized_loss=False, return_selection=False, kernel_initializer='glorot_uniform', bias_initializer='zeros', kernel_regularizer=None, bias_regularizer=None, kernel_constraint=None, bias_constraint=None)

The Just Balance pooling layer from the paper

Simplifying Clustering with Graph Neural Networks
Filippo Maria Bianchi

Mode: single, batch.

This layer learns a soft clustering of the input graph as follows: $\begin{align} \S &= \textrm{MLP}(\X); \\ \X' &= \S^\top \X \\ \A' &= \S^\top \A \S; \\ \end{align}$ where $\textrm{MLP}$ is a multi-layer perceptron with softmax output.

The layer adds the following auxiliary loss to the model $L = - \mathrm{Tr}(\sqrt{ \S^\top \S })$

The layer can be used without a supervised loss to compute node clustering by minimizing the auxiliary loss.

The layer is originally designed to be used in conjuction with a GCNConv layer operating on the following connectivity matrix

$\tilde{\A} = \I - \delta (\I - \D^{-1/2} \A \D^{-1/2})$

Input

Node features of shape (batch, n_nodes_in, n_node_features);
Connectivity matrix of shape (batch, n_nodes_in, n_nodes_in);

Output

Reduced node features of shape (batch, n_nodes_out, n_node_features);
Reduced adjacency matrix of shape (batch, n_nodes_out, n_nodes_out);
If return_selection=True, the selection matrix of shape (batch, n_nodes_in, n_nodes_out).

Arguments

k: number of output nodes;
mlp_hidden: list of integers, number of hidden units for each hidden layer in the MLP used to compute cluster assignments (if None, the MLP has only one output layer);
mlp_activation: activation for the MLP layers;
normalized_loss: booelan, whether to normalize the auxiliary loss in [0,1];
return_selection: boolean, whether to return the selection matrix;
kernel_initializer: initializer for the weights of the MLP;
bias_initializer: initializer for the bias of the MLP;
kernel_regularizer: regularization applied to the weights of the MLP;
bias_regularizer: regularization applied to the bias of the MLP;
kernel_constraint: constraint applied to the weights of the MLP;
bias_constraint: constraint applied to the bias of the MLP;

[source]

DMoNPool

spektral.layers.DMoNPool(k, mlp_hidden=None, mlp_activation='relu', return_selection=False, collapse_regularization=0.1, kernel_initializer='glorot_uniform', bias_initializer='zeros', kernel_regularizer=None, bias_regularizer=None, kernel_constraint=None, bias_constraint=None)

The DMoN pooling layer from the paper

Graph Clustering with Graph Neural Networks
Anton Tsitsulin et al.

Mode: single, batch.

This layer learns a soft clustering of the input graph as follows: $\begin{align} \C &= \textrm{MLP}(\X); \\ \X' &= \C^\top \X \\ \A' &= \C^\top \A \C; \\ \end{align}$ where $\textrm{MLP}$ is a multi-layer perceptron with softmax output.

Two auxiliary loss terms are also added to the model: the modularity loss $L_m = - \frac{1}{2m} \mathrm{Tr}(\C^\top \A \C - \C^\top \d^\top \d \C)$ and the collapse regularization loss $L_c = \frac{\sqrt{k}}{n} \left\| \sum_i \C_i^\top \right\|_F -1.$

This layer is based on the original implementation found here.

Input

Node features of shape (batch, n_nodes_in, n_node_features);
Symmetrically normalized adjacency matrix of shape (batch, n_nodes_in, n_nodes_in);

Output

Reduced node features of shape (batch, n_nodes_out, n_node_features);
Reduced adjacency matrix of shape (batch, n_nodes_out, n_nodes_out);
If return_selection=True, the selection matrix of shape (batch, n_nodes_in, n_nodes_out).

Arguments

k: number of output nodes;
mlp_hidden: list of integers, number of hidden units for each hidden layer in the MLP used to compute cluster assignments (if None, the MLP has only one output layer);
mlp_activation: activation for the MLP layers;
collapse_regularization: strength of the collapse regularization;
return_selection: boolean, whether to return the selection matrix;
use_bias: use bias in the MLP;
kernel_initializer: initializer for the weights of the MLP;
bias_initializer: initializer for the bias of the MLP;
kernel_regularizer: regularization applied to the weights of the MLP;
bias_regularizer: regularization applied to the bias of the MLP;
kernel_constraint: constraint applied to the weights of the MLP;
bias_constraint: constraint applied to the bias of the MLP;

Global pooling layers

[source]

GlobalAvgPool

spektral.layers.GlobalAvgPool()

An average pooling layer. Pools a graph by computing the average of its node features.

Mode: single, disjoint, mixed, batch.

Input

Node features of shape ([batch], n_nodes, n_node_features);
Graph IDs of shape (n_nodes, ) (only in disjoint mode);

Output

Pooled node features of shape (batch, n_node_features) (if single mode, shape will be (1, n_node_features)).

Arguments

None.

[source]

GlobalMaxPool

spektral.layers.GlobalMaxPool()

A max pooling layer. Pools a graph by computing the maximum of its node features.

Mode: single, disjoint, mixed, batch.

Input

Node features of shape ([batch], n_nodes, n_node_features);
Graph IDs of shape (n_nodes, ) (only in disjoint mode);

Output

Pooled node features of shape (batch, n_node_features) (if single mode, shape will be (1, n_node_features)).

Arguments

None.

[source]

GlobalSumPool

spektral.layers.GlobalSumPool()

A global sum pooling layer. Pools a graph by computing the sum of its node features.

Mode: single, disjoint, mixed, batch.

Input

Node features of shape ([batch], n_nodes, n_node_features);
Graph IDs of shape (n_nodes, ) (only in disjoint mode);

Output

Pooled node features of shape (batch, n_node_features) (if single mode, shape will be (1, n_node_features)).

Arguments

None.

[source]

GlobalAttentionPool

spektral.layers.GlobalAttentionPool(channels, kernel_initializer='glorot_uniform', bias_initializer='zeros', kernel_regularizer=None, bias_regularizer=None, kernel_constraint=None, bias_constraint=None)

A gated attention global pooling layer from the paper

Gated Graph Sequence Neural Networks
Yujia Li et al.

This layer computes: $\X' = \sum\limits_{i=1}^{N} (\sigma(\X \W_1 + \b_1) \odot (\X \W_2 + \b_2))_i$ where $\sigma$ is the sigmoid activation function.

Mode: single, disjoint, mixed, batch.

Input

Node features of shape ([batch], n_nodes, n_node_features);
Graph IDs of shape (n_nodes, ) (only in disjoint mode);

Output

Pooled node features of shape (batch, channels) (if single mode, shape will be (1, channels)).

Arguments

channels: integer, number of output channels;
bias_initializer: initializer for the bias vectors;
kernel_regularizer: regularization applied to the kernel matrices;
bias_regularizer: regularization applied to the bias vectors;
kernel_constraint: constraint applied to the kernel matrices;
bias_constraint: constraint applied to the bias vectors.

[source]

GlobalAttnSumPool

spektral.layers.GlobalAttnSumPool(attn_kernel_initializer='glorot_uniform', attn_kernel_regularizer=None, attn_kernel_constraint=None)

A node-attention global pooling layer. Pools a graph by learning attention coefficients to sum node features.

This layer computes: $\alpha = \textrm{softmax}( \X \a); \\ \X' = \sum\limits_{i=1}^{N} \alpha_i \cdot \X_i$ where $\a \in \mathbb{R}^F$ is a trainable vector. Note that the softmax is applied across nodes, and not across features.

Mode: single, disjoint, mixed, batch.

Input

Node features of shape ([batch], n_nodes, n_node_features);
Graph IDs of shape (n_nodes, ) (only in disjoint mode);

Output

Pooled node features of shape (batch, n_node_features) (if single mode, shape will be (1, n_node_features)).

Arguments

attn_kernel_initializer: initializer for the attention weights;
attn_kernel_regularizer: regularization applied to the attention kernel matrix;
attn_kernel_constraint: constraint applied to the attention kernel matrix;

[source]

SortPool

spektral.layers.SortPool(k)

A SortPool layer as described by Zhang et al. This layers takes a graph signal $\mathbf{X}$ and returns the topmost k rows according to the last column. If $\mathbf{X}$ has less than k rows, the result is zero-padded to k.

Mode: single, disjoint, batch.

Input

Node features of shape ([batch], n_nodes, n_node_features);
Graph IDs of shape (n_nodes, ) (only in disjoint mode);

Output

Pooled node features of shape (batch, k, n_node_features) (if single mode, shape will be (1, k, n_node_features)).

Arguments

k: integer, number of nodes to keep;