Pooling layers

The pooling layers from these papers are available in Spektral:

Additionally, sum, average, and max global pooling are implemented, as well as a simple global weighted sum pooling where weights are calculated with an attention mechanism.

See the convolutional layers page for the notation.


[source]

DiffPool

spektral.layers.DiffPool(k, channels=None, return_mask=False, activation=None, kernel_initializer='glorot_uniform', kernel_regularizer=None, kernel_constraint=None)

A DiffPool layer as presented by Ying et al. (2018).

Mode: batch.

This layer computes a soft clustering of the input graphs using a GNN, and reduces graphs as follows:

where GNN consists of one GraphConv layer with softmax activation. Two auxiliary loss terms are also added to the model: the link prediction loss and the entropy loss

The layer also applies a 1-layer GCN to the input features, and returns the updated graph signal (the number of output channels is controlled by the channels parameter). The layer can be used without a supervised loss, to compute node clustering simply by minimizing the two auxiliary losses.

Input

  • Node features of shape ([batch], N, F);
  • Binary adjacency matrix of shape ([batch], N, N);

Output

  • Reduced node features of shape ([batch], K, channels);
  • Reduced adjacency matrix of shape ([batch], K, K);
  • If return_mask=True, the soft clustering matrix of shape ([batch], N, K).

Arguments

  • k: number of nodes to keep;
  • channels: number of output channels (if None, the number of output channels is assumed to be the same as the input);
  • return_mask: boolean, whether to return the cluster assignment matrix;
  • kernel_initializer: initializer for the weights;
  • kernel_regularizer: regularization applied to the weights;
  • kernel_constraint: constraint applied to the weights;

[source]

MinCutPool

spektral.layers.MinCutPool(k, mlp_hidden=None, mlp_activation='relu', return_mask=False, activation=None, use_bias=True, kernel_initializer='glorot_uniform', bias_initializer='zeros', kernel_regularizer=None, bias_regularizer=None, kernel_constraint=None, bias_constraint=None)

A minCUT pooling layer as presented by Bianchi et al. (2019).

Mode: batch.

This layer computes a soft clustering of the input graphs using a MLP, and reduces graphs as follows:

where MLP is a multi-layer perceptron with softmax output. Two auxiliary loss terms are also added to the model: the minCUT loss and the orthogonality loss

The layer can be used without a supervised loss, to compute node clustering simply by minimizing the two auxiliary losses.

Input

  • Node features of shape ([batch], N, F);
  • Binary adjacency matrix of shape ([batch], N, N);

Output

  • Reduced node features of shape ([batch], K, F);
  • Reduced adjacency matrix of shape ([batch], K, K);
  • If return_mask=True, the soft clustering matrix of shape ([batch], N, K).

Arguments

  • k: number of nodes to keep;
  • mlp_hidden: list of integers, number of hidden units for each hidden layer in the MLP used to compute cluster assignments (if None, the MLP has only the output layer);
  • mlp_activation: activation for the MLP layers;
  • return_mask: boolean, whether to return the cluster assignment matrix;
  • kernel_initializer: initializer for the weights;
  • kernel_regularizer: regularization applied to the weights;
  • kernel_constraint: constraint applied to the weights;

[source]

TopKPool

spektral.layers.TopKPool(ratio, return_mask=False, sigmoid_gating=False, kernel_initializer='glorot_uniform', kernel_regularizer=None, kernel_constraint=None)

A gPool/Top-K layer as presented by Gao & Ji (2019) and Cangea et al. (2018).

Mode: single, disjoint.

This layer computes the following operations:

where returns the indices of the top K values of , and is a learnable parameter vector of size . is defined for each graph as a fraction of the number of nodes. Note that the the gating operation (Cangea et al.) can be replaced with a sigmoid (Gao & Ji).

This layer temporarily makes the adjacency matrix dense in order to compute . If memory is not an issue, considerable speedups can be achieved by using dense graphs directly. Converting a graph from sparse to dense and back to sparse is an expensive operation.

Input

  • Node features of shape (N, F);
  • Binary adjacency matrix of shape (N, N);
  • Graph IDs of shape (N, ) (only in disjoint mode);

Output

  • Reduced node features of shape (ratio * N, F);
  • Reduced adjacency matrix of shape (ratio * N, ratio * N);
  • Reduced graph IDs of shape (ratio * N, ) (only in disjoint mode);
  • If return_mask=True, the binary pooling mask of shape (ratio * N, ).

Arguments

  • ratio: float between 0 and 1, ratio of nodes to keep in each graph;
  • return_mask: boolean, whether to return the binary mask used for pooling;
  • sigmoid_gating: boolean, use a sigmoid gating activation instead of a tanh;
  • kernel_initializer: initializer for the weights;
  • kernel_regularizer: regularization applied to the weights;
  • kernel_constraint: constraint applied to the weights;

[source]

SAGPool

spektral.layers.SAGPool(ratio, return_mask=False, sigmoid_gating=False, kernel_initializer='glorot_uniform', kernel_regularizer=None, kernel_constraint=None)

A self-attention graph pooling layer as presented by Lee et al. (2019).

Mode: single, disjoint.

This layer computes the following operations:

where returns the indices of the top K values of , and consists of one GraphConv layer with no activation. is defined for each graph as a fraction of the number of nodes.

This layer temporarily makes the adjacency matrix dense in order to compute . If memory is not an issue, considerable speedups can be achieved by using dense graphs directly. Converting a graph from sparse to dense and back to sparse is an expensive operation.

Input

  • Node features of shape (N, F);
  • Binary adjacency matrix of shape (N, N);
  • Graph IDs of shape (N, ) (only in disjoint mode);

Output

  • Reduced node features of shape (ratio * N, F);
  • Reduced adjacency matrix of shape (ratio * N, ratio * N);
  • Reduced graph IDs of shape (ratio * N, ) (only in disjoint mode);
  • If return_mask=True, the binary pooling mask of shape (ratio * N, ).

Arguments

  • ratio: float between 0 and 1, ratio of nodes to keep in each graph;
  • return_mask: boolean, whether to return the binary mask used for pooling;
  • sigmoid_gating: boolean, use a sigmoid gating activation instead of a tanh;
  • kernel_initializer: initializer for the weights;
  • kernel_regularizer: regularization applied to the weights;
  • kernel_constraint: constraint applied to the weights;

[source]

GlobalSumPool

spektral.layers.GlobalSumPool()

A global sum pooling layer. Pools a graph by computing the sum of its node features.

Mode: single, disjoint, mixed, batch.

Input

  • Node features of shape ([batch], N, F);
  • Graph IDs of shape (N, ) (only in disjoint mode);

Output

  • Pooled node features of shape (batch, F) (if single mode, shape will be (1, F)).

Arguments

None.


[source]

GlobalAvgPool

spektral.layers.GlobalAvgPool()

An average pooling layer. Pools a graph by computing the average of its node features.

Mode: single, disjoint, mixed, batch.

Input

  • Node features of shape ([batch], N, F);
  • Graph IDs of shape (N, ) (only in disjoint mode);

Output

  • Pooled node features of shape (batch, F) (if single mode, shape will be (1, F)).

Arguments

None.


[source]

GlobalMaxPool

spektral.layers.GlobalMaxPool()

A max pooling layer. Pools a graph by computing the maximum of its node features.

Mode: single, disjoint, mixed, batch.

Input

  • Node features of shape ([batch], N, F);
  • Graph IDs of shape (N, ) (only in disjoint mode);

Output

  • Pooled node features of shape (batch, F) (if single mode, shape will be (1, F)).

Arguments

None.


[source]

GlobalAttentionPool

spektral.layers.GlobalAttentionPool(channels, kernel_initializer='glorot_uniform', bias_initializer='zeros', kernel_regularizer=None, bias_regularizer=None, kernel_constraint=None, bias_constraint=None)

A gated attention global pooling layer as presented by Li et al. (2017).

This layer computes: where is the sigmoid activation function.

Mode: single, disjoint, mixed, batch.

Input

  • Node features of shape ([batch], N, F);
  • Graph IDs of shape (N, ) (only in disjoint mode);

Output

  • Pooled node features of shape (batch, channels) (if single mode, shape will be (1, channels)).

Arguments

  • channels: integer, number of output channels;
  • bias_initializer: initializer for the bias vectors;
  • kernel_regularizer: regularization applied to the kernel matrices;
  • bias_regularizer: regularization applied to the bias vectors;
  • kernel_constraint: constraint applied to the kernel matrices;
  • bias_constraint: constraint applied to the bias vectors.

[source]

GlobalAttnSumPool

spektral.layers.GlobalAttnSumPool(attn_kernel_initializer='glorot_uniform', attn_kernel_regularizer=None, attn_kernel_constraint=None)

A node-attention global pooling layer. Pools a graph by learning attention coefficients to sum node features.

This layer computes: where is a trainable vector. Note that the softmax is applied across nodes, and not across features.

Mode: single, disjoint, mixed, batch.

Input

  • Node features of shape ([batch], N, F);
  • Graph IDs of shape (N, ) (only in disjoint mode);

Output

  • Pooled node features of shape (batch, F) (if single mode, shape will be (1, F)).

Arguments

  • attn_kernel_initializer: initializer for the attention weights;
  • attn_kernel_regularizer: regularization applied to the attention kernel matrix;
  • attn_kernel_constraint: constraint applied to the attention kernel matrix;

[source]

SortPool

spektral.layers.SortPool(k)

A SortPool layer as described by Zhang et al. This layers takes a graph signal and returns the topmost k rows according to the last column. If has less than k rows, the result is zero-padded to k.

Mode: single, disjoint, batch.

Input

  • Node features of shape ([batch], N, F);
  • Graph IDs of shape (N, ) (only in disjoint mode);

Output

  • Pooled node features of shape (batch, k, F) (if single mode, shape will be (1, k, F)).

Arguments

  • k: integer, number of nodes to keep;