## Convolutional layers

The message-passing layers from these papers are available in Spektral:

Notation:

• $N$: number of nodes in the graph;
• $F$: dimension of the node attributes (i.e., each node has an attribute in $\mathbb{R}^F$);
• $S$: dimension of the edge attributes (i.e., each edge has an attribute in $\mathbb{R}^S$);
• $\A \in \{0, 1\}^{N \times N}$: binary adjacency matrix;
• $\X \in \mathbb{R}^{ N \times F }$: node attributes matrix;
• $\E \in \mathbb{R}^{ N \times N \times S }$: edge attributes matrix;
• $\D = \textrm{diag} ( \sum\limits_{j=0} \A_{ij} )$: degree matrix;
• $\W, \V$: trainable kernels;
• $\b$: trainable bias vector;
• $\mathcal{N}(i)$: the one-hop neighbourhood of node $i$;
• $F'$: dimension of the node attributes after a message-passing layer;

[source]

#### GraphConv

spektral.layers.GraphConv(channels, activation=None, use_bias=True, kernel_initializer='glorot_uniform', bias_initializer='zeros', kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None)


A graph convolutional layer (GCN) as presented by Kipf & Welling (2016).

Mode: single, disjoint, mixed, batch.

This layer computes: where $\hat \A = \A + \I$ is the adjacency matrix with added self-loops and $\hat\D$ is its degree matrix.

Input

• Node features of shape ([batch], N, F);
• Modified Laplacian of shape ([batch], N, N); can be computed with spektral.utils.convolution.localpooling_filter.

Output

• Node features with the same shape as the input, but with the last dimension changed to channels.

Arguments

• channels: number of output channels;
• activation: activation function to use;
• use_bias: bool, add a bias vector to the output;
• kernel_initializer: initializer for the weights;
• bias_initializer: initializer for the bias vector;
• kernel_regularizer: regularization applied to the weights;
• bias_regularizer: regularization applied to the bias vector;
• activity_regularizer: regularization applied to the output;
• kernel_constraint: constraint applied to the weights;
• bias_constraint: constraint applied to the bias vector.

[source]

#### ChebConv

spektral.layers.ChebConv(channels, K=1, activation=None, use_bias=True, kernel_initializer='glorot_uniform', bias_initializer='zeros', kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None)


A Chebyshev convolutional layer as presented by Defferrard et al. (2016).

Mode: single, disjoint, mixed, batch.

This layer computes: where $\T^{(0)}, ..., \T^{(K - 1)}$ are Chebyshev polynomials of $\tilde \L$ defined as where is the normalized Laplacian with a rescaled spectrum.

Input

• Node features of shape ([batch], N, F);
• A list of K Chebyshev polynomials of shape [([batch], N, N), ..., ([batch], N, N)]; can be computed with spektral.utils.convolution.chebyshev_filter.

Output

• Node features with the same shape of the input, but with the last dimension changed to channels.

Arguments

• channels: number of output channels;
• K: order of the Chebyshev polynomials;
• activation: activation function to use;
• use_bias: bool, add a bias vector to the output;
• kernel_initializer: initializer for the weights;
• bias_initializer: initializer for the bias vector;
• kernel_regularizer: regularization applied to the weights;
• bias_regularizer: regularization applied to the bias vector;
• activity_regularizer: regularization applied to the output;
• kernel_constraint: constraint applied to the weights;
• bias_constraint: constraint applied to the bias vector.

[source]

#### GraphSageConv

spektral.layers.GraphSageConv(channels, aggregate_op='mean', activation=None, use_bias=True, kernel_initializer='glorot_uniform', bias_initializer='zeros', kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None)


A GraphSAGE layer as presented by Hamilton et al. (2017).

Mode: single, disjoint.

This layer computes: where $\textrm{AGGREGATE}$ is a function to aggregate a node's neighbourhood. The supported aggregation methods are: sum, mean, max, min, and product.

Input

• Node features of shape (N, F);
• Binary adjacency matrix of shape (N, N).

Output

• Node features with the same shape as the input, but with the last dimension changed to channels.

Arguments

• channels: number of output channels;
• aggregate_op: str, aggregation method to use ('sum', 'mean', 'max', 'min', 'prod');
• activation: activation function to use;
• use_bias: bool, add a bias vector to the output;
• kernel_initializer: initializer for the weights;
• bias_initializer: initializer for the bias vector;
• kernel_regularizer: regularization applied to the weights;
• bias_regularizer: regularization applied to the bias vector;
• activity_regularizer: regularization applied to the output;
• kernel_constraint: constraint applied to the weights;
• bias_constraint: constraint applied to the bias vector.

[source]

#### ARMAConv

spektral.layers.ARMAConv(channels, order=1, iterations=1, share_weights=False, gcn_activation='relu', dropout_rate=0.0, activation=None, use_bias=True, kernel_initializer='glorot_uniform', bias_initializer='zeros', kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None)


A graph convolutional layer with ARMA$_K$ filters, as presented by Bianchi et al. (2019).

Mode: single, disjoint, mixed, batch.

This layer computes: where $K$ is the order of the ARMA$_K$ filter, and where: is a recursive approximation of an ARMA$_1$ filter, where $\bar \X^{(0)} = \X$ and is the normalized Laplacian with a rescaled spectrum.

Input

• Node features of shape ([batch], N, F);
• Normalized and rescaled Laplacian of shape ([batch], N, N); can be computed with spektral.utils.convolution.normalized_laplacian and spektral.utils.convolution.rescale_laplacian.

Output

• Node features with the same shape as the input, but with the last dimension changed to channels.

Arguments

• channels: number of output channels;
• order: order of the full ARMA$_K$ filter, i.e., the number of parallel stacks in the layer;
• iterations: number of iterations to compute each ARMA$_1$ approximation;
• share_weights: share the weights in each ARMA$_1$ stack.
• gcn_activation: activation function to use to compute each ARMA$_1$ stack;
• dropout_rate: dropout rate for skip connection;
• activation: activation function to use;
• use_bias: bool, add a bias vector to the output;
• kernel_initializer: initializer for the weights;
• bias_initializer: initializer for the bias vector;
• kernel_regularizer: regularization applied to the weights;
• bias_regularizer: regularization applied to the bias vector;
• activity_regularizer: regularization applied to the output;
• kernel_constraint: constraint applied to the weights;
• bias_constraint: constraint applied to the bias vector.

[source]

#### EdgeConditionedConv

spektral.layers.EdgeConditionedConv(channels, kernel_network=None, root=True, activation=None, use_bias=True, kernel_initializer='glorot_uniform', bias_initializer='zeros', kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None)


An edge-conditioned convolutional layer (ECC) as presented by Simonovsky & Komodakis (2017).

Mode: single, disjoint, batch.

Notes:

• In single mode, if the adjacency matrix is dense it will be converted to a SparseTensor automatically (which is an expensive operation).

For each node $i$, this layer computes: where $\textrm{MLP}$ is a multi-layer perceptron that outputs an edge-specific weight as a function of edge attributes.

Input

• Node features of shape ([batch], N, F);
• Binary adjacency matrices of shape ([batch], N, N);
• Edge features. In single mode, shape (num_edges, S); in batch mode, shape (batch, N, N, S).

Output

• node features with the same shape of the input, but the last dimension changed to channels.

Arguments

• channels: integer, number of output channels;
• kernel_network: a list of integers representing the hidden neurons of the kernel-generating network;
• 'root': if False, the layer will not consider the root node for computing the message passing (first term in equation above), but only the neighbours.
• activation: activation function to use;
• use_bias: bool, add a bias vector to the output;
• kernel_initializer: initializer for the weights;
• bias_initializer: initializer for the bias vector;
• kernel_regularizer: regularization applied to the weights;
• bias_regularizer: regularization applied to the bias vector;
• activity_regularizer: regularization applied to the output;
• kernel_constraint: constraint applied to the weights;
• bias_constraint: constraint applied to the bias vector.

[source]

#### GraphAttention

spektral.layers.GraphAttention(channels, attn_heads=1, concat_heads=True, dropout_rate=0.5, return_attn_coef=False, activation=None, use_bias=True, kernel_initializer='glorot_uniform', bias_initializer='zeros', attn_kernel_initializer='glorot_uniform', kernel_regularizer=None, bias_regularizer=None, attn_kernel_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None, attn_kernel_constraint=None)


A graph attention layer (GAT) as presented by Velickovic et al. (2017).

Mode: single, disjoint, mixed, batch.

This layer expects dense inputs when working in batch mode.

This layer computes a convolution similar to layers.GraphConv, but uses the attention mechanism to weight the adjacency matrix instead of using the normalized Laplacian: where where $\a \in \mathbb{R}^{2F'}$ is a trainable attention kernel. Dropout is also applied to $\alpha$ before computing $\Z$. Parallel attention heads are computed in parallel and their results are aggregated by concatenation or average.

Input

• Node features of shape ([batch], N, F);
• Binary adjacency matrix of shape ([batch], N, N);

Output

• Node features with the same shape as the input, but with the last dimension changed to channels;
• if return_attn_coef=True, a list with the attention coefficients for each attention head. Each attention coefficient matrix has shape ([batch], N, N).

Arguments

• channels: number of output channels;
• attn_heads: number of attention heads to use;
• concat_heads: bool, whether to concatenate the output of the attention heads instead of averaging;
• dropout_rate: internal dropout rate for attention coefficients;
• return_attn_coef: if True, return the attention coefficients for the given input (one N x N matrix for each head).
• activation: activation function to use;
• use_bias: bool, add a bias vector to the output;
• kernel_initializer: initializer for the weights;
• attn_kernel_initializer: initializer for the attention weights;
• bias_initializer: initializer for the bias vector;
• kernel_regularizer: regularization applied to the weights;
• attn_kernel_regularizer: regularization applied to the attention kernels;
• bias_regularizer: regularization applied to the bias vector;
• activity_regularizer: regularization applied to the output;
• kernel_constraint: constraint applied to the weights;
• attn_kernel_constraint: constraint applied to the attention kernels;
• bias_constraint: constraint applied to the bias vector.

[source]

#### GraphConvSkip

spektral.layers.GraphConvSkip(channels, activation=None, use_bias=True, kernel_initializer='glorot_uniform', bias_initializer='zeros', kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None)


A simple convolutional layer with a skip connection.

Mode: single, disjoint, mixed, batch.

This layer computes: where $\A$ does not have self-loops (unlike in GraphConv).

Input

• Node features of shape ([batch], N, F);
• Normalized adjacency matrix of shape ([batch], N, N); can be computed with spektral.utils.convolution.normalized_adjacency.

Output

• Node features with the same shape as the input, but with the last dimension changed to channels.

Arguments

• channels: number of output channels;
• activation: activation function to use;
• use_bias: bool, add a bias vector to the output;
• kernel_initializer: initializer for the weights;
• bias_initializer: initializer for the bias vector;
• kernel_regularizer: regularization applied to the weights;
• bias_regularizer: regularization applied to the bias vector;
• activity_regularizer: regularization applied to the output;
• kernel_constraint: constraint applied to the weights;
• bias_constraint: constraint applied to the bias vector.

[source]

#### APPNP

spektral.layers.APPNP(channels, alpha=0.2, propagations=1, mlp_hidden=None, mlp_activation='relu', dropout_rate=0.0, activation=None, use_bias=True, kernel_initializer='glorot_uniform', bias_initializer='zeros', kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None)


A graph convolutional layer implementing the APPNP operator, as presented by Klicpera et al. (2019).

This layer computes: where $\alpha$ is the teleport probability and $\textrm{MLP}$ is a multi-layer perceptron.

Mode: single, disjoint, mixed, batch.

Input

• Node features of shape ([batch], N, F);
• Modified Laplacian of shape ([batch], N, N); can be computed with spektral.utils.convolution.localpooling_filter.

Output

• Node features with the same shape as the input, but with the last dimension changed to channels.

Arguments

• channels: number of output channels;
• alpha: teleport probability during propagation;
• propagations: number of propagation steps;
• mlp_hidden: list of integers, number of hidden units for each hidden layer in the MLP (if None, the MLP has only the output layer);
• mlp_activation: activation for the MLP layers;
• dropout_rate: dropout rate for Laplacian and MLP layers;
• activation: activation function to use;
• use_bias: bool, add a bias vector to the output;
• kernel_initializer: initializer for the weights;
• bias_initializer: initializer for the bias vector;
• kernel_regularizer: regularization applied to the weights;
• bias_regularizer: regularization applied to the bias vector;
• activity_regularizer: regularization applied to the output;
• kernel_constraint: constraint applied to the weights;
• bias_constraint: constraint applied to the bias vector.

[source]

#### GINConv

spektral.layers.GINConv(channels, epsilon=None, mlp_hidden=None, mlp_activation='relu', activation=None, use_bias=True, kernel_initializer='glorot_uniform', bias_initializer='zeros', kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None)


A Graph Isomorphism Network (GIN) as presented by Xu et al. (2018).

Mode: single, disjoint.

This layer expects a sparse adjacency matrix.

This layer computes for each node $i$: where $\textrm{MLP}$ is a multi-layer perceptron.

Input

• Node features of shape (N, F);
• Binary adjacency matrix of shape (N, N).

Output

• Node features with the same shape of the input, but the last dimension changed to channels.

Arguments

• channels: integer, number of output channels;
• epsilon: unnamed parameter, see Xu et al. (2018), and the equation above. By setting epsilon=None, the parameter will be learned (default behaviour). If given as a value, the parameter will stay fixed.
• mlp_hidden: list of integers, number of hidden units for each hidden layer in the MLP (if None, the MLP has only the output layer);
• mlp_activation: activation for the MLP layers;
• activation: activation function to use;
• use_bias: bool, add a bias vector to the output;
• kernel_initializer: initializer for the weights;
• bias_initializer: initializer for the bias vector;
• kernel_regularizer: regularization applied to the weights;
• bias_regularizer: regularization applied to the bias vector;
• activity_regularizer: regularization applied to the output;
• kernel_constraint: constraint applied to the weights;
• bias_constraint: constraint applied to the bias vector.

[source]

#### DiffusionConv

spektral.layers.DiffusionConv(channels, num_diffusion_steps=6, kernel_initializer='glorot_uniform', kernel_regularizer=None, kernel_constraint=None, activation='tanh')


Applies Graph Diffusion Convolution as descibed by Li et al. (2016)

Mode: single, disjoint, mixed, batch.

This layer expects a dense adjacency matrix.

Given a number of diffusion steps $K$ and a row normalized adjacency matrix $\hat \A$, this layer calculates the q'th channel as:

Input

• Node features of shape ([batch], N, F);
• Normalized adjacency or attention coef. matrix $\hat \A$ of shape ([batch], N, N); Use DiffusionConvolution.preprocess to normalize.

Output

• Node features with the same shape as the input, but with the last dimension changed to channels.

Arguments

• channels: number of output channels;
• num_diffusion_steps: How many diffusion steps to consider. $K$ in paper.
• activation: activation function $\sigma$; ($\tanh$ by default)
• kernel_initializer: initializer for the weights;
• kernel_regularizer: regularization applied to the weights;
• kernel_constraint: constraint applied to the weights;

[source]

#### GatedGraphConv

spektral.layers.GatedGraphConv(channels, n_layers, activation=None, use_bias=True, kernel_initializer='glorot_uniform', bias_initializer='zeros', kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None)


A gated graph convolutional layer as presented by Li et al. (2018).

Mode: single, disjoint.

This layer expects a sparse adjacency matrix.

This layer repeatedly applies a GRU cell $L$ times to the node attributes where $\textrm{GRU}$ is the GRU cell.

Input

• Node features of shape (N, F); note that F must be smaller or equal than channels.
• Binary adjacency matrix of shape (N, N).

Output

• Node features with the same shape of the input, but the last dimension changed to channels.

Arguments

• channels: integer, number of output channels;
• n_layers: integer, number of iterations with the GRU cell;
• activation: activation function to use;
• use_bias: bool, add a bias vector to the output;
• kernel_initializer: initializer for the weights;
• bias_initializer: initializer for the bias vector;
• kernel_regularizer: regularization applied to the weights;
• bias_regularizer: regularization applied to the bias vector;
• activity_regularizer: regularization applied to the output;
• kernel_constraint: constraint applied to the weights;
• bias_constraint: constraint applied to the bias vector.

[source]

#### AGNNConv

spektral.layers.AGNNConv(trainable=True, activation=None)


An Attention-based Graph Neural Network (AGNN) as presented by Thekumparampil et al. (2018).

Mode: single, disjoint.

This layer expects a sparse adjacency matrix.

This layer computes: where and $\beta$ is a trainable parameter.

Input

• Node features of shape (N, F);
• Binary adjacency matrix of shape (N, N).

Output

• Node features with the same shape of the input.

Arguments

• trainable: boolean, if True, then beta is a trainable parameter. Otherwise, beta is fixed to 1;
• activation: activation function to use;

[source]

#### TAGConv

spektral.layers.TAGConv(channels, K=3, activation=None, use_bias=True, kernel_initializer='glorot_uniform', bias_initializer='zeros', kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None)


A Topology Adaptive Graph Convolutional layer (TAG) as presented by Du et al. (2017).

Mode: single, disjoint.

This layer expects a sparse adjacency matrix.

This layer computes:

Input

• Node features of shape (N, F);
• Binary adjacency matrix of shape (N, N).

Output

• Node features with the same shape of the input, but the last dimension changed to channels.

Arguments

• channels: integer, number of output channels;
• K: the order of the layer (i.e., the layer will consider a K-hop neighbourhood for each node);
• activation: activation function to use;
• use_bias: bool, add a bias vector to the output;
• kernel_initializer: initializer for the weights;
• bias_initializer: initializer for the bias vector;
• kernel_regularizer: regularization applied to the weights;
• bias_regularizer: regularization applied to the bias vector;
• activity_regularizer: regularization applied to the output;
• kernel_constraint: constraint applied to the weights;
• bias_constraint: constraint applied to the bias vector.

[source]

#### CrystalConv

spektral.layers.CrystalConv(channels, activation=None, use_bias=True, kernel_initializer='glorot_uniform', bias_initializer='zeros', kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None)


A Crystal Graph Convolutional layer as presented by Xie & Grossman (2018).

Mode: single, disjoint.

This layer expects a sparse adjacency matrix.

This layer computes for each node $i$: where $\z_{ij} = \X_i \| \X_j \| \E_{ij}$, $\sigma$ is a sigmoid activation, and $g$ is the activation function (defined by the activation argument).

Input

• Node features of shape (N, F);
• Binary adjacency matrix of shape (N, N).
• Edge features of shape (num_edges, S).

Output

• Node features with the same shape of the input, but the last dimension changed to channels.

Arguments

• channels: integer, number of output channels;
• activation: activation function to use;
• use_bias: bool, add a bias vector to the output;
• kernel_initializer: initializer for the weights;
• bias_initializer: initializer for the bias vector;
• kernel_regularizer: regularization applied to the weights;
• bias_regularizer: regularization applied to the bias vector;
• activity_regularizer: regularization applied to the output;
• kernel_constraint: constraint applied to the weights;
• bias_constraint: constraint applied to the bias vector.

[source]

#### EdgeConv

spektral.layers.EdgeConv(channels, mlp_hidden=None, mlp_activation='relu', activation=None, use_bias=True, kernel_initializer='glorot_uniform', bias_initializer='zeros', kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None)


An Edge Convolutional layer as presented by Wang et al. (2018).

Mode: single, disjoint.

This layer expects a sparse adjacency matrix.

This layer computes for each node $i$: where $\textrm{MLP}$ is a multi-layer perceptron.

Input

• Node features of shape (N, F);
• Binary adjacency matrix of shape (N, N).

Output

• Node features with the same shape of the input, but the last dimension changed to channels.

Arguments

• channels: integer, number of output channels;
• mlp_hidden: list of integers, number of hidden units for each hidden layer in the MLP (if None, the MLP has only the output layer);
• mlp_activation: activation for the MLP layers;
• activation: activation function to use;
• use_bias: bool, add a bias vector to the output;
• kernel_initializer: initializer for the weights;
• bias_initializer: initializer for the bias vector;
• kernel_regularizer: regularization applied to the weights;
• bias_regularizer: regularization applied to the bias vector;
• activity_regularizer: regularization applied to the output;
• kernel_constraint: constraint applied to the weights;
• bias_constraint: constraint applied to the bias vector.

[source]

#### MessagePassing

spektral.layers.MessagePassing(aggregate='sum')


A general class for message passing as presented by Gilmer et al. (2017).

Mode: single, disjoint.

This layer and all of its extensions expect a sparse adjacency matrix.

This layer computes:

where $\gamma$ is a differentiable update function, $\phi$ is a differentiable message function, $\square$ is a permutation-invariant function to aggregate the messages (like the sum or the average), and $\E_{ij}$ is the edge attribute of edge i-j.

By extending this class, it is possible to create any message-passing layer in single/disjoint mode.

API:

• propagate(X, A, E=None, **kwargs): propagate the messages and computes embeddings for each node in the graph. kwargs will be propagated as keyword arguments to message(), aggregate() and update().
• message(X, **kwargs): computes messages, equivalent to $\phi$ in the definition. Any extra keyword argument of this function will be populated by propagate() if a matching keyword is found. Use self.get_i() and self.get_j() to gather the elements using the indices i or j of the adjacency matrix (e.g, self.get_j(X) will get the features of the neighbours).
• aggregate(messages, **kwargs): aggregates the messages, equivalent to $\square$ in the definition. The behaviour of this function can also be controlled using the aggregate keyword in the constructor of the layer (supported aggregations: sum, mean, max, min, prod). Any extra keyword argument of this function will be populated by propagate() if a matching keyword is found.
• update(embeddings, **kwargs): updates the aggregated messages to obtain the final node embeddings, equivalent to $\gamma$ in the definition. Any extra keyword argument of this function will be populated by propagate() if a matching keyword is found.

Arguments:

• aggregate: string or callable, an aggregate function. This flag can be used to control the behaviour of aggregate() wihtout re-implementing it. Supported aggregations: 'sum', 'mean', 'max', 'min', 'prod'. If callable, the function must have the signature foo(updates, indices, N) and return a rank 2 tensor with shape (N, ...).