## Convolutional layers

The message-passing layers from these papers are available in Spektral:

- Semi-Supervised Classification with Graph Convolutional Networks
- Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering
- Inductive Representation Learning on Large Graphs
- Graph Neural Networks with convolutional ARMA filters
- Dynamic Edge-Conditioned Filters in Convolutional Neural Networks on Graphs
- Graph Attention Networks
- Predict then Propagate: Graph Neural Networks meet Personalized PageRank
- How Powerful are Graph Neural Networks?
- Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting
- Gated Graph Sequence Neural Networks
- Attention-based Graph Neural Network for Semi-supervised Learning
- Topology Adaptive Graph Convolutional Networks
- Crystal Graph Convolutional Neural Networks for an Accurate and Interpretable Prediction of Material Properties
- Dynamic Graph CNN for Learning on Point Clouds

Notation:

- : number of nodes in the graph;
- : dimension of the node attributes (i.e., each node has an attribute in );
- : dimension of the edge attributes (i.e., each edge has an attribute in );
- : binary adjacency matrix;
- : node attributes matrix;
- : edge attributes matrix;
- : degree matrix;
- : trainable kernels;
- : trainable bias vector;
- : the one-hop neighbourhood of node ;
- : dimension of the node attributes after a message-passing layer;

#### GraphConv

```
spektral.layers.GraphConv(channels, activation=None, use_bias=True, kernel_initializer='glorot_uniform', bias_initializer='zeros', kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None)
```

A graph convolutional layer (GCN) as presented by Kipf & Welling (2016).

**Mode**: single, disjoint, mixed, batch.

This layer computes: where is the adjacency matrix with added self-loops and is its degree matrix.

**Input**

- Node features of shape
`([batch], N, F)`

; - Modified Laplacian of shape
`([batch], N, N)`

; can be computed with`spektral.utils.convolution.localpooling_filter`

.

**Output**

- Node features with the same shape as the input, but with the last
dimension changed to
`channels`

.

**Arguments**

`channels`

: number of output channels;`activation`

: activation function to use;`use_bias`

: bool, add a bias vector to the output;`kernel_initializer`

: initializer for the weights;`bias_initializer`

: initializer for the bias vector;`kernel_regularizer`

: regularization applied to the weights;`bias_regularizer`

: regularization applied to the bias vector;`activity_regularizer`

: regularization applied to the output;`kernel_constraint`

: constraint applied to the weights;`bias_constraint`

: constraint applied to the bias vector.

#### ChebConv

```
spektral.layers.ChebConv(channels, K=1, activation=None, use_bias=True, kernel_initializer='glorot_uniform', bias_initializer='zeros', kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None)
```

A Chebyshev convolutional layer as presented by Defferrard et al. (2016).

**Mode**: single, disjoint, mixed, batch.

This layer computes: where are Chebyshev polynomials of defined as where is the normalized Laplacian with a rescaled spectrum.

**Input**

- Node features of shape
`([batch], N, F)`

; - A list of K Chebyshev polynomials of shape
`[([batch], N, N), ..., ([batch], N, N)]`

; can be computed with`spektral.utils.convolution.chebyshev_filter`

.

**Output**

- Node features with the same shape of the input, but with the last
dimension changed to
`channels`

.

**Arguments**

`channels`

: number of output channels;`K`

: order of the Chebyshev polynomials;`activation`

: activation function to use;`use_bias`

: bool, add a bias vector to the output;`kernel_initializer`

: initializer for the weights;`bias_initializer`

: initializer for the bias vector;`kernel_regularizer`

: regularization applied to the weights;`bias_regularizer`

: regularization applied to the bias vector;`activity_regularizer`

: regularization applied to the output;`kernel_constraint`

: constraint applied to the weights;`bias_constraint`

: constraint applied to the bias vector.

#### GraphSageConv

```
spektral.layers.GraphSageConv(channels, aggregate_op='mean', activation=None, use_bias=True, kernel_initializer='glorot_uniform', bias_initializer='zeros', kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None)
```

A GraphSAGE layer as presented by Hamilton et al. (2017).

**Mode**: single, disjoint.

This layer computes: where is a function to aggregate a node's neighbourhood. The supported aggregation methods are: sum, mean, max, min, and product.

**Input**

- Node features of shape
`(N, F)`

; - Binary adjacency matrix of shape
`(N, N)`

.

**Output**

- Node features with the same shape as the input, but with the last
dimension changed to
`channels`

.

**Arguments**

`channels`

: number of output channels;`aggregate_op`

: str, aggregation method to use (`'sum'`

,`'mean'`

,`'max'`

,`'min'`

,`'prod'`

);`activation`

: activation function to use;`use_bias`

: bool, add a bias vector to the output;`kernel_initializer`

: initializer for the weights;`bias_initializer`

: initializer for the bias vector;`kernel_regularizer`

: regularization applied to the weights;`bias_regularizer`

: regularization applied to the bias vector;`activity_regularizer`

: regularization applied to the output;`kernel_constraint`

: constraint applied to the weights;`bias_constraint`

: constraint applied to the bias vector.

#### ARMAConv

```
spektral.layers.ARMAConv(channels, order=1, iterations=1, share_weights=False, gcn_activation='relu', dropout_rate=0.0, activation=None, use_bias=True, kernel_initializer='glorot_uniform', bias_initializer='zeros', kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None)
```

A graph convolutional layer with ARMA filters, as presented by Bianchi et al. (2019).

**Mode**: single, disjoint, mixed, batch.

This layer computes: where is the order of the ARMA filter, and where: is a recursive approximation of an ARMA filter, where and is the normalized Laplacian with a rescaled spectrum.

**Input**

- Node features of shape
`([batch], N, F)`

; - Normalized and rescaled Laplacian of shape
`([batch], N, N)`

; can be computed with`spektral.utils.convolution.normalized_laplacian`

and`spektral.utils.convolution.rescale_laplacian`

.

**Output**

- Node features with the same shape as the input, but with the last
dimension changed to
`channels`

.

**Arguments**

`channels`

: number of output channels;`order`

: order of the full ARMA filter, i.e., the number of parallel stacks in the layer;`iterations`

: number of iterations to compute each ARMA approximation;`share_weights`

: share the weights in each ARMA stack.`gcn_activation`

: activation function to use to compute each ARMA stack;`dropout_rate`

: dropout rate for skip connection;`activation`

: activation function to use;`use_bias`

: bool, add a bias vector to the output;`kernel_initializer`

: initializer for the weights;`bias_initializer`

: initializer for the bias vector;`kernel_regularizer`

: regularization applied to the weights;`bias_regularizer`

: regularization applied to the bias vector;`activity_regularizer`

: regularization applied to the output;`kernel_constraint`

: constraint applied to the weights;`bias_constraint`

: constraint applied to the bias vector.

#### EdgeConditionedConv

```
spektral.layers.EdgeConditionedConv(channels, kernel_network=None, root=True, activation=None, use_bias=True, kernel_initializer='glorot_uniform', bias_initializer='zeros', kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None)
```

An edge-conditioned convolutional layer (ECC) as presented by Simonovsky & Komodakis (2017).

**Mode**: single, disjoint, batch.

**Notes**:

- In single mode, if the adjacency matrix is dense it will be converted to a SparseTensor automatically (which is an expensive operation).

For each node , this layer computes: where is a multi-layer perceptron that outputs an edge-specific weight as a function of edge attributes.

**Input**

- Node features of shape
`([batch], N, F)`

; - Binary adjacency matrices of shape
`([batch], N, N)`

; - Edge features. In single mode, shape
`(num_edges, S)`

; in batch mode, shape`(batch, N, N, S)`

.

**Output**

- node features with the same shape of the input, but the last dimension
changed to
`channels`

.

**Arguments**

`channels`

: integer, number of output channels;`kernel_network`

: a list of integers representing the hidden neurons of the kernel-generating network;- 'root': if False, the layer will not consider the root node for computing the message passing (first term in equation above), but only the neighbours.
`activation`

: activation function to use;`use_bias`

: bool, add a bias vector to the output;`kernel_initializer`

: initializer for the weights;`bias_initializer`

: initializer for the bias vector;`kernel_regularizer`

: regularization applied to the weights;`bias_regularizer`

: regularization applied to the bias vector;`activity_regularizer`

: regularization applied to the output;`kernel_constraint`

: constraint applied to the weights;`bias_constraint`

: constraint applied to the bias vector.

#### GraphAttention

```
spektral.layers.GraphAttention(channels, attn_heads=1, concat_heads=True, dropout_rate=0.5, return_attn_coef=False, activation=None, use_bias=True, kernel_initializer='glorot_uniform', bias_initializer='zeros', attn_kernel_initializer='glorot_uniform', kernel_regularizer=None, bias_regularizer=None, attn_kernel_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None, attn_kernel_constraint=None)
```

A graph attention layer (GAT) as presented by Velickovic et al. (2017).

**Mode**: single, disjoint, mixed, batch.

**This layer expects dense inputs when working in batch mode.**

This layer computes a convolution similar to `layers.GraphConv`

, but
uses the attention mechanism to weight the adjacency matrix instead of
using the normalized Laplacian:
where
where is a trainable attention kernel.
Dropout is also applied to before computing .
Parallel attention heads are computed in parallel and their results are
aggregated by concatenation or average.

**Input**

- Node features of shape
`([batch], N, F)`

; - Binary adjacency matrix of shape
`([batch], N, N)`

;

**Output**

- Node features with the same shape as the input, but with the last
dimension changed to
`channels`

; - if
`return_attn_coef=True`

, a list with the attention coefficients for each attention head. Each attention coefficient matrix has shape`([batch], N, N)`

.

**Arguments**

`channels`

: number of output channels;`attn_heads`

: number of attention heads to use;`concat_heads`

: bool, whether to concatenate the output of the attention heads instead of averaging;`dropout_rate`

: internal dropout rate for attention coefficients;`return_attn_coef`

: if True, return the attention coefficients for the given input (one N x N matrix for each head).`activation`

: activation function to use;`use_bias`

: bool, add a bias vector to the output;`kernel_initializer`

: initializer for the weights;`attn_kernel_initializer`

: initializer for the attention weights;`bias_initializer`

: initializer for the bias vector;`kernel_regularizer`

: regularization applied to the weights;`attn_kernel_regularizer`

: regularization applied to the attention kernels;`bias_regularizer`

: regularization applied to the bias vector;`activity_regularizer`

: regularization applied to the output;`kernel_constraint`

: constraint applied to the weights;`attn_kernel_constraint`

: constraint applied to the attention kernels;`bias_constraint`

: constraint applied to the bias vector.

#### GraphConvSkip

```
spektral.layers.GraphConvSkip(channels, activation=None, use_bias=True, kernel_initializer='glorot_uniform', bias_initializer='zeros', kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None)
```

A simple convolutional layer with a skip connection.

**Mode**: single, disjoint, mixed, batch.

This layer computes: where does not have self-loops (unlike in GraphConv).

**Input**

- Node features of shape
`([batch], N, F)`

; - Normalized adjacency matrix of shape
`([batch], N, N)`

; can be computed with`spektral.utils.convolution.normalized_adjacency`

.

**Output**

- Node features with the same shape as the input, but with the last
dimension changed to
`channels`

.

**Arguments**

`channels`

: number of output channels;`activation`

: activation function to use;`use_bias`

: bool, add a bias vector to the output;`kernel_initializer`

: initializer for the weights;`bias_initializer`

: initializer for the bias vector;`kernel_regularizer`

: regularization applied to the weights;`bias_regularizer`

: regularization applied to the bias vector;`activity_regularizer`

: regularization applied to the output;`kernel_constraint`

: constraint applied to the weights;`bias_constraint`

: constraint applied to the bias vector.

#### APPNP

```
spektral.layers.APPNP(channels, alpha=0.2, propagations=1, mlp_hidden=None, mlp_activation='relu', dropout_rate=0.0, activation=None, use_bias=True, kernel_initializer='glorot_uniform', bias_initializer='zeros', kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None)
```

A graph convolutional layer implementing the APPNP operator, as presented by Klicpera et al. (2019).

This layer computes:
where is the *teleport* probability and is a
multi-layer perceptron.

**Mode**: single, disjoint, mixed, batch.

**Input**

- Node features of shape
`([batch], N, F)`

; - Modified Laplacian of shape
`([batch], N, N)`

; can be computed with`spektral.utils.convolution.localpooling_filter`

.

**Output**

- Node features with the same shape as the input, but with the last
dimension changed to
`channels`

.

**Arguments**

`channels`

: number of output channels;`alpha`

: teleport probability during propagation;`propagations`

: number of propagation steps;`mlp_hidden`

: list of integers, number of hidden units for each hidden layer in the MLP (if None, the MLP has only the output layer);`mlp_activation`

: activation for the MLP layers;`dropout_rate`

: dropout rate for Laplacian and MLP layers;`activation`

: activation function to use;`use_bias`

: bool, add a bias vector to the output;`kernel_initializer`

: initializer for the weights;`bias_initializer`

: initializer for the bias vector;`kernel_regularizer`

: regularization applied to the weights;`bias_regularizer`

: regularization applied to the bias vector;`activity_regularizer`

: regularization applied to the output;`kernel_constraint`

: constraint applied to the weights;`bias_constraint`

: constraint applied to the bias vector.

#### GINConv

```
spektral.layers.GINConv(channels, epsilon=None, mlp_hidden=None, mlp_activation='relu', activation=None, use_bias=True, kernel_initializer='glorot_uniform', bias_initializer='zeros', kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None)
```

A Graph Isomorphism Network (GIN) as presented by Xu et al. (2018).

**Mode**: single, disjoint.

**This layer expects a sparse adjacency matrix.**

This layer computes for each node : where is a multi-layer perceptron.

**Input**

- Node features of shape
`(N, F)`

; - Binary adjacency matrix of shape
`(N, N)`

.

**Output**

- Node features with the same shape of the input, but the last dimension
changed to
`channels`

.

**Arguments**

`channels`

: integer, number of output channels;`epsilon`

: unnamed parameter, see Xu et al. (2018), and the equation above. By setting`epsilon=None`

, the parameter will be learned (default behaviour). If given as a value, the parameter will stay fixed.`mlp_hidden`

: list of integers, number of hidden units for each hidden layer in the MLP (if None, the MLP has only the output layer);`mlp_activation`

: activation for the MLP layers;`activation`

: activation function to use;`use_bias`

: bool, add a bias vector to the output;`kernel_initializer`

: initializer for the weights;`bias_initializer`

: initializer for the bias vector;`kernel_regularizer`

: regularization applied to the weights;`bias_regularizer`

: regularization applied to the bias vector;`activity_regularizer`

: regularization applied to the output;`kernel_constraint`

: constraint applied to the weights;`bias_constraint`

: constraint applied to the bias vector.

#### DiffusionConv

```
spektral.layers.DiffusionConv(channels, num_diffusion_steps=6, kernel_initializer='glorot_uniform', kernel_regularizer=None, kernel_constraint=None, activation='tanh')
```

Applies Graph Diffusion Convolution as descibed by Li et al. (2016)

**Mode**: single, disjoint, mixed, batch.

**This layer expects a dense adjacency matrix.**

Given a number of diffusion steps and a row normalized adjacency matrix , this layer calculates the q'th channel as:

**Input**

- Node features of shape
`([batch], N, F)`

; - Normalized adjacency or attention coef. matrix of shape
`([batch], N, N)`

; Use`DiffusionConvolution.preprocess`

to normalize.

**Output**

- Node features with the same shape as the input, but with the last
dimension changed to
`channels`

.

**Arguments**

`channels`

: number of output channels;`num_diffusion_steps`

: How many diffusion steps to consider. in paper.`activation`

: activation function ; ( by default)`kernel_initializer`

: initializer for the weights;`kernel_regularizer`

: regularization applied to the weights;`kernel_constraint`

: constraint applied to the weights;

#### GatedGraphConv

```
spektral.layers.GatedGraphConv(channels, n_layers, activation=None, use_bias=True, kernel_initializer='glorot_uniform', bias_initializer='zeros', kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None)
```

A gated graph convolutional layer as presented by Li et al. (2018).

**Mode**: single, disjoint.

**This layer expects a sparse adjacency matrix.**

This layer repeatedly applies a GRU cell times to the node attributes where is the GRU cell.

**Input**

- Node features of shape
`(N, F)`

; note that`F`

must be smaller or equal than`channels`

. - Binary adjacency matrix of shape
`(N, N)`

.

**Output**

- Node features with the same shape of the input, but the last dimension
changed to
`channels`

.

**Arguments**

`channels`

: integer, number of output channels;`n_layers`

: integer, number of iterations with the GRU cell;`activation`

: activation function to use;`use_bias`

: bool, add a bias vector to the output;`kernel_initializer`

: initializer for the weights;`bias_initializer`

: initializer for the bias vector;`kernel_regularizer`

: regularization applied to the weights;`bias_regularizer`

: regularization applied to the bias vector;`activity_regularizer`

: regularization applied to the output;`kernel_constraint`

: constraint applied to the weights;`bias_constraint`

: constraint applied to the bias vector.

#### AGNNConv

```
spektral.layers.AGNNConv(trainable=True, activation=None)
```

An Attention-based Graph Neural Network (AGNN) as presented by Thekumparampil et al. (2018).

**Mode**: single, disjoint.

**This layer expects a sparse adjacency matrix.**

This layer computes: where and is a trainable parameter.

**Input**

- Node features of shape
`(N, F)`

; - Binary adjacency matrix of shape
`(N, N)`

.

**Output**

- Node features with the same shape of the input.

**Arguments**

`trainable`

: boolean, if True, then beta is a trainable parameter. Otherwise, beta is fixed to 1;`activation`

: activation function to use;

#### TAGConv

```
spektral.layers.TAGConv(channels, K=3, activation=None, use_bias=True, kernel_initializer='glorot_uniform', bias_initializer='zeros', kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None)
```

A Topology Adaptive Graph Convolutional layer (TAG) as presented by Du et al. (2017).

**Mode**: single, disjoint.

**This layer expects a sparse adjacency matrix.**

This layer computes:

**Input**

- Node features of shape
`(N, F)`

; - Binary adjacency matrix of shape
`(N, N)`

.

**Output**

- Node features with the same shape of the input, but the last dimension
changed to
`channels`

.

**Arguments**

`channels`

: integer, number of output channels;`K`

: the order of the layer (i.e., the layer will consider a K-hop neighbourhood for each node);`activation`

: activation function to use;`use_bias`

: bool, add a bias vector to the output;`kernel_initializer`

: initializer for the weights;`bias_initializer`

: initializer for the bias vector;`kernel_regularizer`

: regularization applied to the weights;`bias_regularizer`

: regularization applied to the bias vector;`activity_regularizer`

: regularization applied to the output;`kernel_constraint`

: constraint applied to the weights;`bias_constraint`

: constraint applied to the bias vector.

#### CrystalConv

```
spektral.layers.CrystalConv(channels, activation=None, use_bias=True, kernel_initializer='glorot_uniform', bias_initializer='zeros', kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None)
```

A Crystal Graph Convolutional layer as presented by Xie & Grossman (2018).

**Mode**: single, disjoint.

**This layer expects a sparse adjacency matrix.**

This layer computes for each node :
where , is a sigmoid
activation, and is the activation function (defined by the `activation`

argument).

**Input**

- Node features of shape
`(N, F)`

; - Binary adjacency matrix of shape
`(N, N)`

. - Edge features of shape
`(num_edges, S)`

.

**Output**

- Node features with the same shape of the input, but the last dimension
changed to
`channels`

.

**Arguments**

`channels`

: integer, number of output channels;`activation`

: activation function to use;`use_bias`

: bool, add a bias vector to the output;`kernel_initializer`

: initializer for the weights;`bias_initializer`

: initializer for the bias vector;`kernel_regularizer`

: regularization applied to the weights;`bias_regularizer`

: regularization applied to the bias vector;`activity_regularizer`

: regularization applied to the output;`kernel_constraint`

: constraint applied to the weights;`bias_constraint`

: constraint applied to the bias vector.

#### EdgeConv

```
spektral.layers.EdgeConv(channels, mlp_hidden=None, mlp_activation='relu', activation=None, use_bias=True, kernel_initializer='glorot_uniform', bias_initializer='zeros', kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None)
```

An Edge Convolutional layer as presented by Wang et al. (2018).

**Mode**: single, disjoint.

**This layer expects a sparse adjacency matrix.**

This layer computes for each node : where is a multi-layer perceptron.

**Input**

- Node features of shape
`(N, F)`

; - Binary adjacency matrix of shape
`(N, N)`

.

**Output**

- Node features with the same shape of the input, but the last dimension
changed to
`channels`

.

**Arguments**

`channels`

: integer, number of output channels;`mlp_hidden`

: list of integers, number of hidden units for each hidden layer in the MLP (if None, the MLP has only the output layer);`mlp_activation`

: activation for the MLP layers;`activation`

: activation function to use;`use_bias`

: bool, add a bias vector to the output;`kernel_initializer`

: initializer for the weights;`bias_initializer`

: initializer for the bias vector;`kernel_regularizer`

: regularization applied to the weights;`bias_regularizer`

: regularization applied to the bias vector;`activity_regularizer`

: regularization applied to the output;`kernel_constraint`

: constraint applied to the weights;`bias_constraint`

: constraint applied to the bias vector.

#### MessagePassing

```
spektral.layers.MessagePassing(aggregate='sum')
```

A general class for message passing as presented by Gilmer et al. (2017).

**Mode**: single, disjoint.

**This layer and all of its extensions expect a sparse adjacency matrix.**

This layer computes:

where is a differentiable update function, is a differentiable message function, is a permutation-invariant function to aggregate the messages (like the sum or the average), and is the edge attribute of edge i-j.

By extending this class, it is possible to create any message-passing layer in single/disjoint mode.

**API:**

`propagate(X, A, E=None, **kwargs)`

: propagate the messages and computes embeddings for each node in the graph.`kwargs`

will be propagated as keyword arguments to`message()`

,`aggregate()`

and`update()`

.`message(X, **kwargs)`

: computes messages, equivalent to in the definition. Any extra keyword argument of this function will be populated by`propagate()`

if a matching keyword is found. Use`self.get_i()`

and`self.get_j()`

to gather the elements using the indices`i`

or`j`

of the adjacency matrix (e.g,`self.get_j(X)`

will get the features of the neighbours).`aggregate(messages, **kwargs)`

: aggregates the messages, equivalent to in the definition. The behaviour of this function can also be controlled using the`aggregate`

keyword in the constructor of the layer (supported aggregations: sum, mean, max, min, prod). Any extra keyword argument of this function will be populated by`propagate()`

if a matching keyword is found.`update(embeddings, **kwargs)`

: updates the aggregated messages to obtain the final node embeddings, equivalent to in the definition. Any extra keyword argument of this function will be populated by`propagate()`

if a matching keyword is found.

**Arguments**:

`aggregate`

: string or callable, an aggregate function. This flag can be used to control the behaviour of`aggregate()`

wihtout re-implementing it. Supported aggregations: 'sum', 'mean', 'max', 'min', 'prod'. If callable, the function must have the signature`foo(updates, indices, N)`

and return a rank 2 tensor with shape`(N, ...)`

.