Loaders

[source]

Loader

spektral.data.loaders.Loader(dataset, batch_size=1, epochs=None, shuffle=True)

Parent class for data loaders. The role of a Loader is to iterate over a Dataset and yield batches of graphs to feed your Keras Models.

This is achieved by having a generator object that produces lists of Graphs, which are then collated together and returned as Tensor-like objects.

The core of a Loader is the collate(batch) method. This takes as input a list of Graphs and returns a list of Tensors or SparseTensors.

For instance, if all graphs have the same number of nodes and size of the attributes, a simple collation function can be:

def collate(self, batch):
    x = np.array([g.x for g in batch])
    a = np.array([g.a for g in batch)]
    return x, a

The load() method of a Loader returns an object that can be given as input to Model.fit(). You can use it as follows:

model.fit(loader.load(), steps_per_epoch=loader.steps_per_epoch)

The steps_per_epoch property represents the number of batches that are in an epoch, and is a required keyword when calling model.fit() with a Loader.

If you want to write your own training function, you can use the tf_signature() method to specify the signature of your batches using the tf.TypeSpec system, in order to avoid unnecessary re-tracings.

For example, a simple training function can be written as:

@tf.function(input_signature=loader.tf_signature())
def train_step(inputs, target):
    with tf.GradientTape() as tape:
        predictions = model(inputs, training=True)
        loss = loss_fn(target, predictions) + sum(model.losses)
    gradients = tape.gradient(loss, model.trainable_variables)
    opt.apply_gradients(zip(gradients, model.trainable_variables))

We can then train our model in a for loop as follows:

for batch in loader:
    train_step(*batch)

Arguments

  • dataset: a graph Dataset;
  • batch_size: size of the mini-batches;
  • epochs: number of epochs to iterate over the dataset. By default (None) iterates indefinitely;
  • shuffle: whether to shuffle the data at the start of each epoch.

[source]

SingleLoader

spektral.data.loaders.SingleLoader(dataset, epochs=None, sample_weights=None)

A Loader for single mode.

This loader produces Tensors representing a single graph. As such, it can only be used with Datasets of length 1 and the batch_size cannot be set.

The loader supports sample weights through the sample_weights argument. If given, then each batch will be a tuple (inputs, labels, sample_weights).

Arguments

  • dataset: a graph Dataset;
  • epochs: number of epochs to iterate over the dataset. By default (None) iterates indefinitely;
  • shuffle: whether to shuffle the data at the start of each epoch;
  • sample_weights: if given, these will be appended to the output automatically.

Output

Returns a tuple (inputs, labels) or (inputs, labels, sample_weights).

inputs is a tuple containing the data matrices of the graph, only if they are not None:

  • x: same as dataset[0].x;
  • a: same as dataset[0].a (scipy sparse matrices are converted to SparseTensors);
  • e: same as dataset[0].e;

labels is the same as datsaset[0].y. sample_weights is the same object passed to the constructor.


[source]

DisjointLoader

spektral.data.loaders.DisjointLoader(dataset, node_level=False, batch_size=1, epochs=None, shuffle=True)

A Loader for disjoint mode.

This loader represents a batch of graphs via their disjoint union.

The loader automatically computes a batch index tensor, containing integer indices that map each node to its corresponding graph in the batch.

The adjacency matrix os returned as a SparseTensor, regardless of the input.

If node_level=False, the labels are interpreted as graph-level labels and are stacked along an additional dimension. If node_level=True, then the labels are stacked vertically.

Note: TensorFlow 2.4 or above is required to use this Loader's load() method in a Keras training loop.

Arguments

  • dataset: a graph Dataset;
  • batch_size: size of the mini-batches;
  • epochs: number of epochs to iterate over the dataset. By default (None) iterates indefinitely;
  • shuffle: whether to shuffle the data at the start of each epoch.

Output

For each batch, returns a tuple (inputs, labels).

inputs is a tuple containing:

  • x: node attributes of shape [n_nodes, n_node_features];
  • a: adjacency matrices of shape [n_nodes, n_nodes];
  • e: edge attributes of shape [n_edges, n_edge_features].

labels have shape [batch, n_labels] if node_level=False or [n_nodes, n_labels] otherwise.


[source]

BatchLoader

spektral.data.loaders.BatchLoader(dataset, batch_size=1, epochs=None, shuffle=True)

A Loader for batch mode.

This loader returns batches of graphs stacked along an extra dimension, with all "node" dimensions padded to be equal among all graphs.

If n_max is the number of nodes of the biggest graph in the batch, then the padding consist of adding zeros to the node features, adjacency matrix, and edge attributes of each graph so that they have shapes (n_max, n_node_features), (n_max, n_max), and (n_max, n_max, n_edge_features) respectively.

The zero-padding is done batch-wise, which saves up memory at the cost of more computation. If latency is an issue but memory isn't, or if the dataset has graphs with a similar number of nodes, you can use the PackedBatchLoader that first zero-pads all the dataset and then iterates over it.

Note that the adjacency matrix and edge attributes are returned as dense arrays (mostly due to the lack of support for sparse tensor operations for rank >2).

Only graph-level labels are supported with this loader (i.e., labels are not zero-padded because they are assumed to have no "node" dimensions).

Arguments

  • dataset: a graph Dataset;
  • batch_size: size of the mini-batches;
  • epochs: number of epochs to iterate over the dataset. By default (None) iterates indefinitely;
  • shuffle: whether to shuffle the data at the start of each epoch.

Output

For each batch, returns a tuple (inputs, labels).

inputs is a tuple containing:

  • x: node attributes of shape [batch, n_max, n_node_features];
  • a: adjacency matrices of shape [batch, n_max, n_max];
  • e: edge attributes of shape [batch, n_edges, n_edge_features].

labels have shape [batch, n_labels].


[source]

PackedBatchLoader

spektral.data.loaders.PackedBatchLoader(dataset, batch_size=1, epochs=None, shuffle=True)

A BatchLoader that zero-pads the graphs before iterating over the dataset. This means that n_max is computed over the whole dataset and not just a single batch.

While using more memory than BatchLoader, this loader should reduce the computational overhead of padding each batch independently.

Use this loader if:

  • memory usage isn't an issue and you want to produce the batches as fast as possible;
  • the graphs in the dataset have similar sizes and there are no outliers in the dataset (i.e., anomalous graphs with many more nodes than the dataset average).

Arguments

  • dataset: a graph Dataset;
  • batch_size: size of the mini-batches;
  • epochs: number of epochs to iterate over the dataset. By default (None) iterates indefinitely;
  • shuffle: whether to shuffle the data at the start of each epoch.

Output

For each batch, returns a tuple (inputs, labels).

inputs is a tuple containing:

  • x: node attributes of shape [batch, n_max, n_node_features];
  • a: adjacency matrices of shape [batch, n_max, n_max];
  • e: edge attributes of shape [batch, n_max, n_edge_features].

labels have shape [batch, ..., n_labels].


[source]

MixedLoader

spektral.data.loaders.MixedLoader(dataset, batch_size=1, epochs=None, shuffle=True)

A Loader for mixed mode.

This loader returns batches where the node and edge attributes are stacked along an extra dimension, but the adjacency matrix is shared by all graphs.

The loader expects all node and edge features to have the same number of nodes and edges. The dataset is pre-packed like in a PackedBatchLoader.

Arguments

  • dataset: a graph Dataset;
  • batch_size: size of the mini-batches;
  • epochs: number of epochs to iterate over the dataset. By default (None) iterates indefinitely;
  • shuffle: whether to shuffle the data at the start of each epoch.

Output

For each batch, returns a tuple (inputs, labels).

inputs is a tuple containing:

  • x: node attributes of shape [batch, n_nodes, n_node_features];
  • a: adjacency matrix of shape [n_nodes, n_nodes];
  • e: edge attributes of shape [batch, n_edges, n_edge_features].

labels have shape [batch, ..., n_labels].