Data

[source]

Graph

spektral.data.graph.Graph(x=None, a=None, e=None, y=None)

A container to represent a graph. The data associated with the Graph is stored in its attributes:

  • x, for the node features;
  • a, for the adjacency matrix;
  • e, for the edge attributes;
  • y, for the node or graph labels;

All of these default to None if you don't specify them in the constructor. If you want to read all non-None attributes at once, you can call the numpy() method, which will return all data in a tuple (with the order defined above).

Graphs also have the following attributes that are computed automatically from the data:

  • n_nodes: number of nodes;
  • n_edges: number of edges;
  • n_node_features: size of the node features, if available;
  • n_edge_features: size of the edge features, if available;
  • n_labels: size of the labels, if available;

Any additional kwargs passed to the constructor will be automatically assigned as instance attributes of the graph.

Data can be stored in Numpy arrays or Scipy sparse matrices, and labels can also be scalars.

Spektral usually assumes that the different data matrices have specific shapes, although this is not strictly enforced to allow more flexibility. In general, node attributes should have shape (n_nodes, n_node_features) and the adjacency matrix should have shape (n_nodes, n_nodes).

Edge attributes can be stored in a dense format as arrays of shape (n_nodes, n_nodes, n_edge_features) or in a sparse format as arrays of shape (n_edges, n_edge_features) (so that you don't have to store all the zeros for missing edges). Most components of Spektral will know how to deal with both situations automatically.

Labels can refer to the entire graph (shape (n_labels, )) or to each individual node (shape (n_nodes, n_labels)).

Arguments

  • x: np.array, the node features (shape (n_nodes, n_node_features));
  • a: np.array or scipy.sparse matrix, the adjacency matrix (shape (n_nodes, n_nodes));
  • e: np.array, the edge features (shape (n_nodes, n_nodes, n_edge_features) or (n_edges, n_edge_features));
  • y: np.array, the node or graph labels (shape (n_nodes, n_labels) or (n_labels, ));

[source]

Dataset

spektral.data.dataset.Dataset(transforms=None)

A container for Graph objects. This class can be extended to represent a graph dataset.

To create a Dataset, you must implement the Dataset.read() method, which must return a list of spektral.data.Graph objects:

class MyDataset(Dataset):
    def read(self):
        return [Graph(x=x, adj=adj, y=y) for x, adj, y in some_magic_list]

The download() method is automatically called if the path returned by Dataset.path does not exists (default ~/spektral/datasets/ClassName/).

In this case, download() will be called before read().

Datasets should generally behave like Numpy arrays for any operation that uses simple 1D indexing:

>>> dataset[0]
Graph(...)

>>> dataset[[1, 2, 3]]
Dataset(n_graphs=3)

>>> dataset[1:10]
Dataset(n_graphs=9)

>>> np.random.shuffle(dataset)  # shuffle in-place

>>> for graph in dataset[:3]:
>>>     print(graph)
Graph(...)
Graph(...)
Graph(...)

Datasets have the following properties that are automatically computed:

  • n_nodes: the number of nodes in the dataset (always None, except in single and mixed mode datasets);
  • n_node_features: the size of the node features (assumed to be equal for all graphs);
  • n_edge_features: the size of the edge features (assumed to be equal for all graphs);
  • n_labels: the size of the labels (assumed to be equal for all graphs); this is computed as y.shape[-1].

Any additional kwargs passed to the constructor will be automatically assigned as instance attributes of the dataset.

Datasets also offer three main manipulation functions to apply callables to their graphs:

  • apply(transform): replaces each graph with the output of transform(graph). See spektral.transforms for some ready-to-use transforms.
    Example: apply(spektral.transforms.NormalizeAdj()) normalizes the adjacency matrix of each graph in the dataset.
  • map(transform, reduce=None): returns a list containing the output of transform(graph) for each graph. If reduce is a callable, then returns reduce(output_list).
    Example: map(lambda: g.n_nodes, reduce=np.mean) will return the average number of nodes in the dataset.
  • filter(function): removes from the dataset any graph for which function(graph) is False.
    Example: filter(lambda: g.n_nodes < 100) removes from the dataset all graphs bigger than 100 nodes.

Datasets in mixed mode (one adjacency matrix, many instances of node features) are expected to have a particular structure. The graphs returned by read() should not have an adjacency matrix, which should be instead stored as a singleton in the dataset's a attribute. For example:

class MyMixedModeDataset(Dataset):
    def read(self):
        self.a = compute_adjacency_matrix()
        return [Graph(x=x, y=y) for x, y in some_magic_list]

Have a look at the spektral.datasets module for examples of popular datasets already implemented.

Arguments

  • transforms: a callable or list of callables that are automatically applied to the graphs after loading the dataset.

Data utils

to_disjoint

spektral.data.utils.to_disjoint(x_list=None, a_list=None, e_list=None)

Converts lists of node features, adjacency matrices and edge features to disjoint mode.

Either the node features or the adjacency matrices must be provided as input.

The i-th element of each list must be associated with the i-th graph.

The method also computes the batch index to retrieve individual graphs from the disjoint union.

Edge attributes can be represented as:

  • a dense array of shape (n_nodes, n_nodes, n_edge_features);
  • a sparse edge list of shape (n_edges, n_edge_features);

and they will always be returned as a stacked edge list.

Arguments

  • x_list: a list of np.arrays of shape (n_nodes, n_node_features) -- note that n_nodes can change between graphs;

  • a_list: a list of np.arrays or scipy.sparse matrices of shape (n_nodes, n_nodes);

  • e_list: a list of np.arrays of shape (n_nodes, n_nodes, n_edge_features) or (n_edges, n_edge_features);

Return
Only if the corresponding list is given as input:

  • x: np.array of shape (n_nodes, n_node_features);
  • a: scipy.sparse matrix of shape (n_nodes, n_nodes);
  • e: np.array of shape (n_edges, n_edge_features);
  • i: np.array of shape (n_nodes, );

to_batch

spektral.data.utils.to_batch(x_list=None, a_list=None, e_list=None, mask=False)

Converts lists of node features, adjacency matrices and edge features to batch mode, by zero-padding all tensors to have the same node dimension n_max.

Either the node features or the adjacency matrices must be provided as input.

The i-th element of each list must be associated with the i-th graph.

If a_list contains sparse matrices, they will be converted to dense np.arrays.

The edge attributes of a graph can be represented as

  • a dense array of shape (n_nodes, n_nodes, n_edge_features);
  • a sparse edge list of shape (n_edges, n_edge_features);

and they will always be returned as dense arrays.

Arguments

  • x_list: a list of np.arrays of shape (n_nodes, n_node_features) -- note that n_nodes can change between graphs;

  • a_list: a list of np.arrays or scipy.sparse matrices of shape (n_nodes, n_nodes);

  • e_list: a list of np.arrays of shape (n_nodes, n_nodes, n_edge_features) or (n_edges, n_edge_features);

  • mask: bool, if True, node attributes will be extended with a binary mask that indicates valid nodes (the last feature of each node will be 1 if the node is valid and 0 otherwise). Use this flag in conjunction with layers.base.GraphMasking to start the propagation of masks in a model.

Return
Only if the corresponding list is given as input:

  • x: np.array of shape (batch, n_max, n_node_features);
  • a: np.array of shape (batch, n_max, n_max);
  • e: np.array of shape (batch, n_max, n_max, n_edge_features);

to_mixed

spektral.data.utils.to_mixed(x_list=None, a=None, e_list=None)

Converts lists of node features and edge features to mixed mode.

The adjacency matrix must be passed as a singleton, i.e., a single np.array or scipy.sparse matrix shared by all graphs.

Edge attributes can be represented as:

  • a dense array of shape (n_nodes, n_nodes, n_edge_features);
  • a sparse edge list of shape (n_edges, n_edge_features);

and they will always be returned as a batch of edge lists.

Arguments

  • x_list: a list of np.arrays of shape (n_nodes, n_node_features) -- note that n_nodes must be the same between graphs;

  • a: a np.array or scipy.sparse matrix of shape (n_nodes, n_nodes);

  • e_list: a list of np.arrays of shape (n_nodes, n_nodes, n_edge_features) or (n_edges, n_edge_features);

Return
Only if the corresponding element is given as input:

  • x: np.array of shape (batch, n_nodes, n_node_features);
  • a: scipy.sparse matrix of shape (n_nodes, n_nodes);
  • e: np.array of shape (batch, n_edges, n_edge_features);

batch_generator

spektral.data.utils.batch_generator(data, batch_size=32, epochs=None, shuffle=True)

Iterates over the data for the given number of epochs, yielding batches of size batch_size.

Arguments

  • data: np.array or list of np.arrays with the same first dimension;

  • batch_size: number of samples in a batch;

  • epochs: number of times to iterate over the data (default None, iterates indefinitely);

  • shuffle: whether to shuffle the data at the beginning of each epoch

Return
Batches of size batch_size.


to_tf_signature

spektral.data.utils.to_tf_signature(signature)

Converts a Dataset signature to a TensorFlow signature.

Arguments

  • signature: a Dataset signature.

Return
A TensorFlow signature.