Datasets
Citation networks
load_data
spektral.datasets.citation.load_data(dataset_name='cora', normalize_features=True, random_split=False)
Loads a citation dataset (Cora, Citeseer or Pubmed) using the "Planetoid" splits intialliy defined in Yang et al. (2016). The train, test, and validation splits are given as binary masks.
Node attributes are bagofwords vectors representing the most common words in the text document associated to each node. Two papers are connected if either one cites the other. Labels represent the class of the paper.
Arguments

dataset_name
: name of the dataset to load ('cora'
,'citeseer'
, or'pubmed'
); 
normalize_features
: if True, the node features are normalized; 
random_split
: if True, return a randomized split (20 nodes per class for training, 30 nodes per class for validation and the remaining nodes for testing, Shchur et al. (2018)).
Return
 Adjacency matrix;
 Node features;
 Labels;
 Three binary masks for train, validation, and test splits.
GraphSAGE datasets
load_data
spektral.datasets.graphsage.load_data(dataset_name, max_degree=1, normalize_features=True)
Loads one of the datasets (PPI or Reddit) used in Hamilton & Ying (2017).
The PPI dataset (originally Stark et al. (2006)) for inductive node classification uses positional gene sets, motif gene sets and immunological signatures as features and gene ontology sets as labels.
The Reddit dataset consists of a graph made of Reddit posts in the month of September, 2014. The label for each node is the community that a post belongs to. The graph is built by sampling 50 large communities and two nodes are connected if the same user commented on both. Node features are obtained by concatenating the average GloVe CommonCrawl vectors of the title and comments, the post's score and the number of comments.
The train, test, and validation splits are returned as binary masks.
Arguments

dataset_name
: name of the dataset to load ('ppi'
, or'reddit'
); 
max_degree
: int, if positive, subsample edges so that each node has the specified maximum degree. 
normalize_features
: if True, the node features are normalized;
Return
 Adjacency matrix;
 Node features;
 Labels;
 Three binary masks for train, validation, and test splits.
TU Dortmund Benchmark Datasets for Graph Kernels
load_data
spektral.datasets.tud.load_data(dataset_name, clean=False)
Loads one of the Benchmark Data Sets for Graph Kernels from TU Dortmund (link). The node features are computed by concatenating the following features for each node:
 node attributes, if available, normalized as specified in
normalize_features
;  clustering coefficient, normalized with zscore;
 node degrees, normalized as specified in
normalize_features
;  node labels, if available, onehot encoded.
Arguments

dataset_name
: name of the dataset to load (seespektral.datasets.tud.AVAILABLE_DATASETS
). 
clean
: if True, return a version of the dataset with no isomorphic graphs.
Return
 a list of adjacency matrices;
 a list of node feature matrices;
 a numpy array containing the onehot encoded targets.
Open Graph Benchmark (OGB)
graph_to_numpy
spektral.datasets.ogb.graph_to_numpy(graph, dtype=None)
Converts a graph in OGB's libraryagnostic format to a representation in Numpy/Scipy. See the Open Graph Benchmark's website for more information.
Arguments

graph
: OGB libraryagnostic graph; 
dtype
: if set, all output arrays will be cast to this dtype.
Return
 X: np.array of shape (N, F) with the node features;
 A: scipy.sparse adjacency matrix of shape (N, N) in COOrdinate format;
 E: if edge features are available, np.array of shape (n_edges, S),
None
otherwise.
dataset_to_numpy
spektral.datasets.ogb.dataset_to_numpy(dataset, indices=None, dtype=None)
Converts a dataset in OGB's libraryagnostic version to lists of Numpy/Scipy arrays. See the Open Graph Benchmark's website for more information.
Arguments

dataset
: OGB libraryagnostic dataset (e.g., GraphPropPredDataset); 
indices
: optional, a list of integer indices; if provided, only these graphs will be converted; 
dtype
: if set, the arrays in the returned lists will have this dtype.
Return
 X_list: list of np.arrays of (variable) shape (N, F) with node features;
 A_list: list of scipy.sparse adjacency matrices of (variable) shape (N, N);
 E_list: list of np.arrays of (variable) shape (n_nodes, S) with edge attributes. If edge attributes are not available, a list of None.
 y_list: np.array of shape (n_graphs, n_tasks) with the task labels;
QM9 Small Molecules
load_data
spektral.datasets.qm9.load_data(nf_keys=None, ef_keys=None, auto_pad=True, self_loops=False, amount=None, return_type='numpy')
Loads the QM9 chemical data set of small molecules.
Nodes represent heavy atoms (hydrogens are discarded), edges represent chemical bonds.
The node features represent the chemical properties of each atom, and are
loaded according to the nf_keys
argument.
See spektral.datasets.qm9.NODE_FEATURES
for possible node features, and
see this link
for the meaning of each property. Usually, it is sufficient to load the
atomic number.
The edge features represent the type and stereoscopy of each chemical bond
between two atoms.
See spektral.datasets.qm9.EDGE_FEATURES
for possible edge features, and
see this link
for the meaning of each property. Usually, it is sufficient to load the
type of bond.
Arguments

nf_keys
: list or str, node features to return (seeqm9.NODE_FEATURES
for available features); 
ef_keys
: list or str, edge features to return (seeqm9.EDGE_FEATURES
for available features); 
auto_pad
: ifreturn_type='numpy'
, zero pad graph matrices to have the same number of nodes; 
self_loops
: ifreturn_type='numpy'
, add self loops to adjacency matrices; 
amount
: the amount of molecules to return (in ascending order by number of atoms). 
return_type
:'numpy'
,'networkx'
, or'sdf'
, data format to return;
Return
 if
return_type='numpy'
, the adjacency matrix, node features, edge features, and a Pandas dataframe containing labels;  if
return_type='networkx'
, a list of graphs in Networkx format, and a dataframe containing labels;  if
return_type='sdf'
, a list of molecules in the internal SDF format and a dataframe containing labels.
MNIST KNN Grid
load_data
spektral.datasets.mnist.load_data(k=8, noise_level=0.0)
Loads the MNIST dataset and a KNN graph to perform graph signal classification, as described by Defferrard et al. (2016). The KNN graph is statically determined from a regular grid of pixels using the 2d coordinates.
The node features of each graph are the MNIST digits vectorized and rescaled to [0, 1]. Two nodes are connected if they are neighbours according to the KNN graph. Labels are the MNIST class associated to each sample.
Arguments

k
: int, number of neighbours for each node; 
noise_level
: fraction of edges to flip (from 0 to 1 and vice versa);
Return
 X_train, y_train: training node features and labels;
 X_val, y_val: validation node features and labels;
 X_test, y_test: test node features and labels;
 A: adjacency matrix of the grid;
Delaunay Triangulations
generate_data
spektral.datasets.delaunay.generate_data(classes=0, n_samples_in_class=1000, n_nodes=7, support_low=0.0, support_high=10.0, drift_amount=1.0, one_hot_labels=True, support=None, seed=None, return_type='numpy')
Generates a dataset of Delaunay triangulations as described by Zambon et al. (2017).
Node attributes are the 2D coordinates of the points. Two nodes are connected if they share an edge in the Delaunay triangulation. Labels represent the class of the graph (0 to 20, each class index i represent the "difficulty" of the classification problem 0 v. i. In other words, the higher the class index, the more similar the class is to class 0).
Arguments

classes
: indices of the classes to load (integer, or list of integers between 0 and 20); 
n_samples_in_class
: number of generated samples per class; 
n_nodes
: number of nodes in a graph; 
support_low
: lower bound of the uniform distribution from which the support is generated; 
support_high
: upper bound of the uniform distribution from which the support is generated; 
drift_amount
: coefficient to control the amount of change between classes; 
one_hot_labels
: onehot encode dataset labels; 
support
: custom support to use instead of generating it randomly; 
seed
: random numpy seed; 
return_type
:'numpy'
or'networkx'
, data format to return;
Return
 if
return_type='numpy'
, the adjacency matrix, node features, and an array containing labels;  if
return_type='networkx'
, a list of graphs in Networkx format, and an array containing labels;