All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Removed MOOC, citation(CoraFull and CiteSeer still live) dataset.
- Make DeepGNN snark server temporal loading skippable
- Make DeepGNN snark server feature loading skippable
- Fixes temporal sampling bug, in graphs with multiple edge types that were not deleted, neighbors will be repeated with uniform sampling.
- Fixes edge feature fetching when edges are not sorted by destination id.
-
Add
return_edge_created_ts
argument to neighbor sampling methods to return timestamps when edges connecting nodes were created. -
MOOC
temporal dataset. -
TGN example.
-
GCN example.
-
Add PyG remote backend example.
- Uniform sampling works in temporal graphs.
- ADL path parsing to download graph data.
- Changed pytorch examples to be self contained and use Ray for distributed training.
- link prediction and knowledgegraph examples
- deepgnn-torch/tf are no longer published
-
Breaking. Temporal graph support. Custom decoders must add 2 optional integers in returned tuple in
decode
method, representingcreated_at
andremoved_at
fields. Metadata file must have awatermark
field. -
Last N created neighbors sampling method for temporal graphs.
- Change generated file meta.txt to meta.json in json format.
- All
DistributedGraph
config options (e.g.grpc_options
,num_threads
, ...) are exposed toDistributedClient
andBackendOptions
-
Add usage example for Ray Train, see docs/torch/ray_usage.rst.
-
Add documentation for Ray Data usage, see tutorial and example
-
Add Reddit dataset download tool at deepgnn.graph_engine.data.reddit.
-
Added
grpc_options
to distributed client to control service config. -
Added
ppr-go
neighbor sampling strategy.
-
Implement del method to release C++ client and server. Important for ray actors, because they create numerous clients during training.
-
If sparse feature values present on multiple servers, then only one will be returned with source picked randomly.
- Remove ALL_NODE_TYPE, ALL_EDGE_TYPE, len and iter from Graph API.
- Breaking. Rename get_feature_type -> get_python_type.
-
Add new converter input format "EdgeList" with EdgeListDecoder. Format has nodes and edges on separate lines, is smaller and faster to convert.
-
Breaking. Added version checks for binary data. Requires to convert graph data or add v1 at the top of meta files.
-
Add migrate script to pull to new version of deepgnn.
-
Add debug mode to MultiWorkersConverter, using debug=True will now disable multiprocessing and show error messages.
-
Load graph partitions from separate folders.
-
Breaking. Remove FeatureType enum, replace with np.dtype. FeatureType.BINARY -> np.uint8, FeatureType.FLOAT -> np.float32, FeatureType.INT64 -> np.int64.
-
Rename function deepgnn.graph_engine.data.to_json_node -> deepgnn.graph_engine.data.to_edge_list_node and update functionality accordingly.
-
Support nodes and their outgoing edges on different partitions.
-
Adds neighbor count method to graph.
- Return empty indices and values for missing sparse features.
- Don't record empty sparse features and log warning if sparse features were requested, but dense features are stored.
- JSON/TSV converter didn't sort edges by types resulted in incorrect sampling.
-
Rename and move convert.output to converter.process.converter_process. Dispatchers make argument 'process' default to converter.process.converter_process. Dispatchers move process argument after decoder_type.
-
Replace converter and dispatcher's argument "decoder_type" -> "decoder" that accepts Decoder object directly instead of DecoderType enum. Replace DecoderType enum with type hint.
-
Make Decoder.decode a generator that yields a node then its outgoing edges in order. The yield format for nodes/edges is (node_id/src, -1/dst, type, weight, features), with features being a list of dense features as ndarrays and sparse features as 2 tuples, coordinates and values.
- Add BinaryWriter as new entry point for NodeWriter, EdgeWriter and alias writers.
- Meta.json files are no longer needed by the converter. Remove meta path argument from MultiWorkerConverter and Dispatchers.
- Fill dimensions with 0 for missing features.