Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Example] Add Pytorch Geometric Example #4568

Merged
merged 34 commits into from
Nov 18, 2020
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
115cc62
add example for Pytorch Geometric
tchaton Nov 7, 2020
cd595f4
remove hydra
tchaton Nov 7, 2020
d2500ce
add docstring
tchaton Nov 7, 2020
3f68072
remove description
tchaton Nov 7, 2020
2c054c0
rename folder
tchaton Nov 7, 2020
dd1093a
update script to not break test
tchaton Nov 7, 2020
ccbc95e
Merge branch 'master' into ecosystem_examples
tchaton Nov 7, 2020
6145bd9
remove .lock
tchaton Nov 7, 2020
c0ca670
Merge branch 'master' into ecosystem_examples
tchaton Nov 7, 2020
ed8b9e0
Merge branch 'master' into ecosystem_examples
tchaton Nov 9, 2020
c5b8658
Merge branch 'ecosystem_examples' of https://github.com/PyTorchLightn…
tchaton Nov 9, 2020
e7747da
add Pytorch Geometric to doc
tchaton Nov 9, 2020
3f0df39
add docstring at the begining
tchaton Nov 9, 2020
291764c
add comments
tchaton Nov 9, 2020
2e75321
Merge branch 'master' into ecosystem_examples
tchaton Nov 9, 2020
366232d
Merge branch 'master' into ecosystem_examples
tchaton Nov 9, 2020
8a5623e
Merge branch 'master' into ecosystem_examples
tchaton Nov 9, 2020
83c9deb
Merge branch 'master' into ecosystem_examples
tchaton Nov 9, 2020
9e58b36
Merge branch 'master' into ecosystem_examples
tchaton Nov 9, 2020
8fd078e
Merge branch 'master' into ecosystem_examples
tchaton Nov 10, 2020
5947123
Merge branch 'master' into ecosystem_examples
tchaton Nov 10, 2020
ea06b28
Merge branch 'master' into ecosystem_examples
tchaton Nov 10, 2020
75cd4c4
Merge branch 'master' into ecosystem_examples
tchaton Nov 10, 2020
e653101
Merge branch 'master' into ecosystem_examples
tchaton Nov 11, 2020
4a0a502
Merge branch 'master' into ecosystem_examples
tchaton Nov 12, 2020
104e096
Merge branch 'master' into ecosystem_examples
tchaton Nov 12, 2020
3de0cef
Merge branch 'master' into ecosystem_examples
tchaton Nov 14, 2020
5739c3d
Update pl_examples/pytorch_ecosystem/pytorch_geometric/README.md
tchaton Nov 14, 2020
a0cded4
Update pl_examples/pytorch_ecosystem/pytorch_geometric/README.md
tchaton Nov 14, 2020
e990667
Update pl_examples/pytorch_ecosystem/pytorch_geometric/cora_dna.py
tchaton Nov 14, 2020
860367d
Merge branch 'master' into ecosystem_examples
tchaton Nov 16, 2020
e0679e7
Merge branch 'master' into ecosystem_examples
tchaton Nov 16, 2020
3ab225b
add toml
Borda Nov 16, 2020
1477eb0
Merge branch 'master' into ecosystem_examples
tchaton Nov 18, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 32 additions & 0 deletions pl_examples/pytorch_ecosystem/pytorch_geometric/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Examples of [Pytorch Geometric](https://github.com/rusty1s/pytorch_geometric) with Lighting

### Introduction

PyTorch Geometric (PyG) is a geometric deep learning extension library for PyTorch. It relies on lower level libraries such as

* PyTorch Cluster: A package consists of a small extension library of highly optimized graph cluster algorithms in Pytorch
* PyTorch Sparse: A package consists of a small extension library of optimized sparse matrix operations with autograd support in Pytorch
* PyTorch Scatter: A package consists of a small extension library of highly optimized sparse update (scatter and segment) operations for the use in PyTorch

## Setup

```
pyenv install 3.7.8
pyenv local 3.7.8
python -m venv
source .venv/bin/activate
poetry install
```

## Current example lists

| `DATASET` | `MODEL` | `TASK` | DATASET DESCRIPTION | MODEL DESCRIPTION | |
tchaton marked this conversation as resolved.
Show resolved Hide resolved
| ------------- | ----------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ---------------------------- | --- |
tchaton marked this conversation as resolved.
Show resolved Hide resolved
| Cora | DNAConv | Node Classification | The citation network datasets "Cora", "CiteSeer" and "PubMed" from the "Revisiting Semi-Supervised Learning with Graph Embeddings" <https://arxiv.org/abs/1603.08861> | The dynamic neighborhood aggregation operator from the "Just Jump: Towards Dynamic Neighborhood Aggregation in Graph Neural Networks"


## DATASET SIZES

```
16M ./cora
```
Empty file.
327 changes: 327 additions & 0 deletions pl_examples/pytorch_ecosystem/pytorch_geometric/cora_dna.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,327 @@

# python imports
import os
import os.path as osp
import sys
from functools import partial
from collections import namedtuple
from argparse import ArgumentParser
from typing import List, Optional, NamedTuple

# thrid parties libraries
import numpy as np
from torch import nn
import torch
from torch import Tensor
from torch.optim import Adam
import torch.nn.functional as F

# Lightning imports
from pytorch_lightning import (
Trainer,
LightningDataModule,
LightningModule
)
from pytorch_lightning.metrics import Accuracy

try:
# Pytorch Geometric imports
from torch_geometric.nn import DNAConv, MessagePassing
from torch_geometric.data import DataLoader
from torch_geometric.datasets import Planetoid
import torch_geometric.transforms as T
from torch_geometric.data import NeighborSampler
from lightning import lightning_logo, nice_print
HAS_PYTORCH_GEOMETRIC = True
except:
HAS_PYTORCH_GEOMETRIC = False

# use to make model jittable
OptTensor = Optional[Tensor]
ListTensor = List[Tensor]

class TensorBatch(NamedTuple):
x: Tensor
edge_index: ListTensor
edge_attr: OptTensor
batch: OptTensor

###################################
# LightningDataModule #
###################################

class CoraDataset(LightningDataModule):

r"""The citation network datasets "Cora", "CiteSeer" and "PubMed" from the
`"Revisiting Semi-Supervised Learning with Graph Embeddings"
<https://arxiv.org/abs/1603.08861>`_ paper.
Nodes represent documents and edges represent citation links.
Training, validation and test splits are given by binary masks.
c.f https://github.com/rusty1s/pytorch_geometric/blob/master/torch_geometric/datasets/planetoid.py
"""

NAME = "cora"

def __init__(self,
num_workers: int = 1,
batch_size: int = 8,
drop_last: bool = True,
pin_memory: bool = True,
num_layers: int = None):
super().__init__()

assert num_layers is not None

self._num_workers = num_workers
self._batch_size = batch_size
self._drop_last = drop_last
self._pin_memory = pin_memory
self._num_layers = num_layers

self._transform = T.NormalizeFeatures()

@property
def num_features(self):
return 1433

@property
def num_classes(self):
return 7

@property
def hyper_parameters(self):
return {"num_features": self.num_features, "num_classes": self.num_classes}

def prepare_data(self):
path = osp.join(
osp.dirname(osp.realpath(__file__)), "..", "..", "data", self.NAME
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this path to the module top as constant, bc if you move this file/package one level up it does not work...
or see and use similar as in tests.__init__ var PACKAGE_ROOT

)
self.dataset = Planetoid(path, self.NAME, transform=self._transform)
self.data = self.dataset[0]

def create_neighbor_sampler(self, batch_size=2, stage=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we ad types or just create an issue for adding later?
it would be useful to add docs here as it seems to be very specific step

return NeighborSampler(
self.data.edge_index,
node_idx=getattr(self.data, f"{stage}_mask"),
sizes=[self._num_layers, -1],
num_workers=self._num_workers,
drop_last=self._drop_last,
pin_memory=self._pin_memory,
)

def train_dataloader(self):
return self.create_neighbor_sampler(stage="train")

def validation_dataloader(self):
return self.create_neighbor_sampler(stage="val")

def test_dataloader(self):
return self.create_neighbor_sampler(stage="test")

def gather_data_and_convert_to_namedtuple(self, batch, batch_nb):
usual_keys = ["x", "edge_index", "edge_attr", "batch"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we rather get names from the TensorBatch definition?

Batch: TensorBatch = namedtuple("Batch", usual_keys)
return (
Batch(
self.data.x[batch[1]],
[e.edge_index for e in batch[2]],
None,
None,
),
self.data.y[batch[1]],
)
Comment on lines +141 to +150
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sorry, just not sure about this notation, not sure what it really does...


@staticmethod
def add_argparse_args(parser):
parser.add_argument("--num_workers", type=int, default=1)
parser.add_argument("--batch_size", type=int, default=2)
parser.add_argument("--drop_last", default=True)
parser.add_argument("--pin_memory", default=True)
Comment on lines +156 to +157
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
parser.add_argument("--drop_last", default=True)
parser.add_argument("--pin_memory", default=True)
parser.add_argument("--drop_last", type=bool, default=True)
parser.add_argument("--pin_memory", type=bool, default=True)

can we define it as bool?

return parser


###############################
# LightningModule #
###############################

class DNAConvNet(LightningModule):

r"""The dynamic neighborhood aggregation operator from the `"Just Jump:
Towards Dynamic Neighborhood Aggregation in Graph Neural Networks"
<https://arxiv.org/abs/1904.04849>`_ paper
c.f https://github.com/rusty1s/pytorch_geometric/blob/master/torch_geometric/nn/conv/dna_conv.py#L172
"""

def __init__(self,
num_layers: int = 2,
hidden_channels: int = 128,
heads: int = 8,
groups: int = 16,
dropout: float = 0.8,
cached: bool = False,
num_features: int = None,
num_classes: int = None,
):
super().__init__()

assert num_features is not None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why you set the default as None when you do not allow it?

assert num_classes is not None

self.save_hyperparameters()
hparams = self.hparams
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any special reason? :]


# Instantiate metrics
self.val_acc = Accuracy(hparams["num_classes"])
self.test_acc = Accuracy(hparams["num_classes"])

# Define DNA graph convolution model
self.hidden_channels = hparams["hidden_channels"]
self.lin1 = nn.Linear(hparams["num_features"], hparams["hidden_channels"])
self.convs = nn.ModuleList()
for _ in range(hparams["num_layers"]):
self.convs.append(
DNAConv(
hparams["hidden_channels"],
hparams["heads"],
hparams["groups"],
dropout=hparams["dropout"],
cached=False,
)
)
self.lin2 = nn.Linear(hparams["hidden_channels"], hparams["num_classes"], bias=False)

def forward(self, batch: TensorBatch):
x = batch.x
x = F.relu(self.lin1(x))
x = F.dropout(x, p=0.5, training=self.training)
x_all = x.view(-1, 1, self.hidden_channels)
for idx, conv in enumerate(self.convs):
x = F.relu(conv(x_all, batch.edge_index[idx]))
x = x.view(-1, 1, self.hidden_channels)
x_all = torch.cat([x_all, x], dim=1)
x = x_all[:, -1]
x = F.dropout(x, p=0.5, training=self.training)
return F.log_softmax(self.lin2(x), -1)

def step(self, batch, batch_nb):
typed_batch, targets = self.gather_data_and_convert_to_namedtuple(batch, batch_nb)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
typed_batch, targets = self.gather_data_and_convert_to_namedtuple(batch, batch_nb)
typed_batch, targets = self._gather_data_and_convert_to_namedtuple(batch, batch_nb)

is this good as public API?

update: I see later that it assigned from outside, then it is a but confusion, can we make a placeholder in init as

self.gather_data_and_convert_to_namedtuple = ...

logits = self(typed_batch)
return logits, targets

def training_step(self, batch, batch_nb):
logits, targets = self.step(batch, batch_nb)
train_loss = F.nll_loss(logits, targets)
self.log("train_loss", train_loss, on_step=True, on_epoch=True, prog_bar=True)
return train_loss

def validation_step(self, batch, batch_nb):
logits, targets = self.step(batch, batch_nb)
val_loss = F.nll_loss(logits, targets)
self.log("val_loss", val_loss, on_step=False, on_epoch=True, prog_bar=True)
self.log("val_acc", self.val_acc(logits, targets), on_step=False, on_epoch=True, prog_bar=True)

def test_step(self, batch, batch_nb):
logits, targets = self.step(batch, batch_nb)
test_loss = F.nll_loss(logits, targets)
self.log("test_loss", test_loss, on_step=False, on_epoch=True, prog_bar=True)
self.log("val_acc", self.test_acc(logits, targets), on_step=False, on_epoch=True, prog_bar=True)

# Use for jittable demonstration.

def _convert_to_jittable(self, module):
for key, m in module._modules.items():
if isinstance(m, MessagePassing) and m.jittable is not None:
# Pytorch Geometric MessagePassing implements a `.jittable` function
# which converts the current module into its jittable version.
module._modules[key] = m.jittable()
else:
self._convert_to_jittable(m)
return module

def jittable(self):
for key, m in self._modules.items():
self._modules[key] = self._convert_to_jittable(m)

def configure_optimizers(self):
return Adam(self.parameters(), lr=1e-3)

@staticmethod
def add_argparse_args(parser):
parser.add_argument("--num_layers", type=int, default=2)
parser.add_argument("--hidden_channels", type=int, default=128)
parser.add_argument("--heads", type=int, default=8)
parser.add_argument("--groups", type=int, default=16)
parser.add_argument("--dropout", type=float, default=0.8)
parser.add_argument("--cached", type=int, default=0)
parser.add_argument("--jit", default=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
parser.add_argument("--jit", default=True)
parser.add_argument("--jit", type=bool, default=True)

return parser

#################################
# Instantiate Functions #
#################################

def instantiate_datamodule(args):
datamodule = CoraDataset(
num_workers=args.num_workers,
batch_size=args.batch_size,
drop_last=args.drop_last,
pin_memory=args.pin_memory,
num_layers=args.num_layers,
)
return datamodule

def instantiate_model(args, datamodule):
model = DNAConvNet(
num_layers=args.num_layers,
hidden_channels=args.hidden_channels,
heads=args.heads,
groups=args.groups,
dropout=args.dropout,
**datamodule.hyper_parameters,
)
if args.jit:
model.jittable()

# Attached datamodule function to model
model.gather_data_and_convert_to_namedtuple = datamodule.gather_data_and_convert_to_namedtuple
return model

def get_single_batch(datamodule):
for batch in datamodule.test_dataloader():
return datamodule.gather_data_and_convert_to_namedtuple(batch, 0)

#######################
# Trainer Run #
#######################

def run(args):

nice_print("You are about to train a TorchScripted Pytorch Geometric Lightning model !")
nice_print(lightning_logo)

datamodule: LightningDataModule = instantiate_datamodule(args)
model: LightningModule = instantiate_model(args, datamodule)
trainer = Trainer.from_argparse_args(args)
trainer.fit(model, datamodule)
trainer.test()

batch = get_single_batch(datamodule)
model.to_torchscript(file_path="model_trace.pt",
method='script',
example_inputs=batch)

nice_print("Congratulations !")
nice_print("You trained your first TorchScripted Pytorch Geometric Lightning model !", last=True)

if __name__ == "__main__":
if not HAS_PYTORCH_GEOMETRIC:
print("Skip training. Pytorch Geometric isn't installed. Please, check README.md !")
else:
parser = ArgumentParser(description="Pytorch Geometric Example")
parser = Trainer.add_argparse_args(parser)
parser = CoraDataset.add_argparse_args(parser)
parser = DNAConvNet.add_argparse_args(parser)

cmd_line = '--max_epochs 1'.split(' ')

run(parser.parse_args(cmd_line))
30 changes: 30 additions & 0 deletions pl_examples/pytorch_ecosystem/pytorch_geometric/lightning.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
def nice_print(msg, last=False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this to PL_examples root?

print()
print("\033[0;35m" + msg + "\033[0m")
if last:
print()

lightning_logo = """
####
###########
####################
############################
#####################################
##############################################
######################### ###################
####################### ###################
#################### ####################
################## #####################
################ ######################
##################### #################
###################### ###################
##################### #####################
#################### #######################
################### #########################
##############################################
#####################################
############################
####################
##########
####
"""
Loading