-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
comments JOSS 241018 #58
Comments
@tcoroller , I think for action 3, we can just add the functions from helpers_introduction in blocks of the notebook? |
@melodiemonod I kinda disagree, especially since the Let me reply to the reviewers with those details and maybe highlighting the link to the folder in the text (maybe listing deps if needed) and if this is not suitable we can find another solution. What do you think? |
Or... I can do a hidden cell: https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/hide_code.html?context=cpdaas |
Maybe the simplest solution: # PyTorch boilerplate - see https://github.com/Novartis/torchsurv/blob/main/docs/notebooks/helpers_introduction.py
from helpers_introduction import Custom_dataset, plot_losses |
Sure, sounds good! Please do reply with to the reviewer with these proposed amendments and where to find the file and let's see if it's enough for them. Many thanks |
@tcoroller please review the example in paper.md |
Two big comments: I
Minor questions here:
I tinkered a lot with the example, but could not get results as good as yours. Mostly because you are testing on the trained datasets. Here's the same but now using a generator (sample as needed the distribution).. import torch
from torch.utils.data import DataLoader, Dataset
from torchsurv.loss import cox
from torchsurv.metrics.cindex import ConcordanceIndex
from torchsurv.metrics.auc import Auc
torch.manual_seed(42)
# 1. Simulate batch data. Ensure that your data are formatted in the same way
n_samples = 1000 # number of observations
batch_size = 32 # batch size
# Define the generator function
def tte_generator(batch_size: int, n_features: int = 10):
while True:
x = torch.randn(batch_size, n_features) # 10 features for each observation
weights = torch.randn(n_features) # weights associated with the features
event = torch.randint(
low=0, high=2, size=(batch_size,), dtype=torch.bool
) # event indicator
time = torch.abs(800 + x @ weights + torch.randn(batch_size) * 25) # event time
yield x, event, time
# 2. Define the PyTorch dataset class
class TTE_dataset(Dataset):
def __init__(self, generator: callable, batch_size: int):
self.generator = generator(batch_size=batch_size)
def __len__(self):
return n_samples // batch_size # Number of batches
def __getitem__(self, index):
return next(self.generator)
# 3. Define the backbone model on the log hazards.
class MyPyTorchCoxModel(torch.nn.Module):
def __init__(self):
super(MyPyTorchCoxModel, self).__init__()
self.fc = torch.nn.Linear(10, 1, bias=False) # Simple linear model
def forward(self, x):
return self.fc(x)
# 4. Instantiate the model, optimizer, dataset and dataloader
cox_model = MyPyTorchCoxModel()
optimizer = torch.optim.Adam(cox_model.parameters(), lr=0.01)
dataset = TTE_dataset(tte_generator, batch_size=batch_size)
dataloader = DataLoader(
dataset, batch_size=1, shuffle=True
) # Batch size of 1 because dataset yields batches
# 5. Training loop
for epoch in range(100):
for i, batch in enumerate(dataloader):
x, event, time = [t.squeeze() for t in batch] # Squeeze extra dimension
optimizer.zero_grad()
log_hzs = cox_model(x) # torch.Size([16, 1])
loss = cox.neg_partial_log_likelihood(
log_hzs, event, time, reduction="mean"
) # use mean log-likelihood
loss.backward()
optimizer.step()
# 6. Evaluate the model
# Generate a new batch of data to evaluate
new_batch = next(tte_generator(batch_size=n_samples))
x, event, time = [t.squeeze() for t in new_batch]
log_hzs = cox_model(x) # Trained log hazards
# AUC at time point 800
auc = Auc()
print(
"AUC:", auc(log_hzs, event, time, new_time=torch.tensor(800.0))
) # tensor([0.5531])
print("AUC Confidence Interval:", auc.confidence_interval()) # tensor([0.5220, 0.5841])
print("AUC p-value:", auc.p_value(alternative="greater")) # tensor([0.0004])
# C-index
cindex = ConcordanceIndex()
print("C-Index:", cindex(log_hzs, event, time)) # tensor(0.5465)
print(
"C-Index Confidence Interval:", cindex.confidence_interval()
) # tensor([0.4897, 0.6033])
print("C-Index p-value:", cindex.p_value(alternative="greater")) # tensor(0.0544) |
Hello @tcoroller! I just pushed a new update of the paper with the example (Here https://github.com/Novartis/torchsurv/blob/58-comments-joss-241018/paper/paper.md). And also added the example below for reference.
I used the JOSS branch as the source for the review branch. So after we are happy with the updates we can merge 58 -> 45. If you want, we can then update 45 with the update from main, 45 <- main?
You removed from the updated example the true underlying generative mechanism for time-to-event data, i.e., each individual is associated with a time-to-event (
The time-to-event ( Updated example: import torch
from torch.utils.data import DataLoader, Dataset
from torchsurv.loss import cox
from torchsurv.metrics.cindex import ConcordanceIndex
from torchsurv.metrics.auc import Auc
torch.manual_seed(42)
# 1. Simulate batch data.
# Ensure that your x, event and time are formatted in the same way
n_features = 10 # int, number of features per observation
time_end = torch.tensor(
2000.0
) # float, end of observational period after which all observations are censored
weights = (
torch.randn(n_features) * 5
) # float, weights associated with the features ~ normal(0, 5^2)
# Define the generator function
def tte_generator(batch_size: int):
while True:
x = torch.randn(batch_size, n_features) # features
mean_event_time, mean_censoring_time = 1000.0 + x @ weights, 1000.0
event_time = (
mean_event_time + torch.randn(batch_size) * 50
) # event time ~ normal(mean_event_time, 50^2)
censoring_time = torch.distributions.Exponential(
1 / mean_censoring_time
).sample(
(batch_size,)
) # censoring time ~ Exponential(mean = mean_censoring_time)
censoring_time = torch.minimum(
censoring_time, time_end
) # truncate censoring time to time_end
event = (event_time <= censoring_time).bool() # event indicator
time = torch.minimum(event_time, censoring_time) # observed time
yield x, event, time
# 2. Define the PyTorch dataset class
class TTE_dataset(Dataset):
def __init__(self, generator: callable, batch_size: int):
self.batch_size = batch_size
self.generatated_data = generator(batch_size=batch_size)
def __len__(self):
return self.batch_size
def __getitem__(self, index):
return next(self.generatated_data)
# 3. Define the backbone model on the log hazards.
class MyPyTorchCoxModel(torch.nn.Module):
def __init__(self):
super(MyPyTorchCoxModel, self).__init__()
self.fc = torch.nn.Linear(10, 1, bias=False) # Simple linear model
def forward(self, x):
return self.fc(x)
# 4. Instantiate the model, optimizer, dataset and dataloader
cox_model = MyPyTorchCoxModel()
optimizer = torch.optim.Adam(cox_model.parameters(), lr=0.01)
batch_size = 64 # int, batch size
dataset = TTE_dataset(tte_generator, batch_size=batch_size)
dataloader = DataLoader(
dataset, batch_size=1, shuffle=True
) # Batch size of 1 because dataset yields batches
# 5. Training loop
for epoch in range(100):
for i, batch in enumerate(dataloader):
x, event, time = [t.squeeze() for t in batch] # Squeeze extra dimension
optimizer.zero_grad()
log_hzs = cox_model(x) # torch.Size([batch_size, 1])
loss = cox.neg_partial_log_likelihood(log_hzs, event, time)
loss.backward()
optimizer.step()
# 6. Evaluate the model
n_samples_test = 1000 # int, number of observations in test set
data_test = next(tte_generator(batch_size=n_samples_test))
x, event, time = [t.squeeze() for t in data_test] # test set
log_hzs = cox_model(x) # log hazards evaluated on test set
# AUC at time point 1000
auc = Auc()
print(
"AUC:", auc(log_hzs, event, time, new_time=torch.tensor(1000.0))
) # tensor([0.5902])
print("AUC Confidence Interval:", auc.confidence_interval()) # tensor([0.5623, 0.6180])
print("AUC p-value:", auc.p_value(alternative="greater")) # tensor([0.])
# C-index
cindex = ConcordanceIndex()
print("C-Index:", cindex(log_hzs, event, time)) # tensor(0.5774)
print(
"C-Index Confidence Interval:", cindex.confidence_interval()
) # tensor([0.5086, 0.6463])
print("C-Index p-value:", cindex.p_value(alternative="greater")) # tensor(0.0138) |
Nice, it ran with me - updating the output and linting: import torch
from torch.utils.data import DataLoader, Dataset
from torchsurv.loss import cox
from torchsurv.metrics.cindex import ConcordanceIndex
from torchsurv.metrics.auc import Auc
torch.manual_seed(42)
# 1. Simulate batch data.
# Ensure that your x, event and time are formatted in the same way
n_features = 10 # int, number of features per observation
time_end = torch.tensor(
2000.0
) # float, end of observational period after which all observations are censored
weights = (
torch.randn(n_features) * 5
) # float, weights associated with the features ~ normal(0, 5^2)
# Define the generator function
def tte_generator(batch_size: int):
while True:
x = torch.randn(batch_size, n_features) # features
mean_event_time, mean_censoring_time = 1000.0 + x @ weights, 1000.0
event_time = (
mean_event_time + torch.randn(batch_size) * 50
) # event time ~ normal(mean_event_time, 50^2)
censoring_time = torch.distributions.Exponential(
1 / mean_censoring_time
).sample(
(batch_size,)
) # censoring time ~ Exponential(mean = mean_censoring_time)
censoring_time = torch.minimum(
censoring_time, time_end
) # truncate censoring time to time_end
event = (event_time <= censoring_time).bool() # event indicator
time = torch.minimum(event_time, censoring_time) # observed time
yield x, event, time
# 2. Define the PyTorch dataset class
class TTE_dataset(Dataset):
def __init__(self, generator: callable, batch_size: int):
self.batch_size = batch_size
self.generatated_data = generator(batch_size=batch_size)
def __len__(self):
return self.batch_size
def __getitem__(self, index):
return next(self.generatated_data)
# 3. Define the backbone model on the log hazards.
class MyPyTorchCoxModel(torch.nn.Module):
def __init__(self):
super(MyPyTorchCoxModel, self).__init__()
self.fc = torch.nn.Linear(10, 1, bias=False) # Simple linear model
def forward(self, x):
return self.fc(x)
# 4. Instantiate the model, optimizer, dataset and dataloader
cox_model = MyPyTorchCoxModel()
optimizer = torch.optim.Adam(cox_model.parameters(), lr=0.01)
batch_size = 64 # int, batch size
dataset = TTE_dataset(tte_generator, batch_size=batch_size)
dataloader = DataLoader(
dataset, batch_size=1, shuffle=True
) # Batch size of 1 because dataset yields batches
# 5. Training loop
for epoch in range(100):
for i, batch in enumerate(dataloader):
x, event, time = [t.squeeze() for t in batch] # Squeeze extra dimension
optimizer.zero_grad()
log_hzs = cox_model(x) # torch.Size([batch_size, 1])
loss = cox.neg_partial_log_likelihood(log_hzs, event, time)
loss.backward()
optimizer.step()
# 6. Evaluate the model
n_samples_test = 1000 # int, number of observations in test set
data_test = next(tte_generator(batch_size=n_samples_test))
x, event, time = [t.squeeze() for t in data_test] # test set
log_hzs = cox_model(x) # log hazards evaluated on test set
# AUC at time point 1000
auc = Auc()
print(
"AUC:", auc(log_hzs, event, time, new_time=torch.tensor(1000.0))
) # tensor([0.6278])
print("AUC Confidence Interval:", auc.confidence_interval()) # tensor([0.5998, 0.6558])
print("AUC p-value:", auc.p_value(alternative="greater")) # tensor([0.])
# C-index
cindex = ConcordanceIndex()
print("C-Index:", cindex(log_hzs, event, time)) # tensor(0.5913)
print(
"C-Index Confidence Interval:", cindex.confidence_interval()
) # tensor([0.5222, 0.6604])
print("C-Index p-value:", cindex.p_value(alternative="greater")) # tensor(0.0048) |
I wonder if we are better off just making a new PR from main and just carry over the I can do this piece |
@melodiemonod please check #59 to make sure I didn't forgot anything. |
@tcoroller looks all good to me! I just left one comment about a typo (f6d624c#r1811249877), do you think you could fix it before merging? Also, for the JOSS submission, we had to specify the branch to do the review on, which for now was 45-XX. The bot is looking in this branch to generate the paper proof. We need to ask the editor how we can change this "branch to review" to main. |
Let me close this comment and merge to main. I will also tell the reviewer that they should be looking at |
Address WeakCha's comments from 241018
openjournals/joss-reviews#7341 (comment)
The text was updated successfully, but these errors were encountered: