Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: HIV transmission for short term partners #70

Merged
merged 19 commits into from
Oct 13, 2022

Conversation

mmcleod89
Copy link
Collaborator

Still needs tests to check that things are correct, which might benefit from some refactoring. Not all factors for HIV transmission are accounted for yet, as they don't exist in the model, but I did add a dummy initialisation for viral load groups since that was the most complicated effect and central to the calculation rather than just being a post-factor.

We will need a data file for transmission properties too but I think this should be a separate PR

Copy link
Contributor

@ageorgou ageorgou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've not read through the whole thing yet, only making some quick comments.

I need to look at the transmission probabilities more closely because it seems we get the ratios the wrong way around, although that's probably me reading it wrong!

Does applying stp_HIV_transmission to each person individually give okay performance? (from what we can tell, anyway)

We may need to rethink the age groups and if they should be in common, like the sex type. Otherwise pass a reference to the population into the module?

Comment on lines 47 to 48
sub_pop_indices = population.data.index[(population.data[col.SEX]==sex) & (population.data[col.SEX_MIX_AGE_GROUP] == age_group)]
sub_pop = population.data.loc[sub_pop_indices]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can get this in one step?

Suggested change
sub_pop_indices = population.data.index[(population.data[col.SEX]==sex) & (population.data[col.SEX_MIX_AGE_GROUP] == age_group)]
sub_pop = population.data.loc[sub_pop_indices]
sub_pop = population.data.loc[(population.data[col.SEX]==sex) & (population.data[col.SEX_MIX_AGE_GROUP] == age_group)]

sub_pop_indices = population.data.index[(population.data[col.SEX]==sex) & (population.data[col.SEX_MIX_AGE_GROUP] == age_group)]
sub_pop = population.data.loc[sub_pop_indices]
n_stp_total = sum(sub_pop[col.NUM_PARTNERS]) # total number of people partnered to people in this group
n_infected = sum(sub_pop.loc[sub_pop[col.HIV_STATUS]][col.NUM_PARTNERS]) # num people parters to HIV+ people in this group
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slightly more idiomatic and (for me) a bit easier to read?

Suggested change
n_infected = sum(sub_pop.loc[sub_pop[col.HIV_STATUS]][col.NUM_PARTNERS]) # num people parters to HIV+ people in this group
n_infected = sum(sub_pop.loc[sub_pop[col.HIV_STATUS], col.NUM_PARTNERS]) # num people parters to HIV+ people in this group

def update_HIV_status(self, population: pd.DataFrame):
def update_partner_risk_vectors(self, population):
"""calculate the risk factor associated with each sex and age group"""
# Should we be using for loops here or can we do better?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to do this with a groupby, which would give us the subpopulations directly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I thought about this using a groupby but I don't know if it can be done because we need to access HIV status and number of partners, but we don't want to group by these, only by sex and age group. I'm not sure how we can access these additional fields if we use a group by

sub_pop_indices = population.data.index[(population.data[col.SEX]==sex) & (population.data[col.SEX_MIX_AGE_GROUP] == age_group)]
sub_pop = population.data.loc[sub_pop_indices]
n_stp_total = sum(sub_pop[col.NUM_PARTNERS]) # total number of people partnered to people in this group
n_infected = sum(sub_pop.loc[sub_pop[col.HIV_STATUS]][col.NUM_PARTNERS]) # num people parters to HIV+ people in this group
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I'm misunderstanding, would this be better called n_partners_infected or n_stp_of_infected or similar?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you're correct, that name should have been changed when I corrected this calculation (I mistakenly used the number of people with HIV at first instead of the number of people coupled to someone with HIV) so will change this


def set_dummy_viral_load(self, population):
"""Dummy function to set viral load until this part of the code has been implemented properly"""
population.data[col.VIRAL_LOAD_GROUP] = rng.choice(7, population.size)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add this variable to #48? Or make an issue somewhere for properly implementing it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I very much forgot #48 existed, so yes I will add this to it.

def stp_HIV_transmission(self, person):
# TODO: Add circumcision, STIs etc.
"""Returns True if HIV transmission occurs, and False otherwise"""
stp_viral_groups = np.array([rng.choice(7, p=self.stp_viral_group_rate[1 - person[col.SEX]][age_group]) for age_group in person[col.STP_AGE_GROUPS]])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of the 1 - person[col.SEX], we could add a method to the SexType enum (or a convenience function) that gives the "opposite" sex.

# TODO: Add circumcision, STIs etc.
"""Returns True if HIV transmission occurs, and False otherwise"""
stp_viral_groups = np.array([rng.choice(7, p=self.stp_viral_group_rate[1 - person[col.SEX]][age_group]) for age_group in person[col.STP_AGE_GROUPS]])
HIV_probabilities = np.array([self.stp_HIV_rate[1 - person[col.SEX]][age_group] for age_group in person[col.STP_AGE_GROUPS]])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SAS code uses "rate" and "probability" almost interchangeably, resulting in some confusion. It'd be worth being more consistent from the start in the new model - if we can understand what each value is...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point; I suppose the most correct way to use it would be to use rate for things which need to be time-step adjusted, so I which variables you call rate or probability will depend on at which point in the calculation we take the length of the timestep into account.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment this doesn't take timestep length into account at all, so there isn't really a difference between the notion of a rate or a probability yet

stp_viral_groups = np.array([rng.choice(7, p=self.stp_viral_group_rate[1 - person[col.SEX]][age_group]) for age_group in person[col.STP_AGE_GROUPS]])
HIV_probabilities = np.array([self.stp_HIV_rate[1 - person[col.SEX]][age_group] for age_group in person[col.STP_AGE_GROUPS]])
viral_transmission_probabilities = np.array([max(0, rng.normal(self.transmission_means[group], self.transmission_sigmas[group])) for group in stp_viral_groups])
if person[col.SEX] == SexType.Female:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preferred per the docs on enums:

Suggested change
if person[col.SEX] == SexType.Female:
if person[col.SEX] is SexType.Female:

self.update_partner_risk_vectors(population)
HIV_neg_idx = population.data.index[(population.data[col.HIV_STATUS]==False) & (population.data[col.NUM_PARTNERS]>0)]
sub_pop = population.data.loc[HIV_neg_idx]
population.data.loc[HIV_neg_idx, col.HIV_STATUS] = sub_pop.apply(self.stp_HIV_transmission, axis=1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, this seems like a prime candidate for vectorization if we can. But I guess the value of STP_AGE_GROUPS will be different for everyone? Perhaps we could combine the two steps (choice of partner age groups and probability of transmission) somehow.

Copy link
Contributor

@ageorgou ageorgou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a bug in the infection probabilities, unless I'm misunderstanding - otherwise looks fine though!

Comment on lines +76 to +77
# transform group fails when only grouped by one field
# appears to change the type of the object passed to the function!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noting that this should be fixed as part of #71, so we can update the following lines when merged.

Comment on lines +78 to +80
male_HIV_status = pop.transform_group([col.SEX_MIX_AGE_GROUP, col.SEX], lambda x, y: np.array(
[True] * (2 * N_group // HIV_ratio) +
[False] * (N_group - 2*N_group // HIV_ratio)), False, males)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I'm confused by what the (lambda) function does here: neither argument is used?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just because the function passed to transform group has to match the number of arguments to the variables grouped by, which is something we can potentially change by making transform_group more flexible

print(n_new_male, exp_new_male, sig)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove when done 😁

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole test was already removed in a62a146

@@ -35,6 +35,8 @@ class SexType(IntEnum):
Male = 0
Female = 1

def opposite_sex(sex: SexType):
return (1 - sex)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I would prefer to keep the abstraction here, so something like

return SexType.Male if sex is SexType.Female else SexType.Female

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, but that doesn't work if called on a whole column, does it? Hm. There's probably a way to vectorise this while preserving the abstraction but I guess it's not crucial.

infection_prob[i] *= self.fold_change_yw
else:
infection_prob[i] *= self.fold_change_w
return infection_prob
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be out of the loop? (currently returning after computing probability for the first partner only?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it should be out of the loop !

Comment on lines 65 to 67
def get_infection_prob(self, sex, age, n_partners, stp_age_groups):
# Slow example that avoid repeating the iterations over partners three time by putting them as part of
# one for loop, but for loops in python will be slow.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it'd be nice to be able to vectorise this, but having to repeat it for multiple partners makes it tricky... Will try to think of something for a future version.

population.data[col.SEX_MIX_AGE_GROUP] == age_group)]
# total number of people partnered to people in this group
n_stp_total = sum(sub_pop[col.NUM_PARTNERS])
# num people partered to HIV+ people in this group
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# num people partered to HIV+ people in this group
# num people partnered to HIV+ people in this group

Comment on lines 96 to 97
stp_viral_groups = np.array([rng.choice(7, p=self.stp_viral_group_rate[opposite_sex(
person[col.SEX])][age_group]) for age_group in person[col.STP_AGE_GROUPS]])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find something like the following a more readable way to break it up:

Suggested change
stp_viral_groups = np.array([rng.choice(7, p=self.stp_viral_group_rate[opposite_sex(
person[col.SEX])][age_group]) for age_group in person[col.STP_AGE_GROUPS]])
stp_viral_groups = np.array([
rng.choice(7, p=self.stp_viral_group_rate[opposite_sex(person[col.SEX])][age_group])
for age_group in person[col.STP_AGE_GROUPS]
])

but that's personal preference! (and the second line is actually longer this way... though less than 100)

@mmcleod89
Copy link
Collaborator Author

test_rred keeps failing on this and another branch but not on my machine; is this because of the pandas issue we saw before?

@ageorgou
Copy link
Contributor

No, it's a pandas bug in 1.5.0, see also this comment. We're pinning it to 1.3.x in #71 but haven't added that here.

@mmcleod89 mmcleod89 merged commit faf0c3d into development Oct 13, 2022
@mmcleod89 mmcleod89 deleted the stp-HIV-transmission branch July 13, 2023 13:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants