-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: HIV transmission for short term partners #70
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've not read through the whole thing yet, only making some quick comments.
I need to look at the transmission probabilities more closely because it seems we get the ratios the wrong way around, although that's probably me reading it wrong!
Does applying stp_HIV_transmission
to each person individually give okay performance? (from what we can tell, anyway)
We may need to rethink the age groups and if they should be in common
, like the sex type. Otherwise pass a reference to the population into the module?
src/hivpy/hiv_status.py
Outdated
sub_pop_indices = population.data.index[(population.data[col.SEX]==sex) & (population.data[col.SEX_MIX_AGE_GROUP] == age_group)] | ||
sub_pop = population.data.loc[sub_pop_indices] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can get this in one step?
sub_pop_indices = population.data.index[(population.data[col.SEX]==sex) & (population.data[col.SEX_MIX_AGE_GROUP] == age_group)] | |
sub_pop = population.data.loc[sub_pop_indices] | |
sub_pop = population.data.loc[(population.data[col.SEX]==sex) & (population.data[col.SEX_MIX_AGE_GROUP] == age_group)] |
src/hivpy/hiv_status.py
Outdated
sub_pop_indices = population.data.index[(population.data[col.SEX]==sex) & (population.data[col.SEX_MIX_AGE_GROUP] == age_group)] | ||
sub_pop = population.data.loc[sub_pop_indices] | ||
n_stp_total = sum(sub_pop[col.NUM_PARTNERS]) # total number of people partnered to people in this group | ||
n_infected = sum(sub_pop.loc[sub_pop[col.HIV_STATUS]][col.NUM_PARTNERS]) # num people parters to HIV+ people in this group |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slightly more idiomatic and (for me) a bit easier to read?
n_infected = sum(sub_pop.loc[sub_pop[col.HIV_STATUS]][col.NUM_PARTNERS]) # num people parters to HIV+ people in this group | |
n_infected = sum(sub_pop.loc[sub_pop[col.HIV_STATUS], col.NUM_PARTNERS]) # num people parters to HIV+ people in this group |
src/hivpy/hiv_status.py
Outdated
def update_HIV_status(self, population: pd.DataFrame): | ||
def update_partner_risk_vectors(self, population): | ||
"""calculate the risk factor associated with each sex and age group""" | ||
# Should we be using for loops here or can we do better? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be able to do this with a groupby, which would give us the subpopulations directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I thought about this using a groupby
but I don't know if it can be done because we need to access HIV status and number of partners, but we don't want to group by these, only by sex and age group. I'm not sure how we can access these additional fields if we use a group by
src/hivpy/hiv_status.py
Outdated
sub_pop_indices = population.data.index[(population.data[col.SEX]==sex) & (population.data[col.SEX_MIX_AGE_GROUP] == age_group)] | ||
sub_pop = population.data.loc[sub_pop_indices] | ||
n_stp_total = sum(sub_pop[col.NUM_PARTNERS]) # total number of people partnered to people in this group | ||
n_infected = sum(sub_pop.loc[sub_pop[col.HIV_STATUS]][col.NUM_PARTNERS]) # num people parters to HIV+ people in this group |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless I'm misunderstanding, would this be better called n_partners_infected
or n_stp_of_infected
or similar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes you're correct, that name should have been changed when I corrected this calculation (I mistakenly used the number of people with HIV at first instead of the number of people coupled to someone with HIV) so will change this
|
||
def set_dummy_viral_load(self, population): | ||
"""Dummy function to set viral load until this part of the code has been implemented properly""" | ||
population.data[col.VIRAL_LOAD_GROUP] = rng.choice(7, population.size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add this variable to #48? Or make an issue somewhere for properly implementing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I very much forgot #48 existed, so yes I will add this to it.
src/hivpy/hiv_status.py
Outdated
def stp_HIV_transmission(self, person): | ||
# TODO: Add circumcision, STIs etc. | ||
"""Returns True if HIV transmission occurs, and False otherwise""" | ||
stp_viral_groups = np.array([rng.choice(7, p=self.stp_viral_group_rate[1 - person[col.SEX]][age_group]) for age_group in person[col.STP_AGE_GROUPS]]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of the 1 - person[col.SEX]
, we could add a method to the SexType
enum (or a convenience function) that gives the "opposite" sex.
src/hivpy/hiv_status.py
Outdated
# TODO: Add circumcision, STIs etc. | ||
"""Returns True if HIV transmission occurs, and False otherwise""" | ||
stp_viral_groups = np.array([rng.choice(7, p=self.stp_viral_group_rate[1 - person[col.SEX]][age_group]) for age_group in person[col.STP_AGE_GROUPS]]) | ||
HIV_probabilities = np.array([self.stp_HIV_rate[1 - person[col.SEX]][age_group] for age_group in person[col.STP_AGE_GROUPS]]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The SAS code uses "rate" and "probability" almost interchangeably, resulting in some confusion. It'd be worth being more consistent from the start in the new model - if we can understand what each value is...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good point; I suppose the most correct way to use it would be to use rate for things which need to be time-step adjusted, so I which variables you call rate or probability will depend on at which point in the calculation we take the length of the timestep into account.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment this doesn't take timestep length into account at all, so there isn't really a difference between the notion of a rate or a probability yet
src/hivpy/hiv_status.py
Outdated
stp_viral_groups = np.array([rng.choice(7, p=self.stp_viral_group_rate[1 - person[col.SEX]][age_group]) for age_group in person[col.STP_AGE_GROUPS]]) | ||
HIV_probabilities = np.array([self.stp_HIV_rate[1 - person[col.SEX]][age_group] for age_group in person[col.STP_AGE_GROUPS]]) | ||
viral_transmission_probabilities = np.array([max(0, rng.normal(self.transmission_means[group], self.transmission_sigmas[group])) for group in stp_viral_groups]) | ||
if person[col.SEX] == SexType.Female: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Preferred per the docs on enums:
if person[col.SEX] == SexType.Female: | |
if person[col.SEX] is SexType.Female: |
src/hivpy/hiv_status.py
Outdated
self.update_partner_risk_vectors(population) | ||
HIV_neg_idx = population.data.index[(population.data[col.HIV_STATUS]==False) & (population.data[col.NUM_PARTNERS]>0)] | ||
sub_pop = population.data.loc[HIV_neg_idx] | ||
population.data.loc[HIV_neg_idx, col.HIV_STATUS] = sub_pop.apply(self.stp_HIV_transmission, axis=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, this seems like a prime candidate for vectorization if we can. But I guess the value of STP_AGE_GROUPS
will be different for everyone? Perhaps we could combine the two steps (choice of partner age groups and probability of transmission) somehow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's a bug in the infection probabilities, unless I'm misunderstanding - otherwise looks fine though!
# transform group fails when only grouped by one field | ||
# appears to change the type of the object passed to the function! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just noting that this should be fixed as part of #71, so we can update the following lines when merged.
male_HIV_status = pop.transform_group([col.SEX_MIX_AGE_GROUP, col.SEX], lambda x, y: np.array( | ||
[True] * (2 * N_group // HIV_ratio) + | ||
[False] * (N_group - 2*N_group // HIV_ratio)), False, males) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although I'm confused by what the (lambda) function does here: neither argument is used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just because the function passed to transform group has to match the number of arguments to the variables grouped by, which is something we can potentially change by making transform_group more flexible
src/tests/test_hiv_status.py
Outdated
print(n_new_male, exp_new_male, sig) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove when done 😁
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This whole test was already removed in a62a146
@@ -35,6 +35,8 @@ class SexType(IntEnum): | |||
Male = 0 | |||
Female = 1 | |||
|
|||
def opposite_sex(sex: SexType): | |||
return (1 - sex) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I would prefer to keep the abstraction here, so something like
return SexType.Male if sex is SexType.Female else SexType.Female
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, but that doesn't work if called on a whole column, does it? Hm. There's probably a way to vectorise this while preserving the abstraction but I guess it's not crucial.
src/hivpy/hiv_status.py
Outdated
infection_prob[i] *= self.fold_change_yw | ||
else: | ||
infection_prob[i] *= self.fold_change_w | ||
return infection_prob |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be out of the loop? (currently returning after computing probability for the first partner only?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it should be out of the loop !
src/hivpy/hiv_status.py
Outdated
def get_infection_prob(self, sex, age, n_partners, stp_age_groups): | ||
# Slow example that avoid repeating the iterations over partners three time by putting them as part of | ||
# one for loop, but for loops in python will be slow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it'd be nice to be able to vectorise this, but having to repeat it for multiple partners makes it tricky... Will try to think of something for a future version.
population.data[col.SEX_MIX_AGE_GROUP] == age_group)] | ||
# total number of people partnered to people in this group | ||
n_stp_total = sum(sub_pop[col.NUM_PARTNERS]) | ||
# num people partered to HIV+ people in this group |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# num people partered to HIV+ people in this group | |
# num people partnered to HIV+ people in this group |
src/hivpy/hiv_status.py
Outdated
stp_viral_groups = np.array([rng.choice(7, p=self.stp_viral_group_rate[opposite_sex( | ||
person[col.SEX])][age_group]) for age_group in person[col.STP_AGE_GROUPS]]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find something like the following a more readable way to break it up:
stp_viral_groups = np.array([rng.choice(7, p=self.stp_viral_group_rate[opposite_sex( | |
person[col.SEX])][age_group]) for age_group in person[col.STP_AGE_GROUPS]]) | |
stp_viral_groups = np.array([ | |
rng.choice(7, p=self.stp_viral_group_rate[opposite_sex(person[col.SEX])][age_group]) | |
for age_group in person[col.STP_AGE_GROUPS] | |
]) |
but that's personal preference! (and the second line is actually longer this way... though less than 100)
test_rred keeps failing on this and another branch but not on my machine; is this because of the pandas issue we saw before? |
No, it's a pandas bug in 1.5.0, see also this comment. We're pinning it to 1.3.x in #71 but haven't added that here. |
Still needs tests to check that things are correct, which might benefit from some refactoring. Not all factors for HIV transmission are accounted for yet, as they don't exist in the model, but I did add a dummy initialisation for viral load groups since that was the most complicated effect and central to the calculation rather than just being a post-factor.
We will need a data file for transmission properties too but I think this should be a separate PR