Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clustering function for features #246

Closed
hspitzer opened this issue Jan 28, 2021 · 10 comments
Closed

clustering function for features #246

hspitzer opened this issue Jan 28, 2021 · 10 comments
Labels
enhancement ✨ New feature or request question ❓ Further information is requested

Comments

@hspitzer
Copy link
Collaborator

When writing tutorials, I find myself defining the same clustering function in several notebooks.

def cluster_features(features: pd.DataFrame, like=None):
    """Calculate leiden clustering of features.

    Specify filter of features using `like`.
    """
    # filter features
    if like is not None:
        features = features.filter(like=like)
    # create temporary adata to calculate the clustering
    adata = ad.AnnData(features)
    # adata.var_names_make_unique()
    # important - feature values are not scaled, so need to scale them before PCA
    sc.pp.scale(adata)
    # calculate leiden clustering
    sc.pp.pca(adata, n_comps=min(10, features.shape[1] - 1))
    sc.pp.neighbors(adata)
    sc.tl.leiden(adata)

    return adata.obs["leiden"]

This essentially does scaling+PCA+neighbors+leiden on a set of features.
I was wondering if we should include this in squidpy as a convenience function (maybe made a bit more general)? Or should we rather leave these sort of functions outside of squidpy? Is there a solution that I can avoid defining the same function in several notebooks? @giovp

@hspitzer hspitzer added enhancement ✨ New feature or request question ❓ Further information is requested labels Jan 28, 2021
@giovp
Copy link
Member

giovp commented Jan 28, 2021

good point. The problem is that there are many parameters that shouldbe exposed, since if true it's a simple function, it wraps quite complex processing steps (where it's key that the user might have to change paramters).

I would see a better option a function that takes a adata_parent and a key in obsm, and return an adata_child with same obs, var as adata_parent.

This is btw very related to the biggest problem of having multi modal data in anndata 😅 and we would not be the only ones facing this...

@hspitzer
Copy link
Collaborator Author

Yes, I agree that this wraps quite complicated processing steps. Maybe they should be explicitly visible for the user.
Its just that moving the obsm back and forth is a bit ugly.

Ok, sure so you are proposing a function moving obsm to X, right?
So this would translate to:

adata_features = move_obsm(adata, key="features")
sc.pp.scale(adata_features)
sc.pp.pca(adata_features)
sc.pp.neighbors(adata_features)
sc.tl.leiden(adata_features)

and then you can use adata_features directly for sc.pl.spatial because it already contains the gene clusters. Yeah, that could work.

To deal with features efficiently though, I need some sort of mechanism to select which rows of obsm to move (I do that with the like parameter in the function above).

@giovp
Copy link
Member

giovp commented Jan 28, 2021

Ok, sure so you are proposing a function moving obsm to X, right?
So this would translate to:

yes, something like that.

and then you can use adata_features directly for sc.pl.spatial because it already contains the gene clusters. Yeah, that could work.

yes indeed, in that case youd' have to copy over also adata.uns for images and related metadata

To deal with features efficiently though, I need some sort of mechanism to select which rows of obsm to move (I do that with the like parameter in the function above).

this features are what is moved in adata.X right? Wouldn't it work to just move everything?

@hspitzer
Copy link
Collaborator Author

To deal with features efficiently though, I need some sort of mechanism to select which rows of obsm to move (I do that with the like parameter in the function above).

this features are what is moved in adata.X right? Wouldn't it work to just move everything?

I usually extract all features at once because this is more efficient. In some of the tutorial though I am showing the clustering for only a subset of the features (e.g. only segmentation features or only texture features). For this we need to have a way to filter the pandas table. I can also do that manually, but at this point there is no need to me to use such an extraction function at all.

My point is that I'd like to keep the example notebooks as short as possible, and was wondering if we could make some utility functions that do these steps for us.

@giovp
Copy link
Member

giovp commented Jan 28, 2021

ok yes, then making an extractor similar to what we alredy have I think might makes sense. Maybe teh extractor we have can me modified? also understand now about selecting specific features

@hspitzer
Copy link
Collaborator Author

Yeah, it would be nice to use the extractor for this, but currently sc.pl.extract does obsm -> obs. We are talking about obsm -> X. I'm not sure if its best practice to put these two different functionalities in one function? We could have a "destination" argument that can be either obs or X?

@giovp
Copy link
Member

giovp commented Jan 28, 2021

I'm not sure if its best practice to put these two different functionalities in one function? We could have a "destination" argument that can be either obs or X?

I like this idea!

@giovp
Copy link
Member

giovp commented Feb 11, 2021

I htink this is now done with extract and several tutorials, will close this.

@giovp giovp closed this as completed Feb 11, 2021
@hspitzer
Copy link
Collaborator Author

Is it? Does extract now also extract obsm -> X? Would still be great to have. Not super urgent though.

@hspitzer hspitzer reopened this Feb 11, 2021
@giovp
Copy link
Member

giovp commented Jul 14, 2021

it would be cool to have a multiplex partition based on layers/obsm
see this scverse/scanpy#1818

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement ✨ New feature or request question ❓ Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants