clustering function for features #246

hspitzer · 2021-01-28T09:04:42Z

When writing tutorials, I find myself defining the same clustering function in several notebooks.

def cluster_features(features: pd.DataFrame, like=None):
    """Calculate leiden clustering of features.

    Specify filter of features using `like`.
    """
    # filter features
    if like is not None:
        features = features.filter(like=like)
    # create temporary adata to calculate the clustering
    adata = ad.AnnData(features)
    # adata.var_names_make_unique()
    # important - feature values are not scaled, so need to scale them before PCA
    sc.pp.scale(adata)
    # calculate leiden clustering
    sc.pp.pca(adata, n_comps=min(10, features.shape[1] - 1))
    sc.pp.neighbors(adata)
    sc.tl.leiden(adata)

    return adata.obs["leiden"]

This essentially does scaling+PCA+neighbors+leiden on a set of features.
I was wondering if we should include this in squidpy as a convenience function (maybe made a bit more general)? Or should we rather leave these sort of functions outside of squidpy? Is there a solution that I can avoid defining the same function in several notebooks? @giovp

The text was updated successfully, but these errors were encountered:

giovp · 2021-01-28T09:20:47Z

good point. The problem is that there are many parameters that shouldbe exposed, since if true it's a simple function, it wraps quite complex processing steps (where it's key that the user might have to change paramters).

I would see a better option a function that takes a adata_parent and a key in obsm, and return an adata_child with same obs, var as adata_parent.

This is btw very related to the biggest problem of having multi modal data in anndata 😅 and we would not be the only ones facing this...

hspitzer · 2021-01-28T09:29:28Z

Yes, I agree that this wraps quite complicated processing steps. Maybe they should be explicitly visible for the user.
Its just that moving the obsm back and forth is a bit ugly.

Ok, sure so you are proposing a function moving obsm to X, right?
So this would translate to:

adata_features = move_obsm(adata, key="features")
sc.pp.scale(adata_features)
sc.pp.pca(adata_features)
sc.pp.neighbors(adata_features)
sc.tl.leiden(adata_features)

and then you can use adata_features directly for sc.pl.spatial because it already contains the gene clusters. Yeah, that could work.

To deal with features efficiently though, I need some sort of mechanism to select which rows of obsm to move (I do that with the like parameter in the function above).

giovp · 2021-01-28T09:54:38Z

Ok, sure so you are proposing a function moving obsm to X, right?
So this would translate to:

yes, something like that.

and then you can use adata_features directly for sc.pl.spatial because it already contains the gene clusters. Yeah, that could work.

yes indeed, in that case youd' have to copy over also adata.uns for images and related metadata

To deal with features efficiently though, I need some sort of mechanism to select which rows of obsm to move (I do that with the like parameter in the function above).

this features are what is moved in adata.X right? Wouldn't it work to just move everything?

hspitzer · 2021-01-28T09:59:45Z

To deal with features efficiently though, I need some sort of mechanism to select which rows of obsm to move (I do that with the like parameter in the function above).

this features are what is moved in adata.X right? Wouldn't it work to just move everything?

I usually extract all features at once because this is more efficient. In some of the tutorial though I am showing the clustering for only a subset of the features (e.g. only segmentation features or only texture features). For this we need to have a way to filter the pandas table. I can also do that manually, but at this point there is no need to me to use such an extraction function at all.

My point is that I'd like to keep the example notebooks as short as possible, and was wondering if we could make some utility functions that do these steps for us.

giovp · 2021-01-28T10:38:06Z

ok yes, then making an extractor similar to what we alredy have I think might makes sense. Maybe teh extractor we have can me modified? also understand now about selecting specific features

hspitzer · 2021-01-28T10:46:13Z

Yeah, it would be nice to use the extractor for this, but currently sc.pl.extract does obsm -> obs. We are talking about obsm -> X. I'm not sure if its best practice to put these two different functionalities in one function? We could have a "destination" argument that can be either obs or X?

giovp · 2021-01-28T11:25:00Z

I'm not sure if its best practice to put these two different functionalities in one function? We could have a "destination" argument that can be either obs or X?

I like this idea!

giovp · 2021-02-11T14:01:27Z

I htink this is now done with extract and several tutorials, will close this.

hspitzer · 2021-02-11T17:23:16Z

Is it? Does extract now also extract obsm -> X? Would still be great to have. Not super urgent though.

giovp · 2021-07-14T11:26:00Z

it would be cool to have a multiplex partition based on layers/obsm
see this scverse/scanpy#1818

hspitzer added enhancement ✨ New feature or request question ❓ Further information is requested labels Jan 28, 2021

giovp closed this as completed Feb 11, 2021

hspitzer reopened this Feb 11, 2021

giovp mentioned this issue Jul 14, 2021

clustering accounting for spatial coordinates #13

Open

giovp closed this as completed Oct 18, 2022

scverse-bot mentioned this issue Nov 28, 2023

Update template to v0.3.0 #773

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

clustering function for features #246

clustering function for features #246

hspitzer commented Jan 28, 2021

giovp commented Jan 28, 2021

hspitzer commented Jan 28, 2021

giovp commented Jan 28, 2021

hspitzer commented Jan 28, 2021

giovp commented Jan 28, 2021

hspitzer commented Jan 28, 2021

giovp commented Jan 28, 2021

giovp commented Feb 11, 2021

hspitzer commented Feb 11, 2021

giovp commented Jul 14, 2021

clustering function for features #246

clustering function for features #246

Comments

hspitzer commented Jan 28, 2021

giovp commented Jan 28, 2021

hspitzer commented Jan 28, 2021

giovp commented Jan 28, 2021

hspitzer commented Jan 28, 2021

giovp commented Jan 28, 2021

hspitzer commented Jan 28, 2021

giovp commented Jan 28, 2021

giovp commented Feb 11, 2021

hspitzer commented Feb 11, 2021

giovp commented Jul 14, 2021