-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
clustering accounting for spatial coordinates #13
Comments
Are you working on a method that creates some kind of adjacency matrices for the seqfish data? So especially split by 'Field of View'? So actually like 6 or 7 adjacency matrices? |
@Koncopd is working on them! |
Still mixed feelings about this, let's keep it open |
related to #246 and scverse/scanpy#1818 |
I'm still quite tempted to add this although only use case I see is when the spatial graph is not a grid (but has some interesting topology). also, this should probably be in scanpy (or muon ? ). |
The idea was to include node feature information into the clustering, right? Then it could also be interesting for grid graphs, no? only question is whether people are interested in spatial pieces/clusters of homogeneous cell type patterns |
mmh that could also be a way to do it but in scverse/scanpy#1818 the idea is to do multiplex partitioning with the knn from gexp and spatial graph jointly (without considering the node features). in case of features yes (could be image features?) and it would be interesting nonetheless (and even doable by doing joint partitioning of knn from gexp and image features). |
What do you want to achieve by including spatial information in the clustering? I can think of two reasons to do this:
I see a obvious use cases for 1, but I'm not sure you need a clustering for this. You should just be able to break up your non-spatial clustering results by finding connected components in the spatial graph. This would be like: ExamplesetupJust getting to an AnnData I can do stuff with import scanpy as sc
import squidpy as sq
import numpy as np, pandas as pd
from scipy import sparse
import seaborn as sns
from matplotlib import pyplot as plt
plt.rcParams["figure.figsize"] = (12, 8)
adata = sc.datasets.visium_sge("V1_Breast_Cancer_Block_A_Section_1")
adata.var_names_make_unique()
adata.var["mito"] = adata.var_names.str.startswith("MT-")
sc.pp.calculate_qc_metrics(adata, qc_vars=["mito"], inplace=True)
sc.pp.filter_genes(adata, min_counts=1)
adata.layers["counts"] = adata.X.copy()
sc.pp.normalize_total(adata)
sc.pp.log1p(adata)
sc.pp.highly_variable_genes(adata, flavor="seurat_v3", layer="counts", n_top_genes=1000)
sc.pp.pca(adata)
sc.pp.neighbors(adata) Subsetting clusters by spatial neighborsq.gr.spatial_neighbors(adata)
sc.tl.leiden(adata, resolution=0.5)
def find_per_cluster_components(adata, obs_key, graph_key):
clusters = adata.obs[obs_key].astype("category")
graph = adata.obsp[graph_key]
components = -np.ones(adata.n_obs, dtype=int)
new_labels = pd.DataFrame({"cluster": clusters, "component": np.zeros(adata.n_obs, dtype=int)})
for k, indices in adata.obs.groupby(obs_key).indices.items():
components[indices] = sparse.csgraph.connected_components(adata[indices].obsp[graph_key])[1]
new_labels = pd.DataFrame({"cluster": clusters, "components": components})
return new_labels
df = find_per_cluster_components(adata, "leiden", "spatial_connectivities")
# Kinda gross
subgroups = pd.Series(-np.ones(adata.n_obs, dtype=int), index=adata.obs_names)
subgroups.loc[adata.obs.query("leiden == '6'").index] = df["components"].loc[adata.obs.query("leiden == '6'").index]
adata.obs["to_plot"] = pd.Categorical.from_codes(codes=subgroups, categories=[str(x) for x in range(subgroups.max() + 1)]) One selected cluster, split by connected components on the spatial graph. I'm not so sure how useful 2 is, but I could definitely be missing something. |
that's really cool @ivirshup ! it'd be a very handy function. re 2. , I think it's still be useful and would be a purely "data driven" (not necessarily better) way to achieve 1. That'd be done with multi-graph partitioning (native in leidenalg) where the knngraph from gexp and the spatial graphs are inputted. This is particulary useful for non-visium data where the graph actually has an interesting topology. |
Could this problem also be thought of as "expression driven segmentation"? I'm just a little unsure of the case where you want an output like the second plot, but without knowing those were the same cell types. Unless there's a case where you'd find something that looks different? |
Not very clear idea, but something along these lines: https://www.biorxiv.org/content/10.1101/2020.09.04.283812v1
Maybe a way to achieve similar results without explicit modelling and inference. It's essentially a smoothing of cluster assignments on spatial coordinates.
The text was updated successfully, but these errors were encountered: