Skip to content

Commit

Permalink
Polish the slides
Browse files Browse the repository at this point in the history
  • Loading branch information
nimarafati committed Mar 13, 2024
1 parent b3a5f14 commit cc1349a
Show file tree
Hide file tree
Showing 2 changed files with 10 additions and 9 deletions.
12 changes: 6 additions & 6 deletions slide_clustering.rmd
Original file line number Diff line number Diff line change
Expand Up @@ -428,7 +428,7 @@ name: hclust
- Well suited for hierarchical data (e.g. taxonomies).
- Final output is a dendrogram representing the order decisions at each merge/division of clusters.
- Two approaches:
- Agglomerative (Bottom-up): All data points are treated as clusters and the joins similar ones.
- Agglomerative (Bottom-up): All data points are treated as clusters and then joins similar ones.
- Divisive (Top-down): All data points are in one large clusters and recursively splits the most heterogeneous clusters.
- Number of clusters are decided after generating the tree.
---
Expand Down Expand Up @@ -488,24 +488,24 @@ knitr::include_graphics('data/Linkages.png')
name: linear-clustering-summary
## Summary

- For bulk RNASeq you can perform clusteirng on raw or Z-Score scaled data.
- For bulk RNASeq you can perform clustering on raw, Z-Score scaled data or on top PC coordinates.

- For the sample size is large (>10,000) you can perform clustering on PC. For instance in scRNASeq data.
- For the sample large size (>10,000) you can perform clustering on PC. For instance in scRNASeq data.

- You always need to tune some parameters.

- K-means performs poorly on unbalanced data.

- On hierarchical clustering, some distance metrics need to be used with a certain
- In hierarchical clustering, some distance metrics need to be used with a certain
linkage method.

- Checking clustering Robustness (a.k.a Ensemble perturbations):
- Most clustering techniques will cluster random noise.
- One way of testing this is by clustering on parts of the data (clustering bootstrapping)
- Read more in [Ronan et al (2016) Science Signaling](https://www.science.org/doi/10.1126/scisignal.aad1932?url_ver=Z39.88-2003&rfr_id=ori:rid:crossref.org&rfr_dat=cr_pub%20%200pubmed)).
- Read more in [Ronan et al (2016) Science Signaling](https://www.science.org/doi/10.1126/scisignal.aad1932?url_ver=Z39.88-2003&rfr_id=ori:rid:crossref.org&rfr_dat=cr_pub%20%200pubmed).
---
name: Know more
## Do you want to know more
## Do you want to know more?
Please check the following links:
- [Avoiding common pitfalls when clustering biological data](https://www.science.org/doi/10.1126/scisignal.aad1932?url_ver=Z39.88-2003&rfr_id=ori:rid:crossref.org&rfr_dat=cr_pub%20%200pubmed)
- [Clustering with Scikit with GIFs](https://dashee87.github.io/data%20science/general/Clustering-with-Scikit-with-GIFs/) (Note, this is based on python but provide nice illustration).
Expand Down
7 changes: 4 additions & 3 deletions slide_preprocessing.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -72,8 +72,8 @@ name: pp
- Remove genes and samples with low counts

```{r,echo=TRUE}
cf1 <- cr[rowSums(cr>0) >= 3, ] # Keep rows/genes that have at least one read
cf2 <- cr[rowSums(cr>2) >= 3, ] # Keep rows/genes that have at least three reads
cf1 <- cr[rowSums(cr>0) >= 3, ] # Keep rows/genes that have at least one read in +3 samples
cf2 <- cr[rowSums(cr>3) >= 3, ] # Keep rows/genes that have at least three reads in +3 samples
cf3 <- cr[rowSums(edgeR::cpm(cr)>5) >= 3, ] # need at least three samples to have cpm > 5.
```
_count/read per million (cpm/rpm): a normalized value for sequencing depth._
Expand Down Expand Up @@ -146,6 +146,7 @@ name: norm-1

.pull-left-50[
- Removing technical biases in sequencing data (e.g. sequencing depth and gene length)
- Make counts comparable across features (genes).
- Make counts comparable across samples

<!-- Control for sequencing depth -->
Expand Down Expand Up @@ -202,7 +203,7 @@ name: norm-2

## Normalisation

- Make counts comparable across features (genes). It can be useful for gene to gene comparisons.
- Controlling for gene length: It can be useful for gene to gene comparisons.
.size-60[![](data/normalization_methods_length.png)]

```{r,echo=FALSE}
Expand Down

0 comments on commit cc1349a

Please sign in to comment.