Skip to content

Commit

Permalink
Deploying to gh-pages from @ 6840145 🚀
Browse files Browse the repository at this point in the history
  • Loading branch information
pablo-gar committed Apr 4, 2024
1 parent 76812a0 commit 026f1bb
Show file tree
Hide file tree
Showing 15 changed files with 460 additions and 18 deletions.
66 changes: 66 additions & 0 deletions _sources/articles/2024/20240404-categoricals.md.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# Census supports categoricals for cell metadata

*Published:* *April 4th, 2024*

*By:* *[Emanuele Bezzi](ebezzi@chanzuckerberg.com)* & [Pablo Garcia-Nieto](pgarcia-nieto@chanzuckerberg.com)

Starting with the `2024-04-01` Census build, a subset of the columns in the `obs` dataframe are now categorical instead of strings.

Overall users will observe a smaller memory footprint when loading Census data into memory. 🚀

However, this may break some existing pipelines as explained below.

## Potential breaking changes

For **Python users**, note that Pandas will encode these columns as `pandas.Categorical` for which some downstream operations may need to be adapted. See [this link](https://pandas.pydata.org/docs/user_guide/categorical.html#operations) for more details. In particular:

> Series methods like Series.value_counts() will use all categories, even if some categories are not present in the data

and

> DataFrame methods like sum, groupby, pivot, value_counts also show “unused” categories when observed=False, which is the default.

For **R users**, note that these columns will be encoded as `factor` and similarly downstream operations may need to be adapted. See [this link](https://r4ds.had.co.nz/factors.html) for more details.

For **Python and R users** interfacing with `arrow`, these columns will be encoded as `dictionary`, see more details for R in [this link](https://arrow.apache.org/docs/r/reference/dictionary.html) and Python in [this link](https://arrow.apache.org/docs/python/generated/pyarrow.dictionary.html).

## Identifying the `obs` columns encoded as categorical

Users can always check the the type of each cell metadata variable by inspecting the schema of `obs`. Categoricals will be shown as `dictionary`.

In Python:

```python
import cellxgene_census
census = cellxgene_census.open_soma(census_version="latest")
census["census_data"]["homo_sapiens"].obs.schema

# soma_joinid: int64 not null
# dataset_id: dictionary<values=string, indices=int16, ordered=0> not null
# assay: dictionary<values=string, indices=int8, ordered=0> not null
# assay_ontology_term_id: dictionary<values=string, indices=int8, ordered=0> not null
# cell_type: dictionary<values=string, indices=int16, ordered=0> not null
# cell_type_ontology_term_id: dictionary<values=string, indices=int16, ordered=0> not null
# development_stage: dictionary<values=string, indices=int16, ordered=0> not null
# development_stage_ontology_term_id: dictionary<values=string, indices=int16,
# [OUTPUT TRUNCATED]
```

In R:

```r
library("cellxgene.census")
census = open_soma(census_version="latest")
census$get("census_data")$get("homo_sapiens")$obs$schema()

# Schema
# soma_joinid: int64 not null
# dataset_id: dictionary<values=string, indices=int16> not null
# assay: dictionary<values=string, indices=int8> not null
# assay_ontology_term_id: dictionary<values=string, indices=int8> not null
# cell_type: dictionary<values=string, indices=int16> not null
# cell_type_ontology_term_id: dictionary<values=string, indices=int16> not null
# development_stage: dictionary<values=string, indices=int16> not null
# development_stage_ontology_term_id: dictionary<values=string, indices=int16> not null
# [OUTPUT TRUNCATED]
```
2 changes: 2 additions & 0 deletions _sources/cellxgene_census_docsite_landing.md.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
<span style="background-color: #f3bfcb; color; font-size: 18px"> 🚨 Census now supports categoricals, using less memory but potentially breaking existing pipelines. [Find out more](https://chanzuckerberg.github.io/cellxgene-census/articles/2024/20240404-categoricals.html)!

<span style="background-color: #f3bfcb; color; font-size: 18px"> 🚀 New to the Census: we’ve created a **centralized hub of models and embeddings** using Census data. [Check it out](https://cellxgene.cziscience.com/census-models)!

</span>
Expand Down
372 changes: 372 additions & 0 deletions articles/2024/20240404-categoricals.html

Large diffs are not rendered by default.

3 changes: 2 additions & 1 deletion cellxgene_census_docsite_landing.html
Original file line number Diff line number Diff line change
Expand Up @@ -185,7 +185,8 @@
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
<div itemprop="articleBody">

<p><span style="background-color: #f3bfcb; color; font-size: 18px"> 🚀 New to the Census: we’ve created a <strong>centralized hub of models and embeddings</strong> using Census data. <a class="reference external" href="https://cellxgene.cziscience.com/census-models">Check it out</a>!</p>
<p><span style="background-color: #f3bfcb; color; font-size: 18px"> 🚨 Census now supports categoricals, using less memory but potentially breaking existing pipelines. <a class="reference external" href="https://chanzuckerberg.github.io/cellxgene-census/articles/2024/20240404-categoricals.html">Find out more</a>!</p>
<p><span style="background-color: #f3bfcb; color; font-size: 18px"> 🚀 New to the Census: we’ve created a <strong>centralized hub of models and embeddings</strong> using Census data. <a class="reference external" href="https://cellxgene.cziscience.com/census-models">Check it out</a>!</p>
</span>
<section id="cz-cellxgene-discover-census">
<h1>CZ CELLxGENE Discover Census<a class="headerlink" href="#cz-cellxgene-discover-census" title="Permalink to this heading"></a></h1>
Expand Down
3 changes: 2 additions & 1 deletion index.html
Original file line number Diff line number Diff line change
Expand Up @@ -186,7 +186,8 @@
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
<div itemprop="articleBody">

<p><span style="background-color: #f3bfcb; color; font-size: 18px"> 🚀 New to the Census: we’ve created a <strong>centralized hub of models and embeddings</strong> using Census data. <a class="reference external" href="https://cellxgene.cziscience.com/census-models">Check it out</a>!</p>
<p><span style="background-color: #f3bfcb; color; font-size: 18px"> 🚨 Census now supports categoricals, using less memory but potentially breaking existing pipelines. <a class="reference external" href="https://chanzuckerberg.github.io/cellxgene-census/articles/2024/20240404-categoricals.html">Find out more</a>!</p>
<p><span style="background-color: #f3bfcb; color; font-size: 18px"> 🚀 New to the Census: we’ve created a <strong>centralized hub of models and embeddings</strong> using Census data. <a class="reference external" href="https://cellxgene.cziscience.com/census-models">Check it out</a>!</p>
</span>
<section id="cz-cellxgene-discover-census">
<h1>CZ CELLxGENE Discover Census<a class="headerlink" href="#cz-cellxgene-discover-census" title="Permalink to this heading"></a></h1>
Expand Down
4 changes: 2 additions & 2 deletions notebooks/experimental/pytorch.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,7 @@
},
{
"cell_type": "markdown",
"id": "ddbc5269",
"id": "c5d2fbd5",
"metadata": {
"collapsed": false
},
Expand All @@ -130,7 +130,7 @@
},
{
"cell_type": "markdown",
"id": "0248c3a5",
"id": "89936afc",
"metadata": {
"collapsed": false
},
Expand Down
Binary file modified objects.inv
Binary file not shown.
22 changes: 11 additions & 11 deletions r/articles/comp_bio_data_integration.html

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion r/pkgdown.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,5 +12,5 @@ articles:
comp_bio_data_integration: comp_bio_data_integration.html
comp_bio_normalizing_full_gene_sequencing: comp_bio_normalizing_full_gene_sequencing.html
comp_bio_summarize_axis_query: comp_bio_summarize_axis_query.html
last_built: 2024-04-04T01:01Z
last_built: 2024-04-04T02:54Z

2 changes: 1 addition & 1 deletion r/search.json

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion searchindex.js

Large diffs are not rendered by default.

0 comments on commit 026f1bb

Please sign in to comment.