From 6ff9b4ec94ea2f7753cbb8e58e8099e4294990b9 Mon Sep 17 00:00:00 2001 From: Alessandro Molina Date: Tue, 30 Jul 2024 17:24:43 +0200 Subject: [PATCH 1/5] docs: explain translation from narwhals api to native api --- docs/how_it_works.md | 75 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 75 insertions(+) diff --git a/docs/how_it_works.md b/docs/how_it_works.md index 45812eccd..c265b9508 100644 --- a/docs/how_it_works.md +++ b/docs/how_it_works.md @@ -150,6 +150,81 @@ Each implementation defines its own objects in subfolders such as `narwhals._pan `narwhals._arrow`, `narwhals._polars`, whereas the top-level modules such as `narwhals.dataframe` and `narwhals.series` coordinate how to dispatch the Narwhals API to each backend. +## Mapping from API to implementations + +End user works with narwhals apis, it won't use directly the native dataframe APIS, +so narwhals will take care itself of translating +the calls done to the narwhals api to calls to the dataframe wrapper +( `PandasLikeNamespace`, `PandasLikeDataFrame` or `PandasLikeExpr`) which then +fowards the calls to the native implementation. Translation of native narwhals +apis to the dataframe wrapper will usually happen via the namespace. +Usually you will find the namespace referred as +as `plx` which stands for *Polars compliant namespace for Library X* + +That usually happens in a few different ways: + +- `narwhals.DataFrame` -> `PandasLikeDataFrame` is available via the + `narwhals.DataFrame._compliant_frame` attribute of the Dataframe. + For example in case of a `pyarrow` we would have: + ```python + >>> nwdf = nw.from_native(table) + >>> type(nwdf) + + >>> nwdf._compliant_frame + + ``` +- `narwhals.Expr` -> `PandasLikeExpr` happens via calling `Expr._call` on the + target namespace. For example in case of `pyarrow` we would have: + ```python + >>> type(nwdf) + + >>> nw.col("b").len()._call(nwdf.__narwhals_namespace__()) + ArrowExpr(depth=1, function_name=col->len, root_names=['b'], output_names=['b'] + ``` +- `narwhals.Series` -> `PandasLikeSeries` happens via the + `narwhals.Series._compliant_series` attribute of the Series. + For example, in case of `pyarrow` we would have: + ```python + >>> nwdf = nw.from_native(table) + >>> type(nwdf) + + >>> nwseries = nwdf["b"] + >>> type(colseries) + + >>> colseries._compliant_series + + ``` + +After a compliant object is returned, the user will keep using and perform +additional operations on the compliant object itself, as the compliant +object implements the same API as the narwhals objects and is accepted +as an argument by all narwhals functions. + +For example `nw.col("b")._call(nwdf.__narwhals_namespace__())` on a +`pyarrow.Table` will return an `ArrowExpr`: + +```python + ArrowExpr(depth=0, function_name=col, root_names=['b'], output_names=['b'] +``` + +but invoking `narwhals.Dataframe.select` on it will work and return +a new `narwhals.Dataframe` as if we passed a `narwhals.Expr` itself: + +```python +>>> nwexpr = nw.col("b") +>>> type(nwexpr) + +>>> nwr = nwdf.select(nw.col("b")) +>>> type(nwr) + +>>> arrowexpr = nw.col("b")._call(nwdf.__narwhals_namespace__()) +>>> type(arrowexpr) + +>>> nwr = nwdf.select(arrowexpr) +>>> type(nwr) + +``` + ## Group-by Group-by is probably one of Polars' most significant innovations (on the syntax side) with respect From 4f8f04b01a455357d257c62b6eb293b13be858fc Mon Sep 17 00:00:00 2001 From: Alessandro Molina Date: Tue, 30 Jul 2024 17:59:51 +0200 Subject: [PATCH 2/5] fix typo --- docs/how_it_works.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/how_it_works.md b/docs/how_it_works.md index c265b9508..8320774b1 100644 --- a/docs/how_it_works.md +++ b/docs/how_it_works.md @@ -156,7 +156,7 @@ End user works with narwhals apis, it won't use directly the native dataframe AP so narwhals will take care itself of translating the calls done to the narwhals api to calls to the dataframe wrapper ( `PandasLikeNamespace`, `PandasLikeDataFrame` or `PandasLikeExpr`) which then -fowards the calls to the native implementation. Translation of native narwhals +forwards the calls to the native implementation. Translation of native narwhals apis to the dataframe wrapper will usually happen via the namespace. Usually you will find the namespace referred as as `plx` which stands for *Polars compliant namespace for Library X* From f3d39b80e28bf7e59be7280ba778ccc37f67f19c Mon Sep 17 00:00:00 2001 From: Alessandro Molina Date: Tue, 30 Jul 2024 18:07:14 +0200 Subject: [PATCH 3/5] switch to python-console lexer --- docs/how_it_works.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/docs/how_it_works.md b/docs/how_it_works.md index 8320774b1..d3dd13135 100644 --- a/docs/how_it_works.md +++ b/docs/how_it_works.md @@ -166,7 +166,7 @@ That usually happens in a few different ways: - `narwhals.DataFrame` -> `PandasLikeDataFrame` is available via the `narwhals.DataFrame._compliant_frame` attribute of the Dataframe. For example in case of a `pyarrow` we would have: - ```python + ```python-console >>> nwdf = nw.from_native(table) >>> type(nwdf) @@ -175,7 +175,7 @@ That usually happens in a few different ways: ``` - `narwhals.Expr` -> `PandasLikeExpr` happens via calling `Expr._call` on the target namespace. For example in case of `pyarrow` we would have: - ```python + ```python-console >>> type(nwdf) >>> nw.col("b").len()._call(nwdf.__narwhals_namespace__()) @@ -184,7 +184,7 @@ That usually happens in a few different ways: - `narwhals.Series` -> `PandasLikeSeries` happens via the `narwhals.Series._compliant_series` attribute of the Series. For example, in case of `pyarrow` we would have: - ```python + ```python-console >>> nwdf = nw.from_native(table) >>> type(nwdf) @@ -203,14 +203,14 @@ as an argument by all narwhals functions. For example `nw.col("b")._call(nwdf.__narwhals_namespace__())` on a `pyarrow.Table` will return an `ArrowExpr`: -```python - ArrowExpr(depth=0, function_name=col, root_names=['b'], output_names=['b'] +```python-console +ArrowExpr(depth=0, function_name=col, root_names=['b'], output_names=['b'] ``` but invoking `narwhals.Dataframe.select` on it will work and return a new `narwhals.Dataframe` as if we passed a `narwhals.Expr` itself: -```python +```python-console >>> nwexpr = nw.col("b") >>> type(nwexpr) From f4f4bb3a1e21de4cebbe98e2f5e37470ecfdcc68 Mon Sep 17 00:00:00 2001 From: Marco Gorelli <33491632+MarcoGorelli@users.noreply.github.com> Date: Tue, 30 Jul 2024 21:58:09 +0100 Subject: [PATCH 4/5] reword, make snippets runnable --- docs/how_it_works.md | 122 ++++++++++++++++++++----------------------- 1 file changed, 56 insertions(+), 66 deletions(-) diff --git a/docs/how_it_works.md b/docs/how_it_works.md index d3dd13135..d67985a69 100644 --- a/docs/how_it_works.md +++ b/docs/how_it_works.md @@ -152,79 +152,69 @@ and `narwhals.series` coordinate how to dispatch the Narwhals API to each backen ## Mapping from API to implementations -End user works with narwhals apis, it won't use directly the native dataframe APIS, -so narwhals will take care itself of translating -the calls done to the narwhals api to calls to the dataframe wrapper -( `PandasLikeNamespace`, `PandasLikeDataFrame` or `PandasLikeExpr`) which then -forwards the calls to the native implementation. Translation of native narwhals -apis to the dataframe wrapper will usually happen via the namespace. -Usually you will find the namespace referred as -as `plx` which stands for *Polars compliant namespace for Library X* - -That usually happens in a few different ways: - -- `narwhals.DataFrame` -> `PandasLikeDataFrame` is available via the - `narwhals.DataFrame._compliant_frame` attribute of the Dataframe. - For example in case of a `pyarrow` we would have: - ```python-console - >>> nwdf = nw.from_native(table) - >>> type(nwdf) - - >>> nwdf._compliant_frame - - ``` -- `narwhals.Expr` -> `PandasLikeExpr` happens via calling `Expr._call` on the - target namespace. For example in case of `pyarrow` we would have: - ```python-console - >>> type(nwdf) - - >>> nw.col("b").len()._call(nwdf.__narwhals_namespace__()) - ArrowExpr(depth=1, function_name=col->len, root_names=['b'], output_names=['b'] - ``` -- `narwhals.Series` -> `PandasLikeSeries` happens via the - `narwhals.Series._compliant_series` attribute of the Series. - For example, in case of `pyarrow` we would have: - ```python-console - >>> nwdf = nw.from_native(table) - >>> type(nwdf) - - >>> nwseries = nwdf["b"] - >>> type(colseries) - - >>> colseries._compliant_series - - ``` +When end users call Narwhals APIs, how do they get translated to the native Dataframe APIs? + +Things generally go through a couple of layers: + +- The user calls some top-level Narwhals API. +- The Narwhals API forwards the call to a dataframe wrapper (`PandasLikeNamespace`, `PandasLikeDataFrame`, or `PandasLikeExpr`), whose + API is compliant with the Narwhals one. +- The dataframe wrapper forwards the call to the underlying library. + +The way you access the wrapper depends on the object: + +- `narwhals.DataFrame` -> `._compliant_frame`: `PandasLikeDataFrame` / `ArrowDataFrame` / `PolarsDataFrame` / ... +- `narwhals.Expr` -> `._call`: `PandasLikeExpr` / `ArrowExpr` / `PolarsExpr` / ... +- `narwhals.Series` -> `._compliant_series`: `PandasLikeSeries` / `ArrowSeries` / `PolarsSeries` / ... -After a compliant object is returned, the user will keep using and perform -additional operations on the compliant object itself, as the compliant -object implements the same API as the narwhals objects and is accepted -as an argument by all narwhals functions. +The way these are typically obtained is by going through _namespaces_. Each backend is expected to implement a Narwhals-compliant +namespace: `PandasLikeNamespace`, `ArrowNamespace`, `PolarsNamespace`, ... -For example `nw.col("b")._call(nwdf.__narwhals_namespace__())` on a -`pyarrow.Table` will return an `ArrowExpr`: +To understand how we go through all the layers, let's look at a concrete example: support `df_pd` is a pandas DataFrame. What happens +if someone calls the following? + +```python exec="1" session="pandas_api_mapping" source="above" +import narwhals as nw +from narwhals._pandas_like.namespace import PandasLikeNamespace +from narwhals._pandas_like.utils import Implementation +from narwhals._pandas_like.dataframe import PandasLikeDataFrame +from narwhals.utils import parse_version +import pandas as pd -```python-console -ArrowExpr(depth=0, function_name=col, root_names=['b'], output_names=['b'] +pn = PandasLikeNamespace( + implementation=Implementation.PANDAS, + backend_version=parse_version(pd.__version__), +) + +df_pd = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]}) +df = nw.from_native(df_pd) +df.select(nw.col("a") + 1) ``` -but invoking `narwhals.Dataframe.select` on it will work and return -a new `narwhals.Dataframe` as if we passed a `narwhals.Expr` itself: - -```python-console ->>> nwexpr = nw.col("b") ->>> type(nwexpr) - ->>> nwr = nwdf.select(nw.col("b")) ->>> type(nwr) - ->>> arrowexpr = nw.col("b")._call(nwdf.__narwhals_namespace__()) ->>> type(arrowexpr) - ->>> nwr = nwdf.select(arrowexpr) ->>> type(nwr) - +The first thing `narwhals.DataFrame.select` does is to parse each input expression to end up with a compliant expression for the given +backend, and it does so by passing a Narwhals-compliant namespace `nw.Expr._call`: + +```python exec="1" result="python" session="pandas_api_mapping" source="above" +pn = PandasLikeNamespace( + implementation=Implementation.PANDAS, + backend_version=parse_version(pd.__version__), +) +expr = (nw.col("a") + 1)._call(pn) +print(expr) +``` +Right, so now, `expr` is a `PandasLikeExpr`. If we extract a Narwhals-compliant dataframe from `df` by calling `._compliant_frame`, +we get a `PandasLikeDataFrame` - and that's an object which we can pass `expr` to! After that, we can view the native pandas object +by calling `._native_dataframe`: + +```python exec="1" result="python" session="pandas_api_mapping" source="above" +df_compliant = df._compliant_frame +result = df_compliant.select(expr) +print(result._native_dataframe) ``` +So, in effect went through two layers of abstraction: `nw.DataFrame` was backed by `PandasLikeDataFrame`, which was backed by an +actual `pandas.DataFrame`. The same principle applies for all Narwhals backend. + ## Group-by Group-by is probably one of Polars' most significant innovations (on the syntax side) with respect From 9e4e517c9d190eb8ce45e8b9ea4c26f7433b4560 Mon Sep 17 00:00:00 2001 From: Marco Gorelli <33491632+MarcoGorelli@users.noreply.github.com> Date: Wed, 31 Jul 2024 10:40:00 +0100 Subject: [PATCH 5/5] reorder sections a bit, extra explanation --- docs/how_it_works.md | 56 +++++++++++++++++++++++++++++--------------- 1 file changed, 37 insertions(+), 19 deletions(-) diff --git a/docs/how_it_works.md b/docs/how_it_works.md index d67985a69..758442e4d 100644 --- a/docs/how_it_works.md +++ b/docs/how_it_works.md @@ -152,26 +152,38 @@ and `narwhals.series` coordinate how to dispatch the Narwhals API to each backen ## Mapping from API to implementations -When end users call Narwhals APIs, how do they get translated to the native Dataframe APIs? +If an end user executes some Narwhals code, such as + +```python +df.select(nw.col("a") + 1) +``` +then how does that get mapped to the underlying dataframe's native API? Let's walk through +this example to see. Things generally go through a couple of layers: - The user calls some top-level Narwhals API. -- The Narwhals API forwards the call to a dataframe wrapper (`PandasLikeNamespace`, `PandasLikeDataFrame`, or `PandasLikeExpr`), whose - API is compliant with the Narwhals one. -- The dataframe wrapper forwards the call to the underlying library. +- The Narwhals API forwards the call to a Narwhals-compliant dataframe wrapper, such as + - `PandasLikeDataFrame` / `ArrowDataFrame` / `PolarsDataFrame` / ... + - `PandasLikeSeries` / `ArrowSeries` / `PolarsSeries` / ... + - `PandasLikeExpr` / `ArrowExpr` / `ArrowSeries` / ... +- The dataframe wrapper forwards the call to the underlying library, e.g.: + - `PandasLikeDataFrame` forwards the call to the underlying pandas/Modin/cuDF dataframe. + - `ArrowDataFrame` forwards the call to the underlying PyArrow table. + - `PolarsDataFrame` forwards the call to the underlying Polars DataFrame. -The way you access the wrapper depends on the object: +The way you access the Narwhals-compliant wrapper depends on the object: -- `narwhals.DataFrame` -> `._compliant_frame`: `PandasLikeDataFrame` / `ArrowDataFrame` / `PolarsDataFrame` / ... -- `narwhals.Expr` -> `._call`: `PandasLikeExpr` / `ArrowExpr` / `PolarsExpr` / ... -- `narwhals.Series` -> `._compliant_series`: `PandasLikeSeries` / `ArrowSeries` / `PolarsSeries` / ... +- `narwhals.DataFrame` and `narwhals.LazyFrame`: use the `._compliant_frame` attribute. +- `narwhals.Series`: use the `._compliant_series` attribute. +- `narwhals.Expr`: call the `._call` method, and pass to it the Narwhals-compliant namespace associated with + the given backend. -The way these are typically obtained is by going through _namespaces_. Each backend is expected to implement a Narwhals-compliant -namespace: `PandasLikeNamespace`, `ArrowNamespace`, `PolarsNamespace`, ... +🛑 BUT WAIT! What's a Narwhals-compliant namespace? -To understand how we go through all the layers, let's look at a concrete example: support `df_pd` is a pandas DataFrame. What happens -if someone calls the following? +Each backend is expected to implement a Narwhals-compliant +namespace (`PandasLikeNamespace`, `ArrowNamespace`, `PolarsNamespace`). These can be used to interact with the Narwhals-compliant +Dataframe and Series objects described above - let's work through the motivating example to see how. ```python exec="1" session="pandas_api_mapping" source="above" import narwhals as nw @@ -192,7 +204,7 @@ df.select(nw.col("a") + 1) ``` The first thing `narwhals.DataFrame.select` does is to parse each input expression to end up with a compliant expression for the given -backend, and it does so by passing a Narwhals-compliant namespace `nw.Expr._call`: +backend, and it does so by passing a Narwhals-compliant namespace to `nw.Expr._call`: ```python exec="1" result="python" session="pandas_api_mapping" source="above" pn = PandasLikeNamespace( @@ -202,18 +214,24 @@ pn = PandasLikeNamespace( expr = (nw.col("a") + 1)._call(pn) print(expr) ``` -Right, so now, `expr` is a `PandasLikeExpr`. If we extract a Narwhals-compliant dataframe from `df` by calling `._compliant_frame`, -we get a `PandasLikeDataFrame` - and that's an object which we can pass `expr` to! After that, we can view the native pandas object -by calling `._native_dataframe`: +If we then extract a Narwhals-compliant dataframe from `df` by +calling `._compliant_frame`, we get a `PandasLikeDataFrame` - and that's an object which we can pass `expr` to! -```python exec="1" result="python" session="pandas_api_mapping" source="above" +```python exec="1" session="pandas_api_mapping" source="above" df_compliant = df._compliant_frame result = df_compliant.select(expr) +``` + +We can then view the underlying pandas Dataframe which was produced by calling `._native_dataframe`: + +```python exec="1" result="python" session="pandas_api_mapping" source="above" print(result._native_dataframe) ``` +which is the same as we'd have obtained by just using the Narwhals API directly: -So, in effect went through two layers of abstraction: `nw.DataFrame` was backed by `PandasLikeDataFrame`, which was backed by an -actual `pandas.DataFrame`. The same principle applies for all Narwhals backend. +```python exec="1" result="python" session="pandas_api_mapping" source="above" +print(nw.to_native(df.select(nw.col("a") + 1))) +``` ## Group-by