Skip to content

Commit 60c6c55

Browse files
authoredMar 15, 2024
Release 4.0 (#277)
1 parent 1098a88 commit 60c6c55

File tree

8 files changed

+59
-184
lines changed

8 files changed

+59
-184
lines changed
 

‎doc/source/api.rst

+16-23
Original file line numberDiff line numberDiff line change
@@ -1,57 +1,50 @@
11
API Reference
22
=============
33

4-
.. py:function:: frame_to_hyper(df: pd.DataFrame, database: Union[str, pathlib.Path], *, table: Union[str, tableauhyperapi.Name, tableauhyperapi.TableName], table_mode: str = "w", hyper_process: Optional[HyperProcess]) -> None:
4+
.. py:function:: frame_to_hyper(df: pd.DataFrame, database: Union[str, pathlib.Path], *, table: Union[str, tableauhyperapi.Name, tableauhyperapi.TableName], table_mode: str = "w", not_null_columns: Optional[Iterable[str]] = None, json_columns: Optional[Iterable[str]] = None, geo_columns: Optional[Iterable[str]] = None) -> None:
55
66
Convert a DataFrame to a .hyper extract.
77

88
:param df: Data to be written out.
99
:param database: Name / location of the Hyper file to write to.
10-
:param table: Table to write to. Must be supplied as a keyword argument.
10+
:param table: Table to write to.
1111
:param table_mode: The mode to open the table with. Default is "w" for write, which truncates the file before writing. Another option is "a", which will append data to the file if it already contains information.
12-
:param hyper_process: A `HyperProcess` in case you want to spawn it by yourself. Optional. Must be supplied as a keyword argument.
13-
:param use_parquet: Use a temporary parquet file to write into the Hyper database, which typically will yield better performance. Boolean, default False
12+
:param not_null_columns: Columns which should be considered "NOT NULL" in the target Hyper database. By default, all columns are considered nullable
13+
:param json_columns: Columns to be written as a JSON data type
14+
:param geo_columns: Columns to be written as a GEOGRAPHY data type
1415

15-
16-
.. py:function:: frame_from_hyper(source: Union[str, pathlib.Path, tab_api.Connection], *, table: Union[str, tableauhyperapi.Name, tableauhyperapi.TableName], hyper_process: Optional[HyperProcess], use_float_na: bool = False) -> pd.DataFrame:
16+
.. py:function:: frame_from_hyper(source: Union[str, pathlib.Path, tab_api.Connection], *, table: Union[str, tableauhyperapi.Name, tableauhyperapi.TableName], return_type: Literal["pandas", "pyarrow", "polars"] = "pandas")
1717
1818
Extracts a DataFrame from a .hyper extract.
1919

2020
:param source: Name / location of the Hyper file to be read or Hyper-API connection.
21-
:param table: Table to read. Must be supplied as a keyword argument.
22-
:param hyper_process: A `HyperProcess` in case you want to spawn it by yourself. Optional. Must be supplied as a keyword argument.
23-
:param use_float_na: Flag indicating whether to use the pandas `Float32`/`Float64` dtypes which support the new pandas missing value `pd.NA`, default False
24-
:rtype: pd.DataFrame
21+
:param table: Table to read.
22+
:param return_type: The type of DataFrame to be returned
2523

2624

27-
.. py:function:: frames_to_hyper(dict_of_frames: Dict[Union[str, tableauhyperapi.Name, tableauhyperapi.TableName], pd.DataFrame], database: Union[str, pathlib.Path], table_mode: str = "w", *, hyper_process: Optional[HyperProcess]) -> None:
25+
.. py:function:: frames_to_hyper(dict_of_frames: Dict[Union[str, tableauhyperapi.Name, tableauhyperapi.TableName], pd.DataFrame], database: Union[str, pathlib.Path], *, table_mode: str = "w", not_null_columns: Optional[Iterable[str]] = None, json_columns: Optional[Iterable[str]] = None, geo_columns: Optional[Iterable[str]] = None,) -> None:
2826
2927
Writes multiple DataFrames to a .hyper extract.
3028

3129
:param dict_of_frames: A dictionary whose keys are valid table identifiers and values are dataframes
3230
:param database: Name / location of the Hyper file to write to.
3331
:param table_mode: The mode to open the table with. Default is "w" for write, which truncates the file before writing. Another option is "a", which will append data to the file if it already contains information.
34-
:param hyper_process: A `HyperProcess` in case you want to spawn it by yourself. Optional. Must be supplied as a keyword argument.
35-
:param use_parquet: Use a temporary parquet file to write into the Hyper database, which typically will yield better performance. Boolean, default False
32+
:param not_null_columns: Columns which should be considered "NOT NULL" in the target Hyper database. By default, all columns are considered nullable
33+
:param json_columns: Columns to be written as a JSON data type
34+
:param geo_columns: Columns to be written as a GEOGRAPHY data type
3635

37-
.. py:function:: frames_from_hyper(source: Union[str, pathlib.Path, tab_api.Connection], *, hyper_process: Optional[HyperProcess]) -> Dict[tableauhyperapi.TableName, pd.DataFrame, use_float_na: bool = False]:
36+
.. py:function:: frames_from_hyper(source: Union[str, pathlib.Path, tab_api.Connection], *, return_type: Literal["pandas", "pyarrow", "polars"] = "pandas") -> dict:
3837
3938
Extracts tables from a .hyper extract.
4039

4140
:param source: Name / location of the Hyper file to be read or Hyper-API connection.
42-
:param hyper_process: A `HyperProcess` in case you want to spawn it by yourself. Optional. Must be supplied as a keyword argument.
43-
:param use_float_na: Flag indicating whether to use the pandas `Float32`/`Float64` dtypes which support the new pandas missing value `pd.NA`, default False
44-
:rtype: Dict[tableauhyperapi.TableName, pd.DataFrame]
45-
41+
:param return_type: The type of DataFrame to be returned
4642

47-
.. py:function:: frame_from_hyper_query(source: Union[str, pathlib.Path, tab_api.Connection], query: str, *, hyper_process: Optional[HyperProcess], use_float_na: bool = False) -> pd.DataFrame:
4843

49-
.. versionadded:: 2.0
44+
.. py:function:: frame_from_hyper_query(source: Union[str, pathlib.Path, tab_api.Connection], query: str, *, return_type: Literal["pandas", "polars", "pyarrow"] = "pandas",)
5045
5146
Executes a SQL query and returns the result as a pandas dataframe
5247

5348
:param source: Name / location of the Hyper file to be read or Hyper-API connection.
5449
:param query: SQL query to execute.
55-
:param hyper_process: A `HyperProcess` in case you want to spawn it by yourself. Optional. Must be supplied as a keyword argument.
56-
:param use_float_na: Flag indicating whether to use the pandas `Float32`/`Float64` dtypes which support the new pandas missing value `pd.NA`, default False
57-
:rtype: Dict[tableauhyperapi.TableName, pd.DataFrame]
50+
:param return_type: The type of DataFrame to be returned

‎doc/source/caveats.rst

-92
This file was deleted.

‎doc/source/changelog.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -147,7 +147,7 @@ Pantab 3.0.0 (2022-09-14)
147147
=========================
148148

149149
- Implemented a new ``use_parquet`` keyword in ``frame_to_hyper`` which uses Parquet as an intermediate storage solution instead of pantab's own internal C library. This may provide a small performance boost at the cost of additional disk usage
150-
- Fixed issue where pantab was not compatabile with Hyper versions 0.0.14567 and above. See the :ref:`compatability` documentation.
150+
- Fixed issue where pantab was not compatabile with Hyper versions 0.0.14567 and above.
151151

152152

153153
Pantab 2.1.1 (2022-04-13)

‎doc/source/conf.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
project = "pantab"
66
copyright = "2019-2024, Will Ayd, innobi, LLC"
77
author = "Will Ayd, innobi, LLC"
8-
release = "4.0.0.rc2"
8+
release = "4.0.0"
99

1010

1111
# -- General configuration ---------------------------------------------------

‎doc/source/examples.rst

+38-63
Original file line numberDiff line numberDiff line change
@@ -113,8 +113,6 @@ Please note that ``table_mode="a"`` will create the table(s) if they do not alre
113113
Issuing SQL queries
114114
-------------------
115115

116-
.. versionadded:: 2.0
117-
118116
With ``frame_from_hyper_query``, one can execute SQL queries against a Hyper file and retrieve the resulting data as a DataFrame. This can be used, e.g. to retrieve only a part of the data (using a ``WHERE`` clause) or to offload computations to Hyper.
119117

120118
.. code-block:: python
@@ -150,78 +148,55 @@ With ``frame_from_hyper_query``, one can execute SQL queries against a Hyper fil
150148
print(df)
151149
152150
153-
Providing your own HyperProcess
154-
-------------------------------
155-
156-
.. versionadded:: 2.0
157-
158-
For convenience, pantab's functions internally spawn a `HyperProcess <https://tableau.github.io/hyper-db/docs/hyper-api/hyper_process>`_. In case you prefer to spawn your own ``HyperProcess``, you can supply it to pantab through the ``hyper_process`` keyword argument.
151+
Bring your own DataFrame
152+
------------------------
159153

160-
By using your own ``HyperProcess``, you have full control over all its startup paramters.
161-
In the following example we use that flexibility to:
162-
163-
- enable telemetry, thereby making sure the Hyper team at Tableau knows about our use case and potential issues we might be facing
164-
- `disable log files <https://tableau.github.io/hyper-db/docs/hyper-api/hyper_process#log_config>`_, as we operate in some environment with really tight disk space
165-
- opt-in to the `new Hyper file format <https://tableau.github.io/hyper-db/docs/hyper-api/hyper_process#default_database_version>`_
166-
167-
By reusing the same ``HyperProcess`` for multiple operations, we also save a few milliseconds. While not noteworthy in this simple example, this might be a good optimization in case you call ``frame_to_hyper`` repeatedly in a loop.
154+
.. versionadded:: 4.0
168155

156+
When pantab was first created, pandas was the dominant DataFrame library. In the years since then, many competing libraries have cropped up which all provide different advantages. To give users the most flexibility, pantab provides first class support for exchanging `pandas <https://pandas.pydata.org/>`_, `polars <https://pola.rs/>`_ and `pyarrow <https://arrow.apache.org/docs/python/index.html>`_ DataFrames. To wit, all of the following code samples will produce an equivalent Hyper file:
169157

170158
.. code-block:: python
171159
172-
import pandas as pd
173160
import pantab as pt
174-
from tableauhyperapi import HyperProcess, Telemetry
175-
176-
df = pd.DataFrame([
177-
["dog", 4],
178-
["cat", 4],
179-
], columns=["animal", "num_of_legs"])
180-
181-
parameters = {"log_config": "", "default_database_version": "1"}
182-
with HyperProcess(Telemetry.SEND_USAGE_DATA_TO_TABLEAU, parameters=parameters) as hyper:
183-
# Insert some initial data
184-
pt.frame_to_hyper(df, "example.hyper", table="animals", hyper_process=hyper)
185-
186-
# Append additional data to the same table using `table_mode="a"`
187-
new_data = pd.DataFrame([["moose", 4]], columns=["animal", "num_of_legs"])
188-
pt.frame_to_hyper(df, "example.hyper", table="animals", table_mode="a", hyper_process=hyper)
189161
162+
import pandas as pd
163+
df = pd.DataFrame({"col": [1, 2, 3]})
164+
pt.frame_to_hyper(df, "example.hyper", table="test")
190165
191-
Providing your own Hyper Connection
192-
-----------------------------------
166+
import pyarrow as pa
167+
tbl = pa.Table.from_arrays([pa.array([1, 2, 3])], names=["col"])
168+
pt.frame_to_hyper(tbl, "example.hyper", table="test")
193169
194-
.. versionadded:: 2.0
170+
import polars as pl
171+
df = pl.DataFrame({"col": [1, 2, 3]})
172+
pt.frame_to_hyper(df, "example.hyper", table="test")
195173
196-
In order to interface with Hyper, pantab functions need a HyperAPI `Connection <https://tableau.github.io/hyper-db/docs/hyper-api/connection>`_ to interface with Hyper.
197-
For convenience, pantab creates those connections implicitly for you.
198-
However, establishing a connection is not for free, and by reusing the same ``Connection`` for multiple operations, we can save time.
199-
Hence, pantab also allows you to pass in a HyperAPI connection instead of the name / location of your Hyper file.
174+
As far as reading is concerned, you can control the type of DataFrame you receive back via the ``return_type`` keyword. pandas remains the default
200175

201176
.. code-block:: python
202177
203-
import pandas as pd
204-
import pantab as pt
205-
from tableauhyperapi import HyperProcess, Telemetry, Connection, CreateMode
178+
>>> pt.frame_from_hyper("example.hyper", table="test") # pandas by default
179+
col
180+
0 1
181+
1 2
182+
2 3
183+
>>> pt.frame_from_hyper("example.hyper", table="test", return_type="pyarrow")
184+
pyarrow.Table
185+
col: int64
186+
----
187+
col: [[1,2,3]]
188+
>>> pt.frame_from_hyper("example.hyper", table="test", return_type="polars")
189+
shape: (3, 1)
190+
┌─────┐
191+
│ col │
192+
---
193+
│ i64 │
194+
╞═════╡
195+
1
196+
2
197+
3
198+
└─────┘
206199
207-
df = pd.DataFrame([
208-
["dog", 4],
209-
["cat", 4],
210-
["centipede", 100],
211-
], columns=["animal", "num_of_legs"])
212-
path = "example.hyper"
213-
214-
with HyperProcess(Telemetry.DO_NOT_SEND_USAGE_DATA_TO_TABLEAU) as hyper:
215-
pt.frames_to_hyper({"animals": df}, path, hyper_process=hyper)
216-
217-
with Connection(hyper.endpoint, path, CreateMode.NONE) as connection:
218-
query = """
219-
SELECT animal
220-
FROM animals
221-
WHERE num_of_legs > 4
222-
"""
223-
many_legs_df = pt.frame_from_hyper_query(connection, query)
224-
print(many_legs_df)
225-
226-
all_animals = pt.frame_from_hyper(connection, table="animals")
227-
print(all_animals)
200+
.. note::
201+
202+
Technically pantab is able to *write* any DataFrame library that implements the `Arrow PyCapsule Interface <https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html>`_

‎doc/source/index.rst

+1-2
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,6 @@ pantab
55
:hidden:
66

77
examples
8-
caveats
98
api
109
changelog
1110
support
@@ -19,7 +18,7 @@ What is it?
1918
How do I get it?
2019
----------------
2120

22-
``pantab`` requires Python 3.6+ and can run on any Python-supported OS. Installation is as easy as:
21+
Installation is as easy as:
2322

2423
.. code-block:: bash
2524

‎pyproject.toml

+1-1
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ build-backend = "scikit_build_core.build"
88

99
[project]
1010
name = "pantab"
11-
version = "4.0.0rc2"
11+
version = "4.0.0"
1212
description = "Converts pandas DataFrames into Tableau Hyper Extracts and back"
1313
license = {file = "LICENSE.txt"}
1414
readme = "README.md"

‎src/pantab/__init__.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
__version__ = "4.0.0rc2"
1+
__version__ = "4.0.0"
22

33

44
from pantab._reader import frame_from_hyper, frame_from_hyper_query, frames_from_hyper

0 commit comments

Comments
 (0)