Functions to easily transform Azure blobs into pandas DataFrames and vice versa.
Installing PandaBlob via pip is the preferred method, as it will always install the most recent stable release. If you do not have pip installed, this Python installation guide can guide you through the process.
To install PandaBlob, run this command in your terminal:
# Use pip to install PandaBlob
pip install pandablob
Downloading and installing PandaBlob from source is also possible, follow the code below.
# Download the package
git clone https://github.com/uijl/pandablob
# Go to the correct folder
cd pandablob
# Install package
pip install -e .
The code snip below shows how you can use PandaBlob, all you need is a BlobClient and possibly a pandas DataFrame or some keyword arguments for pandas.
# Import the Azure SDK and pandablob
import pandablob
from azure.storage.blob import ContainerClient
# Your Azure Credentials
account_url = "https://my_account_url.blob.core.windows.net/"
token = "your_key_string"
container = "your_container"
blobname = "your_blob_name.csv"
container_client = ContainerClient(account_url, container, credential=token)
blob_client = container_client.get_blob_client(blob=blobname)
# Specifiy your pandas keyword arguments
pandas_kwargs = {"index_col": 0}
# Read the blob as a pandas DataFrame
df = pandablob.blob_to_df(blob_client, pandas_kwargs)
There are three common errors that can be returned. Two are related to the blob storage and one because of the current limitations of pandablob.
- ResourceExistsError - If the specified blob is already on the blob, this error is returned. There are two options, you can add the
overwrite=True
argument to yourdf_to_blob
function or you can catch the exception. If you wish to enter it in an except statement, you can import it usingfrom azure.core.exceptions import ResourceExistsError
; - ResourceNotFoundError - If the specified blob is not found, this error is returned. If you wish to enter it in an except statement, you can import it using
from azure.core.exceptions import ResourceNotFoundError
; - TypeError - This error is returned by pandablob if you want to upload or download an extensiontype that is not yet supported. Currently only the following extensions are supported:
.csv
.json
.txt
,.xls
and.xlsx
.
Some other stuff that needs to be done:
- Include other files;
- Easier downloading a .csv file;
- Added MIT license.