Skip to content

PAMGuard/pypamguard

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

58 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PyPAMGuard User Guide

This guide is an overview and explains the important features. Details are found in the PyPAMGuard api reference.

Getting Started

What is PyPAMGuard

PyPAMGuard is a package for processing PAMGuard binary file outputs in Python. PAMGuard is a software program written in Java that has a number of applications related to passive acoustic monitoring. Most modules produce binary files names pamguard data files (.pgdf) which follow a core standard structure defined in the PAMGuard source code. The PyPAMGuard library can read these data files into python data types and objects that allow users to further process the data however they wish.

Users may prefer to use the MATLAB or R binary file readers should they be more comfortable with these languages.

What is a PGDF

A PAMGuard Data File (PGDF) is produced by a PAMGuard module. A standard PGDF must contain the following chunk types, in order:

  • File Header (1 exactly) contains metadata about the file and the module within it (structure is identical across all PGDFs)
  • Module Header (1 exactly) contains metadata about the data of the module (structure is module-dependent)
  • Data Set (0 or more) contains the data itself (structure is module-dependent)
  • Module Footer (1 exactly) contains further metadata about the data of the module (structure is module-dependent)
  • File Footer (1 exactly) contains metadata about the file (structure is identical across all PGDFs)

If you want to learn more about how these files are produced, please inspect the PAMGuard source code. You can also learn more about the specific attributes in each of these chunks by loading a PGDF using this library (see: Quick Start).

Installation

Before proceeding, ensure you have Python installed on your local Windows, Linux or MacOS system.

Python Package Index

NOT READY YET. SEE 'CLONED REPOSITORY' BELOW FOR ALTERNATIVE.

You can install PyPAMGuard from the Python Package Index using the following command.

pip install pypamguard

Cloned Repository

Clone the PyPAMGuard repository onto your local system using git clone. Note that this will download all assets of the program including the automated tests and associated binary files (for a leightweight version take what is in the src directory or download pypamguard from Pip).

Install dependencies in a Python Virtual Environment by running pip install -r requirements.txt from the root directory.

From within the src/ directory, start a Python shell. If running another program with a PyPAMGuard dependency, ensure src/pypamguard/ is in the working directory. You can now follow the Quick Start guide below for basic functionality.

Quick Start

Processing a data file using PyPAMGuard is as easy as importing the library, and calling one function, passing in the path to the data file. This will return a core.PGBFile object.

# file.py
import pypamguard
df = pypamguard.load_binary_data_file('path/to/data/file.pgdf')

You can then access the file headers and footers from the PGBFile object like so.

df.file_header # file header
df.module_header # module header
df.module_footer # module footer
df.file_footer # file footer

Access the module data itself like so.

len(df.data) # number of module data chunks
df.data[0] # the first module data chunk

All headers, footers, and module data, contain attributes that were read from the binary file. For example, each file header will always contain a version. This can intuitively be accessed like an attribute of an object. Some example are shown below.

df.file_header.version # integer
df.file_header.module_type # bytes

You can access a detailed list of all attributes of each chunk by converting it to a string and/or printing it. Exact attributes of each header, footer or module data chunk are not listed in this README as they may change in between file versions. It is recommended to run the library yourself on a particular data file and find the attributes yourself.

print(df.file_header)

# OUTPUT
#
# File Header
#         length (<class 'int'>): 103
#         identifier (<class 'int'>): -1
#         file_format (<class 'int'>): 0
#         pamguard (<class 'bytes'>): b'PAMGUARDDATA'
#         version (<class 'bytes'>): b'1.15.03'
#         branch (<class 'bytes'>): b'BETA'
#         data_date (<class 'datetime.datetime'>): 2016-09-03 00:42:14+00:00
#         analysis_date (<class 'datetime.datetime'>): 2016-10-27 19:43:26.511000+00:00
#         start_sample (<class 'int'>): 0
#         module_type (<class 'bytes'>): b'Click Detector'
#         module_name (<class 'bytes'>): b'Click Detector'
#         stream_name (<class 'bytes'>): b'Clicks'
#         extra_info_len (<class 'int'>): 0

See the extended README file below, and the API reference for more information on data types, filtering, and creating your own modules.

Customisation

Logger Verbosity

PyPAMGuard has its own logging capabilities to Python's standard output stream. You can pass in an argument verbosity to load_pamguard_data_file using a class of imported enums from pypamguard.logger (see below).

import pypamguard
debug_verbosity = pypamguard.logger.Verbosity.DEBUG
pypamguard.load_pamguard_binary_file('path/to/data/file.pgdf' verbosity=debug_verbosity)

The Verbosity enum that can be imported from pypamguard.logger can be set to the following values:

  • Verbosity.DEBUG print debug info, info, warnings and errors
  • Verbosity.INFO print info, warnings and errors
  • Verbosity.WARNING print warnings and errors
  • Verbosity.ERROR print errors

Filtering

PyPAMGuard allows you to filter data as it is being streamed into its internal data structures. This offers various speed and memory efficiencies, particularly with large data files. To do this, you must pass in a Filters object from pypamguard.core.filters as shown below.

from pypamguard.core.filters import Filters
from pypamguard import load_pamguard_binary_file

filters = Filters()
df = load_pamguard_binary_file('path/to/data/file.pgdf', filters=filters)

An empty Filters object will have no effect on filtering. The following filters were defined at the time of writing this documentation, however you can access the most up-to-date list by printing pypamguard.core.filters.Filters.INSTALLED_FILTERS.

{
    'uidlist': <class 'pypamguard.core.filters.WhitelistFilter'>,
    'uidrange': <class 'pypamguard.core.filters.RangeFilter'>,
    'daterange': <class 'pypamguard.core.filters.DateFilter'>
}

UID List Filter

Filtering by whitelist allows you to define a specific list of values that can be accepted.

from pypamguard.core.filters import Filter, WhitelistFilter
from pypamguard import load_pamguard_binary_file
filters = Filters({
    'uidlist': WhitelistFilter([1,2,3,10])
})
load_pamguard_binary_file('path/to/data/file.pgdf', filters=filters)

UID Range Filter

A range filter is a kind of filter that takes a start and end point, and includes all the date between these (inclusive). Define a range filter like so.

WARNING: passing ordered=True will cause data chunks to be skipped once the first item out of the upper bound is reached, so is dangerous to use on unordered or unknown data.

from pypamguard.core.filters import Filters, RangeFilter
from pypamguard import load_pamguard_binary_file
filters = Filters({
    'uidlist': RangeFilter(start=1, end=10, ordered=False)
})
load_pamguard_binary_file('path/to/data/file.pgdf', filters=filters)

Date Range Filter

Filtering by date range is similar to filtering by generic range, except you pass in datetime.datetime objects to the constructor of the filter. Note that the datetime objects must contain UTC timezone information, otherwise the filter will throw a ValueError exception.

from pypamguard.core.filters import Filters, DateFilter
from pypamguard import load_pamguard_binary_file
import datetime

from_timestamp: datetime.datetime # some UTC datetime object
to_timestamp: datetime.datetime # some UTC datetime object

filters = Filters({
    'daterange': DateFilter(from_timestamp, to_timestamp, ordered=True)
})
load_pamguard_binary_file('path/to/data/file.pgdf', filters=filters)

Fundamentals

Data Types

Headers, footers and module data automatically read data into their native (or close to) Python data types, to allow for easy use.

Python-core and standard third-party data types are:

  • int (whole numbers)
  • float (decimal numbers)
  • bytes (strings)
  • datetime.datetime (dates and times)
  • numpy.array (arrays of ints or floats)

PyPAMGuard has its own core.Bitmap class which is used to abstract the reading and processing of integer bitmap flags. A brief example is given below on how to interact with a Bitmap, but more information can be found in the API reference.

import pypamguard
# the 4-bit representation of 2 in binary is 0010
bm = pypamguard.utils.bitmap.Bitmap(size=4, value=2) 
bm.get_set_bits() # returns [1]
bm.set(3) # sets bit 3
bm.get_set_bits() # returns [1, 3]
bm.clear(1) # clears bit 1
bm.get_set_bits() # returns [3] 

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages