What is TAB-analysis ?

TAB-analysis : A tool to Analyse tabular and multi-dimensional structures

TAB-analysis analyzes and measures the relationships between Fields in any tabular Dataset.

The TAB-analysis tool is part of the Environmental Sensing Project

For more information, see the user guide or the github repository.

What is TAB-analysis ?

Principles

Each field in a dataset has global properties (e.g. the number of different values). The relationships between two fields can also be characterized in a similar way (e.g. number of pairs of values from the two different fields).

Analyzing these properties gives us a measure of the entire dataset.

The TAB-analysis module carries out these measurements and analyzes. It also identifies data that does not respect given relationships and multidimensional structure.* .

Examples

Here is a price list of different foods based on packaging.

'plants'	'quantity'	'product'	'price'
'fruit'	'1 kg'	'apple'	1
'fruit'	'10 kg'	'apple'	10
'fruit'	'1 kg'	'orange'	2
'fruit'	'10 kg'	'orange'	20
'vegetable'	'1 kg'	'peppers'	1.5
'vegetable'	'10 kg'	'peppers'	15
'vegetable'	'1 kg'	'carrot'	0.5
'vegetable'	'10 kg'	'carrot'	5

In this example, we observe two kinds of relationships:

classification ("derived" relationship): between 'plants' and 'product' (each product belongs a plant)
crossing ("crossed" relationship): between 'product' and 'quantity' (all the combinations of the two fields are present).

This Dataset can be translated in a matrix between 'quantity' ['1 kg', '10 kg'] and 'product' ['apple', 'orange', 'peppers', 'carrot']

In [1]: # creation of the `analysis` object
        from tab_dataset import Sdataset
        from tab_analysis import AnaDataset
        tabular = {'plants':   ['fruit', 'fruit','fruit',   'fruit','vegetable','vegetable','vegetable','vegetable' ],
                   'quantity': ['1 kg' , '10 kg', '1 kg',   '10 kg',  '1 kg',    '10 kg',   '1 kg',     '10 kg'     ],
                   'product':  ['apple', 'apple', 'orange', 'orange', 'peppers', 'peppers', 'carrot',   'carrot'    ],
                   'price':    [1,       10,      2,        20,       1.5,       15,        0.5,        5           ]}
        analysis = AnaDataset(Sdataset.ntv(tabular).to_analysis(True))
        # `analysis` is also available from pandas data
        import pandas as pd
        import ntv_pandas as npd
        analysis = pd.DataFrame(tabular).npd.analysis(distr=True)

In [2]: # each relationship is evaluated and measured
        analysis.get_relation('plants', 'product').typecoupl
Out[2]: 'derived'

In [3]: analysis.get_relation('quantity', 'product').typecoupl
Out[3]: 'crossed'

In [4]: # the 'distance' between to Fields is measured (number of codec links to change to be coupled))
        analysis.get_relation('quantity', 'product').distance
Out[4]: 6

In [5]: # the dataset can be represented as a 'derived tree'
        print(analysis.tree())
Out[5]: -1: root-derived (8)
           1 : quantity (6 - 2)
           2 : product (4 - 4)
              0 : plants (2 - 2)
           3 : price (0 - 8)

In [6]: # 'partitions' are found (partitions are multi-dimensionnal data)'
        analysis.partitions()
Out[6]: [['quantity', 'product'], ['price']]

In [7]: # the `field_partition` method return the main structure of the dataset
        analysis.field_partition()
Out[7]: {'primary': ['quantity', 'product'],
         'secondary': ['plants'],
         'mixte': [],
         'unique': [],
         'variable': ['price']}

Uses

A TAB-analysis object is initialized by a set of properties (a dict with specific keys). It can therefore be used from any tabular data manager (e.g. pandas).

Possible uses are as follows:

control of a dataset in relation to a data model,
quality indicators of a dataset
analysis of datasets

and in connection with the tabular application:

error detection and correction,
generation of optimized data formats
conversion to multidimensional data
interface to specific applications

Name		Name	Last commit message	Last commit date
Latest commit History 144 Commits
.github		.github
ci		ci
docs		docs
example		example
tab_analysis		tab_analysis
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TAB-analysis : A tool to Analyse tabular and multi-dimensional structures

What is TAB-analysis ?

Principles

Examples

Uses

About

Releases 3

Packages

Languages

License

loco-philippe/tab-analysis

Folders and files

Latest commit

History

Repository files navigation

TAB-analysis : A tool to Analyse tabular and multi-dimensional structures

What is TAB-analysis ?

Principles

Examples

Uses

About

Resources

License

Stars

Watchers

Forks

Releases 3

Packages 0

Languages

Packages