Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update docs, Update dependencies, add methods to RuleList #35

Merged
merged 4 commits into from
Mar 14, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
185 changes: 182 additions & 3 deletions docs/getting_started.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,16 +6,195 @@ This section is going to show you how to use the NiaARM framework.
Installation
------------

Firstly, install NiaARM package using the following command:
You can install NiaARM package using the following command:

.. code:: bash

pip install niaarm

Usage
-----

Loading Data
~~~~~~~~~~~~

In NiaARM, data loading is done via the :class:`~niaarm.dataset.Dataset` class.
There are two options for loading data:

**Option 1: Directly from file**

.. code:: python

from niaarm import Dataset

dataset = Dataset('Abalone.csv')
print(dataset)

**Option 2: From a pandas DataFrame (recommended)**

This option is recommended, as it allows you to preprocess the data before mining.

.. code:: python

import pandas as pd
from niaarm import Dataset


df = pd.read_csv('Abalone.csv')
# Preprocess the dataframe...
dataset = Dataset(df)
print(dataset)

**Output:**

.. code:: text

DATASET INFO:
Number of transactions: 4177
Number of features: 9

FEATURE INFO:

Sex Length Diameter Height Whole weight Shucked weight Viscera weight Shell weight Rings
dtype categorical float float float float float float float int
min_val N/A 0.075 0.055 0.0 0.002 0.001 0.0005 0.0015 1
max_val N/A 0.815 0.65 1.13 2.8255 1.488 0.76 1.005 29
categories [M, F, I] N/A N/A N/A N/A N/A N/A N/A N/A

Mining Association Rules
~~~~~~~~~~~~~~~~~~~~~~~~

Once the data has been loaded, we can run our mining algorithm.

The key component here is our :class:`~niaarm.niaarm.NiaARM` class, which inherits from NiaPy's
Problem class. It implements numerical association rule mining as a real valued, single
objective, unconstrained maximization problem (more details on this approach can be found
`here <https://link.springer.com/chapter/10.1007/978-3-030-68154-8_19>`__ and
`here <http://www.iztok-jr-fister.eu/static/publications/231.pdf>`__).
To summarize, for each solution vector a :class:`~niaarm.rule.Rule` is built,
and it's fitness is computed as a weighted sum of selected interest measures (metrics).
The rule is then appended to a list of rules, which can be accessed through the NiaARM class.

The :class:`~niaarm.niaarm.NiaARM` class takes the dataset's
dimension (calculated dimension of the optimization problem), features, and transactions
(all attributes of the :class:`~niaarm.dataset.Dataset` class) and the metrics selected for
the fitness function. The metrics can either be passed in as a sequence of strings, in
which case the weights of the metrics will be set to 1, or you can pass in a dict containing
pairs of ``{'metric_name': weight}``. You can also enable logging of fitness improvements
by setting the ``logging`` parameter to ``True``.

Bellow is a simple example of mining association rules on the Abalone dataset that we
loaded above. For this example we picked Differential Evolution, specifically DE/rand/1/bin,
which we'll be running for 50 iterations.
All available algorithms can be found in the `NiaPy documentation <https://niapy.org/en/stable/>`__.
We've selected the metrics: 'support', 'confidence', 'inclusion' and 'amplitude' for the fitness
function. We then sort the rules by fitness in descending order and export them to csv.

.. code:: python

from niaarm import NiaARM
from niapy.task import OptimizationType, Task
from niapy.algorithms.basic import DifferentialEvolution


# DE/rand/1/bin
algorithm = DifferentialEvolution(population_size=50,
differential_weight=0.8,
crossover_probability=0.9)

metrics = ('support', 'confidence', 'inclusion', 'amplitude')

problem = NiaARM(dataset.dimension, dataset.features, dataset.transactions, metrics, logging=True)
task = Task(problem, max_iters=50, optimization_type=OptimizationType.MAXIMIZATION)

algorithm.run(task)

problem.rules.sort(by='fitness', reverse=True)
problem.rules.to_csv('output.csv')

The mined rules are stored in ``problem.rules``, a :class:`~niaarm.rule_list.RuleList`. A
RuleList is a thin wrapper around a normal python list, with the added functionalities of
sorting by metric, exporting rules to csv, and properties for getting statistical data
about the rules. Printing a RuleList prints a statistical report of the rules in it.

**Output:**

.. code:: text

Fitness: 0.4421065111459649, Support: 0.00023940627244433804, Confidence: 1.0, Inclusion: 0.3333333333333333, Amplitude: 0.43485330497808217
Fitness: 0.5363319939110781, Support: 0.006942781900885803, Confidence: 0.9354838709677419, Inclusion: 0.5555555555555556, Amplitude: 0.6473457672201293
Fitness: 0.5395969006117709, Support: 0.1812305482403639, Confidence: 0.9895424836601308, Inclusion: 0.4444444444444444, Amplitude: 0.5431701261021447
Fitness: 0.5560783231641568, Support: 0.0023940627244433805, Confidence: 1.0, Inclusion: 0.6666666666666666, Amplitude: 0.5552525632655172
Fitness: 0.5711107256845077, Support: 0.5997127124730668, Confidence: 1.0, Inclusion: 0.3333333333333333, Amplitude: 0.3513968569316307
Fitness: 0.5970815767218225, Support: 0.8099114196791956, Confidence: 0.9955856386109476, Inclusion: 0.3333333333333333, Amplitude: 0.2494959152638132
Fitness: 0.6479501714015481, Support: 0.7455111323916687, Confidence: 0.9860671310956302, Inclusion: 0.3333333333333333, Amplitude: 0.5268890887855602
Fitness: 0.6497709183879634, Support: 0.9820445295666747, Confidence: 1.0, Inclusion: 0.4444444444444444, Amplitude: 0.17259469954073503
Fitness: 0.6522418829904134, Support: 0.9176442422791478, Confidence: 0.9422320550639135, Inclusion: 0.4444444444444444, Amplitude: 0.304646790174148
Fitness: 0.6600433108204055, Support: 0.9762987790280105, Confidence: 1.0, Inclusion: 0.5555555555555556, Amplitude: 0.1083189086980556
Fitness: 0.6625114159138297, Support: 0.9209959300933684, Confidence: 1.0, Inclusion: 0.3333333333333333, Amplitude: 0.39571640022861654
Fitness: 0.6748446186051374, Support: 0.9916207804644481, Confidence: 0.9916207804644481, Inclusion: 0.4444444444444444, Amplitude: 0.27169246904720923
Fitness: 0.6868285539707781, Support: 0.949006463969356, Confidence: 0.9927372902579514, Inclusion: 0.5555555555555556, Amplitude: 0.25001490610024923
Rules exported to output.csv


Mining Association Rules (Simplified)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In addition to the above interface, we provide a much simpler one in the form of a simple
function: :class:`~niaarm.mine.get_rules`. The function accepts a dataset object, an algorithm,
sequence or dict of metrics, a stopping condition (either ``max_evals`` or ``max_iters``) and
a ``logging`` flag. The algorithm can either be a NiaPy Algorithm instance, or a string,
in which case it's parameters can be passed in to the function as additional keyword arguments.

The :class:`~niaarm.mine.get_rules` function returns a named tuple of (rules, run_time),
where rules is a :class:`~niaarm.rule_list.RuleList` and run_time is the run time of
the algorithm in seconds.

The same example as above, using :class:`~niaarm.mine.get_rules`:

.. code:: python

from niaarm import get_rules
from niapy.algorithms.basic import DifferentialEvolution


# DE/rand/1/bin
algorithm = DifferentialEvolution(population_size=50,
differential_weight=0.8,
crossover_probability=0.9)

metrics = ('support', 'confidence', 'inclusion', 'amplitude')
rules, run_time = get_rules(dataset, algorithm, metrics, max_iters=50)
print(rules)
print(f'Run Time: {run_time:.4f} seconds')
rules.to_csv('output.csv')

**Output:**

.. code:: text

STATS:
Total rules: 1153
Average fitness: 0.47320577312454054
Average support: 0.3983325861836626
Average confidence: 0.7050696319555724
Average lift: 1.8269022321777044
Average coverage: 0.5791478590164908
Average consequent support: 0.6708142990119975
Average conviction: 80294763647830.92
Average amplitude: 0.33832710930158877
Average inclusion: 0.45109376505733834
Average interestingness: 0.4107718184209992
Average comprehensibility: 0.6225319999993354
Average netconf: 0.08165217509315073
Average Yule's Q: 0.2631267094311884
Average length of antecedent: 2.248048568950564
Average length of consequent: 1.8117953165654814
Run Time: 6.9498 seconds
Rules exported to output.csv

After the successful installation you are ready to run your first example.

Examples
--------

You can find usage examples `here <https://github.com/firefly-cpp/NiaARM/tree/main/examples>`_.
You can find the full code and usage examples `here <https://github.com/firefly-cpp/NiaARM/tree/main/examples>`_.
48 changes: 48 additions & 0 deletions niaarm/rule_list.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,54 @@ def sort(self, by='fitness', reverse=True):
"""
self.data.sort(key=lambda rule: getattr(rule, by), reverse=reverse)

def mean(self, metric):
"""Get mean value of metric.

Args:
metric (str): Metric.

Returns:
float: Mean value of metric in rule list.

"""
return np.mean([getattr(rule, metric) for rule in self.data])

def min(self, metric):
"""Get min value of metric.

Args:
metric (str): Metric.

Returns:
float: Min value of metric in rule list.

"""
return min(self.data, key=lambda x: getattr(x, metric))

def max(self, metric):
"""Get max value of metric.

Args:
metric (str): Metric.

Returns:
float: Max value of metric in rule list.

"""
return max(self.data, key=lambda x: getattr(x, metric))

def std(self, metric):
"""Get standard deviation of metric.

Args:
metric (str): Metric.

Returns:
float: Standard deviation of metric in rule list.

"""
return np.std([getattr(rule, metric) for rule in self.data])

def to_csv(self, filename):
"""Export rules to csv.

Expand Down
57 changes: 29 additions & 28 deletions poetry.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ python = "^3.7"
niapy = "^2.0.1"
numpy = [
{ version = "^1.21.5", python = ">=3.7,<3.11" },
{ version = "^1.22.0", python = "^3.11" }
{ version = "^1.22.3", python = "^3.11" }
]
pandas = [
{ version = "^1.3.5", python = ">=3.7.1,<3.8" },
Expand Down