Skip to content

Commit

Permalink
Merge pull request #35 from zStupan/main
Browse files Browse the repository at this point in the history
Update docs, Update dependencies, add methods to RuleList
  • Loading branch information
firefly-cpp authored Mar 14, 2022
2 parents ad99667 + 14f367e commit 534ed0a
Show file tree
Hide file tree
Showing 4 changed files with 260 additions and 32 deletions.
185 changes: 182 additions & 3 deletions docs/getting_started.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,16 +6,195 @@ This section is going to show you how to use the NiaARM framework.
Installation
------------

Firstly, install NiaARM package using the following command:
You can install NiaARM package using the following command:

.. code:: bash
pip install niaarm
Usage
-----

Loading Data
~~~~~~~~~~~~

In NiaARM, data loading is done via the :class:`~niaarm.dataset.Dataset` class.
There are two options for loading data:

**Option 1: Directly from file**

.. code:: python
from niaarm import Dataset
dataset = Dataset('Abalone.csv')
print(dataset)
**Option 2: From a pandas DataFrame (recommended)**

This option is recommended, as it allows you to preprocess the data before mining.

.. code:: python
import pandas as pd
from niaarm import Dataset
df = pd.read_csv('Abalone.csv')
# Preprocess the dataframe...
dataset = Dataset(df)
print(dataset)
**Output:**

.. code:: text
DATASET INFO:
Number of transactions: 4177
Number of features: 9
FEATURE INFO:
Sex Length Diameter Height Whole weight Shucked weight Viscera weight Shell weight Rings
dtype categorical float float float float float float float int
min_val N/A 0.075 0.055 0.0 0.002 0.001 0.0005 0.0015 1
max_val N/A 0.815 0.65 1.13 2.8255 1.488 0.76 1.005 29
categories [M, F, I] N/A N/A N/A N/A N/A N/A N/A N/A
Mining Association Rules
~~~~~~~~~~~~~~~~~~~~~~~~

Once the data has been loaded, we can run our mining algorithm.

The key component here is our :class:`~niaarm.niaarm.NiaARM` class, which inherits from NiaPy's
Problem class. It implements numerical association rule mining as a real valued, single
objective, unconstrained maximization problem (more details on this approach can be found
`here <https://link.springer.com/chapter/10.1007/978-3-030-68154-8_19>`__ and
`here <http://www.iztok-jr-fister.eu/static/publications/231.pdf>`__).
To summarize, for each solution vector a :class:`~niaarm.rule.Rule` is built,
and it's fitness is computed as a weighted sum of selected interest measures (metrics).
The rule is then appended to a list of rules, which can be accessed through the NiaARM class.

The :class:`~niaarm.niaarm.NiaARM` class takes the dataset's
dimension (calculated dimension of the optimization problem), features, and transactions
(all attributes of the :class:`~niaarm.dataset.Dataset` class) and the metrics selected for
the fitness function. The metrics can either be passed in as a sequence of strings, in
which case the weights of the metrics will be set to 1, or you can pass in a dict containing
pairs of ``{'metric_name': weight}``. You can also enable logging of fitness improvements
by setting the ``logging`` parameter to ``True``.

Bellow is a simple example of mining association rules on the Abalone dataset that we
loaded above. For this example we picked Differential Evolution, specifically DE/rand/1/bin,
which we'll be running for 50 iterations.
All available algorithms can be found in the `NiaPy documentation <https://niapy.org/en/stable/>`__.
We've selected the metrics: 'support', 'confidence', 'inclusion' and 'amplitude' for the fitness
function. We then sort the rules by fitness in descending order and export them to csv.

.. code:: python
from niaarm import NiaARM
from niapy.task import OptimizationType, Task
from niapy.algorithms.basic import DifferentialEvolution
# DE/rand/1/bin
algorithm = DifferentialEvolution(population_size=50,
differential_weight=0.8,
crossover_probability=0.9)
metrics = ('support', 'confidence', 'inclusion', 'amplitude')
problem = NiaARM(dataset.dimension, dataset.features, dataset.transactions, metrics, logging=True)
task = Task(problem, max_iters=50, optimization_type=OptimizationType.MAXIMIZATION)
algorithm.run(task)
problem.rules.sort(by='fitness', reverse=True)
problem.rules.to_csv('output.csv')
The mined rules are stored in ``problem.rules``, a :class:`~niaarm.rule_list.RuleList`. A
RuleList is a thin wrapper around a normal python list, with the added functionalities of
sorting by metric, exporting rules to csv, and properties for getting statistical data
about the rules. Printing a RuleList prints a statistical report of the rules in it.

**Output:**

.. code:: text
Fitness: 0.4421065111459649, Support: 0.00023940627244433804, Confidence: 1.0, Inclusion: 0.3333333333333333, Amplitude: 0.43485330497808217
Fitness: 0.5363319939110781, Support: 0.006942781900885803, Confidence: 0.9354838709677419, Inclusion: 0.5555555555555556, Amplitude: 0.6473457672201293
Fitness: 0.5395969006117709, Support: 0.1812305482403639, Confidence: 0.9895424836601308, Inclusion: 0.4444444444444444, Amplitude: 0.5431701261021447
Fitness: 0.5560783231641568, Support: 0.0023940627244433805, Confidence: 1.0, Inclusion: 0.6666666666666666, Amplitude: 0.5552525632655172
Fitness: 0.5711107256845077, Support: 0.5997127124730668, Confidence: 1.0, Inclusion: 0.3333333333333333, Amplitude: 0.3513968569316307
Fitness: 0.5970815767218225, Support: 0.8099114196791956, Confidence: 0.9955856386109476, Inclusion: 0.3333333333333333, Amplitude: 0.2494959152638132
Fitness: 0.6479501714015481, Support: 0.7455111323916687, Confidence: 0.9860671310956302, Inclusion: 0.3333333333333333, Amplitude: 0.5268890887855602
Fitness: 0.6497709183879634, Support: 0.9820445295666747, Confidence: 1.0, Inclusion: 0.4444444444444444, Amplitude: 0.17259469954073503
Fitness: 0.6522418829904134, Support: 0.9176442422791478, Confidence: 0.9422320550639135, Inclusion: 0.4444444444444444, Amplitude: 0.304646790174148
Fitness: 0.6600433108204055, Support: 0.9762987790280105, Confidence: 1.0, Inclusion: 0.5555555555555556, Amplitude: 0.1083189086980556
Fitness: 0.6625114159138297, Support: 0.9209959300933684, Confidence: 1.0, Inclusion: 0.3333333333333333, Amplitude: 0.39571640022861654
Fitness: 0.6748446186051374, Support: 0.9916207804644481, Confidence: 0.9916207804644481, Inclusion: 0.4444444444444444, Amplitude: 0.27169246904720923
Fitness: 0.6868285539707781, Support: 0.949006463969356, Confidence: 0.9927372902579514, Inclusion: 0.5555555555555556, Amplitude: 0.25001490610024923
Rules exported to output.csv
Mining Association Rules (Simplified)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In addition to the above interface, we provide a much simpler one in the form of a simple
function: :class:`~niaarm.mine.get_rules`. The function accepts a dataset object, an algorithm,
sequence or dict of metrics, a stopping condition (either ``max_evals`` or ``max_iters``) and
a ``logging`` flag. The algorithm can either be a NiaPy Algorithm instance, or a string,
in which case it's parameters can be passed in to the function as additional keyword arguments.

The :class:`~niaarm.mine.get_rules` function returns a named tuple of (rules, run_time),
where rules is a :class:`~niaarm.rule_list.RuleList` and run_time is the run time of
the algorithm in seconds.

The same example as above, using :class:`~niaarm.mine.get_rules`:

.. code:: python
from niaarm import get_rules
from niapy.algorithms.basic import DifferentialEvolution
# DE/rand/1/bin
algorithm = DifferentialEvolution(population_size=50,
differential_weight=0.8,
crossover_probability=0.9)
metrics = ('support', 'confidence', 'inclusion', 'amplitude')
rules, run_time = get_rules(dataset, algorithm, metrics, max_iters=50)
print(rules)
print(f'Run Time: {run_time:.4f} seconds')
rules.to_csv('output.csv')
**Output:**

.. code:: text
STATS:
Total rules: 1153
Average fitness: 0.47320577312454054
Average support: 0.3983325861836626
Average confidence: 0.7050696319555724
Average lift: 1.8269022321777044
Average coverage: 0.5791478590164908
Average consequent support: 0.6708142990119975
Average conviction: 80294763647830.92
Average amplitude: 0.33832710930158877
Average inclusion: 0.45109376505733834
Average interestingness: 0.4107718184209992
Average comprehensibility: 0.6225319999993354
Average netconf: 0.08165217509315073
Average Yule's Q: 0.2631267094311884
Average length of antecedent: 2.248048568950564
Average length of consequent: 1.8117953165654814
Run Time: 6.9498 seconds
Rules exported to output.csv
After the successful installation you are ready to run your first example.
Examples
--------

You can find usage examples `here <https://github.com/firefly-cpp/NiaARM/tree/main/examples>`_.
You can find the full code and usage examples `here <https://github.com/firefly-cpp/NiaARM/tree/main/examples>`_.
48 changes: 48 additions & 0 deletions niaarm/rule_list.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,54 @@ def sort(self, by='fitness', reverse=True):
"""
self.data.sort(key=lambda rule: getattr(rule, by), reverse=reverse)

def mean(self, metric):
"""Get mean value of metric.
Args:
metric (str): Metric.
Returns:
float: Mean value of metric in rule list.
"""
return np.mean([getattr(rule, metric) for rule in self.data])

def min(self, metric):
"""Get min value of metric.
Args:
metric (str): Metric.
Returns:
float: Min value of metric in rule list.
"""
return min(self.data, key=lambda x: getattr(x, metric))

def max(self, metric):
"""Get max value of metric.
Args:
metric (str): Metric.
Returns:
float: Max value of metric in rule list.
"""
return max(self.data, key=lambda x: getattr(x, metric))

def std(self, metric):
"""Get standard deviation of metric.
Args:
metric (str): Metric.
Returns:
float: Standard deviation of metric in rule list.
"""
return np.std([getattr(rule, metric) for rule in self.data])

def to_csv(self, filename):
"""Export rules to csv.
Expand Down
57 changes: 29 additions & 28 deletions poetry.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ python = "^3.7"
niapy = "^2.0.1"
numpy = [
{ version = "^1.21.5", python = ">=3.7,<3.11" },
{ version = "^1.22.0", python = "^3.11" }
{ version = "^1.22.3", python = "^3.11" }
]
pandas = [
{ version = "^1.3.5", python = ">=3.7.1,<3.8" },
Expand Down

0 comments on commit 534ed0a

Please sign in to comment.