Skip to content

Commit

Permalink
Merge pull request #415 from vanithakattumuri/main
Browse files Browse the repository at this point in the history
#2 updated the documentation of FAE.py
  • Loading branch information
udayRage authored May 22, 2024
2 parents 2205a1d + f767966 commit a0371a5
Show file tree
Hide file tree
Showing 2 changed files with 76 additions and 98 deletions.
161 changes: 69 additions & 92 deletions PAMI/frequentPattern/topk/FAE.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,13 @@
# Top - K is and algorithm to discover top frequent patterns in a transactional database.
#
# **Importing this algorithm into a python program**
# ---------------------------------------------------------
#
# import PAMI.frequentPattern.topK.FAE as alg
#
# iFile = 'sampleDB.txt'
#
# K = 2
#
# obj = alg.FAE(iFile, K)
#
# obj.mine()
Expand All @@ -31,9 +34,6 @@
#





__copyright__ = """
Copyright (C) 2021 Rage Uday Kiran
Expand All @@ -57,49 +57,29 @@

class FAE(_ab._frequentPatterns):
"""
:Description: Top - K is and algorithm to discover top frequent patterns in a transactional database.
:Reference: Zhi-Hong Deng, Guo-Dong Fang: Mining Top-Rank-K Frequent Patterns: DOI: 10.1109/ICMLC.2007.4370261 · Source: IEEE Xplore
https://ieeexplore.ieee.org/document/4370261
:param iFile: str :
Name of the Input file to mine complete set of frequent patterns
:param oFile: str :
Name of the output file to store complete set of frequent patterns
:param k: int :
User specified count of top frequent patterns
:param minimum: int :
Minimum number of frequent patterns to consider in analysis
:param sep: str :
This variable is used to distinguish items from one another in a transaction. The default seperator is tab space. However, the users can override their default separator.
About this algorithm
====================
:**Description**: Top - K is and algorithm to discover top frequent patterns in a transactional database.
:**Reference**: Zhi-Hong Deng, Guo-Dong Fang: Mining Top-Rank-K Frequent Patterns: DOI: 10.1109/ICMLC.2007.4370261 · Source: IEEE Xplore https://ieeexplore.ieee.org/document/4370261
:Attributes:
:**Parameters**: - **iFile** (*str or URL or dataFrame*) -- *Name of the Input file to mine complete set of frequent patterns.*
- **oFile** (*str*) -- *Name of the output file to store complete set of frequent patterns.*
- **k** (*int*) -- *User specified count of top frequent patterns.*
**minimum** (*int*) -- *Minimum number of frequent patterns to consider in analysis.*
**sep** (*str*) -- *This variable is used to distinguish items from one another in a transaction. The default seperator is tab space. However, the users can override their default separator.*
startTime : float
To record the start time of the mining process
:**Attributes**: - **startTime** (*float*) -- *To record the start time of the mining process.*
- **endTime** (*float*) -- *To record the completion time of the mining process.*
- **finalPatterns** (*dict*) -- *Storing the complete set of patterns in a dictionary variable.*
- **memoryUSS** (*float*) -- *To store the total amount of USS memory consumed by the program.*
- **memoryRSS** (*float*) -- *To store the total amount of RSS memory consumed by the program.*
endTime : float
To record the completion time of the mining process
Execution methods
=================
finalPatterns : dict
Storing the complete set of patterns in a dictionary variable
memoryUSS : float
To store the total amount of USS memory consumed by the program
memoryRSS : float
To store the total amount of RSS memory consumed by the program
finalPatterns : dict
it represents to store the patterns
**Methods to execute code on terminal**
-------------------------------------------
**Terminal command**
.. code-block:: console
Expand All @@ -109,45 +89,49 @@ class FAE(_ab._frequentPatterns):
Example Usage:
(.venv) $ python3 FAE.py sampleDB.txt patterns.txt 10
.. note:: k will be considered as count of top frequent patterns to consider in analysis
(.venv) $ python3 FAE.py sampleDB.txt patterns.txt 10.0
.. note:: k will be considered as count of top frequent patterns to consider in analysis.
**Calling from a python program**
**Importing this algorithm into a python program**
---------------------------------------------------------
.. code-block:: python
import PAMI.frequentPattern.topK.FAE as alg
import PAMI.frequentPattern.topK.FAE as alg
iFile = 'sampleDB.txt'
K = 2
obj = alg.FAE(iFile, K)
obj = alg.FAE(iFile, K)
obj.mine()
obj.mine()
topKFrequentPatterns = obj.getPatterns()
topKFrequentPatterns = obj.getPatterns()
print("Total number of Frequent Patterns:", len(topKFrequentPatterns))
print("Total number of Frequent Patterns:", len(topKFrequentPatterns))
obj.save(oFile)
obj.save(oFile)
Df = obj.getPatternInDataFrame()
Df = obj.getPatternInDataFrame()
memUSS = obj.getMemoryUSS()
memUSS = obj.getMemoryUSS()
print("Total Memory in USS:", memUSS)
print("Total Memory in USS:", memUSS)
memRSS = obj.getMemoryRSS()
memRSS = obj.getMemoryRSS()
print("Total Memory in RSS", memRSS)
print("Total Memory in RSS", memRSS)
run = obj.getRuntime()
run = obj.getRuntime()
print("Total ExecutionTime in seconds:", run)
print("Total ExecutionTime in seconds:", run)
Credits:
--------
The complete program was written by P.Likhitha under the supervision of Professor Rage Uday Kiran.
Credits
=======
The complete program was written by P. Likhitha and revised by Tarun Sreepada under the supervision of Professor Rage Uday Kiran.
"""

Expand All @@ -166,8 +150,7 @@ class FAE(_ab._frequentPatterns):

def _creatingItemSets(self):
"""
Storing the complete transactions of the database/input file in a database variable
Storing the complete transactions of the database/input file in a database variable
"""

self._Database = []
Expand Down Expand Up @@ -227,14 +210,15 @@ def _frequentOneItem(self):
return plist

def _save(self, prefix, suffix, tidSetI):
"""Saves the patterns that satisfy the periodic frequent property.
:param prefix: the prefix of a pattern
:type prefix: list
:param suffix: the suffix of a patterns
:type suffix: list
:param tidSetI: the timestamp of a patterns
:type tidSetI: list
"""
Saves the patterns that satisfy the periodic frequent property.
:param prefix: the prefix of a pattern
:type prefix: list
:param suffix: the suffix of a patterns
:type suffix: list
:param tidSetI: the timestamp of a patterns
:type tidSetI: list
"""

if prefix is None:
Expand Down Expand Up @@ -263,18 +247,16 @@ def _save(self, prefix, suffix, tidSetI):
return

def _Generation(self, prefix, itemSets, tidSets):
"""Equivalence class is followed and checks for the patterns generated for periodic-frequent patterns.
:param prefix: main equivalence prefix
:type prefix: periodic-frequent item or pattern
:param itemSets: patterns which are items combined with prefix and satisfying the periodicity
and frequent with their timestamps
:type itemSets: list
:param tidSets: timestamps of the items in the argument itemSets
:type tidSets: list
"""
"""
Equivalence class is followed and checks for the patterns generated for periodic-frequent patterns.
:param prefix: main equivalence prefix
:type prefix: periodic-frequent item or pattern
:param itemSets: patterns which are items combined with prefix and satisfying the periodicity and frequent with their timestamps
:type itemSets: list
:param tidSets: timestamps of the items in the argument itemSets
:type tidSets: list
"""
if len(itemSets) == 1:
i = itemSets[0]
tidI = tidSets[0]
Expand Down Expand Up @@ -302,6 +284,7 @@ def _Generation(self, prefix, itemSets, tidSets):
def _convert(self, value):
"""
to convert the type of user specified minSup value
:param value: user specified minSup value
:type value: int or float or str
:return: converted type
Expand All @@ -321,13 +304,13 @@ def _convert(self, value):
@deprecated("It is recommended to use 'mine()' instead of 'startMine()' for mining process. Starting from January 2025, 'startMine()' will be completely terminated.")
def startMine(self):
"""
Main function of the program
TopK Frequent pattern mining process will start from here
"""
self.mine()

def mine(self):
"""
Main function of the program
TopK Frequent pattern mining process will start from here
"""
self._startTime = _ab._time.time()
if self._iFile is None:
Expand Down Expand Up @@ -364,7 +347,6 @@ def getMemoryUSS(self):
Total amount of USS memory consumed by the mining process will be retrieved from this function
:return: returning USS memory consumed by the mining process
:rtype: float
"""

Expand All @@ -375,7 +357,6 @@ def getMemoryRSS(self):
Total amount of RSS memory consumed by the mining process will be retrieved from this function
:return: returning RSS memory consumed by the mining process
:rtype: float
"""

Expand All @@ -386,7 +367,6 @@ def getRuntime(self):
Calculating the total amount of runtime taken by the mining process
:return: returning total amount of runtime taken by the mining process
:rtype: float
"""

Expand All @@ -397,7 +377,6 @@ def getPatternsAsDataFrame(self):
Storing final frequent patterns in a dataframe
:return: returning frequent patterns in a dataframe
:rtype: pd.DataFrame
"""

Expand All @@ -413,7 +392,6 @@ def save(self, outFile):
Complete set of frequent patterns will be loaded in to an output file
:param outFile: name of the output file
:type outFile: file
"""
self._oFile = outFile
Expand All @@ -427,7 +405,6 @@ def getPatterns(self):
Function to send the set of frequent patterns after completion of the mining process
:return: returning frequent patterns
:rtype: dict
"""
return self._finalPatterns
Expand Down
13 changes: 7 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,12 @@ PAttern MIning (PAMI) is a Python library containing several algorithms to disco
***
# Recent Updates

- Version 2024.04.1.2: Introduced two new frequent subgraph mining algorithms, namely gspan and TKG. Optimized the frequent pattern mining algorithms.
- **Version 2024.05.01:**
In this latest version, the following updates have been made:
- Included two new algorithms, **Gspan and TKG**, for frequent subgraph mining.
- Updated three Synthetic Data Generator, **transactional database, temporal database, and geo-referenced transactional database**.
- Optimized the following frequent pattern mining algorithms: **Apriori, Aprioribitset, ECLAT, ECLATbitset, FPGrowth, and CHARM**.
- startMine() function has been deprecated to mine() function.

Total number of algorithms: 83

Expand Down Expand Up @@ -142,10 +147,9 @@ from PAMI.frequentPattern.basic import FPGrowth as alg
fileURL = "https://u-aizu.ac.jp/~udayrage/datasets/transactionalDatabases/Transactional_T10I4D100K.csv"
minSup=300
obj = alg.FPGrowth(iFile=fileURL, minSup=minSup, sep='\t')
obj.mine()
obj.startMine()
obj.save('frequentPatternsAtMinSupCount300.txt')
frequentPatternsDF= obj.getPatternsAsDataFrame()

print('Total No of patterns: ' + str(len(frequentPatternsDF))) #print the total number of patterns
print('Runtime: ' + str(obj.getRuntime())) #measure the runtime
print('Memory (RSS): ' + str(obj.getMemoryRSS()))
Expand Down Expand Up @@ -225,7 +229,6 @@ We invite and encourage all community members to contribute, report bugs, fix bu

***
# Tutorials

### 0. Association Rule Mining

| Basic |
Expand All @@ -234,8 +237,6 @@ We invite and encourage all community members to contribute, report bugs, fix bu
| Lift <a target="_blank" href="https://colab.research.google.com/github/UdayLab/PAMI/blob/main/notebooks/associationRules/basic/lift.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a> |
| Leverage <a target="_blank" href="https://colab.research.google.com/github/UdayLab/PAMI/blob/main/notebooks/associationRules/basic/leverage.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a> |



### 1. Pattern mining in binary transactional databases

#### 1.1. Frequent pattern mining: [Sample](https://udaylab.github.io/PAMI/frequentPatternMining.html)
Expand Down

0 comments on commit a0371a5

Please sign in to comment.