Artifician is an event driven library developed to simplify and speed up the process of preparation of the datasets for Artificial Intelligence models.
- python v3.6 or later
Binary installers for the latest released version are available at the Python Package Index (PyPI) and on Conda
# or PyPI
pip install artifician
# conda
conda install -c plato_solutions artifician
Please visit Aritfician Docs
from artifician.dataset import *
from artifician.feature_definition import *
from artifician.processors.normalizer import *
def extract_domain_name(sample):
"""function for extracting the path from the given URL"""
domain_name = sample.split("//")[-1].split('/')[0]
return domain_name
input_data = ['https://www.google.com/', 'https://www.youtube.com/']
dataset = Dataset() # initializing dataset object
url_domain = FeatureDefinition(extract_domain_name, dataset) # initializing feature_definition and passing extractor function name as a parameter and subscribing it to dataset
normalizer = Normalizer(PropertiesNormalizer(), url_domain delimiter = {'delimiter': ["."]}) # Initializing normalizer (processor) and passing properties normalizer as a parameter and subscribing it to url_domain
""" Now we are all set to go, all we have to do is call add_samples method on the dataset object and pass the input data
after calling the add_samples, url_domain will start its execution and extract the data using extract_domain_name function, as soon url_domain
feature is processed normalizer will start it execution and furthur is will process the data extracted by url_domain. The processed data is then
passed back to the dataset. Following diagram will make it more clear for you. """
prepared_data = dataset.add_samples(input_data)
print(prepared_data)
Output
0 | 1 | |
---|---|---|
0 | https://www.google.com/ | [(www, 0), (google, 1), (com, 2)] |
1 | https://www.youtube.com/ | [(www, 0), (youtube, 1), (com, 2)] |