Skip to content
This repository has been archived by the owner on Jun 9, 2023. It is now read-only.

Update dataXplore #77

Merged
merged 7 commits into from
Jun 11, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -220,7 +220,7 @@ As before, CLAI skill will not execute without your permission unless `auto` mod

## :robot: Want to build your own skills?

[`fixit`](clai/server/plugins/fix_bot)   [`nlc2cmd`](clai/server/plugins/nlc2cmd)   [`helpme`](clai/server/plugins/helpme)   [`howdoi`](clai/server/plugins/howdoi)   [`man page explorer`](clai/server/plugins/manpage_agent)   [`ibmcloud`](clai/server/plugins/ibmcloud)   [`tellina`](clai/server/plugins/tellina)
[`fixit`](clai/server/plugins/fix_bot)   [`nlc2cmd`](clai/server/plugins/nlc2cmd)   [`helpme`](clai/server/plugins/helpme)   [`howdoi`](clai/server/plugins/howdoi)   [`man page explorer`](clai/server/plugins/manpage_agent)   [`ibmcloud`](clai/server/plugins/ibmcloud)   [`tellina`](clai/server/plugins/tellina)   [`dataXplore`](clai/server/plugins/dataxplore)

Project CLAI is intended to rekindle the spirit of AI softbots by providing a plug-and-play framework and simple interface abstractions to the Bash and its underlying operating system. Developers can access the command line through a simple `sense-act` API for rapid prototyping of newer and more complex AI capabilities.

Expand Down
38 changes: 26 additions & 12 deletions clai/server/plugins/dataxplore/README.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,35 @@
# DataXplore
# dataXplore

`Data-Analytics` `NLP` `Support` `Plots`
`Analytics` `NLP` `Support`

Data Exploration is one of the well versed topics in the course of Data Analyst/ Scientist and Researcher. For a given data based modeling, one needs to know a) What are the attribute one is looking b) how the attribute can be used for top level analysis. DataXplore Plugin in Project (CLAI) Command Line AI implements few primary tasks on the command line including visualization on a certain terminal. A goto plugin like dataxplore comes handy for myriad of data analysis task for the above effort.
Data science has become one of the most popular real-world applications of ML. This skills is targeted specifically
toward making the CLI easier to adopt and navigate for data scientists.

## Implementation

Command usage:- clai dataxplore function csvfilelocation
e.g.
1) `>> clai dataxplore summarize air_quality.csv`, when this command is executed one can view the summary of the give data file.
2) `>> clai dataxplore plot air_quality.csv`, when this command is executed one can view the plot of the given data file.
#### Execution on test dataset.
![figure1](https://github.com/madhavanpallan/clai/blob/master/clai/server/plugins/dataxplore/figures/dx_summarize_plot_test.png)
#### Execution on air quality dataset.
![figure2](https://github.com/madhavanpallan/clai/blob/master/clai/server/plugins/dataxplore/figures/dx_summarize_plot_airQuality.png)
The current version of the skill provides two functionalities: **summarize** and **plot**.
"Summarize" utilizes the [describe function](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.describe.html) of the popular
[Pandas library](https://pandas.pydata.org/pandas-docs/stable/index.html) to
generate a human-readable summary of a specified CSV file; this functionality is intended to allow data scientists
to quickly examine any data file right from the command line. "Plot" builds on the plot function provided by
[MatPlotLib](https://ieeexplore.ieee.org/document/4160265),
and the Pillow library [[link](https://pillow.readthedocs.io/en/stable/index.html)]
[[link](https://www.pythonware.com/products/pil/)]
to generate a plot of a given CSV file. Such functionalities illustrate basic use cases
of how CLAI can be used as a CLI assistant for data science.

## Example Usage

`>> clai "dataxplore" summarize air_quality.csv` to view the summary of the give data file.

`>> clai "dataxplore" plot air_quality.csv` to view a plot of the given data file.

![figure1](https://www.dropbox.com/s/lin379uw2nc0ts9/dx_summarize_plot_test.png?raw=1)

![figure2](https://www.dropbox.com/s/j4xxme9eaj92mh5/dx_summarize_plot_airQuality.png?raw=1)

Both dataset are courtesy of [pandas](http://pandas.pydata.org/).

*** Both dataset are courtesy from the [pandas](http://pandas.pydata.org/) website.
## [xkcd](https://uni.xkcd.com/)
The contents of any one panel are dependent on the contents of every panel including itself. The graph of panel dependencies is complete and bidirectional, and each node has a loop. The mouseover text has two hundred and forty-two characters.

Expand Down
14 changes: 7 additions & 7 deletions clai/server/plugins/dataxplore/dataxplore.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,27 +34,27 @@ def get_next_action(self, state: State) -> Action:
logger.info("Command passed in dataxplore: " + command)
commandStr = str(command)
commandTokenized = commandStr.split(" ")
if len(commandTokenized) == 3:
if commandTokenized[1] == 'summarize':
fileName = commandTokenized[2]
if len(commandTokenized) == 2:
if commandTokenized[0] == 'summarize':
fileName = commandTokenized[1]
csvFile = fileName.split(".")
if len(csvFile) == 2:
if csvFile[1] == 'csv':
path = os.path.abspath(commandTokenized[2])
path = os.path.abspath(fileName)
data = pd.read_csv(path)
df = pd.DataFrame(data)
response = df.describe().to_string()
else:
response = "We currently support only csv files. Please, Try >> clai dataxplore summarize csvFileLocation "
else:
response = "Not a supported file format. Please, Try >> clai dataxplore summarize csvFileLocation "
elif commandTokenized[1] == 'plot':
fileName = commandTokenized[2]
elif commandTokenized[0] == 'plot':
fileName = commandTokenized[1]
csvFile = fileName.split(".")
if len(csvFile) == 2:
if csvFile[1] == 'csv':
plt.close('all')
path = os.path.abspath(commandTokenized[2])
path = os.path.abspath(fileName)
data = pd.read_csv(path,index_col=0, parse_dates=True)
data.plot()
plt.savefig('/tmp/claifigure.png')
Expand Down
Binary file not shown.
Binary file not shown.
2 changes: 1 addition & 1 deletion clai/server/plugins/dataxplore/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
pandas==1.0.3
numpy==1.17.2
matplotlib==3.2.1
Pillow==2.2.1
Pillow==7.1.1
imageloader==0.0.5