-
Notifications
You must be signed in to change notification settings - Fork 1
Plotting
We implemented some basic plots that can help visualizing the results of protocols, you can achieve this using the module mq.plots.
In case of protocols, the plots implemented are a lineplot and a boxplot, but the inputs are basically the same:
- table_protocol: protocol table when the parameter return_type for the protocol function is "table"
- x: what to put in x axis, in string (name of column)
- y: what to put in y axis, in string (name of column)
- methods: list of methods (abbreviation, see here) in string format, but can be None for all methods
- title: title of the plot
- legend: boolean value for applying the legend
- save_path: path to save the plot, can be None
- plot_params: dictionary for some parameters of the corresponding plot (line_plot, boxplot)
In case you don't know how to use the protocol, see the protocol wiki. To use the protocol plotting functions, first you need to run the specified protocol with "table" in return_type parameter.
import mlquantify as mq
import pandas as pd
table = pd.read_csv("path_to_table.csv")
table = mq.utils.convert_columns_to_arrays(table)
table = mq.utils.round_protocol_df(table, 5)
For utilizing the plots, just see the usage below of each one of the plots for protocol and more.
The protocol boxplot is a boxplot for plotting usually the methods in the protocol by a error measure given:
mq.plots.protocol_boxplot(table_protocol=table,
x="QUANTIFIER",
y="ae", # you first need to run the app with the measure 'ae'
methods=["EMQ", "DyS"],
title="Absolute error of methods",
legend=True, # Default
save_path="box.pdf",
order="rank") # or None
The order parameter is used to show a boxplot with the methods ranked by mean, or just the way it is.
The protocol lineplot is a lineplot that is usually for the same purpose that the protocol boxplot, but in this case, we want to see how a error measure goes for each method along a y axis, until now we have the BATCH_SIZE
and the ALPHA
(with the pos_alpha indicated or not).`
In the case of the line plot, we have a parameter called pos_alpha. Although the table doesn't include an 'ALPHA' column, pos_alpha is used to determine which position in the prevalence arrays should be used for plotting. For binary datasets, the default value is set to 1, indicating the positive class, but this can be adjusted as needed.
The last plot avaliable in mlquantify is the cass distribultion plot, that is used to see the class distribution of a given value, in this case we use classification scores, but it can be anything you want.
The usage is simple, just do as follows:
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
features, target = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.3)
model = Random_ForestClassifier()
model.fit(X_train, y_train)
proba = model.predict_proba(X_test)
# Separating the scores for each class
scores_class = [scores[:, i] for i in range(scores.shape[1])]
mq.plots.class_distribution_plot(values=scores_class,
labels=y_test,
bins=30, # Default
title="Scores Distribution",
legend=True,
save_path="dist.pdf")
And the results would be