Skip to content

Plotting

Luiz Fernando edited this page Aug 15, 2024 · 9 revisions

We implemented some basic plots that can help visualizing the results of protocols, you can achieve this using the module mq.plots.

In case of protocols, the plots implemented are a lineplot and a boxplot, but the inputs are basically the same:

  • table_protocol: protocol table when the parameter return_type for the protocol function is "table"
  • x: what to put in x axis, in string (name of column)
  • y: what to put in y axis, in string (name of column)
  • methods: list of methods (abbreviation, see here) in string format, but can be None for all methods
  • title: title of the plot
  • legend: boolean value for applying the legend
  • save_path: path to save the plot, can be None
  • plot_params: dictionary for some parameters of the corresponding plot (line_plot, boxplot)

In case you don't know how to use the protocol, see the protocol wiki. To use the protocol plotting functions, first you need to run the specified protocol with "table" in return_type parameter.

Basic usage:

import mlquantify as mq
import pandas as pd

table = pd.read_csv("path_to_table.csv")
table = mq.utils.convert_columns_to_arrays(table)
table = mq.utils.round_protocol_df(table, 5)

For utilizing the plots, just see the usage below of each one of the plots for protocol and more.


Protocol Boxplot

The protocol boxplot is a boxplot for plotting usually the methods in the protocol by a error measure given:

mq.plots.protocol_boxplot(table_protocol=table,
                          x="QUANTIFIER",
                          y="ae", # you first need to run the app with the measure 'ae'
                          methods=["EMQ", "DyS"],
                          title="Absolute error of methods",
                          legend=True, # Default
                          save_path="box.pdf",
                          order="rank") # or None

The order parameter is used to show a boxplot with the methods ranked by mean, or just the way it is.

box

Protocol Lineplot

The protocol lineplot is a lineplot that is usually for the same purpose that the protocol boxplot, but in this case, we want to see how a error measure goes for each method along a y axis, until now we have the BATCH_SIZE and the ALPHA (with the pos_alpha indicated or not).` In the case of the line plot, we have a parameter called pos_alpha. Although the table doesn't include an 'ALPHA' column, pos_alpha is used to determine which position in the prevalence arrays should be used for plotting. For binary datasets, the default value is set to 1, indicating the positive class, but this can be adjusted as needed.

line


Class Distribution Plot

The last plot avaliable in mlquantify is the cass distribultion plot, that is used to see the class distribution of a given value, in this case we use classification scores, but it can be anything you want.

The usage is simple, just do as follows:

from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

features, target = load_breast_cancer(return_X_y=True)

X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.3)

model = Random_ForestClassifier()
model.fit(X_train, y_train)

proba = model.predict_proba(X_test)

# Separating the scores for each class
scores_class = [scores[:, i] for i in range(scores.shape[1])]

mq.plots.class_distribution_plot(values=scores_class,
                                 labels=y_test,
                                 bins=30, # Default
                                 title="Scores Distribution",
                                 legend=True,
                                 save_path="dist.pdf")

And the results would be

dist

Clone this wiki locally