GitHub - mayabhat/doe_v2: Design of Experiments Pipeline

Design of Experiments (DOE) Package

This github repo hosts a simple code for designing experiments, building models, and seeing basic statistical information. The package is aimed to be used to direct experimental or computational work when it comes to exploring multidimensional spaces (ex. formulation chemistries). You may want to install the library with the following:

!pip install git+https://github.com/mayabhat/doe_v2

Once installed, you can import the doe module as follows:

from doe import doe

Creating Design Files

You can create xlsx files that contain the necessary experiments by creating a design object and specifying the experiment's desired directory, the independent variables, and the expected output variables.

design = doe.doe('test-dir', ['CompA', 'CompB'], ['Target1'])
design

test-dir
Report

Directory name with design xls: test-dir
Solutions: ['CompA', 'CompB']
CompA: 0.0 to 0.0
CompB: 0.0 to 0.0
Total Experiments: 0

Experimental Design

	CompA	CompB	Target1

There should now be a directory called test-dir if there was not already. To make the design, you will need to call the method make_design() as follows. make_design() takes in a design array and a range.

The design array can be manually generated or generated using the pyDOE2 functionalities expressed in this package. Designs can be of the form doe.bbdesign(), doe.ccdesign(), doe.fullfact, or more. See documentation to figure out how to make a design https://pythonhosted.org/pyDOE/.

pyDOE2 outputs designs from -1 to 1. Range takes center points and distance from center points to convert -1 to 1 to scaled values. The form is [[center point, deviation], [center point, deviation]] where the length of the list is the equivalent to the number of independent variables.

make_design() outputs an xlsx file with the experiments that must be completed. Target values should be inputed as the experiments are completed in the target column.

d = doe.ccdesign(2, face = 'cci')
range = [[3, 1.5], [10, 8]]
design.make_design(d, range)
design

test-dir
Report

Directory name with design xls: test-dir
Solutions: ['CompA', 'CompB']
CompA: 1.5 to 4.5
CompB: 2.0 to 1.8e+01
Total Experiments: 16

Experimental Design

	CompA	CompB
0	1.93934	4.343146
1	4.06066	4.343146
2	1.93934	15.656854
3	4.06066	15.656854
4	3.00000	10.000000
5	3.00000	10.000000
6	3.00000	10.000000
7	3.00000	10.000000
8	1.50000	10.000000
9	4.50000	10.000000
10	3.00000	2.000000
11	3.00000	18.000000
12	3.00000	10.000000
13	3.00000	10.000000
14	3.00000	10.000000
15	3.00000	10.000000

Building a Model with Results

To build a model with the obtained results, you will need to call the method fit(). Fit takes in a path to a results xlsx file. This file should be of the same format as the design xlsx with independent and dependent variable names as column headings.

fit() should output a stdout report containing information about the linear model's parameters, p values, r^2, residual plots, and more.

design.fit('results.xlsx')
design

test-dir
Report

Directory name with design xls: test-dir
Solutions: ['CompA', 'CompB']
CompA: 1.5 to 4.5
CompB: 2.0 to 1.8e+01
Total Experiments: 16

Data

	CompA	CompB	Target1
0	1.93934	4.343146	1
1	4.06066	4.343146	2
2	1.93934	15.656854	3
3	4.06066	15.656854	1
4	3.00000	10.000000	2
5	3.00000	10.000000	3
6	3.00000	10.000000	1
7	3.00000	10.000000	2
8	1.50000	10.000000	4
9	4.50000	10.000000	5
10	3.00000	2.000000	3
11	3.00000	18.000000	5
12	3.00000	10.000000	6
13	3.00000	10.000000	7
14	3.00000	10.000000	2
15	3.00000	10.000000	3

Model R Squared Value:
0.093

Optimum Prediction
{"['Target1']": 4.853553390593341, 'Compositions': array([ 1.5, 18. ])} for ['CompA', 'CompB']

P values less than 0.05 are significant

	coefficients	pvalues
features
1	-2.273667e+00	0.791929
CompA	1.298816e+00	0.775728
CompB	6.158471e-01	0.413186
CompA^2	6.661338e-16	1.000000
CompA CompB	-1.250000e-01	0.496331
CompB^2	-7.812500e-03	0.746163

Other Functionalities

You can make a parity plot to see how the model's predicted vs experimental values are. If the model is good, these values should fall on a 45 degree x = y line.

design.parity()

You can get an optimum prediction by either specifying maximize or minimize with the optimum() function. If you want to maximize the target, indicate maximize = True, if you want to minimize, indicate maximize = False.

design.optimum(maximize = True)

{'Compositions': array([ 1.5, 18. ]), "['Target1']": 4.853553390593341}

Residual plots can be displayed with residual_plots(). There are a few ways to tell if your model is not good enough. Theoretically, the histogram of your residuals should be normally distributed, and the data points on the qq plot should fall on the 45 degree angle line. If this is not the case, the model may need to be revisted.

design.residual_plots()

P values indicate which of the components in the model contribute the greatest to the variance in target values. P values under 0.05 are most significant, and those above 0.05 are not. If none are under 0.05, the next lowest can be considered statistically significant.

print(design.p_values())

CompB has the most significant P value of 0.41
P values are as follows where 1 is constant:
1: 	0.79
CompA: 	0.78
CompB: 	0.41
CompA^2: 	1.0
CompA CompB: 	0.5
CompB^2: 	0.75

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
README_files		README_files
doe		doe
LICENSE		LICENSE
README.ipynb		README.ipynb
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Design of Experiments (DOE) Package

Creating Design Files

Building a Model with Results

Other Functionalities

About

Releases

Packages

Languages

License

mayabhat/doe_v2

Folders and files

Latest commit

History

Repository files navigation

Design of Experiments (DOE) Package

Creating Design Files

Building a Model with Results

Other Functionalities

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages