This github repo hosts a simple code for designing experiments, building models, and seeing basic statistical information. The package is aimed to be used to direct experimental or computational work when it comes to exploring multidimensional spaces (ex. formulation chemistries). You may want to install the library with the following:
!pip install git+https://github.com/mayabhat/doe_v2
Once installed, you can import the doe module as follows:
from doe import doe
You can create xlsx files that contain the necessary experiments by creating a design object and specifying the experiment's desired directory, the independent variables, and the expected output variables.
design = doe.doe('test-dir', ['CompA', 'CompB'], ['Target1'])
design
test-dir
Report
Directory name with design xls: test-dir
Solutions: ['CompA', 'CompB']
CompA: 0.0 to 0.0
CompB: 0.0 to 0.0
Total Experiments: 0
Experimental Design
CompA | CompB | Target1 |
---|
There should now be a directory called test-dir if there was not already. To make the design, you will need to call the method make_design() as follows. make_design() takes in a design array and a range.
The design array can be manually generated or generated using the pyDOE2 functionalities expressed in this package. Designs can be of the form doe.bbdesign(), doe.ccdesign(), doe.fullfact, or more. See documentation to figure out how to make a design https://pythonhosted.org/pyDOE/.
pyDOE2 outputs designs from -1 to 1. Range takes center points and distance from center points to convert -1 to 1 to scaled values. The form is [[center point, deviation], [center point, deviation]] where the length of the list is the equivalent to the number of independent variables.
make_design() outputs an xlsx file with the experiments that must be completed. Target values should be inputed as the experiments are completed in the target column.
d = doe.ccdesign(2, face = 'cci')
range = [[3, 1.5], [10, 8]]
design.make_design(d, range)
design
test-dir
Report
Directory name with design xls: test-dir
Solutions: ['CompA', 'CompB']
CompA: 1.5 to 4.5
CompB: 2.0 to 1.8e+01
Total Experiments: 16
Experimental Design
CompA | CompB | Target1 | |
---|---|---|---|
0 | 1.93934 | 4.343146 | |
1 | 4.06066 | 4.343146 | |
2 | 1.93934 | 15.656854 | |
3 | 4.06066 | 15.656854 | |
4 | 3.00000 | 10.000000 | |
5 | 3.00000 | 10.000000 | |
6 | 3.00000 | 10.000000 | |
7 | 3.00000 | 10.000000 | |
8 | 1.50000 | 10.000000 | |
9 | 4.50000 | 10.000000 | |
10 | 3.00000 | 2.000000 | |
11 | 3.00000 | 18.000000 | |
12 | 3.00000 | 10.000000 | |
13 | 3.00000 | 10.000000 | |
14 | 3.00000 | 10.000000 | |
15 | 3.00000 | 10.000000 |
To build a model with the obtained results, you will need to call the method fit(). Fit takes in a path to a results xlsx file. This file should be of the same format as the design xlsx with independent and dependent variable names as column headings.
fit() should output a stdout report containing information about the linear model's parameters, p values, r^2, residual plots, and more.
design.fit('results.xlsx')
design
test-dir
Report
Directory name with design xls: test-dir
Solutions: ['CompA', 'CompB']
CompA: 1.5 to 4.5
CompB: 2.0 to 1.8e+01
Total Experiments: 16
Data
CompA | CompB | Target1 | |
---|---|---|---|
0 | 1.93934 | 4.343146 | 1 |
1 | 4.06066 | 4.343146 | 2 |
2 | 1.93934 | 15.656854 | 3 |
3 | 4.06066 | 15.656854 | 1 |
4 | 3.00000 | 10.000000 | 2 |
5 | 3.00000 | 10.000000 | 3 |
6 | 3.00000 | 10.000000 | 1 |
7 | 3.00000 | 10.000000 | 2 |
8 | 1.50000 | 10.000000 | 4 |
9 | 4.50000 | 10.000000 | 5 |
10 | 3.00000 | 2.000000 | 3 |
11 | 3.00000 | 18.000000 | 5 |
12 | 3.00000 | 10.000000 | 6 |
13 | 3.00000 | 10.000000 | 7 |
14 | 3.00000 | 10.000000 | 2 |
15 | 3.00000 | 10.000000 | 3 |
Model R Squared Value:
0.093
Optimum Prediction
{"['Target1']": 4.853553390593341, 'Compositions': array([ 1.5, 18. ])} for ['CompA', 'CompB']
P values less than 0.05 are significant
coefficients | pvalues | |
---|---|---|
features | ||
1 | -2.273667e+00 | 0.791929 |
CompA | 1.298816e+00 | 0.775728 |
CompB | 6.158471e-01 | 0.413186 |
CompA^2 | 6.661338e-16 | 1.000000 |
CompA CompB | -1.250000e-01 | 0.496331 |
CompB^2 | -7.812500e-03 | 0.746163 |
You can make a parity plot to see how the model's predicted vs experimental values are. If the model is good, these values should fall on a 45 degree x = y line.
design.parity()
You can get an optimum prediction by either specifying maximize or minimize with the optimum() function. If you want to maximize the target, indicate maximize = True, if you want to minimize, indicate maximize = False.
design.optimum(maximize = True)
{'Compositions': array([ 1.5, 18. ]), "['Target1']": 4.853553390593341}
Residual plots can be displayed with residual_plots(). There are a few ways to tell if your model is not good enough. Theoretically, the histogram of your residuals should be normally distributed, and the data points on the qq plot should fall on the 45 degree angle line. If this is not the case, the model may need to be revisted.
design.residual_plots()
P values indicate which of the components in the model contribute the greatest to the variance in target values. P values under 0.05 are most significant, and those above 0.05 are not. If none are under 0.05, the next lowest can be considered statistically significant.
print(design.p_values())
CompB has the most significant P value of 0.41
P values are as follows where 1 is constant:
1: 0.79
CompA: 0.78
CompB: 0.41
CompA^2: 1.0
CompA CompB: 0.5
CompB^2: 0.75