Chemical Compounds and The Age of Carpets Using SAS

Testing carpet samples for chemical compounds to determine their age using SAS. I use logistic regression in SAS Studio with a dataset from "Age Estimation of Old Carpets Based on Cystine and Cysteic Acid Content."

Getting Started

To begin the project, you'll need to download the following dataset: Age Estimation of Old Carpets Based on Cystine and Cysteic Acid Content.

Source: J. Csapo, Z. Csapo-Kiss, T.G. Martin, S. Folestad, O. Orwar, A. Tivesten, and S. Nemethy (1995). "Age Estimation of Old Carpets Based on Cystine and Cysteic Acid Content," Analytica Chimica Acta, Vol. 300, pp. 313-320.

Prerequisites

You will need to download SAS in order to run the code. More details on how to install SAS on a Windows machine are here.

Creating the Q-Q Plot

Our covariates are the four organic compounds--Cysteic Acid, Cystine, Methionine, and Tyrosine. The first step I did was creating a QQ-plot in order to see if our residuals follow a normal pattern.

proc reg DATA=dg.carpet plots(only)=QQPLot;
model age=cys_acid cys met tyr;
ods select QQPlot;
run;

Make sure to import the data file correctly before creating your Q-Q plot. The plot should look like this:

Plotting the Residuals

Although it is lightly tailed on both ends, the data seems to be normally distributed, which is what we want. To further rectify that there is a linear relationship, we can plot the residuals, which are the differences between our observed and predicted values. Ideally, we want our plot of the residuals to look totally random, even if there are symmetrically distributed clouds of points.

data subset;
set dis2.carpet;
if age=. then delete;

option obs=1000;

proc corr data=subset plots=matrix;
var age cys_acid cys met tyr;

option obs=1000;

proc reg data=subset;
model age=cys_acid cys met tyr;
output out=dis2.carpet;

please note there is a typo in line one, the first statement should read 'libname' to associate the chemicals' library with a libref. Sorry!

Your model should look like the image below:

Findings

In the case that there is a distinct pattern, outliers, or shape, we can further improve themodel. We can see in Figure 2, I’ve modelled the residual plots for each of our four covariates respectively. There doesn’t seem to be a distinct pattern so we can check off these assumptions: the variance must have a mean of and the variance of the error terms must be constant.

Final Data Summary

Doing a data summary, we can take note that cysteic acid has the smallest p-value and thus a minimal effect on the age of our wood samples. In any case for any of the four covariates, you would fail to reject a null hypothesis for alpha equals 0.01. All of the compounds have F-values less than 1%.

proc contents data = carpet;


proc reg data = carpet; 
model age = cys;

proc reg data = carpet;
model age = met;

Your output should look like this procedure for the regression of our model. Make sure to accompany the PROC REG statement with a MODEL statement to specify the regression models.

Checking With A Log Transformation

Our adjusted coefficient of determination is approximately 0.9946—implying that 99.46% of our Cysteic Acid, Cystine, Methionine, and Tyrosine’s variation can be explained by our linear model. Though it isn’t quite 1, the regression predictions almost perfectly fit the data, so we’re on the right track. I tried playing around and doing a logarithmic transformation on age but didn’t really see a difference (i.e, expecting a tighter QQ-plot for the data but instead getting Figure 3). For this reason, I would suggest sticking to the first model since we would have a coefficient of determination closest to one and better results overall.

data work.transform;
set WORK.IMPORT;
log_age=log(age);
log_cys_acid=log(cys_acid);
log_cys=l0g(cys);
run;

The Q-Q plot for the log transformed age category:

We can assess the quality of the fit with the 'Fit Diagnostic' function.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README.md		README.md
code.sas		code.sas

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Chemical Compounds and The Age of Carpets Using SAS

Getting Started

Prerequisites

Creating the Q-Q Plot

Plotting the Residuals

Findings

Final Data Summary

Checking With A Log Transformation

Thank you for reading!

About

Releases

Packages

Languages

AneesahG/-Chemical-Compounds-Age-SAS

Folders and files

Latest commit

History

Repository files navigation

Chemical Compounds and The Age of Carpets Using SAS

Getting Started

Prerequisites

Creating the Q-Q Plot

Plotting the Residuals

Findings

Final Data Summary

Checking With A Log Transformation

Thank you for reading!

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages