Skip to content

Analysis and Prediction of defects in software program using R.

Notifications You must be signed in to change notification settings

Ashok314/Rnotebook

Repository files navigation

Rnotebook

Open file "metric_notebook.Rmd" in R Studio

Report as follows:


title: "R Notebook- Analysis and prediction on NASA KC1" output: html_notebook: default bibliography: bibliography.bib

Resource • NASA KC1

Setup : Rtool and Rstudio

Analysis

Finding 1

Load data

  library("tidyverse")
data <- read_csv("KC1_product_module_metrics.csv")

view data

data

Cleaning data

data[is.na(data)] = 0
data

Structure and data types of the data

str(data)

summary of the data

summary provides the various statistical information about the data like mean, median, min , max values.

summary(data)

Plotting Cyclomatic Complexity

ggplot(data = data) + 
  geom_bar(mapping = aes(x = CYCLOMATIC_COMPLEXITY))

Relation among cyclomatic Complexity and Error Count

boxplot(CYCLOMATIC_COMPLEXITY~ERROR_COUNT, data)
summary(data$CYCLOMATIC_COMPLEXITY)
summary(data$ERROR_COUNT)

Analysis by filtering the CC value

for CC < 45

cc_restricted <- data %>% 
  filter(CYCLOMATIC_COMPLEXITY <= 45)

Plot

 boxplot(CYCLOMATIC_COMPLEXITY~ERROR_COUNT, ylab="CC_45",cc_restricted)
summary(cc_restricted$CYCLOMATIC_COMPLEXITY)
summary(cc_restricted$ERROR_COUNT)

for CC < 16

cc_restricted <- data %>% 
  filter(CYCLOMATIC_COMPLEXITY < 16)

Plot

 boxplot(CYCLOMATIC_COMPLEXITY~ERROR_COUNT, ylab="CC_lt_16",cc_restricted)
summary(cc_restricted$CYCLOMATIC_COMPLEXITY)
summary(cc_restricted$ERROR_COUNT)

for CC < 8

cc_restricted <- data %>% 
  filter(CYCLOMATIC_COMPLEXITY < 8)

Plot

 boxplot(CYCLOMATIC_COMPLEXITY~ERROR_COUNT, ylab="CC_lt_8",cc_restricted)
summary(cc_restricted$CYCLOMATIC_COMPLEXITY)
summary(cc_restricted$ERROR_COUNT)

for CC < 4

cc_restricted <- data %>% 
  filter(CYCLOMATIC_COMPLEXITY < 4)

Plot

 boxplot(CYCLOMATIC_COMPLEXITY~ERROR_COUNT, ylab="CC_lt_4",cc_restricted)

summary(cc_restricted$CYCLOMATIC_COMPLEXITY)
summary(cc_restricted$ERROR_COUNT)

for CC < 2

cc_restricted <- data %>% 
  filter(CYCLOMATIC_COMPLEXITY < 2)

Plot

 boxplot(CYCLOMATIC_COMPLEXITY~ERROR_COUNT, ylab="CC_lt_2",cc_restricted)
summary(cc_restricted$CYCLOMATIC_COMPLEXITY)
summary(cc_restricted$ERROR_COUNT)

Result of Analysis

From above analysis we can see the relation between Cyclomatic Complexity of module and and the error count in the same module. Above plots and the summary of the plots shows changing the error count on varying cyclomatic complexity.

Finding 2

Relation of LOC_BLANK, LOC_EXECUTABLE, LOC_CODE_AND_COMMENT,LOC_COMMENT with ERROR_COUNT

Same procedure as finding_1.

Plotting Each variables with ERROR_COUNT Analysis of the summary

Sample Run

result_1 <-boxplot(LOC_EXECUTABLE~ERROR_COUNT, ylab="loc blank",data)
summary(data$LOC_EXECUTABLE)
summary(data$ERROR_COUNT)
data_1 <- data %>% 
  filter(LOC_EXECUTABLE < 100)
result_1 <-boxplot(LOC_EXECUTABLE~ERROR_COUNT, ylab="loc blank",data_1)
summary(data_1$LOC_EXECUTABLE)
summary(data_1$ERROR_COUNT)

Similarly for rest of the variables, LOC_BLANK, LOC_CODE_AND_COMMENT,LOC_COMMENT.

Result of finding 2

There is less relation among these variables and ERROR_COUNT as compared to Cyclomatic complexity and Design Complexity values. These values has small affect on the defect Count of the module.

Prediction

REF: https://www.rdocumentation.org/packages/car/versions/3.0-8/topics/Predict

Linear Model


model_lm <- lm(ERROR_COUNT~CYCLOMATIC_COMPLEXITY, data)

summary(model_lm)

New predicted Values And summary

data$pred<-predict(model_lm,newdata = data)# These are the predicted values
str(data$pred)

summary of predicted value

summary(data$pred)

Install Package for Evaluation of the Model #"mae": mean absolute error #"mse": mean squared error, #"rmse": root mean squared error

library(DMwR)
regr.eval(data$CYCLOMATIC_COMPLEXITY,data$pred)

plot(model_lm$residuals)

Prediction on new value of CC

predict(model_lm,newdata = data.frame(CYCLOMATIC_COMPLEXITY=c(5,30,45)))

Results

Result of the prediction can be seen with less accuracy. With altering the parameters in the lm accuracy may be improved.

References

http://promise.site.uottawa.ca/SERepository/datasets-page.html

About

Analysis and Prediction of defects in software program using R.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published