Rnotebook

Open file "metric_notebook.Rmd" in R Studio

Report as follows:

title: "R Notebook- Analysis and prediction on NASA KC1" output: html_notebook: default bibliography: bibliography.bib

Resource • NASA KC1

Setup : Rtool and Rstudio

Analysis

Finding 1

Load data

  library("tidyverse")
data <- read_csv("KC1_product_module_metrics.csv")

view data

data

Cleaning data

data[is.na(data)] = 0
data

Structure and data types of the data

str(data)

summary of the data

summary provides the various statistical information about the data like mean, median, min , max values.

summary(data)

Plotting Cyclomatic Complexity

ggplot(data = data) + 
  geom_bar(mapping = aes(x = CYCLOMATIC_COMPLEXITY))

Relation among cyclomatic Complexity and Error Count

boxplot(CYCLOMATIC_COMPLEXITY~ERROR_COUNT, data)

summary(data$CYCLOMATIC_COMPLEXITY)
summary(data$ERROR_COUNT)

Analysis by filtering the CC value

for CC < 45

cc_restricted <- data %>% 
  filter(CYCLOMATIC_COMPLEXITY <= 45)

Plot

 boxplot(CYCLOMATIC_COMPLEXITY~ERROR_COUNT, ylab="CC_45",cc_restricted)

summary(cc_restricted$CYCLOMATIC_COMPLEXITY)
summary(cc_restricted$ERROR_COUNT)

for CC < 16

cc_restricted <- data %>% 
  filter(CYCLOMATIC_COMPLEXITY < 16)

Plot

 boxplot(CYCLOMATIC_COMPLEXITY~ERROR_COUNT, ylab="CC_lt_16",cc_restricted)

summary(cc_restricted$CYCLOMATIC_COMPLEXITY)
summary(cc_restricted$ERROR_COUNT)

for CC < 8

cc_restricted <- data %>% 
  filter(CYCLOMATIC_COMPLEXITY < 8)

Plot

 boxplot(CYCLOMATIC_COMPLEXITY~ERROR_COUNT, ylab="CC_lt_8",cc_restricted)

summary(cc_restricted$CYCLOMATIC_COMPLEXITY)
summary(cc_restricted$ERROR_COUNT)

for CC < 4

cc_restricted <- data %>% 
  filter(CYCLOMATIC_COMPLEXITY < 4)

Plot

 boxplot(CYCLOMATIC_COMPLEXITY~ERROR_COUNT, ylab="CC_lt_4",cc_restricted)

summary(cc_restricted$CYCLOMATIC_COMPLEXITY)
summary(cc_restricted$ERROR_COUNT)

for CC < 2

cc_restricted <- data %>% 
  filter(CYCLOMATIC_COMPLEXITY < 2)

Plot

 boxplot(CYCLOMATIC_COMPLEXITY~ERROR_COUNT, ylab="CC_lt_2",cc_restricted)

summary(cc_restricted$CYCLOMATIC_COMPLEXITY)
summary(cc_restricted$ERROR_COUNT)

Result of Analysis

From above analysis we can see the relation between Cyclomatic Complexity of module and and the error count in the same module. Above plots and the summary of the plots shows changing the error count on varying cyclomatic complexity.

Finding 2

Relation of LOC_BLANK, LOC_EXECUTABLE, LOC_CODE_AND_COMMENT,LOC_COMMENT with ERROR_COUNT

Same procedure as finding_1.

Plotting Each variables with ERROR_COUNT Analysis of the summary

Sample Run

result_1 <-boxplot(LOC_EXECUTABLE~ERROR_COUNT, ylab="loc blank",data)

summary(data$LOC_EXECUTABLE)
summary(data$ERROR_COUNT)

data_1 <- data %>% 
  filter(LOC_EXECUTABLE < 100)
result_1 <-boxplot(LOC_EXECUTABLE~ERROR_COUNT, ylab="loc blank",data_1)

summary(data_1$LOC_EXECUTABLE)
summary(data_1$ERROR_COUNT)

Similarly for rest of the variables, LOC_BLANK, LOC_CODE_AND_COMMENT,LOC_COMMENT.

Result of finding 2

There is less relation among these variables and ERROR_COUNT as compared to Cyclomatic complexity and Design Complexity values. These values has small affect on the defect Count of the module.

Prediction

REF: https://www.rdocumentation.org/packages/car/versions/3.0-8/topics/Predict

Linear Model


model_lm <- lm(ERROR_COUNT~CYCLOMATIC_COMPLEXITY, data)

summary(model_lm)

New predicted Values And summary

data$pred<-predict(model_lm,newdata = data)# These are the predicted values
str(data$pred)

summary of predicted value

summary(data$pred)

Install Package for Evaluation of the Model #"mae": mean absolute error #"mse": mean squared error, #"rmse": root mean squared error

library(DMwR)
regr.eval(data$CYCLOMATIC_COMPLEXITY,data$pred)

plot(model_lm$residuals)

Prediction on new value of CC

predict(model_lm,newdata = data.frame(CYCLOMATIC_COMPLEXITY=c(5,30,45)))

Results

Result of the prediction can be seen with less accuracy. With altering the parameters in the lm accuracy may be improved.

References

http://promise.site.uottawa.ca/SERepository/datasets-page.html

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.gitignore		.gitignore
README.md		README.md
Rnotebook.Rproj		Rnotebook.Rproj
analysis.R		analysis.R
bibliography.bib		bibliography.bib
metric_notebook.Rmd		metric_notebook.Rmd
prediction.R		prediction.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Rnotebook

Open file "metric_notebook.Rmd" in R Studio

Report as follows:

title: "R Notebook- Analysis and prediction on NASA KC1" output: html_notebook: default bibliography: bibliography.bib

Analysis

Finding 1

Load data

view data

Cleaning data

Structure and data types of the data

summary of the data

Plotting Cyclomatic Complexity

Relation among cyclomatic Complexity and Error Count

Analysis by filtering the CC value

Result of Analysis

Finding 2

Relation of LOC_BLANK, LOC_EXECUTABLE, LOC_CODE_AND_COMMENT,LOC_COMMENT with ERROR_COUNT

Result of finding 2

Prediction

Prediction on new value of CC

Results

References

About

Releases

Packages

Languages

Ashok314/Rnotebook

Folders and files

Latest commit

History

Repository files navigation

Rnotebook

Open file "metric_notebook.Rmd" in R Studio

Report as follows:

title: "R Notebook- Analysis and prediction on NASA KC1" output: html_notebook: default bibliography: bibliography.bib

Analysis

Finding 1

Load data

view data

Cleaning data

Structure and data types of the data

summary of the data

Plotting Cyclomatic Complexity

Relation among cyclomatic Complexity and Error Count

Analysis by filtering the CC value

Result of Analysis

Finding 2

Relation of LOC_BLANK, LOC_EXECUTABLE, LOC_CODE_AND_COMMENT,LOC_COMMENT with ERROR_COUNT

Result of finding 2

Prediction

Prediction on new value of CC

Results

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages