[cocom] Add repository level analysis option for CoCom backend #38

inishchith · 2019-06-13T13:57:47Z

@valeriocos As discussed, this adds repository level analysis as an option to CoCom Backend.

As of now, this is being added to make repository level analysis and visualization to be carried out in Kibana in a bit easier way.
We can discuss the limitation that might be caused in the future and some edge cases that I might have missed in the implementation.

Results of comparison can be found in #36

Note: Some work/commit(s) are addressed and are a continuation of #37

Edit: This is just a rough idea (implementation)

Some things that need to be worked on:

Tests
Docstrings

There can be two scenarios, a deleted file and other renamed (or the one which has changed the directory). These changes handles both the cases. Alter test data to accomodate cases for deleted files results Signed-off-by: inishchith <inishchith@gmail.com>

valeriocos

Thank you @inishchith for sharing the idea of an analysis at repository level. Overall it looks interesting, however I have some doubts about some implementation choices.

Instead of passing a flag to trigger the analysis at repo level, it wouldn't be better to introduce a new category that performs this kind of analysis ? What do you think ?

The variable self.history collects the cocom analysis for each file. The solution seems to work well when performing the initial fetch, but I'm not sure it would work for incremental fetches. Could you explain the logic in case I'm missing something ?
A different approach could be to execute the analyzer over the full repo for each commit and then sum up the results obtained ? Maybe the param -t may speed up the analysis. What do you think ?

inishchith · 2019-06-15T07:01:18Z

@valeriocos Sorry for the delayed response.

Instead of passing a flag to trigger the analysis at repo level, it wouldn't be better to introduce a new category that performs this kind of analysis? What do you think?

We have used category as a convention to perform analysis using different analyzers as of now and repository-level analysis does make sense to other backends as well so thought it would be better to provide an option(flag) instead of a category. let me know what you think

The variable self.history collects the cocom analysis for each file. The solution seems to work well when performing the initial fetch, but I'm not sure it would work for incremental fetches. Could you explain the logic in case I'm missing something?

Yes, you're correct. I had a thought about this(midway) but in the process missed out to address it. This could be a problem we can look to solve. I'll give a thought on how the current implementation can be refactored to support incremental fetches. Thanks for addressing it 👍

A different approach could be to execute the analyzer over the full repo. for each commit and then sum up the results obtained? Maybe the param -t may speed up the analysis. What do you think?

Yes. this was one of the initial approaches that we'd addressed in [cocom] Evaluating results with repository level analysis #36. I thought the current idea could outperform in term of time and redundancy in the calculation.
( it's like cutting down the history part and then thinking of a workable solution ).
But Yes; agreed here, we face a trade-off between Time(full-repo on every commit) and Memory(maintaining self.history). And as(pointed by you), if we can get incremental fetch working on the current implementation then it'd be great, else have to evaluate lizard's full-repo with WORKING_THREADS option.

I'll update you once i've worked on the evaluation. Let me know what you think!

valeriocos · 2019-06-15T13:04:10Z

No worries @inishchith , thank you for answering.

We have used category as a convention to perform analysis using different analyzers as of now and repository-level analysis does make sense to other backends as well so thought it would be better to provide an option(flag) instead of a category. let me know what you think

If the data has a different shape it's probably better to use a different category. However, we can proceed without adding a new category, and change the code afterwards if needed :)
I'm not sure about the definition of the variable self.repository_level in __init__. That information seems to be more related to the way the fetch is performed than how the class is initialized. Could you explain why repository_level has been defined as an instance attribute ?
Thanks

inishchith · 2019-06-15T14:42:20Z

If the data has a different shape it's probably better to use a different category. However, we can proceed without adding a new category, and change the code afterwards if needed :)

Yes, Agreed. Sure!.

I'm not sure about the definition of the variable self.repository_level in init. That information seems to be more related to the way the fetch is performed than how the class is initialized. Could you explain why repository_level has been defined as an instance attribute?

It shouldn't be an instance attribute. I was working on a different approach (for adding an option) and am not sure when I added this line.
Sorry about that and Thanks for pointing it out :)

Edit:

made the correction
@valeriocos If you could review [analyzer] Fix results for deleted files for CoCom backend #37 before this, as it's of higher priority. This still requires some work to be done :)

The idea is to produce repository level analysis for CoCom backend using a history of files and picking the latest values of the available results Signed-off-by: inishchith <inishchith@gmail.com>

inishchith · 2019-06-17T17:58:53Z

@valeriocos I had a thought over the incremental fetches issue regarding the current implementation. I figured out that we have to execute an initial run over the entire repository every-time that would again increase the execution time for large repositories as addressed in #36 .

As pointed by you above (about lizard's worker thread for repository-level analysis):

A different approach could be to execute the analyzer over the full repo for each commit and then sum up the results obtained ? Maybe the param -t may speed up the analysis ...

Here the incremental fetches wouldn't be affected and would work just as before. I have implemented a version locally for evaluation purpose, (below are the results).

Repository	Number of Commits	*File Level	Repository Level
chaoss/grimoirelab-perceval	1387	23.65 min	27.97 min
chaoss/grimoirelab-sirmordred	869	9.69 min	4.27 min
chaoss/grimoirelab-graal	169	1.73 min	0.90 min

(there's a divergence due to Perceval have a lot more files than the other repositories in consideration).
With the help of lizard's repository-level analysis, I was able to create two of the metric visualization ( Overall LOC and CCN and other attributes). REF. chaoss/metrics#139.
And I'm now on to visualizing the most complex files in a repository.

Let me know what you think. Thanks :)

inishchith · 2019-06-20T14:13:14Z

Closing in reference to #39

inishchith changed the title ~~Repository level cocom~~ [cocom] Add repository level analysis option for CoCom backend Jun 13, 2019

valeriocos reviewed Jun 14, 2019

View reviewed changes

[cocom] Add repository level analysis

159cf41

The idea is to produce repository level analysis for CoCom backend using a history of files and picking the latest values of the available results Signed-off-by: inishchith <inishchith@gmail.com>

inishchith force-pushed the repository_level_cocom branch from 1d04905 to 159cf41 Compare June 15, 2019 14:43

This was referenced Jun 18, 2019

[discussion] Considering relevant attributes from Graal for Visualization inishchith/gsoc#6

Closed

[cocom] Add repository level analysis via lizard #39

Merged

inishchith closed this Jun 20, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[cocom] Add repository level analysis option for CoCom backend #38

[cocom] Add repository level analysis option for CoCom backend #38

inishchith commented Jun 13, 2019 •

edited

Loading

valeriocos left a comment

inishchith commented Jun 15, 2019

valeriocos commented Jun 15, 2019

inishchith commented Jun 15, 2019 •

edited

Loading

inishchith commented Jun 17, 2019

inishchith commented Jun 20, 2019

[cocom] Add repository level analysis option for CoCom backend #38

[cocom] Add repository level analysis option for CoCom backend #38

Conversation

inishchith commented Jun 13, 2019 • edited Loading

valeriocos left a comment

Choose a reason for hiding this comment

inishchith commented Jun 15, 2019

valeriocos commented Jun 15, 2019

inishchith commented Jun 15, 2019 • edited Loading

inishchith commented Jun 17, 2019

inishchith commented Jun 20, 2019

inishchith commented Jun 13, 2019 •

edited

Loading

inishchith commented Jun 15, 2019 •

edited

Loading