Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cocom] Add repository level analysis option for CoCom backend #38

Closed
wants to merge 2 commits into from

Conversation

inishchith
Copy link
Contributor

@inishchith inishchith commented Jun 13, 2019

@valeriocos As discussed, this adds repository level analysis as an option to CoCom Backend.

As of now, this is being added to make repository level analysis and visualization to be carried out in Kibana in a bit easier way.
We can discuss the limitation that might be caused in the future and some edge cases that I might have missed in the implementation.

Results of comparison can be found in #36

Note: Some work/commit(s) are addressed and are a continuation of #37

Edit: This is just a rough idea (implementation)

Some things that need to be worked on:

  • Tests
  • Docstrings

There can be two scenarios, a deleted file and other renamed (or the one
which has changed the directory). These changes handles both the cases.
Alter test data to accomodate cases for deleted files results

Signed-off-by: inishchith <inishchith@gmail.com>
@inishchith inishchith changed the title Repository level cocom [cocom] Add repository level analysis option for CoCom backend Jun 13, 2019
Copy link
Member

@valeriocos valeriocos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @inishchith for sharing the idea of an analysis at repository level. Overall it looks interesting, however I have some doubts about some implementation choices.

Instead of passing a flag to trigger the analysis at repo level, it wouldn't be better to introduce a new category that performs this kind of analysis ? What do you think ?

The variable self.history collects the cocom analysis for each file. The solution seems to work well when performing the initial fetch, but I'm not sure it would work for incremental fetches. Could you explain the logic in case I'm missing something ?
A different approach could be to execute the analyzer over the full repo for each commit and then sum up the results obtained ? Maybe the param -t may speed up the analysis. What do you think ?

@inishchith
Copy link
Contributor Author

@valeriocos Sorry for the delayed response.

Instead of passing a flag to trigger the analysis at repo level, it wouldn't be better to introduce a new category that performs this kind of analysis? What do you think?

  • We have used category as a convention to perform analysis using different analyzers as of now and repository-level analysis does make sense to other backends as well so thought it would be better to provide an option(flag) instead of a category. let me know what you think

The variable self.history collects the cocom analysis for each file. The solution seems to work well when performing the initial fetch, but I'm not sure it would work for incremental fetches. Could you explain the logic in case I'm missing something?

  • Yes, you're correct. I had a thought about this(midway) but in the process missed out to address it. This could be a problem we can look to solve. I'll give a thought on how the current implementation can be refactored to support incremental fetches. Thanks for addressing it 👍

A different approach could be to execute the analyzer over the full repo. for each commit and then sum up the results obtained? Maybe the param -t may speed up the analysis. What do you think?

  • Yes. this was one of the initial approaches that we'd addressed in [cocom] Evaluating results with repository level analysis #36. I thought the current idea could outperform in term of time and redundancy in the calculation.
    ( it's like cutting down the history part and then thinking of a workable solution ).
    But Yes; agreed here, we face a trade-off between Time(full-repo on every commit) and Memory(maintaining self.history). And as(pointed by you), if we can get incremental fetch working on the current implementation then it'd be great, else have to evaluate lizard's full-repo with WORKING_THREADS option.

I'll update you once i've worked on the evaluation. Let me know what you think!

@valeriocos
Copy link
Member

No worries @inishchith , thank you for answering.

We have used category as a convention to perform analysis using different analyzers as of now and repository-level analysis does make sense to other backends as well so thought it would be better to provide an option(flag) instead of a category. let me know what you think

If the data has a different shape it's probably better to use a different category. However, we can proceed without adding a new category, and change the code afterwards if needed :)
I'm not sure about the definition of the variable self.repository_level in __init__. That information seems to be more related to the way the fetch is performed than how the class is initialized. Could you explain why repository_level has been defined as an instance attribute ?
Thanks

@inishchith
Copy link
Contributor Author

inishchith commented Jun 15, 2019

If the data has a different shape it's probably better to use a different category. However, we can proceed without adding a new category, and change the code afterwards if needed :)

  • Yes, Agreed. Sure!.

I'm not sure about the definition of the variable self.repository_level in init. That information seems to be more related to the way the fetch is performed than how the class is initialized. Could you explain why repository_level has been defined as an instance attribute?

  • It shouldn't be an instance attribute. I was working on a different approach (for adding an option) and am not sure when I added this line.
    Sorry about that and Thanks for pointing it out :)

Edit:

  1. made the correction
  2. @valeriocos If you could review [analyzer] Fix results for deleted files for CoCom backend #37 before this, as it's of higher priority. This still requires some work to be done :)

The idea is to produce repository level analysis for CoCom backend using
a history of files and picking the latest values of the available
results

Signed-off-by: inishchith <inishchith@gmail.com>
@inishchith inishchith force-pushed the repository_level_cocom branch from 1d04905 to 159cf41 Compare June 15, 2019 14:43
@inishchith
Copy link
Contributor Author

@valeriocos I had a thought over the incremental fetches issue regarding the current implementation. I figured out that we have to execute an initial run over the entire repository every-time that would again increase the execution time for large repositories as addressed in #36 .

As pointed by you above (about lizard's worker thread for repository-level analysis):

A different approach could be to execute the analyzer over the full repo for each commit and then sum up the results obtained ? Maybe the param -t may speed up the analysis ...

Here the incremental fetches wouldn't be affected and would work just as before. I have implemented a version locally for evaluation purpose, (below are the results).

Repository Number of Commits *File Level Repository Level
chaoss/grimoirelab-perceval 1387 23.65 min 27.97 min
chaoss/grimoirelab-sirmordred 869 9.69 min 4.27 min
chaoss/grimoirelab-graal 169 1.73 min 0.90 min

(there's a divergence due to Perceval have a lot more files than the other repositories in consideration).
With the help of lizard's repository-level analysis, I was able to create two of the metric visualization ( Overall LOC and CCN and other attributes). REF. chaoss/metrics#139.
And I'm now on to visualizing the most complex files in a repository.

Let me know what you think. Thanks :)

@inishchith
Copy link
Contributor Author

Closing in reference to #39

@inishchith inishchith closed this Jun 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants