Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Doc] opensource obtains star-count and fork-count #1623

Open
Tenth-crew opened this issue Sep 16, 2024 · 13 comments
Open

[Doc] opensource obtains star-count and fork-count #1623

Tenth-crew opened this issue Sep 16, 2024 · 13 comments
Labels
waiting for author need issue author's feedback

Comments

@Tenth-crew
Copy link

Description

At the invitation of Dr. Zhao, here is a method to obtain the fork-count and star-count of the repository in opensource.
Regarding repo_forks_count and repo_startgazers_count, these repo-related fields are not included in all types of events. Only PullRequest-related events have repository statistics.
image
In addition, there is no PR counter in opensource, and I have not found any other way to get the PR count. Here is an additional explanation of the conditions for merging PR: action='closed' AND pull_merged=1

@github-actions github-actions bot added the waiting for repliers need other's feedback label Sep 16, 2024
@Tenth-crew
Copy link
Author

Tenth-crew commented Sep 16, 2024

I also found that the description of issue_number in the data description file seems to be wrong. The table says "the id of the issue in this repo". But I think it seems to mean the number of issues in the repository at this time. I don't know if it is correct.
a8815c345137e3c38841b034d6e2f3e8

@zhingoll
Copy link

I also found that the description of issue_number in the data description file seems to be wrong. The table says "the id of the issue in this repo". But I think it seems to mean the number of issues in the repository at this time. I don't know if it is correct. a8815c345137e3c38841b034d6e2f3e8

Hello! The information in the table regarding the description of issue_number is correct. The issue_number here actually represents the unique ID number of each issue in the current repo, rather than the number of issues in the warehouse. The example is as follows:
image
image

@github-actions github-actions bot added waiting for author need issue author's feedback and removed waiting for repliers need other's feedback labels Sep 18, 2024
@Tenth-crew
Copy link
Author

What is the difference between issue_id and issue_number? I saw the description of issue_id is unique identity of this issue on GitHub, which is very similar to the unique ID number you mentioned. At the same time, they are sometimes the same and sometimes different.
image

@github-actions github-actions bot added waiting for repliers need other's feedback and removed waiting for author need issue author's feedback labels Sep 18, 2024
@zhingoll
Copy link

What is the difference between issue_id and issue_number? I saw the description of issue_id is unique identity of this issue on GitHub, which is very similar to the unique ID number you mentioned. At the same time, they are sometimes the same and sometimes different. image

The 'issue_number' is a unique identifier visible to users within a specific repository. It is unique within that repository. On the other hand, the 'issue_id' you mentioned is a globally unique identifier used across GitHub to uniquely identify an issue.The example is as follows:
image

@github-actions github-actions bot added waiting for author need issue author's feedback and removed waiting for repliers need other's feedback labels Sep 18, 2024
@Tenth-crew
Copy link
Author

Got it, thanks a lot!

@github-actions github-actions bot added waiting for repliers need other's feedback and removed waiting for author need issue author's feedback labels Sep 18, 2024
@birdflyi
Copy link
Contributor

birdflyi commented Sep 18, 2024

The contradiction in your viewpoints may come from the different definitions :
Gitee API:

  • the Response Class of 仓库的某个Issue
    • "root"->"id" with no description;
    • "root"->"number" with a description "唯一标识". (This is a string type!)

vs.

GitHub API:

@github-actions github-actions bot added waiting for author need issue author's feedback and removed waiting for repliers need other's feedback labels Sep 18, 2024
@Tenth-crew
Copy link
Author

Thank you both for the detailed explanation!

@github-actions github-actions bot added waiting for repliers need other's feedback and removed waiting for author need issue author's feedback labels Sep 18, 2024
@birdflyi
Copy link
Contributor

birdflyi commented Sep 18, 2024

The relationship between issue 'id' and 'issue_number' in Gitee API may differ from their relationship in GitHub API.

  1. Records from Gitee platform:
    image
    Notes: Different issue_number can share a same issue_id in records from Gitee platform.

  2. Records from GitHub platform:
    image
    Notes: Different issue_id can share a same issue_number in records from GitHub platform.
    Extra Notes: The implementation granularity of issue_id is finer than that of ({owner}, {repo}, {issue_number})! 2 Different issue_id values can share a same "Get an issue" parameters combination ({owner}, {repo}, {issue_number}) when it comes to a type=PullRequestEvent record. Try https://api.github.com/repos/X-lab2017/open-digger/issues/1611 and https://api.github.com/repos/X-lab2017/open-digger/pulls/1611.

@github-actions github-actions bot added waiting for author need issue author's feedback and removed waiting for repliers need other's feedback labels Sep 18, 2024
@zhingoll
Copy link

My understanding is that on the Gitee platform, Issues and PRs are managed separately and each has its own independent global ID, which may result in different records sharing the same id number, i.e. the issue_id field stored in the database. As shown in the example you provided, the PR in the 'ljt365fir/ZhaoCaiZhu' repository and the issue in the 'oschina/Android app' repository on Gitee both have records with id 63. As shown below:
image
image
Also, I'm not quite sure how the issue_number field for issues is obtained on the Gitee platform in the database? For issues in Gitee, issue_number is in string format, as shown in the above figure. In this case, the issue_number of the issue should be 'I1R', but it is stored in the database as 23391. Did you perform any relevant mapping operations?

@birdflyi
Copy link
Contributor

We need to invite more people who understand the records from Gitee platform well to join this discussion :-)
Could you help us understand the properties(issue id , issue_number) of repositories in Gitee platform and the fileds(issue_id, issue_number) of records in clickhouse table opensource.events? @frank-zsy

@frank-zsy
Copy link
Contributor

I just use the id in the API data and store to issue_id field, I will try to understand the meaning and get back later.

For issue_numer, in Gitee, issues and pull requests have independent numbers. And since in the original GitHub record level, we define the column as a UInt32 type, so I cast the number to UInt32 type like it is a base 36 number, so I1R is parseInt('I1R', 36)=23391.

@zhingoll @birdflyi

@birdflyi
Copy link
Contributor

I find similar parsing functions here: https://blog.csdn.net/qq_52669357/article/details/122883439 and https://blog.csdn.net/z956281507/article/details/72796699 .

@frank-zsy
Copy link
Contributor

BTW, all the repo related fields are removed in the original log data after 2024.09.24.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
waiting for author need issue author's feedback
Projects
None yet
Development

No branches or pull requests

4 participants