-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update final nodes output #1247
base: incremental_indexing/main
Are you sure you want to change the base?
Update final nodes output #1247
Conversation
graphrag/index/update/dataframes.py
Outdated
The merged relationships. | ||
old_nodes : pd.DataFrame | ||
The old nodes. | ||
community_count_threshold : int, optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we have some params to define the % of existing relationships vs new relationships (e.g. if a new node has 2 old neighbors and 10 new neighbors then we consider putting it in a new community, rather than an old community?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the time I'm doing simple majority, and not even fine tuning it haha. I totally agree that should be percentage and not an int. to be independent on the dataset size.
My plan is to further refine this value once we have community reports generated.
But, will address the change of converting to a percentage right now
new_delta_nodes_df[["level", "title"]], on=["level", "title"], how="outer" | ||
) | ||
|
||
# Count the communities for each (level, title) pair |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we remove all these logics for calculating community_counts too?
Description
Update create_final_nodes
Related Issues
[Reference any related issues or tasks that this pull request addresses.]
Proposed Changes
[List the specific changes made in this pull request.]
Checklist
Additional Notes
[Add any additional notes or context that may be helpful for the reviewer(s).]