Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update final nodes output #1247

Open
wants to merge 12 commits into
base: incremental_indexing/main
Choose a base branch
from

Conversation

AlonsoGuevara
Copy link
Contributor

Description

Update create_final_nodes

Related Issues

[Reference any related issues or tasks that this pull request addresses.]

Proposed Changes

[List the specific changes made in this pull request.]

Checklist

  • I have tested these changes locally.
  • I have reviewed the code changes.
  • I have updated the documentation (if necessary).
  • I have added appropriate unit tests (if applicable).

Additional Notes

[Add any additional notes or context that may be helpful for the reviewer(s).]

@AlonsoGuevara AlonsoGuevara requested review from a team as code owners October 3, 2024 18:55
The merged relationships.
old_nodes : pd.DataFrame
The old nodes.
community_count_threshold : int, optional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we have some params to define the % of existing relationships vs new relationships (e.g. if a new node has 2 old neighbors and 10 new neighbors then we consider putting it in a new community, rather than an old community?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the time I'm doing simple majority, and not even fine tuning it haha. I totally agree that should be percentage and not an int. to be independent on the dataset size.
My plan is to further refine this value once we have community reports generated.
But, will address the change of converting to a percentage right now

new_delta_nodes_df[["level", "title"]], on=["level", "title"], how="outer"
)

# Count the communities for each (level, title) pair
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we remove all these logics for calculating community_counts too?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants