Add document summary to extraction process #1682
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Analysis over the Nobel prizes in science demonstrates that having access to document summaries during knowledge graph extraction improves quality and relevance. The summary-assisted extraction delivered better focus and coherence across three key dimensions: entity selection, relationship mapping, and community structure.
Key Statistics:
While the baseline extraction captured more relationships, the summary-assisted version produced more meaningful connections focused on core Nobel Prize concepts, particularly in representing the intersection of neural networks and statistical physics. These findings validate the inclusion of document summaries in the extraction pipeline for improved knowledge representation.
Important
Add document summaries to the knowledge graph extraction process for improved entity and relationship extraction, and update related prompt templates.
augment_document_info()
iningestion_service.py
and_extract_kg()
inkg_service.py
.graphrag_entity_description.yaml
to includedocument_summary
in entity description generation.graphrag_relationships_extraction_few_shot.yaml
to usedocument_summary
for relationship extraction.user_count
anddocument_count
columns tocollections
inc45a9cf6a8a4_add_user_and_document_count_to_.py
.This description was created by for 97c5ab8. It will automatically update as commits are pushed.