Skip to content

Commit

Permalink
Merge official graphrag 0.3.4 (#25)
Browse files Browse the repository at this point in the history
* Bump micromatch from 4.0.5 to 4.0.8 in /docsite (microsoft#1013)

Bumps [micromatch](https://github.com/micromatch/micromatch) from 4.0.5 to 4.0.8.
- [Release notes](https://github.com/micromatch/micromatch/releases)
- [Changelog](https://github.com/micromatch/micromatch/blob/4.0.8/CHANGELOG.md)
- [Commits](micromatch/micromatch@4.0.5...4.0.8)

---
updated-dependencies:
- dependency-name: micromatch
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump ruff from 0.5.7 to 0.6.2 (microsoft#1014)

Bumps [ruff](https://github.com/astral-sh/ruff) from 0.5.7 to 0.6.2.
- [Release notes](https://github.com/astral-sh/ruff/releases)
- [Changelog](https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md)
- [Commits](astral-sh/ruff@0.5.7...0.6.2)

---
updated-dependencies:
- dependency-name: ruff
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>

* Ensure entity types to be str in prompt tune (microsoft#1015)

* Fix weight casting during graph extraction (microsoft#1016)

* Fix weight casting during graph extraction

* Format

* Format

* Update developer guide (microsoft#1029)

* Add missing config parameter for prompt tuning docs (microsoft#1017)

* Improve search type hint (microsoft#1031)

* update get_local_search_engine and get_global_search_engine return annotation

* add semversioner file

* reorder imports

* fix pyright errors

* revert change and ignore previous pyright error

---------

Co-authored-by: wanhua.gu <wanhua.gu@wiz.ai>
Co-authored-by: longyunfeigu <2514553187@qq.com>
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>

* Patch "past" dependency issues (microsoft#1033)

* Patch "past" dependency issues

* Semver

* Release v0.3.2 (microsoft#1034)

* Update VectorStoreSearchResult score value range (microsoft#937)

update VectorStoreSearchResult score comment

Co-authored-by: wanhua.gu <wanhua.gu@wiz.ai>
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>

* Add source URL to the package (microsoft#927)

Co-authored-by: Josh Bradley <joshbradley@microsoft.com>
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>

* Fix/text unit code cleanup (microsoft#1040)

* Optimized _build_text_unit_context function for improved time and space complexity

Refactored the _build_text_unit_context function to enhance its performance and efficiency. Key optimizations include:

1. Set for Text Unit IDs: Replaced list-based membership checks with a set (text_unit_ids_set) to achieve constant-time complexity for membership checks, reducing overall time complexity.
2. Direct Attribute Removal: Utilized pop with a default value (None) to directly remove attributes entity_order and num_relationships from text units, minimizing overhead and avoiding potential KeyError.
3. Default Dictionary for Entity Orders: Implemented defaultdict for managing entity orders, simplifying the ranking process and improving readability.

These improvements result in a more efficient function with better performance, especially when handling large datasets or numerous selected entities. The refactoring ensures that the core functionality remains unchanged while enhancing both time and space complexity.

* Format

* Ruff fixes

* semver

---------

Co-authored-by: arjun-234 <arjun.darji@yudiz.com>
Co-authored-by: Arjun D. <103405661+arjun-234@users.noreply.github.com>

* Fix INIT_YAML embeddings default settings (microsoft#1039)

Co-authored-by: Thanh Long Phan <long.phan@dida.do>
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>

* Bump pytest-asyncio from 0.23.8 to 0.24.0 (microsoft#1022)

Bumps [pytest-asyncio](https://github.com/pytest-dev/pytest-asyncio) from 0.23.8 to 0.24.0.
- [Release notes](https://github.com/pytest-dev/pytest-asyncio/releases)
- [Commits](pytest-dev/pytest-asyncio@v0.23.8...v0.24.0)

---
updated-dependencies:
- dependency-name: pytest-asyncio
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>

* Bump json-repair from 0.26.0 to 0.28.4 (microsoft#1044)

Bumps [json-repair](https://github.com/mangiucugna/json_repair) from 0.26.0 to 0.28.4.
- [Release notes](https://github.com/mangiucugna/json_repair/releases)
- [Commits](mangiucugna/json_repair@0.26.0...v0.28.4)

---
updated-dependencies:
- dependency-name: json-repair
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump lancedb from 0.11.0 to 0.12.0 (microsoft#1024)

Bumps [lancedb](https://github.com/lancedb/lancedb) from 0.11.0 to 0.12.0.
- [Release notes](https://github.com/lancedb/lancedb/releases)
- [Changelog](https://github.com/lancedb/lancedb/blob/main/release_process.md)
- [Commits](lancedb/lancedb@python-v0.11.0...python-v0.12.0)

---
updated-dependencies:
- dependency-name: lancedb
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump textual from 0.76.0 to 0.78.0 (microsoft#1038)

Bumps [textual](https://github.com/Textualize/textual) from 0.76.0 to 0.78.0.
- [Release notes](https://github.com/Textualize/textual/releases)
- [Changelog](https://github.com/Textualize/textual/blob/main/CHANGELOG.md)
- [Commits](Textualize/textual@v0.76.0...v0.78.0)

---
updated-dependencies:
- dependency-name: textual
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Fix/entity extraction strategy (microsoft#1046)

* fix strategy config in entity_extraction

* update init content

---------

Co-authored-by: KylinMountain <kose2livs@gmail.com>

* fix for issue 515 (microsoft#925)

* fix for issue 515

* semver impact document

---------

Co-authored-by: Kanishk Tyagi <kanishktyagi@Kanishks-MacBook-Pro.local>
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>

* docs: update manual_prompt_tuning.md (microsoft#963)

paramater -> parameter

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>

* Update indexer_adapters.py (microsoft#895)

Update the lines 71 and 72
before:
entity_df["community"] = entity_df["community"].fillna(-1)
entity_df["community"] = entity_df["community"].astype(int)
after:
entity_df.loc[:, "community"] = entity_df["community"].fillna(-1)
entity_df.loc[:, "community"] = entity_df["community"].astype(int)

Co-authored-by: Alonso Guevara <alonsog@microsoft.com>

* Fix circular dependency on prompt tune api (microsoft#1054)

* Bump notebook from 7.2.1 to 7.2.2 (microsoft#1055)

Bumps [notebook](https://github.com/jupyter/notebook) from 7.2.1 to 7.2.2.
- [Release notes](https://github.com/jupyter/notebook/releases)
- [Changelog](https://github.com/jupyter/notebook/blob/@jupyter-notebook/tree@7.2.2/CHANGELOG.md)
- [Commits](https://github.com/jupyter/notebook/compare/@jupyter-notebook/tree@7.2.1...@jupyter-notebook/tree@7.2.2)

---
updated-dependencies:
- dependency-name: notebook
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>

* Bump jupyterlab from 4.2.4 to 4.2.5 (microsoft#1056)

Bumps [jupyterlab](https://github.com/jupyterlab/jupyterlab) from 4.2.4 to 4.2.5.
- [Release notes](https://github.com/jupyterlab/jupyterlab/releases)
- [Changelog](https://github.com/jupyterlab/jupyterlab/blob/@jupyterlab/lsp@4.2.5/CHANGELOG.md)
- [Commits](https://github.com/jupyterlab/jupyterlab/compare/@jupyterlab/lsp@4.2.4...@jupyterlab/lsp@4.2.5)

---
updated-dependencies:
- dependency-name: jupyterlab
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Update Prompt Tuning docs (microsoft#1057)

* Update Prompt Tuning docs

* Semver

* Update bash example in docs for prompt tune (microsoft#1059)

* Semver

* Update bash command

* Fix img for autotune (microsoft#1060)

* Fix img for autotune

* Add line breaks to tune docs

* More line breaks

* Fix img width (microsoft#1061)

* Consistent config load_config (microsoft#1065)

* Consistent config load_config

- Provide a consistent way to load configuration
- Resolve potential timestamp directories upfront
    upon config object creation
- Add unit tests for resolving timestamp directories
- Resolves microsoft#599
- Resolves microsoft#1049

* fix formatting issues

* remove unnecessary path resolution

* fix smoke tests

* update prompts to use load_config

* Update none checks

* Update none checks

* Update searching for config method signature

* Update unit tests

* fix formatting issues

* fix setting base_dir to full paths when not using file system. (microsoft#1096)

* fix setting base_dir to full paths when not using file system.

* add general resolve_path

* Clean and organize run index code (microsoft#1090)

* Create entypoint for cli and api (microsoft#1067)

* Add cli and api entrypoints for update index

* Semver

* Update docs

* Run tests on feature branch main

* Better /main handling in tests

* Clean and organize run index code

* Ruff fix

* Pyright fix

* Format fixes

* Pyright fix

* Format

* Fix integ tests

* Fix ruff

* Reorganize and clean up

* Load query from blob (microsoft#1095)

* Moved query loading from file to helper function

* added loading parquets from blob to function

* resolved adlfs async error

* debugging cleanup and small fixes

* added connection string support

* semversioner and ruff fixes

* completed testing for merge with main

* more ruff changes

* fixed unbound vars warning

* rewrote function to use storage utils

* removed unused vars

---------

Co-authored-by: Kenny Zhang <zhangken@microsoft.com>

* Update create_pipeline_config.py (microsoft#1108)

* Update create_pipeline_config.py

Order switched to ensure that user settings at runtime take precedence.

* Updated semversioner.

* release v0.3.3 (microsoft#1116)

* Deep copy txt units on local search to avoid race conditions (microsoft#1118)

* Deep copy txt units on local search to avoid race conditions

* Format

* Fix summarization including empty descriptions (microsoft#1124)

* Fix summarization including empty descriptions

* Update

* Release v0.3.4 (microsoft#1125)

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Alonso Guevara <alonsog@microsoft.com>
Co-authored-by: Josh Bradley <joshbradley@microsoft.com>
Co-authored-by: wanhua.gu <wanhua.gu@wiz.ai>
Co-authored-by: longyunfeigu <2514553187@qq.com>
Co-authored-by: Konstantin Gukov <gukkos@gmail.com>
Co-authored-by: arjun-234 <arjun.darji@yudiz.com>
Co-authored-by: Arjun D. <103405661+arjun-234@users.noreply.github.com>
Co-authored-by: TLP <104315397+TLongP@users.noreply.github.com>
Co-authored-by: Thanh Long Phan <long.phan@dida.do>
Co-authored-by: fantom845 <77169323+fantom845@users.noreply.github.com>
Co-authored-by: Kanishk Tyagi <kanishktyagi@Kanishks-MacBook-Pro.local>
Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com>
Co-authored-by: guangxiangdebizi <154864206+guangxiangdebizi@users.noreply.github.com>
Co-authored-by: Derek Worthen <worthend.derek@gmail.com>
Co-authored-by: KennyZhang1 <90438893+KennyZhang1@users.noreply.github.com>
Co-authored-by: Kenny Zhang <zhangken@microsoft.com>
Co-authored-by: Doug Orbaker <107270698+dorbaker@users.noreply.github.com>
  • Loading branch information
19 people authored Sep 12, 2024
1 parent 767f063 commit 258c9e9
Show file tree
Hide file tree
Showing 59 changed files with 1,347 additions and 917 deletions.
10 changes: 7 additions & 3 deletions .github/workflows/python-ci.yml
Original file line number Diff line number Diff line change
@@ -1,9 +1,13 @@
name: Python CI
on:
push:
branches: [main]
branches:
- "**/main" # Matches branches like feature/main
- "main" # Matches the main branch
pull_request:
branches: [main]
branches:
- "**/main"
- "main"

permissions:
contents: read
Expand Down Expand Up @@ -72,4 +76,4 @@ jobs:
- name: Unit Test
run: |
poetry run poe test_unit
poetry run poe test_unit
8 changes: 6 additions & 2 deletions .github/workflows/python-integration-tests.yml
Original file line number Diff line number Diff line change
@@ -1,9 +1,13 @@
name: Python Integration Tests
on:
push:
branches: [main]
branches:
- "**/main" # Matches branches like feature/main
- "main" # Matches the main branch
pull_request:
branches: [main]
branches:
- "**/main"
- "main"

permissions:
contents: read
Expand Down
9 changes: 6 additions & 3 deletions .github/workflows/python-notebook-tests.yml
Original file line number Diff line number Diff line change
@@ -1,9 +1,13 @@
name: Python Notebook Tests
on:
push:
branches: [main]
branches:
- "**/main" # Matches branches like feature/main
- "main" # Matches the main branch
pull_request:
branches: [main]
branches:
- "**/main"
- "main"

permissions:
contents: read
Expand Down Expand Up @@ -64,7 +68,6 @@ jobs:
poetry run python -m pip install gensim
poetry install
- name: Notebook Test
run: |
poetry run poe test_notebook
8 changes: 6 additions & 2 deletions .github/workflows/python-smoke-tests.yml
Original file line number Diff line number Diff line change
@@ -1,9 +1,13 @@
name: Python Smoke Tests
on:
push:
branches: [main]
branches:
- "**/main" # Matches branches like feature/main
- "main" # Matches the main branch
pull_request:
branches: [main]
branches:
- "**/main"
- "main"

permissions:
contents: read
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/spellcheck.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ on:
branches: [main]
pull_request:
paths:
- '**/*'
- "**/*"
jobs:
spellcheck:
runs-on: ubuntu-latest
Expand Down
42 changes: 42 additions & 0 deletions .semversioner/0.3.2.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
{
"changes": [
{
"description": "Add context data to query API responses.",
"type": "patch"
},
{
"description": "Add missing config parameter documentation for prompt tuning",
"type": "patch"
},
{
"description": "Add neo4j community notebook",
"type": "patch"
},
{
"description": "Ensure entity types to be str when running prompt tuning",
"type": "patch"
},
{
"description": "Fix weight casting during graph extraction",
"type": "patch"
},
{
"description": "Patch \"past\" dependency issues",
"type": "patch"
},
{
"description": "Update developer guide.",
"type": "patch"
},
{
"description": "Update query type hints.",
"type": "patch"
},
{
"description": "change-lancedb-placement",
"type": "patch"
}
],
"created_at": "2024-08-26T23:43:01+00:00",
"version": "0.3.2"
}
66 changes: 66 additions & 0 deletions .semversioner/0.3.3.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
{
"changes": [
{
"description": "Add entrypoints for incremental indexing",
"type": "patch"
},
{
"description": "Clean up and organize run index code",
"type": "patch"
},
{
"description": "Consistent config loading. Resolves #99 and Resolves #1049",
"type": "patch"
},
{
"description": "Fix circular dependency when running prompt tune api directly",
"type": "patch"
},
{
"description": "Fix default settings for embedding",
"type": "patch"
},
{
"description": "Fix img for auto tune",
"type": "patch"
},
{
"description": "Fix img width",
"type": "patch"
},
{
"description": "Fixed a bug in prompt tuning process",
"type": "patch"
},
{
"description": "Refactor text unit build at local search",
"type": "patch"
},
{
"description": "Update Prompt Tuning docs",
"type": "patch"
},
{
"description": "Update create_pipeline_config.py",
"type": "patch"
},
{
"description": "Update prompt tune command in docs",
"type": "patch"
},
{
"description": "add querying from azure blob storage",
"type": "patch"
},
{
"description": "fix setting base_dir to full paths when not using file system.",
"type": "patch"
},
{
"description": "fix strategy config in entity_extraction",
"type": "patch"
}
],
"created_at": "2024-09-10T19:51:24+00:00",
"version": "0.3.3"
}
14 changes: 14 additions & 0 deletions .semversioner/0.3.4.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
{
"changes": [
{
"description": "Deep copy txt units on local search to avoid race conditions",
"type": "patch"
},
{
"description": "Fix summarization including empty descriptions",
"type": "patch"
}
],
"created_at": "2024-09-11T22:31:58+00:00",
"version": "0.3.4"
}
4 changes: 0 additions & 4 deletions .semversioner/next-release/patch-20240821135138469990.json

This file was deleted.

4 changes: 0 additions & 4 deletions .semversioner/next-release/patch-20240821235154401001.json

This file was deleted.

35 changes: 35 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,41 @@
# Changelog
Note: version releases in the 0.x.y range may introduce breaking changes.

## 0.3.4

- patch: Deep copy txt units on local search to avoid race conditions
- patch: Fix summarization including empty descriptions

## 0.3.3

- patch: Add entrypoints for incremental indexing
- patch: Clean up and organize run index code
- patch: Consistent config loading. Resolves #99 and Resolves #1049
- patch: Fix circular dependency when running prompt tune api directly
- patch: Fix default settings for embedding
- patch: Fix img for auto tune
- patch: Fix img width
- patch: Fixed a bug in prompt tuning process
- patch: Refactor text unit build at local search
- patch: Update Prompt Tuning docs
- patch: Update create_pipeline_config.py
- patch: Update prompt tune command in docs
- patch: add querying from azure blob storage
- patch: fix setting base_dir to full paths when not using file system.
- patch: fix strategy config in entity_extraction

## 0.3.2

- patch: Add context data to query API responses.
- patch: Add missing config parameter documentation for prompt tuning
- patch: Add neo4j community notebook
- patch: Ensure entity types to be str when running prompt tuning
- patch: Fix weight casting during graph extraction
- patch: Patch "past" dependency issues
- patch: Update developer guide.
- patch: Update query type hints.
- patch: change-lancedb-placement

## 0.3.1

- patch: Add preflight check to check LLM connectivity.
Expand Down
12 changes: 9 additions & 3 deletions DEVELOPING.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
poetry install
```

## Execute the Indexing Engine
## Executing the Indexing Engine

```sh
poetry run poe index <...args>
Expand All @@ -28,6 +28,13 @@ poetry run poe index <...args>
poetry run poe query <...args>
```

## Versioning

We use [semversioner](https://github.com/raulgomis/semversioner) to automate and enforce semantic versioning in the release process. Our CI/CD pipeline checks that all PR's include a json file generated by semversioner. When submitting a PR, please run:
```shell
poetry run semversioner add-change -t patch -d "<a small sentence describing changes made>."
```

# Azurite

Some unit and smoke tests use Azurite to emulate Azure resources. This can be started by running:
Expand All @@ -40,10 +47,9 @@ or by simply running `azurite` in the terminal if already installed globally. Se

# Lifecycle Scripts

Our Python package utilizes Poetry to manage dependencies and [poethepoet](https://pypi.org/project/poethepoet/) to manage build scripts.
Our Python package utilizes Poetry to manage dependencies and [poethepoet](https://pypi.org/project/poethepoet/) to manage custom build scripts.

Available scripts are:

- `poetry run poe index` - Run the Indexing CLI
- `poetry run poe query` - Run the Query CLI
- `poetry build` - This invokes `poetry build`, which will build a wheel file and other distributable artifacts.
Expand Down
4 changes: 2 additions & 2 deletions docsite/_includes/page.njk
Original file line number Diff line number Diff line change
Expand Up @@ -107,10 +107,10 @@ title: GraphRAG
{{link_to("/posts/prompt_tuning/overview/", "Prompt Tuning")}}
<ul>
<li>
{{link_to("/posts/prompt_tuning/auto_prompt_tuning/", "Automatic Templating")}}
{{link_to("/posts/prompt_tuning/auto_prompt_tuning/", "Auto Tuning")}}
</li>
<li>
{{link_to("/posts/prompt_tuning/manual_prompt_tuning/", "Manual Prompt Tuning")}}
{{link_to("/posts/prompt_tuning/manual_prompt_tuning/", "Manual Tuning")}}
</li>
</ul>
</li>
Expand Down
Binary file added docsite/img/auto-tune-diagram.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 258c9e9

Please sign in to comment.