-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Skip resolve_timestamp_path
when using --data in local search
#1049
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* Improved local search logic by avoiding redundant execution of `resolve_timestamp_path` when `--data` argument is specified * Prevented potential errors caused by repeated calls ---------
AlonsoGuevara
assigned AlonsoGuevara, dworthen and KennyZhang1 and unassigned AlonsoGuevara
Aug 29, 2024
4 tasks
KylinMountain
pushed a commit
to KylinMountain/graphrag-server
that referenced
this pull request
Sep 12, 2024
* Consistent config load_config - Provide a consistent way to load configuration - Resolve potential timestamp directories upfront upon config object creation - Add unit tests for resolving timestamp directories - Resolves microsoft#599 - Resolves microsoft#1049 * fix formatting issues * remove unnecessary path resolution * fix smoke tests * update prompts to use load_config * Update none checks * Update none checks * Update searching for config method signature * Update unit tests * fix formatting issues
KylinMountain
added a commit
to KylinMountain/graphrag-server
that referenced
this pull request
Sep 12, 2024
* Bump micromatch from 4.0.5 to 4.0.8 in /docsite (microsoft#1013) Bumps [micromatch](https://github.com/micromatch/micromatch) from 4.0.5 to 4.0.8. - [Release notes](https://github.com/micromatch/micromatch/releases) - [Changelog](https://github.com/micromatch/micromatch/blob/4.0.8/CHANGELOG.md) - [Commits](micromatch/micromatch@4.0.5...4.0.8) --- updated-dependencies: - dependency-name: micromatch dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump ruff from 0.5.7 to 0.6.2 (microsoft#1014) Bumps [ruff](https://github.com/astral-sh/ruff) from 0.5.7 to 0.6.2. - [Release notes](https://github.com/astral-sh/ruff/releases) - [Changelog](https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md) - [Commits](astral-sh/ruff@0.5.7...0.6.2) --- updated-dependencies: - dependency-name: ruff dependency-type: direct:development update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Alonso Guevara <alonsog@microsoft.com> * Ensure entity types to be str in prompt tune (microsoft#1015) * Fix weight casting during graph extraction (microsoft#1016) * Fix weight casting during graph extraction * Format * Format * Update developer guide (microsoft#1029) * Add missing config parameter for prompt tuning docs (microsoft#1017) * Improve search type hint (microsoft#1031) * update get_local_search_engine and get_global_search_engine return annotation * add semversioner file * reorder imports * fix pyright errors * revert change and ignore previous pyright error --------- Co-authored-by: wanhua.gu <wanhua.gu@wiz.ai> Co-authored-by: longyunfeigu <2514553187@qq.com> Co-authored-by: Alonso Guevara <alonsog@microsoft.com> * Patch "past" dependency issues (microsoft#1033) * Patch "past" dependency issues * Semver * Release v0.3.2 (microsoft#1034) * Update VectorStoreSearchResult score value range (microsoft#937) update VectorStoreSearchResult score comment Co-authored-by: wanhua.gu <wanhua.gu@wiz.ai> Co-authored-by: Alonso Guevara <alonsog@microsoft.com> * Add source URL to the package (microsoft#927) Co-authored-by: Josh Bradley <joshbradley@microsoft.com> Co-authored-by: Alonso Guevara <alonsog@microsoft.com> * Fix/text unit code cleanup (microsoft#1040) * Optimized _build_text_unit_context function for improved time and space complexity Refactored the _build_text_unit_context function to enhance its performance and efficiency. Key optimizations include: 1. Set for Text Unit IDs: Replaced list-based membership checks with a set (text_unit_ids_set) to achieve constant-time complexity for membership checks, reducing overall time complexity. 2. Direct Attribute Removal: Utilized pop with a default value (None) to directly remove attributes entity_order and num_relationships from text units, minimizing overhead and avoiding potential KeyError. 3. Default Dictionary for Entity Orders: Implemented defaultdict for managing entity orders, simplifying the ranking process and improving readability. These improvements result in a more efficient function with better performance, especially when handling large datasets or numerous selected entities. The refactoring ensures that the core functionality remains unchanged while enhancing both time and space complexity. * Format * Ruff fixes * semver --------- Co-authored-by: arjun-234 <arjun.darji@yudiz.com> Co-authored-by: Arjun D. <103405661+arjun-234@users.noreply.github.com> * Fix INIT_YAML embeddings default settings (microsoft#1039) Co-authored-by: Thanh Long Phan <long.phan@dida.do> Co-authored-by: Alonso Guevara <alonsog@microsoft.com> * Bump pytest-asyncio from 0.23.8 to 0.24.0 (microsoft#1022) Bumps [pytest-asyncio](https://github.com/pytest-dev/pytest-asyncio) from 0.23.8 to 0.24.0. - [Release notes](https://github.com/pytest-dev/pytest-asyncio/releases) - [Commits](pytest-dev/pytest-asyncio@v0.23.8...v0.24.0) --- updated-dependencies: - dependency-name: pytest-asyncio dependency-type: direct:development update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Alonso Guevara <alonsog@microsoft.com> * Bump json-repair from 0.26.0 to 0.28.4 (microsoft#1044) Bumps [json-repair](https://github.com/mangiucugna/json_repair) from 0.26.0 to 0.28.4. - [Release notes](https://github.com/mangiucugna/json_repair/releases) - [Commits](mangiucugna/json_repair@0.26.0...v0.28.4) --- updated-dependencies: - dependency-name: json-repair dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump lancedb from 0.11.0 to 0.12.0 (microsoft#1024) Bumps [lancedb](https://github.com/lancedb/lancedb) from 0.11.0 to 0.12.0. - [Release notes](https://github.com/lancedb/lancedb/releases) - [Changelog](https://github.com/lancedb/lancedb/blob/main/release_process.md) - [Commits](lancedb/lancedb@python-v0.11.0...python-v0.12.0) --- updated-dependencies: - dependency-name: lancedb dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump textual from 0.76.0 to 0.78.0 (microsoft#1038) Bumps [textual](https://github.com/Textualize/textual) from 0.76.0 to 0.78.0. - [Release notes](https://github.com/Textualize/textual/releases) - [Changelog](https://github.com/Textualize/textual/blob/main/CHANGELOG.md) - [Commits](Textualize/textual@v0.76.0...v0.78.0) --- updated-dependencies: - dependency-name: textual dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Fix/entity extraction strategy (microsoft#1046) * fix strategy config in entity_extraction * update init content --------- Co-authored-by: KylinMountain <kose2livs@gmail.com> * fix for issue 515 (microsoft#925) * fix for issue 515 * semver impact document --------- Co-authored-by: Kanishk Tyagi <kanishktyagi@Kanishks-MacBook-Pro.local> Co-authored-by: Alonso Guevara <alonsog@microsoft.com> * docs: update manual_prompt_tuning.md (microsoft#963) paramater -> parameter Co-authored-by: Alonso Guevara <alonsog@microsoft.com> * Update indexer_adapters.py (microsoft#895) Update the lines 71 and 72 before: entity_df["community"] = entity_df["community"].fillna(-1) entity_df["community"] = entity_df["community"].astype(int) after: entity_df.loc[:, "community"] = entity_df["community"].fillna(-1) entity_df.loc[:, "community"] = entity_df["community"].astype(int) Co-authored-by: Alonso Guevara <alonsog@microsoft.com> * Fix circular dependency on prompt tune api (microsoft#1054) * Bump notebook from 7.2.1 to 7.2.2 (microsoft#1055) Bumps [notebook](https://github.com/jupyter/notebook) from 7.2.1 to 7.2.2. - [Release notes](https://github.com/jupyter/notebook/releases) - [Changelog](https://github.com/jupyter/notebook/blob/@jupyter-notebook/tree@7.2.2/CHANGELOG.md) - [Commits](https://github.com/jupyter/notebook/compare/@jupyter-notebook/tree@7.2.1...@jupyter-notebook/tree@7.2.2) --- updated-dependencies: - dependency-name: notebook dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Alonso Guevara <alonsog@microsoft.com> * Bump jupyterlab from 4.2.4 to 4.2.5 (microsoft#1056) Bumps [jupyterlab](https://github.com/jupyterlab/jupyterlab) from 4.2.4 to 4.2.5. - [Release notes](https://github.com/jupyterlab/jupyterlab/releases) - [Changelog](https://github.com/jupyterlab/jupyterlab/blob/@jupyterlab/lsp@4.2.5/CHANGELOG.md) - [Commits](https://github.com/jupyterlab/jupyterlab/compare/@jupyterlab/lsp@4.2.4...@jupyterlab/lsp@4.2.5) --- updated-dependencies: - dependency-name: jupyterlab dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Update Prompt Tuning docs (microsoft#1057) * Update Prompt Tuning docs * Semver * Update bash example in docs for prompt tune (microsoft#1059) * Semver * Update bash command * Fix img for autotune (microsoft#1060) * Fix img for autotune * Add line breaks to tune docs * More line breaks * Fix img width (microsoft#1061) * Consistent config load_config (microsoft#1065) * Consistent config load_config - Provide a consistent way to load configuration - Resolve potential timestamp directories upfront upon config object creation - Add unit tests for resolving timestamp directories - Resolves microsoft#599 - Resolves microsoft#1049 * fix formatting issues * remove unnecessary path resolution * fix smoke tests * update prompts to use load_config * Update none checks * Update none checks * Update searching for config method signature * Update unit tests * fix formatting issues * fix setting base_dir to full paths when not using file system. (microsoft#1096) * fix setting base_dir to full paths when not using file system. * add general resolve_path * Clean and organize run index code (microsoft#1090) * Create entypoint for cli and api (microsoft#1067) * Add cli and api entrypoints for update index * Semver * Update docs * Run tests on feature branch main * Better /main handling in tests * Clean and organize run index code * Ruff fix * Pyright fix * Format fixes * Pyright fix * Format * Fix integ tests * Fix ruff * Reorganize and clean up * Load query from blob (microsoft#1095) * Moved query loading from file to helper function * added loading parquets from blob to function * resolved adlfs async error * debugging cleanup and small fixes * added connection string support * semversioner and ruff fixes * completed testing for merge with main * more ruff changes * fixed unbound vars warning * rewrote function to use storage utils * removed unused vars --------- Co-authored-by: Kenny Zhang <zhangken@microsoft.com> * Update create_pipeline_config.py (microsoft#1108) * Update create_pipeline_config.py Order switched to ensure that user settings at runtime take precedence. * Updated semversioner. * release v0.3.3 (microsoft#1116) * Deep copy txt units on local search to avoid race conditions (microsoft#1118) * Deep copy txt units on local search to avoid race conditions * Format * Fix summarization including empty descriptions (microsoft#1124) * Fix summarization including empty descriptions * Update * Release v0.3.4 (microsoft#1125) --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Alonso Guevara <alonsog@microsoft.com> Co-authored-by: Josh Bradley <joshbradley@microsoft.com> Co-authored-by: wanhua.gu <wanhua.gu@wiz.ai> Co-authored-by: longyunfeigu <2514553187@qq.com> Co-authored-by: Konstantin Gukov <gukkos@gmail.com> Co-authored-by: arjun-234 <arjun.darji@yudiz.com> Co-authored-by: Arjun D. <103405661+arjun-234@users.noreply.github.com> Co-authored-by: TLP <104315397+TLongP@users.noreply.github.com> Co-authored-by: Thanh Long Phan <long.phan@dida.do> Co-authored-by: fantom845 <77169323+fantom845@users.noreply.github.com> Co-authored-by: Kanishk Tyagi <kanishktyagi@Kanishks-MacBook-Pro.local> Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com> Co-authored-by: guangxiangdebizi <154864206+guangxiangdebizi@users.noreply.github.com> Co-authored-by: Derek Worthen <worthend.derek@gmail.com> Co-authored-by: KennyZhang1 <90438893+KennyZhang1@users.noreply.github.com> Co-authored-by: Kenny Zhang <zhangken@microsoft.com> Co-authored-by: Doug Orbaker <107270698+dorbaker@users.noreply.github.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
The errors during the indexing process that led to rebuilding once again have been quite frustrating, so I specified the
--resume
parameter in the CLI to avoid repeatedly consuming tokens to build the index. However, instead of using the timestamp in the example, I customized a name, like acc (since there are too many previously generated timestamp directories under this directory, a custom name makes it more noticeable). After successfully completing the index build, I attempted to use global search with the--data
parameter, and it worked well. However, when I used local search with the same--data
parameter, I received a ValueError.For example, the CLI command is as follows:
The ValueError originates from graphrag/config/resolve_timestamp_path.py at lines 76-78:
This is very confusing since I didn’t use a timestamp as the value for the
--data
parameter, so why is it prompting that none were found?Upon inspecting the code, I discovered that in the method
_configure_paths_and_settings
within the graphrag/query/cli.py file, a conditional check is already performed based on the--data
parameter:Further on, the local search methods in both graphrag/query/cli.py and graphrag/query/api.py repeatedly call
resolve_timestamp_path
, which led to the ValueError mentioned earlier.As a result, I made modifications to the code and submitted this PR.
I modified the local_search function’s parameters to pass in the data_path parameter, which was previously handled in
_configure_paths_and_settings
in cli.py, asresolved_base_dir
. This is only used for thelancedb_dir
variable.Related Issues
I did not find any related issues.
Checklist
Thank you for taking the time to review this PR.
I appreciate your consideration of these changes.