[Bug]: TypeError: ColbertIndex._build_index_from_nodes() got an unexpected keyword argument 'index_name' #14398

gavinHuang · 2024-06-26T06:53:16Z

Bug Description

Trying to create ColbertIndex, via:

colBertIndex = ColbertIndex(nodes=nodes) and
ColbertIndex.from_documents(documents=docs)

Both getting error:

ColbertIndex._build_index_from_nodes() got an unexpected keyword argument 'index_name'

checked BaseIndex at llama-index-core/llama_index/core/indices/base.py
Neither init function nor build_index_from_nodes take index_name as a parameter from all history.

Version

0.10.30

Steps to Reproduce

using the following package (was 0.10.30 for core, update to latest one which is 0.10.50 by the time I create this):

llama-index
llama-index-embeddings-azure-openai
llama-index-llms-azure-openai
llama-index-embeddings-huggingface
llama-index-embeddings-instructor
llama-index-readers-file
llama-index-indices-managed-colbert
colbert-ir

extra installation via conda pytorch channle:
pytorch-cpu=1.13
run the following code

import pandas as pd
df = pd.read_csv(pathlib.Path("./data/products.csv"), sep=",").drop_duplicates()
rows = df.to_dict("records")
nodes = []
for row in rows:
        doc = Document(
            text=row["DISPLAYNAME"], 
            metadata={"ITEM_NUMBER": row["ITEM_NUMBER"],"DESCRIPTION":row["DESCRIPTION"]}
        )
        nodes.append(doc)
colBertIndex = ColbertIndex.from_documents(documents=nodes)

Relevant Logs/Tracbacks

File "/mnt/c/Users/user_name/projects/GenAI/env_name/retriever.py", line 110, in <module>
    load_products()
  File "/mnt/c/Users/user_name/projects/GenAI/env_name/retriever.py", line 54, in load_products
    colBertIndex = ColbertIndex.from_documents(documents=nodes)
  File "/home/user_name/miniconda3/envs/env_name/lib/python3.10/site-packages/llama_index/core/indices/base.py", line 145, in from_documents
    return cls(
  File "/home/user_name/miniconda3/envs/env_name/lib/python3.10/site-packages/llama_index/indices/managed/colbert/base.py", line 82, in __init__
    super().__init__(
  File "/home/user_name/miniconda3/envs/env_name/lib/python3.10/site-packages/llama_index/core/indices/base.py", line 94, in __init__
    index_struct = self.build_index_from_nodes(
  File "/home/user_name/miniconda3/envs/env_name/lib/python3.10/site-packages/llama_index/core/indices/base.py", line 216, in build_index_from_nodes
    return self._build_index_from_nodes(nodes, **build_kwargs)
TypeError: ColbertIndex._build_index_from_nodes() got an unexpected keyword argument 'index_name'

The text was updated successfully, but these errors were encountered:

dosubot · 2024-06-26T07:02:24Z

@gavinHuang Hello there! I'm here to assist you with any bugs, questions, or contributions. Let's work together to solve this issue.

The TypeError you're encountering is due to the index_name parameter being passed to the _build_index_from_nodes method, which does not accept it as an argument. This issue arises because the ColbertIndex class constructor includes index_name as a parameter, but the _build_index_from_nodes method does not.

To resolve this issue, you need to ensure that the index_name parameter is not passed to the _build_index_from_nodes method. Here is a modified version of your code that should work:

import pandas as pd
from llama_index.core.schema import BaseNode
from llama_index.indices.managed.colbert import ColbertIndex

# Load data
df = pd.read_csv("./data/products.csv", sep=",").drop_duplicates()
rows = df.to_dict("records")

# Convert rows to BaseNode objects
nodes = []
for row in rows:
    node = BaseNode(
        node_id=row["ITEM_NUMBER"],
        content=row["DISPLAYNAME"],
        metadata={"DESCRIPTION": row["DESCRIPTION"]}
    )
    nodes.append(node)

# Create ColbertIndex using nodes
colBertIndex = ColbertIndex(nodes=nodes)

Ensure that you have the necessary dependencies installed, including the colbert-ir library. If you need to convert documents to BaseNode objects, you can use the following approach:

from llama_index.core.schema import BaseNode

# Assuming `docs` is a list of document contents
docs = ["Document content 1", "Document content 2"]

# Convert documents to BaseNode objects
nodes = [BaseNode(node_id=str(i), content=doc) for i, doc in enumerate(docs)]

# Create ColbertIndex using nodes
colBertIndex = ColbertIndex(nodes=nodes)

This should help you avoid the TypeError by ensuring that the index_name parameter is not incorrectly passed to the _build_index_from_nodes method ^[1]^[2].

To continue talking to Dosu, mention @dosu.

gavinHuang · 2024-06-26T07:33:30Z

@gavinHuang Hello there! I'm here to assist you with any bugs, questions, or contributions. Let's work together to solve this issue.

The TypeError you're encountering is due to the index_name parameter being passed to the _build_index_from_nodes method, which does not accept it as an argument. This issue arises because the ColbertIndex class constructor includes index_name as a parameter, but the _build_index_from_nodes method does not.

To resolve this issue, you need to ensure that the index_name parameter is not passed to the _build_index_from_nodes method. Here is a modified version of your code that should work:
import pandas as pd
from llama_index.core.schema import BaseNode
from llama_index.indices.managed.colbert import ColbertIndex

# Load data
df = pd.read_csv("./data/products.csv", sep=",").drop_duplicates()
rows = df.to_dict("records")

# Convert rows to BaseNode objects
nodes = []
for row in rows:
    node = BaseNode(
        node_id=row["ITEM_NUMBER"],
        content=row["DISPLAYNAME"],
        metadata={"DESCRIPTION": row["DESCRIPTION"]}
    )
    nodes.append(node)

# Create ColbertIndex using nodes
colBertIndex = ColbertIndex(nodes=nodes)
Ensure that you have the necessary dependencies installed, including the colbert-ir library. If you need to convert documents to BaseNode objects, you can use the following approach:
from llama_index.core.schema import BaseNode

# Assuming `docs` is a list of document contents
docs = ["Document content 1", "Document content 2"]

# Convert documents to BaseNode objects
nodes = [BaseNode(node_id=str(i), content=doc) for i, doc in enumerate(docs)]

# Create ColbertIndex using nodes
colBertIndex = ColbertIndex(nodes=nodes)
This should help you avoid the TypeError by ensuring that the index_name parameter is not incorrectly passed to the _build_index_from_nodes method [1][2].

To continue talking to Dosu, mention @dosu.

Understanding it's a bot generated post, but it didn't get the point.
No one is passing the index_name parameter, but the colbertindex class itself, via pr: #9656, by commits: af2910e

dosubot · 2024-06-26T07:43:07Z

The TypeError you're encountering is due to the index_name parameter being passed internally by the ColbertIndex class itself, not by your code. This issue was introduced in a recent commit and pull request.

To resolve this, you need to ensure that the index_name parameter is not passed to the _build_index_from_nodes method. Here is the relevant part of the ColbertIndex class:

def _build_index_from_nodes(self, nodes: Sequence[BaseNode]) -> IndexDict:
    """Generate a PLAID index from the ColBERT checkpoint via its hugging face model_name."""
    from colbert import Indexer, Searcher
    from colbert.infra import ColBERTConfig, Run, RunConfig

    index_struct = IndexDict()

    docs_list = []
    for i, node in enumerate(nodes):
        docs_list.append(node.get_content())
        self._docs_pos_to_node_id[i] = node.node_id
        index_struct.add_node(node, text_id=str(i))

    with Run().context(
        RunConfig(index_root=self.index_path, nranks=self.ranks, gpus=self.gpus)
    ):
        config = ColBERTConfig(
            doc_maxlen=self.doc_maxlen,
            query_maxlen=self.query_maxlen,
            nbits=self.nbits,
            kmeans_niters=self.kmeans_niters,
        )
        indexer = Indexer(checkpoint=self.model_name, config=config)
        indexer.index(name=self.index_name, collection=docs_list, overwrite=True)
        self.store = Searcher(
            index=self.index_name, collection=docs_list, checkpoint=self.model_name
        )
    return index_struct

The _build_index_from_nodes method only accepts nodes as a parameter. Ensure that when you call this method, you do not pass index_name or any other unexpected keyword arguments.

Additionally, when initializing the ColbertIndex, you can set the index_name as an attribute of the class, which will be used internally by the _build_index_from_nodes method:

class ColbertIndex(BaseIndex[IndexDict]):
    def __init__(
        self,
        nodes: Optional[Sequence[BaseNode]] = None,
        objects: Optional[Sequence[IndexNode]] = None,
        index_struct: Optional[IndexDict] = None,
        storage_context: Optional[StorageContext] = None,
        model_name: str = "colbert-ir/colbertv2.0",
        index_name: str = "",
        show_progress: bool = False,
        nbits: int = 2,
        gpus: int = 0,
        ranks: int = 1,
        doc_maxlen: int = 120,
        query_maxlen: int = 60,
        kmeans_niters: int = 4,
        **kwargs: Any,
    ) -> None:
        self.model_name = model_name
        self.index_path = "storage/colbert_index"
        self.index_name = index_name
        self.nbits = nbits
        self.gpus = gpus
        self.ranks = ranks
        self.doc_maxlen = doc_maxlen
        self.query_maxlen = query_maxlen
        self.kmeans_niters = kmeans_niters
        self._docs_pos_to_node_id: Dict[int, str] = {}
        super().__init__(
            nodes=nodes,
            index_struct=index_struct,
            index_name=index_name,
            storage_context=storage_context,
            show_progress=show_progress,
            objects=objects,
            **kwargs,
        )

By ensuring that index_name is set during initialization and not passed to _build_index_from_nodes, you can avoid the TypeError ^[1]^[2].

To continue talking to Dosu, mention @dosu.

gavinHuang added bug Something isn't working triage Issue needs to be triaged/prioritized labels Jun 26, 2024

logan-markewich added P2 and removed triage Issue needs to be triaged/prioritized labels Jun 27, 2024

tibor-reiss mentioned this issue Jul 2, 2024

Fix: follow up with kwargs propagation due to change in parent class #14522

Merged

20 tasks

logan-markewich closed this as completed in #14522 Jul 3, 2024

Aaronchangji mentioned this issue Jul 8, 2024

[Bug]: KnowledgeGraphIndex._build_index_from_nodes() got an unexpected keyword argument 'space_name' #14629

Open

marchburg mentioned this issue Aug 7, 2024

Fix Colbert Indexing with Llama Index climateandtech/factchecker#18

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: TypeError: ColbertIndex._build_index_from_nodes() got an unexpected keyword argument 'index_name' #14398

[Bug]: TypeError: ColbertIndex._build_index_from_nodes() got an unexpected keyword argument 'index_name' #14398

gavinHuang commented Jun 26, 2024 •

edited

Loading

dosubot bot commented Jun 26, 2024

gavinHuang commented Jun 26, 2024

dosubot bot commented Jun 26, 2024

[Bug]: TypeError: ColbertIndex._build_index_from_nodes() got an unexpected keyword argument 'index_name' #14398

[Bug]: TypeError: ColbertIndex._build_index_from_nodes() got an unexpected keyword argument 'index_name' #14398

Comments

gavinHuang commented Jun 26, 2024 • edited Loading

Bug Description

Version

Steps to Reproduce

Relevant Logs/Tracbacks

dosubot bot commented Jun 26, 2024

gavinHuang commented Jun 26, 2024

dosubot bot commented Jun 26, 2024

gavinHuang commented Jun 26, 2024 •

edited

Loading