Skip to content

neo4j_graphrag.exceptions.SchemaExtractionError: LLM response is not valid JSON when using Ollama with Simple KGPipeline #376

@Frederic-Zhou

Description

@Frederic-Zhou

Description:
I am trying to use the SimpleKGPipeline to build a knowledge graph using the Ollama LLM for entity and relation extraction. However, I am encountering the following error:
neo4j_graphrag.exceptions.SchemaExtractionError: LLM response is not valid JSON.

Expected Behavior:
I expect that the SimpleKGPipeline will be able to automatically extract the schema from the text and build the knowledge graph successfully.

Actual Behavior:
The SchemaExtractionError is raised due to an invalid LLM response format.

Questions:
1. Is this error occurring because the Ollama model does not support the expected JSON response format for schema extraction?
2. Which Ollama models support schema extraction with valid JSON response formats?
3. Are there any specific configurations or models that I should be using to avoid this issue?

Additional Information:
• I am used the qwen3 and Mistral model for LLM and the bge-m3 model for embeddings with Ollama.
• The issue seems to be related to the schema extraction step where the LLM response is not in the expected JSON format.

Any help or insights would be greatly appreciated!

Full Code:

import asyncio
from pathlib import Path
import neo4j
from neo4j_graphrag.experimental.pipeline.kg_builder import SimpleKGPipeline
from neo4j_graphrag.llm import OllamaLLM
from neo4j_graphrag.embeddings import OllamaEmbeddings


# Configuration
NEO4J_URI = "neo4j://127.0.0.1:7687"
NEO4J_USER = "neo4j"
NEO4J_PASSWORD = "********"
NEO4J_DATABASE = "neo4j"
OLLAMA_BASE_URL = "http://localhost:11434"
LLM_MODEL = "qwen3"
EMBEDDING_MODEL = "bge-m3"


async def build_knowledge_graph(
    text=None, file_path=None, perform_entity_resolution=True
):
    """Build knowledge graph from text or PDF file using automatic schema extraction"""

    # Validate input
    if not text and not file_path:
        raise ValueError("You must provide either text or file_path.")
    if text and file_path:
        raise ValueError("Only one of text or file_path should be provided.")
    if file_path and not file_path.exists():
        raise FileNotFoundError(f"File does not exist: {file_path}")

    # Initialize components
    driver = neo4j.GraphDatabase.driver(NEO4J_URI, auth=(NEO4J_USER, NEO4J_PASSWORD))
    llm = OllamaLLM(
        model_name=LLM_MODEL,
        host=OLLAMA_BASE_URL,
        model_params={
            "temperature": 0.1,
            "max_tokens": 2000,
            "response_format": {"type": "json_object"},
        },
    )
    embedder = OllamaEmbeddings(model=EMBEDDING_MODEL, host=OLLAMA_BASE_URL)

    try:
        # Create and run pipeline
        kg_builder = SimpleKGPipeline(
            llm=llm,
            driver=driver,
            embedder=embedder,
            from_pdf=bool(file_path),
            schema="EXTRACTED",
            perform_entity_resolution=perform_entity_resolution,
            neo4j_database=NEO4J_DATABASE,
            on_error="IGNORE",
        )

        # Process input
        if file_path:
            print(f"Processing PDF file: {file_path}")
            await kg_builder.run_async(file_path=str(file_path))
        else:
            print(f"Processing text content...")
            await kg_builder.run_async(text=text)

        print("Knowledge graph construction completed!")

    finally:
        driver.close()


# Example usage
async def example_usage():
    """Example of how to use the functions"""

    literature_text = """
    Li Bai was a famous poet of the Tang Dynasty, known as the 'Poet Immortal'. His representative works include 'Quiet Night Thought', 'Bring in the Wine', etc. Li Bai's poetry style is bold and unrestrained, and has had a profound impact on later literature.
    """

    await build_knowledge_graph(text=literature_text, perform_entity_resolution=False)


if __name__ == "__main__":
    # asyncio.run(main())

    # Run example (uncomment the line below):
    asyncio.run(example_usage())

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions