Feature 122 extend research context #123

ydamit · 2024-06-24T14:02:06Z

Added three new files to handle the Usecase Model, Import Port, and Usecase code for a new feature to allow for extending existing Research Contexts. Since adding new source data to an existing Research Context isn't a practical solution for several reasons, this feature expects that the end user will provide new data sources together with identifying information about the existing Research Context, which the Usecase can then use to create a new Research Context by combining (and deduplicating) the new source data with the source data already present in the provided Research Context.

These files were created by cloning and adapting the corresponding files for the New Research Context feature. Models outside the scope of this training exercise (e.g. View Model) were not updated. A new error was created for the scenario in which no actually new source data is provided (i.e. all provided source data is duplicative of source data already found in the existing Research Context).

alebg · 2024-06-25T21:02:31Z

lib/core/usecase/extend_research_context_usecase.py

+
+            new_client_source_data_list = new_client_source_data_list_dto.data
+
+            if not isinstance(new_client_source_data_list, list):


@maany quick question here: is this check with isinstance OK? Don't know if I'm misremembering, but I think you had some opinion against using this kind of check at runtime

@ydamit so, the DTO you're getting is a pydantic model (see ListSourceDataDTO at lib/core/dto/client_repository_dto.py), so pydantic already enforces at runtime that data is in fact a list. I.e., we don't need this check with isinstance. What you'll need to check instead is that the list is not empty (and thanks to pydantic, if it's not empty, it's guaranteed to be a list of SourceData in particular).

That's pretty convenient! Do we still need it, then, in the New Research Context feature (which is where I copied it from)? Or is there something about that use case that does require it, that's different here?

Oh, my bad, that other one should be a check of a non-empty list too. There's some more cleanup / refactoring to do in the current KP, but we'll tackle that some other day. For now I'll just open a different issue.

alebg · 2024-06-25T21:10:17Z

lib/core/usecase_models/extend_research_context_usecase_models.py

+        description="Title of the new research context to be created to include new data sources."
+    )
+    new_research_context_description: str = Field(description="Description of the research context to be created.")
+    client_sub: str = Field(description="SUB of the client for which the research context is to be created.")


@ydamit There's a repository function in sqla_research_context_repository called get_research_context_client. You can use this one, instead of asking for a client_sub as an input. If you do this, you'll directly get the client you need and you wouldn't need to check if the client provided as an input matches the client of the old research_context

Noted! But since we need client_sub to create new research contexts, I just want to triple-check: we never want to allow one SUB to extend a research context originally created by another SUB?

Actually, since we're at it, you're right, we might want this eventually. If we implement this, then we'll change your check for some sort of permission check. So no problem, please just leave this as is!

alebg

@ydamit Great job! Looks organized enough. Apart from the inline comment I made, please:

Please provide a smaller amount of commits that are more conceptually unified. Navigating this git history will be kinda messy as it is right now. We don't have a specific standard at the moment (but we might soon), so please just take inspiration from our previous commits. If you're comfortable with git, to sort this out, please use git rebase to squash your commits into more organized ones (anything between 1 and 3 commits should be enough for all you've done so far). Then, force push back to this branch. If you don't feel comfortable enough with git to do this (rebasing + force pushes are potentially destructive of your work), let me know and we'll do it over a call
Also please provide a test for this usecase. We haven't done many usecase-specific tests, but since this one doesn't have new repositories, it is worth it. The usecase tests are at tests/api. The test_new_message_feature.py contains a substantial usecase test you can check to find your way about instantiating what you need. Let me know if you have any questions about this
Once these changes are done, I'll show you how to implement the rest of the chain (presenter, controller, and API endpoint) in a call. You'll need to implement those in this same branch before we accept this PR

ydamit · 2024-06-25T21:43:29Z

Sounds good, @alebg , I'll work on all this tomorrow!

…Usecase

…reqs for feature test

alebg · 2024-06-28T18:56:31Z

@ydamit Great job!!! next up would be implementing the rest to have the whole feature

…re Container

tests/api/test_extend_research_context_feature.py

ydamit · 2024-07-01T22:52:43Z

@alebg , it took me until it was pretty late for you to figure out the flaw in my logic and fix everything. But it should be done now! All tests passing, and no references remaining anywhere to any component of the New Research Context feature!

alebg · 2024-07-02T14:57:14Z

@ydamit looks great! Only piece missing that we discussed yesterday is a test to check if you try to "extend" an existing research context but not providing truly new source data IDs. I ran locally the test below (appended to the file with your tests, it's just your usecase success test with a slight modification towards the end), and it passed, so that means the code is handling that case properly now. But just please add a test for this error case. Mine was like this, but feel free to change it if you wish, and then commit it. After that we're good to merge!:

def test_error_extend_research_context_usecase_no_new_source_data(
    app_initialization_container: ApplicationContainer,
    db_session: TDatabaseFactory,
    fake: Faker,
    fake_client_with_research_context_and_sources: SQLAClient,  # Fake client with Research Context
    fake_source_data_list: list[SQLAResearchContext],
) -> None:
    usecase: ExtendResearchContextUseCase = app_initialization_container.extend_research_context_feature.usecase()

    assert usecase is not None

    client_with_context = fake_client_with_research_context_and_sources
    llm_name = fake.name()

    existing_research_context = random.choice(client_with_context.research_contexts)
    # Populate source data into the existing research context
    existing_source_data_ids = []
    for source_datum in client_with_context.source_data:
        existing_research_context.source_data.append(source_datum)
        existing_source_data_ids.append(source_datum.id)

    incoming_source_data_list = fake_source_data_list
    new_source_data_list: List[SQLASourceData] = []
    for source_data in incoming_source_data_list:
        while source_data.id is None or source_data.id in existing_source_data_ids + new_source_data_list:
            source_data.id = random.randint(1001, 2000)
        new_source_data_list.append(source_data)

    client_with_context.source_data.extend(new_source_data_list)
    new_source_data_ids = [sd.id for sd in new_source_data_list]
    assert new_source_data_ids != []

    llm = SQLALLM(
        llm_name=llm_name,
        research_contexts=client_with_context.research_contexts,
    )

    # Make title unique to query later
    existing_research_context_title = f"{existing_research_context.title}-{uuid.uuid4()}"
    existing_research_context.title = existing_research_context_title

    with db_session() as session:
        client_with_context.save(session=session, flush=True)
        session.commit()

    with db_session() as session:
        queried_existing_research_context = (
            session.query(SQLAResearchContext).filter_by(title=existing_research_context_title).first()
        )

        assert queried_existing_research_context is not None
        queried_existing_source_data_list = queried_existing_research_context.source_data
        queried_existing_source_data_ids = [sd.id for sd in queried_existing_source_data_list]

        queried_client = session.get(SQLAClient, queried_existing_research_context.client_id)

        assert queried_client is not None

        # Details for new Research Context data
        new_research_context_title = f"{fake.name().replace(' ', '_')}-{uuid.uuid4()}"
        new_research_context_description = f"{fake.text()}-{uuid.uuid4()}"

        request = ExtendResearchContextRequest(
            new_research_context_title=new_research_context_title,
            new_research_context_description=new_research_context_description,
            client_sub=queried_client.sub,
            llm_name=llm_name,
            new_source_data_ids=queried_existing_source_data_ids,
            existing_research_context_id=queried_existing_research_context.id,
        )
        response = usecase.execute(request=request)

        assert response is not None
        assert isinstance(response, ExtendResearchContextError)

alebg

Great job @ydamit ! Merging this :)

tests/api/test_extend_research_context_feature.py

ydamit requested review from alebg June 24, 2024 19:06

alebg reviewed Jun 25, 2024

View reviewed changes

alebg requested changes Jun 25, 2024

View reviewed changes

ydamit force-pushed the feature-122-extend_research_context branch 3 times, most recently from 972c18d to f603aa1 Compare June 26, 2024 15:46

Extend Research Context Feature: Usecase Models, Primary Input Port, …

937db2c

…Usecase

ydamit force-pushed the feature-122-extend_research_context branch from 82ee9c3 to 937db2c Compare June 26, 2024 15:47

ydamit added 3 commits June 26, 2024 14:44

Extend Research Context Feature: update containers and YAML with min …

2fa4c6e

…reqs for feature test

Extend Research Context Feature: Initial tests

3c11511

Extend Research Context Feature: Additional changes to satisfy testing

0dfcb64

ydamit force-pushed the feature-122-extend_research_context branch from 6fc208e to 0dfcb64 Compare June 28, 2024 00:12

ydamit added 2 commits June 28, 2024 18:18

Extend Research Context Feature: Presenter, Controller, config, Featu…

612966a

…re Container

Extend Research Context Feature: Controller Test

5a67594

ydamit commented Jul 1, 2024

View reviewed changes

tests/api/test_extend_research_context_feature.py Show resolved Hide resolved

ydamit commented Jul 1, 2024

View reviewed changes

tests/api/test_extend_research_context_feature.py Show resolved Hide resolved

ydamit added 2 commits July 1, 2024 18:25

Extend Research Context Feature: test and error improvements

1fc26ee

Extend Research Context Feature: View model and fastAPI endpoint config

2517f70

Extend Research Context Feature: test redundant source data error

b0abd5f

ydamit force-pushed the feature-122-extend_research_context branch from 8238b56 to b0abd5f Compare July 3, 2024 16:22

maany approved these changes Jul 4, 2024

View reviewed changes

alebg approved these changes Jul 4, 2024

View reviewed changes

tests/api/test_extend_research_context_feature.py Show resolved Hide resolved

alebg merged commit 2db6843 into dream-aim-deliver:main Jul 4, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature 122 extend research context #123

Feature 122 extend research context #123

ydamit commented Jun 24, 2024

alebg Jun 25, 2024

alebg Jun 25, 2024

ydamit Jun 25, 2024

alebg Jun 26, 2024

alebg Jun 25, 2024

ydamit Jun 25, 2024

alebg Jun 25, 2024

alebg left a comment

ydamit commented Jun 25, 2024

alebg commented Jun 28, 2024

ydamit commented Jul 1, 2024

alebg commented Jul 2, 2024

alebg left a comment


		new_client_source_data_list = new_client_source_data_list_dto.data

		if not isinstance(new_client_source_data_list, list):

Feature 122 extend research context #123

Feature 122 extend research context #123

Conversation

ydamit commented Jun 24, 2024

alebg Jun 25, 2024

Choose a reason for hiding this comment

alebg Jun 25, 2024

Choose a reason for hiding this comment

ydamit Jun 25, 2024

Choose a reason for hiding this comment

alebg Jun 26, 2024

Choose a reason for hiding this comment

alebg Jun 25, 2024

Choose a reason for hiding this comment

ydamit Jun 25, 2024

Choose a reason for hiding this comment

alebg Jun 25, 2024

Choose a reason for hiding this comment

alebg left a comment

Choose a reason for hiding this comment

ydamit commented Jun 25, 2024

alebg commented Jun 28, 2024

ydamit commented Jul 1, 2024

alebg commented Jul 2, 2024

alebg left a comment

Choose a reason for hiding this comment