-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature 122 extend research context #123
Feature 122 extend research context #123
Conversation
|
||
new_client_source_data_list = new_client_source_data_list_dto.data | ||
|
||
if not isinstance(new_client_source_data_list, list): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maany quick question here: is this check with isinstance
OK? Don't know if I'm misremembering, but I think you had some opinion against using this kind of check at runtime
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ydamit so, the DTO you're getting is a pydantic model (see ListSourceDataDTO
at lib/core/dto/client_repository_dto.py
), so pydantic already enforces at runtime that data
is in fact a list. I.e., we don't need this check with isinstance
. What you'll need to check instead is that the list is not empty (and thanks to pydantic, if it's not empty, it's guaranteed to be a list of SourceData
in particular).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's pretty convenient! Do we still need it, then, in the New Research Context feature (which is where I copied it from)? Or is there something about that use case that does require it, that's different here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, my bad, that other one should be a check of a non-empty list too. There's some more cleanup / refactoring to do in the current KP, but we'll tackle that some other day. For now I'll just open a different issue.
description="Title of the new research context to be created to include new data sources." | ||
) | ||
new_research_context_description: str = Field(description="Description of the research context to be created.") | ||
client_sub: str = Field(description="SUB of the client for which the research context is to be created.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ydamit There's a repository function in sqla_research_context_repository called get_research_context_client
. You can use this one, instead of asking for a client_sub
as an input. If you do this, you'll directly get the client
you need and you wouldn't need to check if the client
provided as an input matches the client
of the old research_context
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted! But since we need client_sub
to create new research contexts, I just want to triple-check: we never want to allow one SUB to extend a research context originally created by another SUB?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, since we're at it, you're right, we might want this eventually. If we implement this, then we'll change your check for some sort of permission check. So no problem, please just leave this as is!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ydamit Great job! Looks organized enough. Apart from the inline comment I made, please:
-
Please provide a smaller amount of commits that are more conceptually unified. Navigating this git history will be kinda messy as it is right now. We don't have a specific standard at the moment (but we might soon), so please just take inspiration from our previous commits. If you're comfortable with git, to sort this out, please use git rebase to
squash
your commits into more organized ones (anything between 1 and 3 commits should be enough for all you've done so far). Then, force push back to this branch. If you don't feel comfortable enough with git to do this (rebasing + force pushes are potentially destructive of your work), let me know and we'll do it over a call -
Also please provide a test for this usecase. We haven't done many usecase-specific tests, but since this one doesn't have new repositories, it is worth it. The usecase tests are at
tests/api
. Thetest_new_message_feature.py
contains a substantial usecase test you can check to find your way about instantiating what you need. Let me know if you have any questions about this -
Once these changes are done, I'll show you how to implement the rest of the chain (presenter, controller, and API endpoint) in a call. You'll need to implement those in this same branch before we accept this PR
Sounds good, @alebg , I'll work on all this tomorrow! |
972c18d
to
f603aa1
Compare
82ee9c3
to
937db2c
Compare
6fc208e
to
0dfcb64
Compare
@ydamit Great job!!! next up would be implementing the rest to have the whole feature |
@alebg , it took me until it was pretty late for you to figure out the flaw in my logic and fix everything. But it should be done now! All tests passing, and no references remaining anywhere to any component of the New Research Context feature! |
@ydamit looks great! Only piece missing that we discussed yesterday is a test to check if you try to "extend" an existing research context but not providing truly new source data IDs. I ran locally the test below (appended to the file with your tests, it's just your usecase success test with a slight modification towards the end), and it passed, so that means the code is handling that case properly now. But just please add a test for this error case. Mine was like this, but feel free to change it if you wish, and then commit it. After that we're good to merge!: def test_error_extend_research_context_usecase_no_new_source_data(
app_initialization_container: ApplicationContainer,
db_session: TDatabaseFactory,
fake: Faker,
fake_client_with_research_context_and_sources: SQLAClient, # Fake client with Research Context
fake_source_data_list: list[SQLAResearchContext],
) -> None:
usecase: ExtendResearchContextUseCase = app_initialization_container.extend_research_context_feature.usecase()
assert usecase is not None
client_with_context = fake_client_with_research_context_and_sources
llm_name = fake.name()
existing_research_context = random.choice(client_with_context.research_contexts)
# Populate source data into the existing research context
existing_source_data_ids = []
for source_datum in client_with_context.source_data:
existing_research_context.source_data.append(source_datum)
existing_source_data_ids.append(source_datum.id)
incoming_source_data_list = fake_source_data_list
new_source_data_list: List[SQLASourceData] = []
for source_data in incoming_source_data_list:
while source_data.id is None or source_data.id in existing_source_data_ids + new_source_data_list:
source_data.id = random.randint(1001, 2000)
new_source_data_list.append(source_data)
client_with_context.source_data.extend(new_source_data_list)
new_source_data_ids = [sd.id for sd in new_source_data_list]
assert new_source_data_ids != []
llm = SQLALLM(
llm_name=llm_name,
research_contexts=client_with_context.research_contexts,
)
# Make title unique to query later
existing_research_context_title = f"{existing_research_context.title}-{uuid.uuid4()}"
existing_research_context.title = existing_research_context_title
with db_session() as session:
client_with_context.save(session=session, flush=True)
session.commit()
with db_session() as session:
queried_existing_research_context = (
session.query(SQLAResearchContext).filter_by(title=existing_research_context_title).first()
)
assert queried_existing_research_context is not None
queried_existing_source_data_list = queried_existing_research_context.source_data
queried_existing_source_data_ids = [sd.id for sd in queried_existing_source_data_list]
queried_client = session.get(SQLAClient, queried_existing_research_context.client_id)
assert queried_client is not None
# Details for new Research Context data
new_research_context_title = f"{fake.name().replace(' ', '_')}-{uuid.uuid4()}"
new_research_context_description = f"{fake.text()}-{uuid.uuid4()}"
request = ExtendResearchContextRequest(
new_research_context_title=new_research_context_title,
new_research_context_description=new_research_context_description,
client_sub=queried_client.sub,
llm_name=llm_name,
new_source_data_ids=queried_existing_source_data_ids,
existing_research_context_id=queried_existing_research_context.id,
)
response = usecase.execute(request=request)
assert response is not None
assert isinstance(response, ExtendResearchContextError) |
8238b56
to
b0abd5f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job @ydamit ! Merging this :)
Added three new files to handle the Usecase Model, Import Port, and Usecase code for a new feature to allow for extending existing Research Contexts. Since adding new source data to an existing Research Context isn't a practical solution for several reasons, this feature expects that the end user will provide new data sources together with identifying information about the existing Research Context, which the Usecase can then use to create a new Research Context by combining (and deduplicating) the new source data with the source data already present in the provided Research Context.
These files were created by cloning and adapting the corresponding files for the New Research Context feature. Models outside the scope of this training exercise (e.g. View Model) were not updated. A new error was created for the scenario in which no actually new source data is provided (i.e. all provided source data is duplicative of source data already found in the existing Research Context).