-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: DIA-1450: Image Classification support in adala #264
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #264 +/- ##
==========================================
+ Coverage 66.49% 67.97% +1.48%
==========================================
Files 47 47
Lines 2462 2498 +36
==========================================
+ Hits 1637 1698 +61
+ Misses 825 800 -25 ☔ View full report in Codecov by Sentry. |
|
||
def split_message_into_chunks( | ||
input_template: str, input_field_types: Dict[str, MessageChunkType], **input_fields | ||
) -> List[MessageChunk]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm tempted to ask "why not make the return type Message
?" which leads to what I assume is the reason: this is actually one particular type of Message
, and that's importantly different from the str
version.
Which leads me to: I'm generally uneasy about us having these two different paths, and ideally it'd be nice to just have like a "normalize message" function that includes both get_messages
and split_message_into_chunks
.
Generally, it feels like our pre-processing is getting pretty convoluted, and this layer seems like plenty to push us to "just have one normalize function that defines the permitted inputs, and outputs a single well-understood output type" (which I imagine would be a List of Dicts, each with a "role" and "content" entry. I'm torn on whether this should just be a pydantic model or dataclass or something though bc, while it seems clearly cleaner, I feel like we're going to have an annoying amount of serialization/deserialization overhead at that point).
Happy to hop on a call to discuss all this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, exactly, the multimodal vs text-only input paths still seem importantly different enough to me, I don't want to coerce them all into the same pipeline yet without better understanding any more special handling we'll have to do, nor do I want to port existing text-only inputs from str
to {'type': 'text', 'text': str}
for no reason other than clean code. Definitely feeling the lack of well defined data models here but I'm willing to let this be an "exploration" phase which we follow with a "consolidation" phase
adala/runtimes/_litellm.py
Outdated
current_chunk: Optional[MessageChunk] = None | ||
|
||
def add_to_current_chunk(chunk): | ||
nonlocal current_chunk |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing you've tried already, but would definitely like to avoid all this nonlocal business... I'd think we could accomplish the push_current_chunk
behavior by using a generator function and iterating through that? And then for add_to_current_chunk
we could accumulate into a var within the generator function, and empty it after yield
ing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or for add_to_current_chunk
we could just return the new chunk. So like:
def add_to_chunk(current_chunk, additional_chunk):
if current_chunk:
current_chunk["text"] += chunk["text"]
else:
current_chunk = chunk
return current_chunk
Then we don't have to reason about nonlocal state in the loop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can just paste the function bodies 2-3x each, just wanted to make the intent of the code clearer by giving them descriptive names there's no algorithmic reason it has to be like this :)
[x] parse input variables to split message based on media type
[x] plumb up vision runtime
[x] generalize error handling
[x] let LabelStudioSkill set input variable types
[x] add tests