-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add path as an option for uploading attachments #1331
Conversation
attachments = { | ||
a: v for a, v in self.attachments.items() if isinstance(v, tuple) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was done strictly for typing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is v ever not a tuple?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If so might be worth doing in a for loop and raising an error if not. I think would be unexpected. But mypy probably knows better than me, so there might be a case we're not handling here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
v is never not a tuple here, this is just because of the union types I believe and it thinks that something could be passed here that is never passed
class Config: | ||
"""Configuration class for the schema.""" | ||
|
||
arbitrary_types_allowed = True | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needed to allow the AttachmentInfo
acc_parts.append( | ||
( | ||
f"attachment.{op.id}.{n}", | ||
( | ||
None, | ||
open(file_path, "rb"), # type: ignore[arg-type] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we not need to read this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, I believe the multipart endpoint allows a buffer to be passed (I also added a test that should cover this)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pretty cve worthy so we should be careful about cases where this is allowed/encouraged
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My suggestion would be to require either an environment variable of LANGSMITH_DANGEROUSLY_ALLOW_FILESYSTEM or some client init variable saying this. Otherwise throw an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to explicitly close
the files the are opened here, using some try, finally
pattern. I would structure the code such that the multipart form is assembled and sent within the same method we are opening the file handles:
try:
# Prepare the files for the multipart request
# Make the POST request
finally:
# Explicitly close the files
@@ -370,7 +368,9 @@ class RunBase(BaseModel): | |||
tags: Optional[List[str]] = None | |||
"""Tags for categorizing or annotating the run.""" | |||
|
|||
attachments: Attachments = Field(default_factory=dict) | |||
attachments: Union[Attachments, Dict[str, AttachmentInfo]] = Field( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this related to this pr?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah need this for the changes in read_run
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left comments. I think main issue is the arbitrary file read, which I could be convinced isn't a cve if we feel data exfiltration to langsmith is by design. But it feels moderately easy to construct a payload as a user of an app that would make the server stream /etc/passwd to whichever langsmith url you have, which is unideal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to add the flag for runs (creation and updating)
acc_parts.append( | ||
( | ||
f"attachment.{op.id}.{n}", | ||
( | ||
None, | ||
open(file_path, "rb"), # type: ignore[arg-type] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to explicitly close
the files the are opened here, using some try, finally
pattern. I would structure the code such that the multipart form is assembled and sent within the same method we are opening the file handles:
try:
# Prepare the files for the multipart request
# Make the POST request
finally:
# Explicitly close the files
@@ -377,6 +378,41 @@ def test_persist_update_run(langchain_client: Client) -> None: | |||
langchain_client.delete_project(project_name=project_name) | |||
|
|||
|
|||
def test_update_run_attachments(langchain_client: Client) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be good to add a test case for traceable
as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added one in test_runs
because that is where the previous traceable tests were
try: | ||
file.close() | ||
except Exception: | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably good to log something here
f"trace={op.trace_id},id={op.id}", | ||
else: | ||
file_size = os.path.getsize(data) | ||
file = open(data, "rb") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't we check dangerously_use_filesystem
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We check it before hand (in client.py). Everywhere this function gets called, dangerously_use_filesystem
has already been checked (you should verify this though, I have double checked and am 99% certain but another pass is a good idea)
From the Rust bindings, import the `_serialize_json` function from `langsmith._internal._serde`, then use it as the default fallback if `orjson` serialization can't handle some object. This makes the Rust serialization code equivalent to the `_orjson.dumps()` call inside `langsmith._internal._serde.dumps_json`. I will handle UTF surrogate characters in a subsequent PR.
Co-authored-by: Erick Friis <erick@langchain.dev>
No description provided.