Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: How can we efficiently retrieve existing annotation data by searching based on key and value? #32

Open
tenzin3 opened this issue Oct 8, 2024 · 1 comment
Assignees
Labels
question Further information is requested

Comments

@tenzin3
Copy link

tenzin3 commented Oct 8, 2024

# If ann data already exists, use it . Otherwise create a new one with new id
prepared_ann_data = []
for k, v in ann_data.items():
    try:
        ann_datas = list(ann_store.data(set=ann_dataset.id(), key=k, value=v))
        prepared_ann_data.append(ann_datas[0])
    except:  # noqa
        prepared_ann_data.append(
            {"id": get_uuid(), "set": ann_dataset.id(), "key": k, "value": v}
        )

ann_store.annotate(target=text_selector, data=prepared_ann_data, id=get_uuid())

In ann_data, we have annotation data that we want to associate with an annotation. We aim to avoid creating a new annotation data entry with a new ID if it already exists. If annotation data with the same key and value is already present, we want to link it to the incoming annotation instead of duplicating it. The current code works, but I wanted to know if there's a better solution using the STAM API.

Apparently if the key doesnt exists in the annotation data set, it throws an error.

@proycon
Copy link
Collaborator

proycon commented Oct 9, 2024

STAM will already do something similar internally, assigning a new random ID for the annotation data if it is new, and reusing the existing one if not, so you can just pass something like:

ann_store.annotate(target=text_selector, data=[
  {
     "set": ann_dataset.id(), "key": k, "value": v
  },
  {
     "set": ann_dataset.id(), "key": k2, "value": v2
  },
], id=get_uuid())

Note that I omitted the AnnotationData ID here, that means an ID will be assigned automatically. STAM assigns a random 21-char nanoid rather than a uuid, as that takes less space, see https://crates.io/crates/nanoid .

If you really do want to assign the annotationdata ID explicitly, then the method you used is okay, but can be improved slightly for performance inside the try block:

prepared_ann_data.append( next(ann_store.data(set=ann_dataset.id(), key=k, value=v, limit=1)) )

@proycon proycon self-assigned this Oct 9, 2024
@proycon proycon added the question Further information is requested label Oct 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants