Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for JSON mistakes by claude #953

Merged
merged 1 commit into from
Jan 29, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion src/autolabel/tasks/attribute_extraction.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
import json
import logging
import pickle
import re
from collections import defaultdict
from typing import Callable, Dict, List, Optional, Tuple, Union

Expand Down Expand Up @@ -32,14 +33,14 @@


class AttributeExtractionTask(BaseTask):
NULL_LABEL = {}

Check failure on line 36 in src/autolabel/tasks/attribute_extraction.py

View workflow job for this annotation

GitHub Actions / lint

Ruff (RUF012)

src/autolabel/tasks/attribute_extraction.py:36:18: RUF012 Mutable class attributes should be annotated with `typing.ClassVar`
DEFAULT_TASK_GUIDELINES = "You are an expert at extracting attributes from text. Given a piece of text, extract the required attributes."

Check failure on line 37 in src/autolabel/tasks/attribute_extraction.py

View workflow job for this annotation

GitHub Actions / lint

Ruff (E501)

src/autolabel/tasks/attribute_extraction.py:37:89: E501 Line too long (141 > 88)
DEFAULT_OUTPUT_GUIDELINES = "You will return the extracted attributes as a json with the following keys:\n{attribute_json}. \n Do not include keys in the final JSON that don't have any valid value extracted."

Check failure on line 38 in src/autolabel/tasks/attribute_extraction.py

View workflow job for this annotation

GitHub Actions / lint

Ruff (E501)

src/autolabel/tasks/attribute_extraction.py:38:89: E501 Line too long (212 > 88)
LABEL_FORMAT_IN_EXPLANATION = (
" The explanation should end with - 'so, the answer is <label>.'"
)
EXCLUDE_LABEL_IN_EXPLANATION = " Do not repeat the output of the task - simply provide an explanation for the provided output. The provided label was generated by you in a previous step and your job now is to only provided an explanation for the output. Your job is not verify the output but instead explain why it might have been generated, even if it is incorrect. If you think the provided output is incorrect, give an explanation of why it might have been generated anyway but don't say that the output may be incorrect or incorrectly generated.'"

Check failure on line 42 in src/autolabel/tasks/attribute_extraction.py

View workflow job for this annotation

GitHub Actions / lint

Ruff (E501)

src/autolabel/tasks/attribute_extraction.py:42:89: E501 Line too long (555 > 88)
GENERATE_EXPLANATION_PROMPT = "You are an expert at providing a well reasoned explanation for the output of a given task. \n\nBEGIN TASK DESCRIPTION\n{task_guidelines}\nEND TASK DESCRIPTION\nYou will be given an input example and the output for one of the attributes. Your job is to provide an explanation for why the output for that attribute is correct for the task above.\nYour explanation should be at most two sentences.{label_format}\n{labeled_example}\nCurrent Attribute:{attribute}.\nExplanation: "

Check failure on line 43 in src/autolabel/tasks/attribute_extraction.py

View workflow job for this annotation

GitHub Actions / lint

Ruff (E501)

src/autolabel/tasks/attribute_extraction.py:43:89: E501 Line too long (510 > 88)
OUTPUT_DICT_KEY = "output_dict"

def __init__(self, config: AutolabelConfig) -> None:
Expand All @@ -54,9 +55,9 @@
if self.config.confidence():
self.metrics.append(AUROCMetric())

def _construct_attribute_json(

Check failure on line 58 in src/autolabel/tasks/attribute_extraction.py

View workflow job for this annotation

GitHub Actions / lint

Ruff (C901)

src/autolabel/tasks/attribute_extraction.py:58:9: C901 `_construct_attribute_json` is too complex (12 > 10)

Check failure on line 58 in src/autolabel/tasks/attribute_extraction.py

View workflow job for this annotation

GitHub Actions / lint

Ruff (D417)

src/autolabel/tasks/attribute_extraction.py:58:9: D417 Missing argument descriptions in the docstring for `_construct_attribute_json`: `selected_labels_desc_map`, `selected_labels_map`
self,
selected_labels_map: Dict[str, List[str]] = None,

Check failure on line 60 in src/autolabel/tasks/attribute_extraction.py

View workflow job for this annotation

GitHub Actions / lint

Ruff (FA100)

src/autolabel/tasks/attribute_extraction.py:60:30: FA100 Add `from __future__ import annotations` to simplify `typing.Dict`

Check failure on line 60 in src/autolabel/tasks/attribute_extraction.py

View workflow job for this annotation

GitHub Actions / lint

Ruff (RUF013)

src/autolabel/tasks/attribute_extraction.py:60:30: RUF013 PEP 484 prohibits implicit `Optional`

Check failure on line 60 in src/autolabel/tasks/attribute_extraction.py

View workflow job for this annotation

GitHub Actions / lint

Ruff (FA100)

src/autolabel/tasks/attribute_extraction.py:60:40: FA100 Add `from __future__ import annotations` to simplify `typing.List`
selected_labels_desc_map: Dict[str, Dict[str, str]] = None,
) -> Tuple[str, Dict]:
"""
Expand Down Expand Up @@ -365,9 +366,14 @@
)
try:
json_start, json_end = response.text.find("{"), response.text.rfind("}")
json_str = re.sub(
r'"[^"]*"',
lambda m: m.group().replace("\n", "\\n"),
response.text[json_start : json_end + 1],
)
llm_label = {}
for k, v in json5.loads(
response.text[json_start : json_end + 1],
json_str,
).items():
if isinstance(v, list) or isinstance(v, dict):
llm_label[k] = v
Expand Down