KOL-3467 - Thresholded #249

diegokolena · 2023-10-03T15:01:21Z

Linked issue(s):

What change does this PR introduce and why?

We introduce support for Dict on the MetricsTestSample. This will allow to models thresholded like this:

@dataclass(frozen=True)
class TestSampleMetric(DataObject):
   a: float

@dataclass(frozen=True)
class CustomMetric(MetricsTestSample):
   t1: Dict[float, TestSampleMetric]

Please check if the PR fulfills these requirements

Include reference to internal ticket and/or GitHub issue "Fixes #NNNN" (if applicable)
Relevant tests for the changes have been added
Relevant docs have been added / updated

codecov · 2023-10-03T16:33:28Z

Codecov Report

Merging #249 (206ef07) into trunk (9cb317a) will decrease coverage by 11.04%.
Report is 2 commits behind head on trunk.
The diff coverage is 96.00%.

@@             Coverage Diff             @@
##            trunk     #249       +/-   ##
===========================================
- Coverage   93.69%   82.66%   -11.04%     
===========================================
  Files         113      114        +1     
  Lines        7361     7384       +23     
  Branches      878      884        +6     
===========================================
- Hits         6897     6104      -793     
- Misses        358     1198      +840     
+ Partials      106       82       -24

Flag	Coverage Δ
integration	`72.81% <96.00%> (+0.07%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files	Coverage Δ
kolena/workflow/__init__.py	`100.00% <100.00%> (ø)`
kolena/workflow/_validators.py	`92.94% <100.00%> (+0.08%)`	⬆️
kolena/workflow/evaluator.py	`90.90% <100.00%> (ø)`
kolena/workflow/thresholded.py	`95.23% <95.23%> (ø)`

... and 35 files with indirect coverage changes

- Update evaluator and validators to allow dicts. - Added new unit test to verify the new use cases.

munkyshi

Might be jumping ahead with the review here, but sharing my thoughts:

I'm a bit confused at the direction here. It looks like we're extending the base data object to allow arbitrary dicts, but then restricting that to test cases only.

What about test samples? The original intent was to support this at the test sample level
How would the webapp know how to handle these? Does it automatically parse any dict as if it were a thresholded object?

kolena/workflow/_validators.py

diegokolena · 2023-10-05T13:14:06Z

Might be jumping ahead with the review here, but sharing my thoughts:

I'm a bit confused at the direction here. It looks like we're extending the base data object to allow arbitrary dicts, but then restricting that to test cases only.

What about test samples? The original intent was to support this at the test sample level

I added now for test samples too. I was to split on two PR.

How would the webapp know how to handle these? Does it automatically parse any dict as if it were a thresholded object?

Initially, I considered introducing a hidden "__type" parameter to help the webapp differentiate between standard and thresholded objects. However, after reviewing the existing codebase, I observed that this approach would be inconsistent with our current practices. Instead, I'm leaning towards a more seamless integration. The webapp could dynamically adapt its behavior when it detects a MetricsTestSample object containing a dict. What are your thoughts on this?

yifanwu-kolena · 2023-10-05T13:52:24Z

docs/advanced-usage/nesting-test-case-metrics.md

@@ -93,6 +93,7 @@ class AggregateMetrics(MetricsTestCase):
    macro_F1: float
    mAP: float
    PerClass: List[PerClassMetrics]
+    ThresholdMetrics: Dict[float, PerClassMetrics]


just want to double check that we use float as key is intentional. b/c floating number will have precision issues for instance

>>> 0.1+0.2 0.30000000000000004 >>> x = {1.12381902839021830912830981209381092380912830982190380192839081230982190: 42} >>> x {1.1238190283902183: 42}

I've considered the precision issues associated with using floats as keys. However, it's important to note that access to these dictionary values will be managed exclusively by us through the webapp. We plan to control this access via a slider, which will operate on a restricted set of values. This approach mitigates the risks associated with float imprecision.

@munkyshi @yifanwu-kolena Any other use case I am missing where we can have issue?

I would consider storing the keys as strings so we can deterministically match them. I don't see any big downside with storing the thresholds in strings and it means we won't have potential precision issues in JS as well once these objects come across the network

I would avoid using float as key considering we are using json for data serialization/deserialization. Handling float precision is already something we need to cautious of, and using it as key, which is not valid for json could be more landmines.

Side note: is this the right place to update? This PR is about test-sample level metrics, and thresholds for our discussion is scoped to test-sample only (so far), right?

munkyshi · 2023-10-09T17:55:04Z

@diegokolena I'm wondering if the dictionary based approach is the right change for our client API.

Concerns:

We have typically forbidden nesting dictionaries as data object values (unless specifically wrapped with a different explicitly declared data object, such as annotations)
We're running into pain points for explicitly using numeric keys
The List[DataObject] pattern is pretty common in our client API already (and is less friction for users to adapt to), in places like List[BoundingBox] or List[NestedTestCaseMetric]

Rather than pushing forward with this approach, I wonder if it's worth revisiting the original List[Thresholded] approach?

diegokolena · 2023-10-10T13:13:55Z

@diegokolena I'm wondering if the dictionary based approach is the right change for our client API.

Concerns:

We have typically forbidden nesting dictionaries as data object values (unless specifically wrapped with a different explicitly declared data object, such as annotations)

We're running into pain points for explicitly using numeric keys

The List[DataObject] pattern is pretty common in our client API already (and is less friction for users to adapt to), in places like List[BoundingBox] or List[NestedTestCaseMetric]

Rather than pushing forward with this approach, I wonder if it's worth revisiting the original List[Thresholded] approach?

In this approach, the "Thresholded" dictionary will include the threshold parameter as one of its key/value pairs, as follows:

car_values = Thresholded([
    {"threshold": 0.01, "tp": 100, "fp": 50, "fn": 0},
    {"threshold": 0.02, "tp": 99, "fp": 50, "fn": 1},
    {"threshold": 0.03, "tp": 98, "fp": 49, "fn": 2},
    ...
    {"threshold": 0.99, "tp": 1, "fp": 1, "fn": 99},
])

How will we determine which key/value pair represents the threshold? Should we use a naming convention?

…ors and evaluator to support this new field.

gordonhart · 2023-10-18T20:14:11Z

Let me know when this is ready for review, @diegokolena!

munkyshi · 2023-11-02T19:42:53Z

tests/unit/workflow/test_validators.py

+        a: List[str]
+
+    with pytest.raises(TypeError):
+        MyThresholdedMetrics(a=List["1"])


Including List changes the intended type, correct?

Suggested change

MyThresholdedMetrics(a=List["1"])

MyThresholdedMetrics(a=["1"])

munkyshi · 2023-11-02T19:43:53Z

tests/unit/workflow/test_validators.py

+        a: Dict[str, str]
+
+    with pytest.raises(TypeError):
+        MyThresholdedMetrics(a=Dict["1", "1"])


Should this include threshold to ensure that this doesn't fail for excluding it?

Suggested change

MyThresholdedMetrics(a=Dict["1", "1"])

MyThresholdedMetrics(threshold=1, a={"key": "value"})

kolena/workflow/evaluator.py

munkyshi · 2023-11-02T19:51:58Z

kolena/workflow/evaluator.py

-    validate_scalar_data_object_type(metrics_test_case_type, supported_list_types=[MetricsTestCase])
+    validate_scalar_data_object_type(
+        metrics_test_case_type,
+        supported_list_types=[MetricsTestCase, ThresholdedMetrics],


Do we need to validate that ThresholdedMetrics is only used as a list type?

In other words, it doesn't make sense to have something like:

class ObjectDetectionThresholded(ThresholdedMetrics): tp: List[BoundingBox] fp: List[BoundingBox] fn: List[BoundingBox] precision: float recall: float # makes sense class MyMetrics(MetricsTestSample): car_detections: List[ObjectDetectionThresholded] # doesn't make sense class MyMetrics2(MetricsTestSample): car_detections: ObjectDetectionThresholded

tests/unit/workflow/test_validators.py

munkyshi

Approving with one final request

munkyshi · 2023-11-03T18:23:45Z

kolena/workflow/_thresholded.py

+
+
+@dataclass(frozen=True)
+class ThresholdedMetrics(TypedDataObject, metaclass=PreventThresholdOverrideMeta):


Sorry, last change here -- since this will be public to our users, lets add a docstring about usage.

Would prefer to have this a bit more detailed / user actionable (example: nested test case metrics).

I'd structure this similarly:

A data class representing metrics associated with a specified threshold. `List[ThresholdedMetrics]` is intended for usage as a field type for a [`MetricsTestSample`][kolena.workflow.MetricsTestSample] or [`MetricsTestCase`][kolena.workflow.MetricsTestCase]. This list is intended to expected to contain threshold-specific metric values for a discrete set of specific thresholds that are consistent across [`TestSample`][kolena.workflow.TestSample]s within a test run. `ThresholdedMetrics` does not allow dictionary objects as field values and ensures that the threshold cannot be overridden. It should be subclassed with the relevant fields for a given workflow. Example usage: ```python from kolena.workflow import MetricsTestSample from kolena.workflow import ThresholdedMetrics from kolena.workflow.annotations import BoundingBox @dataclass(frozen=True) class ClassThresholdedMetrics(ThresholdedMetrics): precision: float recall: float f1: float @dataclass(frozen=True) class TestSampleMetrics(MetricsTestSample): car: List[ClassThresholdedMetrics] pedestrian: List[ClassThresholdedMetrics] # metrics object construction metric = TestSampleMetrics( car=[ ClassThresholdedMetrics( threshold=0.3, precision=0.5, recall=0.8, f1=0.615, ), ClassThresholdedMetrics( threshold=0.4, precision=0.6, recall=0.6, f1=0.6, ), ClassThresholdedMetrics( threshold=0.5, precision=0.8, recall=0.4, f1=0.533, ), ... ], pedestrian=[ ClassThresholdedMetrics( threshold=0.3, precision=0.6, recall=0.9, f1=0.72, ), ClassThresholdedMetrics( threshold=0.4, precision=0.7, recall=0.7, f1=0.7, ), ClassThresholdedMetrics( threshold=0.5, precision=0.8, recall=0.6, f1=0.686, ), ... ], ) ``` Raises: TypeError: If any field value is a dictionary.

munkyshi · 2023-11-03T18:24:30Z

kolena/workflow/_thresholded.py

Should this be non-private, since this is meant to be used by our users?

My rationale for using _thresholded.py was that it essentially serves as an extension of TypedDataObject, which is found in _datatypes.py.

But if you think it makes more sense to rename it to thresholded.py, that's fine with me.

I think it makes sense as thresholded.py -- it's analogous to the other things we expect our users to see and extend (such as MetricsTestSample in evaluator.py). We don't expect our users to know of and extend TypedDataObject directly, so that's still private.

This reverts commit c85ef45.

* Revert "Attach docs for ThresholdedMetrics (#306)" This reverts commit 5b111ab. * Revert "KOL-3467 - Thresholded (#249)" This reverts commit c85ef45.

diegokolena added 6 commits October 4, 2023 07:55

Added support to for thresholded metrics, via Dict.

95f49ff

- Update evaluator and validators to allow dicts. - Added new unit test to verify the new use cases.

Update documentation

1396650

Update unit test

432f5bf

Undo unecesary changes on documentation

285854a

Fix python style that was failing some test.

f623dbd

Fix python style that was failing some test

1777fad

diegokolena force-pushed the KOL-3467 branch from 685fd3e to 1777fad Compare October 4, 2023 12:10

munkyshi reviewed Oct 5, 2023

View reviewed changes

kolena/workflow/_validators.py Outdated Show resolved Hide resolved

diegokolena added 2 commits October 5, 2023 06:35

Avoid init param with default mutable value

341934c

Added Dict support on MetricsTestSample

34a61f8

diegokolena marked this pull request as ready for review October 5, 2023 13:32

diegokolena requested review from a team as code owners October 5, 2023 13:32

yifanwu-kolena reviewed Oct 5, 2023

View reviewed changes

diegokolena closed this Oct 10, 2023

diegokolena reopened this Oct 10, 2023

diegokolena added 2 commits October 11, 2023 10:39

Switch float for str

f09106b

Added ThresholdedMetrics class to support thresholds, updated validat…

6d30f42

…ors and evaluator to support this new field.

diegokolena marked this pull request as draft October 13, 2023 14:29

diegokolena added 7 commits October 13, 2023 12:05

Bring back test was remove before

af461c6

Bring back test was remove before

08b861a

Bring back test was remove before

a76a8f6

Fix unit test

153fc36

Fix unit test

cf458bf

Fix unit test

289a3f5

Fix unit test

15c303c

diegokolena marked this pull request as ready for review October 19, 2023 13:03

munkyshi reviewed Nov 2, 2023

View reviewed changes

diegokolena and others added 4 commits November 3, 2023 10:13

Fix unit test

640f5d9

Fix unit test

ec8a68b

Remove unecesary list type

51c0436

Merge branch 'trunk' into KOL-3467

6cb24ed

munkyshi approved these changes Nov 3, 2023

View reviewed changes

diegokolena added 3 commits November 3, 2023 16:59

Added docstring to ThresholdedMetrics class

ff9585f

Rename _thresholded.py to from thresholded.py

6279305

Update docstring

206ef07

diegokolena merged commit c85ef45 into trunk Nov 6, 2023
6 checks passed

diegokolena deleted the KOL-3467 branch November 6, 2023 21:40

munkyshi added a commit that referenced this pull request Nov 7, 2023

Revert "KOL-3467 - Thresholded (#249)"

28ba42a

This reverts commit c85ef45.

munkyshi added a commit that referenced this pull request Nov 7, 2023

Revert ThresholdedMetrics (#309)

f01f93f

* Revert "Attach docs for ThresholdedMetrics (#306)" This reverts commit 5b111ab. * Revert "KOL-3467 - Thresholded (#249)" This reverts commit c85ef45.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KOL-3467 - Thresholded #249

KOL-3467 - Thresholded #249

diegokolena commented Oct 3, 2023

codecov bot commented Oct 3, 2023 •

edited

Loading

munkyshi left a comment •

edited

Loading

diegokolena commented Oct 5, 2023

yifanwu-kolena Oct 5, 2023 •

edited

Loading

diegokolena Oct 5, 2023

dylangrandmont Oct 5, 2023

ly-kolena Oct 9, 2023

ly-kolena Oct 9, 2023

munkyshi commented Oct 9, 2023

diegokolena commented Oct 10, 2023 •

edited

Loading

gordonhart commented Oct 18, 2023

munkyshi Nov 2, 2023

diegokolena Nov 3, 2023

munkyshi Nov 2, 2023

diegokolena Nov 3, 2023

munkyshi Nov 2, 2023

munkyshi left a comment

munkyshi Nov 3, 2023

diegokolena Nov 3, 2023

munkyshi Nov 6, 2023

munkyshi Nov 3, 2023

diegokolena Nov 3, 2023

munkyshi Nov 3, 2023

	MyThresholdedMetrics(a=List["1"])
	MyThresholdedMetrics(a=["1"])

	MyThresholdedMetrics(a=Dict["1", "1"])
	MyThresholdedMetrics(threshold=1, a={"key": "value"})



		@dataclass(frozen=True)
		class ThresholdedMetrics(TypedDataObject, metaclass=PreventThresholdOverrideMeta):

KOL-3467 - Thresholded #249

KOL-3467 - Thresholded #249

Conversation

diegokolena commented Oct 3, 2023

Linked issue(s):

What change does this PR introduce and why?

Please check if the PR fulfills these requirements

codecov bot commented Oct 3, 2023 • edited Loading

Codecov Report

munkyshi left a comment • edited Loading

Choose a reason for hiding this comment

diegokolena commented Oct 5, 2023

yifanwu-kolena Oct 5, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

munkyshi commented Oct 9, 2023

diegokolena commented Oct 10, 2023 • edited Loading

gordonhart commented Oct 18, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

munkyshi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Oct 3, 2023 •

edited

Loading

munkyshi left a comment •

edited

Loading

yifanwu-kolena Oct 5, 2023 •

edited

Loading

diegokolena commented Oct 10, 2023 •

edited

Loading