Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Old annotations anchoring in the wrong places #258

Closed
nickstenning opened this issue Feb 22, 2017 · 5 comments
Closed

Old annotations anchoring in the wrong places #258

nickstenning opened this issue Feb 22, 2017 · 5 comments

Comments

@nickstenning
Copy link
Contributor

Originally reported as hypothesis/h#4328 by @lenazun.

Steps to reproduce

  1. Go to https://hypothes.is/jobs/
  2. This annotation https://hyp.is/4fD2qZSYTRKneIx8iJWhbg/hypothes.is/jobs/ anchors in the fragment " developers and desi" when the original selected text was "Developer, Frontend".
  3. Seems to me like the prefix, postfix and even the selected fragment are very different.

Expected behaviour

This annotation should not anchor in the new text, since it's clearly different.

Actual behaviour

This annotation anchors in new text, that is very different from the old text.

Browser/system information

Version:
0.52.0
User agent:
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.95 Safari/537.36
URL:
https://hypothes.is/jobs/
Date:
25 Jan 2017 17:35:12 -0800

Additional details

I have a couple of user tickets talking about this behavior. I think reanchoring is inevitable in some cases, but it seems like the context of the text is different enough that we may want to adjust what's going on here.

@nickstenning
Copy link
Contributor Author

@lenazun you mention Zendesk tickets in the above. Any additional test cases you can provide will be very helpful in tuning this behaviour.

@dwhly
Copy link
Member

dwhly commented Jul 7, 2017

Additional examples here:
#480

@castedo
Copy link

castedo commented Apr 12, 2022

Looks like this type of bug is not getting fixed.

If there is interest in fixing these types of bugs, here's the details on another case of this bug. That is the output from https://hypothes.is/api/annotations/YHjZCrhREey4v5OrWLPSyA

The annotation should be an orphan now. Instead this HTML is incorrectly matching:

 and</p></li>
<li><p>an obse

After Hypothesis annotates, the DOM is this:

<ol type="1">
<li><p>a past time horizon when all ancestors are members of separate
non-admixed subpopulations,<hypothesis-highlight class="hypothesis-highlight"> and</hypothesis-highlight></p></li>
<li><p><hypothesis-highlight class="hypothesis-highlight">an obse</hypothesis-highlight>rvation time, such as the present.</p></li>
</ol>

Here's a brief summary of the API output for the annotation:

{ ...
    "text": "The current definition has equal weighting between maternal and paternal lineages. So the distribution will be the same from one chromosome to another chromosome. So it doesn't make sense to talk about distributions across chromosomes.",
...
    "target": [
        {
            "source": "https://castedo.com/doc/151/doc.html",
            "selector": [
                {
                    "type": "RangeSelector",
                    "endOffset": 126,
                    "startOffset": 110,
                    "endContainer": "/main[1]/div[2]/div[1]/p[7]",
                    "startContainer": "/main[1]/div[2]/div[1]/p[7]"
                },
                {
                    "end": 3498,
                    "type": "TextPositionSelector",
                    "start": 3482
                },
                {
                    "type": "TextQuoteSelector",
                    "exact": " and\nchromosomes",
                    "prefix": "ividuals, groups of individuals,",
                    "suffix": " will prove to be a useful mathe"
                }
            ]
        }
...
}

This is my first annotation and I hit this bug after the 2nd time the page text changed.

@robertknight
Copy link
Member

The annotation should be an orphan now. Instead this HTML is incorrectly matching:

The quote anchoring logic tolerates edits to the original document and has to pick some threshold for when the closest match in the document is different enough from the original that it should not match. Where the threshold is set trades off precision vs recall of matches. The current logic uses a very liberal threshold that tolerates up to 50% character-level edits between the original quote and the nearest match.

@castedo
Copy link

castedo commented Apr 12, 2022

Thx for the explanation. I bet it's nearly impossible to create threshold logic that makes content authors, annotators and readers happy without the input/feedback of the content authors, annotators or readers.

Perhaps publisher groups is a solution whereby some human input/control from the publisher group can manually remove annotations that are no longer helpful to readers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants