Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Note: Add body & author #4481

Closed
wants to merge 23 commits into from
Closed

Conversation

grekko
Copy link
Contributor

@grekko grekko commented Jan 16, 2024

Why are the changes necessary?

Addresses #3831 which describes how a set of bugs related to the current Note-model can be solved by merging the "special first comment" into the Note-record itself.

This PR is meant to serve as a discussion starter. I gave my best to make sure the code changes are not introducing any undesired behavioural changes, but I am sure there are things I've missed, so please regard this PR a work in progress.

What has changed?

  • Adjusts the Note-Model to hold body and the author
    • adds db fields notes.{body,author_id,author_ip}

Open TODOs

I am not sure if I successfully migrated the logic of Api::NotesController#feed. I understand that at least the behaviour of the .limit(result_limit)-call will involuntarily return more results than it currently does. This part of the PR needs more focus and probably I'll have to add some tests first.

Rollout

Step 1) Merge and deploy the db and code changes

  • New Note-records will be created with a body- and author-information
  • Web UI and API should still render the the same output

Step 2) Start backfilling process

Run the migrate_notes.rake-script either via long-running shell or I can also rewrite the code so it can be run as an ActiveJob in the background.

The script copies the author- and body-information from the opening NoteComment to the Note-record.

Step 3) Sanity check backfilled data

  • all Note-records should now have a non-empty body AFAIU (maybe there are exceptions? Notes and Note-Comments have been around since 2010. Back then it was called MapBugs though ^^)
  • TBD: Are there more things to sanity check?

Step 4) Clean up if everything looks good

Until this point all operations have been non-destructive. If everything looks good and no issues have been discovered, the NoteComments-records with an open-event can be removed.

Also all FIXME annotated code can be removed or updated.

@grekko grekko force-pushed the notes_refactoring_2 branch from c48393d to cfd304d Compare January 16, 2024 04:37
@AntonKhorev
Copy link
Collaborator

This PR is meant to serve as a discussion starter. I gave my best to make sure the code changes are not introducing any undesired behavioural changes, but I am sure there are things I've missed, so please regard this PR a work in progress.

It's missing the entire deleted users aspect that @gravitystorm for some reason avoided mentioning.

@AntonKhorev
Copy link
Collaborator

Short summary:

  • note comments of deleted users were made hidden for spam-fighting purposes
  • this became the source of bugs
  • refactor plans never addressed what to do with comments of deleted users
  • @tomhughes opposes any changes to visibility of anything related to deleted users because he doesn't want spam to resurface
  • @grekko in this pull request chose to reveal the opening comment, both in the web ui and in the api

@tomhughes
Copy link
Member

I do not oppose any change to visibility of anything related to deleted users.

I want us to properly define how deleted users should be handled, and I specifically said that showing diary entries and comments for deleted uses would be a problem because historically we have hidden spam diary content by deleting the user without marking the content as hidden explicitly. I never said that applied to notes.

@AntonKhorev
Copy link
Collaborator

I never said that applied to notes.

That's not how your Jan 10 03:37 comment on the dwg channel reads because you didn't say that it doesn't apply to notes. Since you also object to visibility changes of changesets, why would you be ok with the same thing for notes?

@AntonKhorev
Copy link
Collaborator

@tomhughes you made this change to hide all of the deleted users' note comments 6027c42

@tomhughes
Copy link
Member

Sure, because our general operating principle AT THE TIME was to hide things created by deleted users.

As you are well aware that policy is currently being reviewed with the intention of seeking LWG approval for a new policy around what needs to be and/or should be hidden.

Nothing is set in stone, everything can be reviewed.

I wasn't even saying we can't change our policy around diary entries, just that we would have to consider how to avoid suddenly having lots of spam re-appear.

@grekko
Copy link
Contributor Author

grekko commented Jan 17, 2024

@grekko in this pull request chose to reveal the opening comment, both in the web ui and in the api

That is true. The code changes in this PR effectively work around the special handling introduced in 6027c42, keeping the Note body of then-deleted users accessible. Thanks for pointing this out.

Just to be explicit: No User-related information would be accessible. The author would be displayed as anonymous. Also note comments of deleted Users would continue NOT to be accessible.

AFAIU the current policy of User#soft_destroy we would need to keep the information of Note-records by deleted Users hidden. Here a screenshot:

SCR-20240116-qskd

At this point I think it is useful to clarify if there are good reasons not to continue to comply with the aforementioned policy. If there are none, I can think of the following changes to move forward:

  • Notes of deleted users are not hidden per se, but the associated information (body and author) is redacted – effectively hiding the Notes comment. The author is already redacted.
  • Associated comments to the Note continue to be accessible (unless their authors are also deleted).

Or alternatively:

  • The Note and all of its associated comments are hidden once the original author of the Note is deleted.

Although the latter approach seems rather invasive to other users. I can imagine that Note comments carry a lot of value even though if the message of the initial Note is redacted. Also it seems wrong that a User can effectively manipulate the visibility of other Users comments.

@AntonKhorev Do you think I'm missing here something else?

@tordans
Copy link
Contributor

tordans commented Jan 17, 2024

@grekko I think we should split this in two scenarios:

  • Scenario 1 Spammer: User creates 10 notes, all without comments. User gets deleted, so user data and content is hidden. BUT in this case, the nodes should IMO disappear as well. They are not actionable and just noise.
  • Scenario 2: User creates a note, other users comment. Creator deletes account and data is hidden. Now the discussion is still valid and should be visible until resolved (and later auto-hidden from the map)

Just FYI

Just to be explicit: No User-related information would NOT be accessible. The author would be displayed as anonymous. Also note comments of deleted Users would continue NOT to be accessible.

I don't understand this sentence :-). But I think I got the overall message of the thread / that is clear.

@grekko
Copy link
Contributor Author

grekko commented Jan 17, 2024

I don't understand this sentence :-). But I think I got the overall message of the thread / that is clear.

The NOT was superfluous. I edited the comment above.

@grekko
Copy link
Contributor Author

grekko commented Jan 17, 2024

I think we should split this in two scenarios

Agreed. I think for Scenario 1 there are tools for Moderators in place to hide such Notes.
And to cover Scenario 2 too, it would be useful to keep Notes of deleted Users still accessible and just redact the Note body.

@gravitystorm
Copy link
Collaborator

a) Should the data migration (to backfill the notes.{body,author_id,author_ip} run in the scope of the migration or can this be a rake task which is running in the background and allows to perform some sanity checks on the data?

I'm slightly torn on this one. My instinct is that the steps should be decoupled - adding columns in a transaction, and then a separate rake task that can be (re)-run as required to do the migration. In particular, if there's some record somewhere that causes an error (or the process times out or whatever) then I'd rather not be rolling everything back.

However, I'm not sure how this will work for other deployments, who might not be so hot on following the issue tracker and keeping up to date with everything. At some point they will update their installation and run the database migrations, but will they know that there's a rake task to run too?

At some point in the future, after the migration is completed on osm.org we'll want to do a code cleanup, and probably make body text NOT NULL. Perhaps that will be the point that they realise there's something that needs to be done, when that cleanup migration fails? On the plus side, anyone without any notes can run both the migrations fine back-to-back since the cleanup migration would work without error on an empty db.

Copy link
Collaborator

@gravitystorm gravitystorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just an intermediate review, I've skimmed the rest but I haven't gone into detail yet.

lib/tasks/migrate_notes.rake Outdated Show resolved Hide resolved
@tomhughes
Copy link
Member

I'm slightly torn on this one. My instinct is that the steps should be decoupled - adding columns in a transaction, and then a separate rake task that can be (re)-run as required to do the migration. In particular, if there's some record somewhere that causes an error (or the process times out or whatever) then I'd rather not be rolling everything back.

It is generally better to do things in a migration so they just work but here we do have four million records which will need to be migrated so we do need to think about the risk of a failure and how long a migration might take and what lock contention issues it might involve...

@tomhughes
Copy link
Member

I did a quick test and running the big UPDATE in a single transaction in production is definitely going to cause lock contention issues - not sure how long it would actually take because enough locks were piling up that I had to abort it.

@grekko grekko force-pushed the notes_refactoring_2 branch 5 times, most recently from 62c6caa to 5f4a414 Compare February 7, 2024 15:15
@grekko
Copy link
Contributor Author

grekko commented Feb 7, 2024

A summary of the recent changes:

  • 1c3d3b6 redacts Note#body if Note#author is soft-deleted
  • edfab03 extracted Note-Migration code into Note::MigrateOpenedComment and added basic tests for that class
  • 5f4a414 hides the authors display name in api/notes/_note.rss.builder if the author is deleted (the same way the author name is hidden in the Web UI)

@gravitystorm could you give this PR a rough review so we can spot weak spots or issues I can address?

@tomhughes thanks for trying out the UPDATE-SQL Statement.
I'd be interested if there are any NoteComment-records with event: "opened" and with a body either being NULL or EMPTY.
The migration code assumes that there will be a non-zero-length body but there are neither DB constraints nor Model Validations enforcing this.

@fititnt
Copy link

fititnt commented Apr 16, 2024

Please can you take another look?

cc @gravitystorm @tomhughes

@tordans
Copy link
Contributor

tordans commented Jun 19, 2024

@gravitystorm this EWG project is waiting for input for a few month now. I hope it's OK to ping here as a reminder to move it up in the review queue.

@mmd-osm mmd-osm added the notes Related to the notes feature label Jun 25, 2024
@grekko grekko force-pushed the notes_refactoring_2 branch from 12dfc7c to 12cd233 Compare September 23, 2024 07:20
@grekko grekko force-pushed the notes_refactoring_2 branch from 12cd233 to 9e6109d Compare September 23, 2024 07:23
@grekko
Copy link
Contributor Author

grekko commented Sep 23, 2024

@gravitystorm I've reworked the PR a bit based on your feedback.

Regarding the readability of the history perspective on the PR changes.

Rebasing the branch destroys the git history which breaks the conversation messages attached to specific commits, but I recall that you prefer a rebase because you'd like to avoid merge commits since PRs are not squash-merged into the master-branch.

At this point I'm happy to use this PR as a draft/proposal and rather have merge conflicts in this Proposal PR than breaking the Github Conversation UI.

If this PR reaches a state in which maintainers are happy to introduce the changes into the master-branch I will create one or more clean slate PRs.

Would that work for you?

@gravitystorm
Copy link
Collaborator

you prefer a rebase because you'd like to avoid merge commits

I prefer a rebase so that when everything is ready to go, i.e. the finished code and branch commits list are both fine, I can just press "Merge pull request" on the github UI. 😄 We'll have a merge commit when the branch is merged, but that's fine.

However, if you want to keep this open as a draft, and then make a different one when development is finished, that's fine with me too. If so, please mark this as a draft so that I don't get too keen and accidentally merge it (e.g. for PRs with a WIP-style commit history)

@grekko grekko marked this pull request as draft October 7, 2024 08:07
@grekko
Copy link
Contributor Author

grekko commented Oct 7, 2024

If so, please mark this as a draft so that I don't get too keen and accidentally merge it

That's a good compromise for me. Done.

@grekko
Copy link
Contributor Author

grekko commented Jan 27, 2025

I will close this PR since @nenad-vujicic is making progress with their Notes refactoring efforts which cover and surpass the scope of this PR.

@grekko grekko closed this Jan 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
notes Related to the notes feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants