-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
User accounts deduping #3246
Comments
Have you tested what is the next auth behavior if we remove the unique index from the email column? |
I can test it locally, but that would not help much. That being said, I think leaving it unique is the right choice, we don't want multiple users with the same email. |
When the user login Discord. If they change their discord email and login again, next-auth still matches to old account and doesn't update the email. |
Discussion from meeting 09.06.2023: The main point should be to use the same ID across the entire stack. Merging multiple accounts in the backend should be possible, if we know the corresponding frontend ids. I will try to analyze the databases and see how much overlap do we have and how easy would it be to migrate. |
Sorry I couldn't join the meeting last night, but may I ask why we decided to stick with the ID from the backend? Especially we are wanting to decouple the frontend and the backend to allow deploy the chat app only. |
because the ID in the backend is the hardest to change, since it is connected to everything else. "if possible" because if that does not work, we can still use frontend id. I imagine using the frontend id being the easiest, and then we could just take the frontend db and make it to its own user service if we want to. |
Then I would suggest the backend store the frontend id, then identify the user via a header as we would do now. It still uses the old id for internal usage |
I think like you said we can start by adding the frontend user id. in the mean time, I will try to see how we can get rid of duplicate accounts. I imagine we add an endpoint into the backend, and do a post request from frontend with all of the accounts & provider that the user has, and make sure that all of these accounts get merged in data backend, and the frontend user id is also saved somewhere. Hopefully it is easily doable |
Here is a dump of all the stuff I found out in the last 2 day: Contextwe have roughly 6000 users (and growing) who have more than one account and are affect by the duplication. However, all of these users, regardless of how many providers or accounts of providers they used for login, they have a maximum of 2 accounts in the backend. This is because we always use the first provider Open-Assistant/website/src/lib/users.ts Lines 42 to 46 in 8ec4ebc
This also means, that users who logged in with email first, and then discord, the first account in the data backend that references the email is basically never used again, and the data is orphaned. MergingThe backend uses the its own user id as foreign key in almost all tables. The "easy" way to merge could be to simply update the id of all users to the same id, but this won't work because we don't have cascade updates on the user id to other tables (easy to solve), and would cause duplicate ids in the user table which is unique index (hard to solve). The "hard" way would be to do this manually, with a rough pseudo code: users = # get all duplicates
id = users[0].id
for user in users[1:]:
for table in [messages, tasks, troll_stats, ...]:
table.where(table.user_id == id).set(id=id)
userstable.delete(user) And it seems that this is the only way, but I might be wrong. APIThe user identification over the api boundary uses the auth method and provider id
Possible values:
This means that even if we merge the users, the backend would still not now that the users are merged, and would create new ones. We need a new policy of communicating with the backend, ideally we use a single identifier for cross boundary communication, two options:
Gradual Migration?Maybe it would be easy if we can start by solving the problem for new users, preventing the duplication from happening in the first place. Saving the backend id in the frontend db seems to be the easiest way, if we find it the database, we use it, otherwise fallback to the old method for old users. Next stepsI will get some sleep. Getting ideas and suggestions from the team would greatly appreciated. |
First backend support for #3246 : Adds a new `/api/v1/admin/merge_users` endpoint to merge one or more source user accounts into an existing destination user account. The source user accounts are deleted in the process and all objects that belonged to the source users are transferred to the destination user. Rows belonging to the source accounts in the `user_stats` and `troll_stats` tables are also deleted.
@AbdBarho Merging backend users is now possible via POST to For the "X-OASST-USER" header I suggest that we use |
Context
There are some users who are facing problem when logged in through multiple methods, i.e. email first and then google / discord second, or the other way around.
#3136 #3232 #2637 #1695 #1268 and probably many more. We don't want their efforts to go to waste.
In the frontend db we have
users
andaccounts
, in theory, it should be possible for a user to have multiple accounts from different providers and still be the same user.However, the way we identify users in the data backend is a combination of user id and auth method, which makes it harder to reconnect the two.
This problem is more pronounced when taking the discord bot into consideration, since it has no information about the user other than their discord id.
For chat, we only use the user's frontend db id, which should make it easier.
Solution Idea
Ideally, we should be able to identify a user through our entire stack using a single piece of information, which is identical through all auth methods we provide.
I see the email address being the only valid option(users might change their emails), which is used bynext-auth
for connecting multiple accounts to the same user.This would require at least:
This will probably be a gradual process where we would have to support both old and new formats while we do the migration.
Open questions
@andreaskoepf @yk @notmd @olliestanley would love to hear your opinions on this, since it is a huge topic and basically touches everything.
The text was updated successfully, but these errors were encountered: