Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrongly used index slows down queries on action table (dashboard view) #16665

Closed
2 of 6 tasks
zyclonite opened this issue Aug 11, 2021 · 11 comments · Fixed by #19472
Closed
2 of 6 tasks

Wrongly used index slows down queries on action table (dashboard view) #16665

zyclonite opened this issue Aug 11, 2021 · 11 comments · Fixed by #19472
Labels
performance/speed performance issues with slow downs
Milestone

Comments

@zyclonite
Copy link
Contributor

  • Gitea version (or commit ref): 1.14.5 / 1.14.6
  • Git version: 2.30.2
  • Operating system: Fedora CoreOS, running gitea officical container on podman
  • Database (use [x]):
    • PostgreSQL
    • MySQL (mariadb)
    • MSSQL
    • SQLite
  • Can you reproduce the bug at https://try.gitea.io:
    • Yes (provide example URL)
    • No

Description

Super slow loading of the Dashboard where the list of actions is displayed.
There are about 300k action entries in the database for the last year.
Most users have around 10k up to 30k actions if selected on their user_id.

When doing an EXPLAIN on

SELECT `id`, `user_id`, `op_type`, `act_user_id`, `repo_id`, `comment_id`, `is_deleted`, `ref_name`, `is_private`, `content`, `created_unix`
FROM `action`
WHERE user_id=1 AND is_deleted=0
ORDER BY `id` DESC LIMIT 20;

there will be two indexes merged (on disk) IDX_action_is_deleted and IDX_action_user_id
it seems the is_deleted one is taken first and so the query takes around 5 seconds

there are three ways of temporarily fixing the issue

  1. creating a composite index on both fields
  2. adding a FORCE INDEX(IDX_action_user_id) to the query changes the order of the indexes used
  3. (currently the best results) just dropping the IDX_action_is_deleted index

as a result the speed for the query goes down to sub milliseconds

maybe just removing the index from the deleted field in the action model would be the easiest one?

@lunny lunny added the performance/speed performance issues with slow downs label Aug 11, 2021
@lafriks
Copy link
Member

lafriks commented Aug 11, 2021

Yes it should be composite index (user_id, is_deleted)

@noerw noerw added this to the 1.16.0 milestone Aug 23, 2021
@lunny lunny modified the milestones: 1.16.0, 1.17.0 Nov 15, 2021
@zyclonite
Copy link
Contributor Author

seems this gets even worse with 1.16.0

the workaround with dropping the deleted index does not really have an effect any longer (maybe there is another slow query)

@wxiaoguang

This comment was marked as outdated.

@42wim
Copy link
Member

42wim commented Mar 11, 2022

We're hitting the same issue, using use INDEX(IDX_action_created_unix) gets a 130 times speed-up.
It seems xorm.io doesn't support those index hints though (I didn't see a way to do it)

SELECT `id`, `user_id`, `op_type`, `act_user_id`, `repo_id`, `comment_id`, `is_deleted`, `ref_name`, `is_private`, `content`, `created_unix` 
FROM `action` use INDEX(IDX_action_created_unix) 
WHERE repo_id IN (SELECT id FROM repository WHERE (`repository`.is_private=false AND `repository`.owner_id NOT IN (SELECT id FROM `user` WHERE type=1 AND visibility IN ("private"))) OR `repository`.id IN (SELECT repo_id FROM `access` WHERE `access`.user_id=1 AND `access`.mode>0) OR `repository`.owner_id=1 OR `repository`.id IN (SELECT `team_repo`.repo_id FROM team_repo INNER JOIN team_user ON `team_user`.team_id = `team_repo`.team_id WHERE `team_user`.uid=1) OR (`repository`.is_private=false AND `repository`.owner_id IN (SELECT `org_user`.org_id FROM org_user WHERE `org_user`.uid=1))) AND user_id=1 AND is_deleted=false ORDER BY `created_unix` DESC LIMIT 20;

For those who build their own gitea and use mysql, you can find a quick hack/fix on 42wim@5854c79

@lunny
Copy link
Member

lunny commented Mar 14, 2022

There is an issue for that https://gitea.com/xorm/xorm/issues/1456 . Is that the index names or the columns names?

@42wim
Copy link
Member

42wim commented Mar 14, 2022

it's an index name, more information about almost the same issue on https://www.percona.com/blog/2012/12/14/the-optimization-that-often-isnt-index-merge-intersection/

@zeripath
Copy link
Contributor

zeripath commented Apr 23, 2022

There are a lot of indices on the Action table and I think that most of them are likely to be useless, unhelpful and unused.

So, I think we need to look again at these indexes in action (and across Gitea as whole)

Current Status

To this end let's start with the current status:

type Action struct {
	ID          int64 `xorm:"pk autoincr"`
	UserID      int64 `xorm:"INDEX"` // Receiver user id.
	OpType      ActionType
	ActUserID   int64                  `xorm:"INDEX"` // Action user id.
	ActUser     *user_model.User       `xorm:"-"`
	RepoID      int64                  `xorm:"INDEX"`
	Repo        *repo_model.Repository `xorm:"-"`
	CommentID   int64                  `xorm:"INDEX"`
	Comment     *Comment               `xorm:"-"`
	IsDeleted   bool                   `xorm:"INDEX NOT NULL DEFAULT false"`
	RefName     string
	IsPrivate   bool               `xorm:"INDEX NOT NULL DEFAULT false"`
	Content     string             `xorm:"TEXT"`
	CreatedUnix timeutil.TimeStamp `xorm:"INDEX created"`
}

Now... we need to think about our common queries.

  • @42wim do you have a list of common SQL queries on the actions table? It might be useful to look at what actually gets requested and set the indexes to fit those.

First thoughts

However, we can look at the code and reason a little about this:

  • Do we ever do a query on the Action table which is primarily dependent on is_deleted or uses is_deleted only?
    • No. Therefore this INDEX is useless and it should be dropped.
    • Instead we should consider adding a composite index on User and IsDeleted as this appears to be a common query.
    • We may even want to add a 3-composite and 4-composite index with act_user_id and act_user_id&repo_id
  • What about the User index?
    • Yes - in user_heatmap.go:getUserHeatmapData() which always has the created_unix in it.
    • Often however there are queries which have a RepoID in them and an act_user_id too.
  • What about the RepoID index?
    • When the user is an admin then the act_user_id may be missing. However, often this will also be present.
  • CommentID?
    • Yes this appears to be commonly by itself
  • IsPrivate?
    • Again never by itself and therefore this index is useless and actively unhelpful. We should consider a composite index
  • CreatedUnix?
    • Never alone. Again this should be dropped.

Now we need to think about our common queries. Which are (likely to be):

  1. User/Org/Team Dashboard/Feed
  2. Heatmap
  3. RepoFeed
  4. UserFeed
  5. UserProfile activity

User/Org/Team Dashboard/Feed

func Dashboard(ctx *context.Context) {
:

This first gets the heatmap data which we'll skip for now.

ctx.Data["Feeds"], err = models.GetFeeds(ctx, models.GetFeedsOptions{
RequestedUser: ctxUser,
RequestedTeam: ctx.Org.Team,
Actor: ctx.Doer,
IncludePrivate: true,
OnlyPerformedBy: false,
IncludeDeleted: false,
Date: ctx.FormString("date"),
})

*date may or may not be set.

The options are passed in to:

  • func activityQueryCondition(opts GetFeedsOptions) (builder.Cond, error) {
  • Which has essentially two forks: Admin doer or non-admin doer
    • For non-admin users, the condition will query act_user_id repo_id user_id is_deleted
    • For admin users, the condition will query user_id and is_deleted +/- repo_id
    • created_unix will rarely be queried in addition.

Heatmap

data, err := models.GetUserHeatmapDataByUserTeam(ctxUser, ctx.Org.Team, ctx.Doer)

  • with user = ctx.Doer, team = nil, doer = ctx.Doer

cond, err := activityQueryCondition(GetFeedsOptions{
RequestedUser: user,
RequestedTeam: team,
Actor: doer,
IncludePrivate: true, // don't filter by private, as we already filter by repo access
IncludeDeleted: true,
// * Heatmaps for individual users only include actions that the user themself did.
// * For organizations actions by all users that were made in owned
// repositories are counted.
OnlyPerformedBy: !user.IsOrganization(),
})

  • Again essentially two forks: Admin or non-admin
    • For non-admin users, the condition will query act_user_id repo_id, user_id
    • For admin users, the condition will query user_id +/- repo_id

RepoFeed

func ShowRepoFeed(ctx *context.Context, repo *repo_model.Repository, formatType string) {
actions, err := models.GetFeeds(ctx, models.GetFeedsOptions{
RequestedRepo: repo,
Actor: ctx.Doer,
IncludePrivate: true,
Date: ctx.FormString("date"),
})
if err != nil {
ctx.ServerError("GetFeeds", err)
return
}

The options are passed in to:

  • func activityQueryCondition(opts GetFeedsOptions) (builder.Cond, error) {
  • Which has essentially two forks: Admin doer or non-admin doer
    • For non-admin users, the condition will query act_user_id repo_id is_deleted
    • For admin users, the condition will query repo_id and is_deleted
    • created_unix will rarely be queried in addition.

UserFeed

func showUserFeed(ctx *context.Context, formatType string) {
actions, err := models.GetFeeds(ctx, models.GetFeedsOptions{
RequestedUser: ctx.ContextUser,
Actor: ctx.Doer,
IncludePrivate: false,
OnlyPerformedBy: !ctx.ContextUser.IsOrganization(),
IncludeDeleted: false,
Date: ctx.FormString("date"),
})

The options are passed in to:

  • func activityQueryCondition(opts GetFeedsOptions) (builder.Cond, error) {
  • Which has essentially two forks: Admin doer or non-admin doer
    • For non-admin users, the condition will query user_id act_user_id repo_id is_deleted
    • For admin users, the condition will query repo_id user_id is_deleted +/- act_user_id
    • created_unix will rarely be queried in addition.

UserProfile Activity

case "activity":
ctx.Data["Feeds"], err = models.GetFeeds(ctx, models.GetFeedsOptions{
RequestedUser: ctx.ContextUser,
Actor: ctx.Doer,
IncludePrivate: showPrivate,
OnlyPerformedBy: true,
IncludeDeleted: false,
Date: ctx.FormString("date"),
})
if err != nil {

  • Similar to above. but occasionally with is_private

Summary

OK what should we do:

  1. Drop the is_deleted, is_private and created_unix indices - they are useless
  2. At the minimum we should add:
  • act_user_id repo_id user_id
  • repo_id user_id
  • act_user_id repo_id
  1. Consider adding in:
  • act_user_id repo_id user_id is_deleted
  1. We could consider adding further composites with is_deleted added in to the above but it will depend on how frequent is_deleted and I think we would need to actually do some live testing on this.

I'll try to put up a PR for this.

EDIT: I forgot we always sort by created_unix which is a good reason to include the created_unix in these proposed indices.

@zeripath
Copy link
Contributor

zeripath commented Apr 23, 2022

Proposed Action struct:

// Action represents user operation type and other information to
// repository. It implemented interface base.Actioner so that can be
// used in template render.
type Action struct {
	ID          int64 `xorm:"pk autoincr"`
	UserID      int64 `xorm:"INDEX(u_ua_and_r)"` // Receiver user id.
	OpType      ActionType
	ActUserID   int64                  `xorm:"INDEX(u_ua_and_r) INDEX(ua_and_r)"` // Action user id.
	ActUser     *user_model.User       `xorm:"-"`
	RepoID      int64                  `xorm:"INDEX(u_ua_and_r) INDEX(ua_and_r) INDEX(r)"`
	Repo        *repo_model.Repository `xorm:"-"`
	CommentID   int64                  `xorm:"INDEX"`
	Comment     *Comment               `xorm:"-"`
	IsDeleted   bool                   `xorm:"NOT NULL DEFAULT false"`
	RefName     string
	IsPrivate   bool               `xorm:"NOT NULL DEFAULT false"`
	Content     string             `xorm:"TEXT"`
	CreatedUnix timeutil.TimeStamp `xorm:"INDEX(u_ua_and_r) INDEX(ua_and_r) INDEX(r) created"`
}

EDIT: I've include created_unix in the User and ActUser and Repo individual indices.

zeripath added a commit to zeripath/gitea that referenced this issue Apr 23, 2022
Improve the indices on the action table by creating a covering index that covers the
common queries and removes unhelpful indices.

Fix go-gitea#16665

Signed-off-by: Andrew Thornton <art27@cantab.net>
@techknowlogick techknowlogick modified the milestones: 1.17.0, 1.18.0 Jun 8, 2022
@zeripath
Copy link
Contributor

We need to go through all of the indices on our tables and through the databases individually to see if we can improve things.

Realistically this would require someone with a reasonably sized db to do this to get the best results but we can try to reason about things by looking at the common queries from sites like try.gitea.io

@delvh delvh modified the milestones: 1.18.0, 1.17.0 Jun 18, 2022
@zyclonite
Copy link
Contributor Author

this just got worse not better... 1.17.x up to 1.18.0 did fix the issue

@go-gitea go-gitea locked and limited conversation to collaborators May 3, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
performance/speed performance issues with slow downs
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants