Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issue when viewing all packages of an organisation #28255

Closed
Mik4sa opened this issue Nov 28, 2023 · 14 comments · Fixed by #30520
Closed

Performance issue when viewing all packages of an organisation #28255

Mik4sa opened this issue Nov 28, 2023 · 14 comments · Fixed by #30520
Labels
performance/speed performance issues with slow downs topic/packages type/bug

Comments

@Mik4sa
Copy link

Mik4sa commented Nov 28, 2023

Description

In addition to #25953 there is atleast on more very slow scenario for a very lot of packages when viewing all packages of an organisation. The total time was about 4m.

I discovered atleast two very slow queries:

exec sp_executesql N'SELECT * FROM [package_version] LEFT JOIN [package_version] [pv2] ON (package_version.package_id = pv2.package_id AND (package_version.created_unix < pv2.created_unix OR (package_version.created_unix = pv2.created_unix AND package_version.id < pv2.id))) AND pv2.is_internal=@p1 INNER JOIN [package] ON package.id = package_version.package_id WHERE package_version.is_internal=@p2 AND package.owner_id=@p3 AND (pv2.id IS NULL) ORDER BY [package_version].[created_unix] DESC OFFSET 0 ROWS FETCH NEXT 20 ROWS ONLY',N'@p1 bit,@p2 bit,@p3 bigint',@p1=0,@p2=0,@p3=2

and

exec sp_executesql N'SELECT count(*) FROM [package_version] LEFT JOIN [package_version] [pv2] ON (package_version.package_id = pv2.package_id AND (package_version.created_unix < pv2.created_unix OR (package_version.created_unix = pv2.created_unix AND package_version.id < pv2.id))) AND pv2.is_internal=@p1 INNER JOIN [package] ON package.id = package_version.package_id WHERE package_version.is_internal=@p2 AND package.owner_id=@p3 AND (pv2.id IS NULL)',N'@p1 bit,@p2 bit,@p3 bigint',@p1=0,@p2=0,@p3=2

The first one run 2m46s and the second one 1m20s.
There might be more slow queries.

I'm happy to help and execute queries for test if necessary

Gitea Version

Gitea version 1.21.1 built with GNU Make 4.3, go1.21.4 : bindata, sqlite, sqlite_unlock_notify

Can you reproduce the bug on the Gitea demo site?

No

Log Gist

No response

Screenshots

No response

Git Version

git version 2.39.1.windows.1

Operating System

Windows Server 2016 - Version 1607 (Build 14393.6452)

How are you running Gitea?

  • Using gitea-1.21.1-windows-4.0-amd64.exe from your download page
  • Registered as a Windows service with the following command line: D:\gitea\gitea.exe web --config D:\gitea\custom\conf\app.ini

Database

MSSQL

@lunny lunny added the performance/speed performance issues with slow downs label Nov 28, 2023
@lunny
Copy link
Member

lunny commented Nov 28, 2023

What's your Gitea version.

@Mik4sa
Copy link
Author

Mik4sa commented Nov 28, 2023

Hi @lunny, as stated above Gitea version 1.21.1

@KN4CK3R
Copy link
Member

KN4CK3R commented Nov 28, 2023

Could you please check if the queries use indices? All used fields should be covered but maybe we need a combined index.

@Mik4sa
Copy link
Author

Mik4sa commented Nov 28, 2023

Does this help you? This is the execution plan of the first query. I hope german is fine. Tell me otherwise
image

Note: As far as I could see all fields have an index, yes. Also I just refreshed all statistics using the maintenance plan "Gitea"
My workmate just told me that he has created that maintenance plan. He isn't sure though ;)

@Mik4sa
Copy link
Author

Mik4sa commented Nov 28, 2023

When moving the condition (pv2.id IS NULL) from the WHERE clause to the LEFT JOIN clause the query is fast. It finishes in under 1 second. Shouldn't be this fine?

So instead of

exec sp_executesql N'SELECT * FROM [package_version] LEFT JOIN [package_version] [pv2] ON (package_version.package_id = pv2.package_id AND (package_version.created_unix < pv2.created_unix OR (package_version.created_unix = pv2.created_unix AND package_version.id < pv2.id))) AND pv2.is_internal=@p1 INNER JOIN [package] ON package.id = package_version.package_id WHERE package_version.is_internal=@p2 AND package.owner_id=@p3 AND (pv2.id IS NULL) ORDER BY [package_version].[created_unix] DESC OFFSET 0 ROWS FETCH NEXT 20 ROWS ONLY',N'@p1 bit,@p2 bit,@p3 bigint',@p1=0,@p2=0,@p3=2

I executed this one:

exec sp_executesql N'SELECT * FROM [package_version] LEFT JOIN [package_version] [pv2] ON (package_version.package_id = pv2.package_id AND (package_version.created_unix < pv2.created_unix OR (package_version.created_unix = pv2.created_unix AND package_version.id < pv2.id))) AND pv2.is_internal=@p1 AND (pv2.id IS NULL) INNER JOIN [package] ON package.id = package_version.package_id WHERE package_version.is_internal=@p2 AND package.owner_id=@p3 ORDER BY [package_version].[created_unix] DESC OFFSET 0 ROWS FETCH NEXT 20 ROWS ONLY',N'@p1 bit,@p2 bit,@p3 bigint',@p1=0,@p2=0,@p3=2

@KN4CK3R
Copy link
Member

KN4CK3R commented Nov 28, 2023

I hope german is fine.

Immer doch! 😄

I executed this one:

exec sp_executesql N'SELECT * FROM [package_version] LEFT JOIN [package_version] [pv2] ON (package_version.package_id = pv2.package_id AND (package_version.created_unix < pv2.created_unix OR (package_version.created_unix = pv2.created_unix AND package_version.id < pv2.id))) AND pv2.is_internal=@p1 AND (pv2.id IS NULL) INNER JOIN [package] ON package.id = package_version.package_id WHERE package_version.is_internal=@p2 AND package.owner_id=@p3 ORDER BY [package_version].[created_unix] DESC OFFSET 0 ROWS FETCH NEXT 20 ROWS ONLY',N'@p1 bit,@p2 bit,@p3 bigint',@p1=0,@p2=0,@p3=2

That should just produce a fast SELECT * FROM package_version result I guess.

Could you provide screenshots from mouse-hovering the "Parallelität" blocks?

@Mik4sa
Copy link
Author

Mik4sa commented Nov 28, 2023

I executed this one:

exec sp_executesql N'SELECT * FROM [package_version] LEFT JOIN [package_version] [pv2] ON (package_version.package_id = pv2.package_id AND (package_version.created_unix < pv2.created_unix OR (package_version.created_unix = pv2.created_unix AND package_version.id < pv2.id))) AND pv2.is_internal=@p1 AND (pv2.id IS NULL) INNER JOIN [package] ON package.id = package_version.package_id WHERE package_version.is_internal=@p2 AND package.owner_id=@p3 ORDER BY [package_version].[created_unix] DESC OFFSET 0 ROWS FETCH NEXT 20 ROWS ONLY',N'@p1 bit,@p2 bit,@p3 bigint',@p1=0,@p2=0,@p3=2

That should just produce a fast SELECT * FROM package_version result I guess.

I'm unsure what you want to tell me with that. The changed query from above is fast. I just tested it. But personally I'm just not 100% sure whether the query is still the same and correct. So that I didn't modified it in a way so it returns different results

Could you provide screenshots from mouse-hovering the "Parallelität" blocks?

Here the images for the first, original query:
image


image


image


image


image

@KazzmanK
Copy link
Contributor

KazzmanK commented Nov 28, 2023

If you add
OPTION (MAXDOP 1)
it will remove all the parallel stuff from query plan and make it easier to tune and identify issues.

looks like query gets latest package versions. Not so good design to handle large amount of data.
Probably larger ID belongs to larger created_unix, so no need of OR, OR is evil here.
Also , select top 1 with ties + order by row_number () over(partition by version order by id desc) may perform better, but mssql specific.

@KN4CK3R
Copy link
Member

KN4CK3R commented Dec 5, 2023

@Mik4sa Could you please test how fast/slow this query is?
SELECT * FROM package_version WHERE package_version.id IN (SELECT MAX(package_version.id) FROM package_version INNER JOIN package ON package.id = package_version.package_id WHERE package_version.is_internal=0 AND package.owner_id=2 GROUP BY package_version.package_id) ORDER BY package_version.created_unix DESC, package_version.id ASC
It should list all latest versions of packages with the owner = 2 like your query does. In my tests there are no differences in the result set but it's way faster.

@Mik4sa
Copy link
Author

Mik4sa commented Dec 5, 2023

Yes, the query is fast, under 1 second.
The result set is identical but (just in case you missed) not in terms of sorting and not in terms of column count (everything after the first download_count column is "missing").

Also, you search for the latest package version by id as it seems. Should we change this so that a date is used? I mean, what if the ids get reused some day for example?

@Mik4sa
Copy link
Author

Mik4sa commented Feb 20, 2024

@KN4CK3R
What is the current status of this? The query was blazing fast and I would be very happy to have this in the final product.
Is there anything missing here?

@Mik4sa
Copy link
Author

Mik4sa commented Apr 15, 2024

@KN4CK3R Still no update? I thought we get this done very fast. It felt so when you were first replying.
Are there any open tasks?

@lunny
Copy link
Member

lunny commented Apr 16, 2024

I will do some investigations.

@KN4CK3R
Copy link
Member

KN4CK3R commented Apr 16, 2024

Sorry, didn't published my changes back then. Created #30520.

silverwind pushed a commit that referenced this issue Apr 20, 2024
Fixes #28255

The new query uses the id field to sort by "newer". This most not be
correct (usually it is) but it's faster (see #28255).
If someone has a better idea, please propose changes.

Co-authored-by: Giteabot <teabot@gitea.io>
GiteaBot added a commit to GiteaBot/gitea that referenced this issue Apr 20, 2024
Fixes go-gitea#28255

The new query uses the id field to sort by "newer". This most not be
correct (usually it is) but it's faster (see go-gitea#28255).
If someone has a better idea, please propose changes.

Co-authored-by: Giteabot <teabot@gitea.io>
lunny pushed a commit that referenced this issue Apr 21, 2024
Backport #30520 by @KN4CK3R

Fixes #28255

The new query uses the id field to sort by "newer". This most not be
correct (usually it is) but it's faster (see #28255).
If someone has a better idea, please propose changes.

Co-authored-by: KN4CK3R <admin@oldschoolhack.me>
@go-gitea go-gitea locked as resolved and limited conversation to collaborators Jul 19, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
performance/speed performance issues with slow downs topic/packages type/bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants