Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add API to get Index Metrics #74

Merged
merged 15 commits into from
Mar 7, 2022
Merged

Add API to get Index Metrics #74

merged 15 commits into from
Mar 7, 2022

Conversation

Jiaweihu08
Copy link
Member

@Jiaweihu08 Jiaweihu08 commented Feb 9, 2022

Add an API to QbeastTable to retrieve OTree index metrics more easily!

This is how would you use it:

val qbeastTable = QbeastTable.forPath(spark, tmpDir)
val metrics = qbeastTable.getIndexMetrics()

The metrics included so far are:
General index metadata:

  • desiredCubeSize
  • number of cubes
  • depth
  • average fan out of the cubes
  • dimension count
  • number of rows

Some more specific details such as depthOverLogNumNodes = depth / log(cubeCounts), depthOnBalance = depth / log(rowCount/desiredCubeSize), both logs use base = dimensionCount.

We also take a closer look at the non leaf cube sizes. NonLeafCubeSizeDetails contains their min, max, quantiles, and how far each of the cube sizes are from the desiredCubeSize.

Map[CubeId, CubeStatus] of the index is also returned, since some of the information stored in CubeStatus are interesting to analyze, such as the distribution of cube weights for different indexes.

You can access this information through metrics.cubeStatuses.

Example output:

OTree Index Metrics:
dimensionCount: 2
elementCount: 1001
depth: 2
cubeCounts: 7
desiredCubeSize: 100
avgFanOut: 2.0
depthOverLogNumNodes: 0.7124143742160444
depthOnBalance: 0.6020599913279624
Non-lead Cube Size Stats:
(All values are 0 if there's no non-leaf cubes):
- min: 3729
- firstQuartile: 3729
- secondQuartile: 4832
- thirdQuartile: 5084
- max: 5084
- dev: 2.0133907E7

@Jiaweihu08 Jiaweihu08 requested review from cugni and osopardo1 February 9, 2022 14:47
@osopardo1
Copy link
Member

Seems good! Whenever you can, @Jiaweihu08 , please resolve the conflicts and merge it!

@Jiaweihu08 Jiaweihu08 marked this pull request as ready for review February 28, 2022 11:10
@Jiaweihu08 Jiaweihu08 requested a review from osopardo1 February 28, 2022 11:10
@Jiaweihu08
Copy link
Member Author

Jiaweihu08 commented Feb 28, 2022

Shouldn't we also add IndexStatuses as well? @osopardo1
The distribution of weights can also be interesting to analize. We can include it in IndexMetrics and leave toString untouched.

@osopardo1
Copy link
Member

Shouldn't we also add IndexStatuses as well? @osopardo1 The distribution of weights can also be interesting to analize. We can include it in IndexMetrics and leave toString untouched.

Do you mean the Cube and Weight map? Because I don't know if including the files makes sense. But yeah, sure!

Also, let's include a document in docs where we explain the API (sorry, I missed doing this part when I added new methods). Let me know if you take the task or I will!

@Jiaweihu08
Copy link
Member Author

Shouldn't we also add IndexStatuses as well? @osopardo1 The distribution of weights can also be interesting to analize. We can include it in IndexMetrics and leave toString untouched.

Do you mean the Cube and Weight map? Because I don't know if including the files makes sense. But yeah, sure!

Also, let's include a document in docs where we explain the API (sorry, I missed doing this part when I added new methods). Let me know if you take the task or I will!

Yes, I meant Cube Statuses. I'll do the documentation. Thanks!

@Jiaweihu08 Jiaweihu08 merged commit 73571e8 into Qbeast-io:main Mar 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants