Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Only initialize BusinessCalendar caches when data is accessed and improve caching performance #5378

Merged
merged 57 commits into from
Oct 23, 2024

Conversation

chipkent
Copy link
Member

@chipkent chipkent commented Apr 17, 2024

Makes BusinessCalendar caches populate only when accessed to avoid slow worker initialization. Additionally, caching in various operations has been improved to yield better performance.

Resolves #5377

Benchmark:

n = 1_000_000
npart = 10
t = emptyTable(n)
        .updateView("Part = ii%npart", "Today = todayLocalDate()")

tp = t.partitionBy("Part").proxy()

c = calendar()

timeit = { name, table, query ->
    c.clearCache()
    start = System.currentTimeMillis()
    rst = table.update(query)
    end = System.currentTimeMillis()
    println("TIME: ${name}: ${end-start} ms")
}

istart = parseInstant("2022-01-01T01:01:01 ET")
iend = parseInstant("2024-06-01T01:01:01 ET")

timeit("calendarDayOld-NORMAL", t, "CD = c.calendarDay(Today)")
timeit("calendarDayOld-PART", tp, "CD = c.calendarDay(Today)")

timeit("diffBusinessYearsOld-NORMAL", t, "CD = c.diffBusinessYears(istart, iend)")
timeit("diffBusinessYearsOld-PART", tp, "CD = c.diffBusinessYears(istart, iend)")

println("DONE")

Performance:

// 1M - NEW
//TIME: calendarDayOld-NORMAL: 266 ms
//TIME: calendarDayOld-PART: 299 ms
//TIME: diffBusinessYearsOld-NORMAL: 5937 ms
//TIME: diffBusinessYearsOld-PART: 6664 ms <<<<

// 1M - v0.33.3
//TIME: calendarDayOld-NORMAL: 366 ms
//TIME: calendarDayOld-PART: 327 ms
//TIME: diffBusinessYearsOld-NORMAL: 20384 ms
//TIME: diffBusinessYearsOld-PART: 5549 ms <<<<

// 10M - NEW
//TIME: calendarDayOld-NORMAL: 449 ms
//TIME: calendarDayOld-PART: 548 ms
//TIME: diffBusinessYearsOld-NORMAL: 58883 ms
//TIME: diffBusinessYearsOld-PART: 67362 ms <<<<

// 10M - v0.33.3
//TIME: calendarDayOld-NORMAL: 522 ms
//TIME: calendarDayOld-PART: 704 ms
//TIME: diffBusinessYearsOld-NORMAL: 195916 ms
//TIME: diffBusinessYearsOld-PART: 49411 ms <<<<

Note1: v0.33.0 times do not include initialization times, but the new values do.
Note2: There appears to be contention or initialization overhead in the partitioned table case yielding slightly worse performance.

@chipkent chipkent added this to the 1. March 2024 milestone Apr 17, 2024
@chipkent chipkent self-assigned this Apr 17, 2024
@chipkent chipkent changed the title Only initialize BusinessCalendar caches when data is accessed Only initialize BusinessCalendar caches when data is accessed and improve caching performance Apr 18, 2024
@chipkent chipkent added NoDocumentationNeeded NoReleaseNotesNeeded No release notes are needed. devrel-watch DevRel team is watching labels Apr 19, 2024
@chipkent chipkent requested a review from jjbrosnan April 19, 2024 04:35
Copy link
Contributor

@jjbrosnan jjbrosnan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have any specific comments on the code. I was unable to break things when testing this locally. I tested against docs CI, and ran my own tests locally with no issues.

Copy link
Member

@devinrsmith devinrsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also see #5378 (comment)

@chipkent chipkent changed the title Only initialize BusinessCalendar caches when data is accessed and improve caching performance feat: Only initialize BusinessCalendar caches when data is accessed and improve caching performance Jun 24, 2024
Copy link
Member

@rcaudy rcaudy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some trivial comments.

@chipkent chipkent requested a review from rcaudy October 3, 2024 22:09
Minimizing the lifetime of recomputed values.
@chipkent chipkent requested a review from rcaudy October 23, 2024 19:23
Copy link
Member

@rcaudy rcaudy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed enough, has great test coverage.

@chipkent chipkent enabled auto-merge (squash) October 23, 2024 19:27
@chipkent chipkent merged commit 1635218 into deephaven:main Oct 23, 2024
16 checks passed
@chipkent chipkent deleted the bus_cal_init_speed branch October 23, 2024 19:53
@github-actions github-actions bot locked and limited conversation to collaborators Oct 23, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BusinessCalendar initialization is causing initialization delays
5 participants