Skip to content
This repository has been archived by the owner on Feb 27, 2024. It is now read-only.

Speeding up cachetxt additions #36

Open
mattsinc opened this issue Apr 5, 2018 · 5 comments
Open

Speeding up cachetxt additions #36

mattsinc opened this issue Apr 5, 2018 · 5 comments

Comments

@mattsinc
Copy link

mattsinc commented Apr 5, 2018

I was wondering if any of you have any feedback on ways I could speed up the cachetxt additions to the kernel cache? I know these files change occasionally, but they are relatively constant, so it seems like creating a .so file (with the values for the different components set in an array or something similar) and loading from that would improve performance of this step significantly. However, since this hasn't been done already, I wanted to check if my understanding about this component was off and there's a reason why the different kernels are being loaded into MIOpenGEMM every time it executes?

Thanks,
Matt

@newling
Copy link
Contributor

newling commented Apr 7, 2018

Having one / several source files for different cachetxt files sounds like a good idea to me. Last time I checked (a few months ag0) the compilation wasn't all that slow though (?)

Note that there are no OpenCL kernels (no kernel source, no binary, nothing) when MIOpenGEMM is compiled, and so a large .cachetxt just means a large map / unordered_map. I'm not sure how large this needs to be to feel the slowdown. Also, actually generating a new cachetxt (which involves generating and compiling many OpenCL kernels) is surely slower than compiling MIOpenGEMM? So maybe I'm missing something here.

@mattsinc
Copy link
Author

mattsinc commented Apr 9, 2018

I'm seeing that it takes about 0.65 seconds to read the 4 cachetxt files, then parse them and choose the correct kernel. However I spend those same 0.65 seconds every time I run the program, to process the same information over and over. It seems like we only need to process this information once -- the first time I run a given commit of MIOpenGEMM.

Just to be clear, I am not proposing to make the cachetxt OpenCL kernels. Rather I am thinking of making the cachetxt files into some alternative format that doesn't require setting everything and parsing everything every time I run.

Matt

@newling
Copy link
Contributor

newling commented Apr 11, 2018

I don't understand why it is so slow. It looks like there are O(10^3) entries in the cachetxt files. If it's taking O(1) seconds to (i) build the std::map and (ii) find the nearest match for a given geometry, that's like O(10^6) cycles per entry which seems like too many (each entry is just a few strings).

I think this pull request is related to the slowness : #30
Maybe @zjing14 has some ideas.

Matt are you proposing to have the std::map constructed at compile time? I'm not sure how to do this.

James

@zjing14
Copy link
Contributor

zjing14 commented Apr 14, 2018

In #30, we had to increase the threshold of cache search time, since we added many entries into cachetxt. Otherwise, the MIOpenGEMM failed in Jenkins.

@mattsinc
Copy link
Author

mattsinc commented Apr 16, 2018

@newling, yeah it appears that it takes much longer now (thanks @zjing14). I don't have a perfect plan for how to fix it, but perhaps a std::map that is just set up by default in the code would work? It seems like this would tie in well with the vals unordered_map that is already used in the KernelCache class (there isn't a constructor for KernelCache, so perhaps it would just be initialized inline in kernelcache.hpp? -- although this would make kernelcache.hpp really long).

Then if/when we wanted to add a new entry, instead of adding it to a text file, we could just add another entry to the map.

Matt

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants