Speeding up cachetxt additions #36

mattsinc · 2018-04-05T19:48:11Z

I was wondering if any of you have any feedback on ways I could speed up the cachetxt additions to the kernel cache? I know these files change occasionally, but they are relatively constant, so it seems like creating a .so file (with the values for the different components set in an array or something similar) and loading from that would improve performance of this step significantly. However, since this hasn't been done already, I wanted to check if my understanding about this component was off and there's a reason why the different kernels are being loaded into MIOpenGEMM every time it executes?

Thanks,
Matt

newling · 2018-04-07T08:47:43Z

Having one / several source files for different cachetxt files sounds like a good idea to me. Last time I checked (a few months ag0) the compilation wasn't all that slow though (?)

Note that there are no OpenCL kernels (no kernel source, no binary, nothing) when MIOpenGEMM is compiled, and so a large .cachetxt just means a large map / unordered_map. I'm not sure how large this needs to be to feel the slowdown. Also, actually generating a new cachetxt (which involves generating and compiling many OpenCL kernels) is surely slower than compiling MIOpenGEMM? So maybe I'm missing something here.

mattsinc · 2018-04-09T15:52:03Z

I'm seeing that it takes about 0.65 seconds to read the 4 cachetxt files, then parse them and choose the correct kernel. However I spend those same 0.65 seconds every time I run the program, to process the same information over and over. It seems like we only need to process this information once -- the first time I run a given commit of MIOpenGEMM.

Just to be clear, I am not proposing to make the cachetxt OpenCL kernels. Rather I am thinking of making the cachetxt files into some alternative format that doesn't require setting everything and parsing everything every time I run.

Matt

newling · 2018-04-11T20:51:23Z

I don't understand why it is so slow. It looks like there are O(10^3) entries in the cachetxt files. If it's taking O(1) seconds to (i) build the std::map and (ii) find the nearest match for a given geometry, that's like O(10^6) cycles per entry which seems like too many (each entry is just a few strings).

I think this pull request is related to the slowness : #30
Maybe @zjing14 has some ideas.

Matt are you proposing to have the std::map constructed at compile time? I'm not sure how to do this.

James

zjing14 · 2018-04-14T20:39:49Z

In #30, we had to increase the threshold of cache search time, since we added many entries into cachetxt. Otherwise, the MIOpenGEMM failed in Jenkins.

mattsinc · 2018-04-16T20:37:27Z

@newling, yeah it appears that it takes much longer now (thanks @zjing14). I don't have a perfect plan for how to fix it, but perhaps a std::map that is just set up by default in the code would work? It seems like this would tie in well with the vals unordered_map that is already used in the KernelCache class (there isn't a constructor for KernelCache, so perhaps it would just be initialized inline in kernelcache.hpp? -- although this would make kernelcache.hpp really long).

Then if/when we wanted to add a new entry, instead of adding it to a text file, we could just add another entry to the map.

Matt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speeding up cachetxt additions #36

Speeding up cachetxt additions #36

mattsinc commented Apr 5, 2018

newling commented Apr 7, 2018

mattsinc commented Apr 9, 2018

newling commented Apr 11, 2018

zjing14 commented Apr 14, 2018

mattsinc commented Apr 16, 2018 •

edited

Loading

Speeding up cachetxt additions #36

Speeding up cachetxt additions #36

Comments

mattsinc commented Apr 5, 2018

newling commented Apr 7, 2018

mattsinc commented Apr 9, 2018

newling commented Apr 11, 2018

zjing14 commented Apr 14, 2018

mattsinc commented Apr 16, 2018 • edited Loading

mattsinc commented Apr 16, 2018 •

edited

Loading