Skip to content

Latest commit

 

History

History
155 lines (154 loc) · 11.5 KB

news.md

File metadata and controls

155 lines (154 loc) · 11.5 KB

Older News

  • June 18:
    • some changes under-the-hood:
      • migrated from pthreads to C++11 threads: C++11 threads are easier to use, more standard, hopefully portable-ish
      • migrated from using the bash script cocl to a new python script cocl_py as the main compilation entry-point
        • you can continue to use cocl for now, if you wish, but it seems likely to not be maintained, even if it isnt physically deleted
        • this does mean that python 2.7 is now a runtime dependency, but I think python 2.7 is relatively ubiquitous?
      • the Coriander library and executables now build ok on Windows, which isnt to say they will run on Windows, but baby steps...
    • created plugin architecture
      • see coriander-dnn for proof of concept for creating plugins :-)
      • it uses the pluggable branch of Coriander update: this branch is merged this to master now
      • the idea is that you can pick some cool functionality, that doesnt exist it, and create your own project, to implement that
      • to install a plugin, simply do eg cocl_plugins.py install --repo-url https://github.com/hughperkins/coriander-dnn
      • from then on, cocl_py will automatically add its includes and libraries when building :-)
  • June 11:
  • June 4:
    • added cmake macros cocl_add_executable and cocl_add_library
    • these replace the previous add_cocl_executable, and have the advantage that they are standard targets, that you can use with target_link_libraries and so on
    • see cmake usage
  • May 31:
    • added a developer debugging option COCL_DUMP_CONFIG, to allow easy inspection of buffers returned by kernel calls, see options
  • May 28:
    • revamped how we choose the type of buffer offsets passed into the kernels:
      • it's always done at runtime now, never at compile time
      • when you run an already built app, simply set the environment variable COCL_OFFSETS_32BIT to the string 1 to use 32-bit offsets
      • otherwise it will default to 64-bit offsets (means, can access more memory)
      • basically, unless you're using beignet, you can ignore this, and stop having to think about the 32-bit offsets variables any more :-)
    • if you build with BUILD_TESTS set to OFF, you can still build the tests, eg by doing make cocl_unittests, and you can still run them eg by doing make run-tests: just, no longer builds them by default, when you do make
  • May 27:
    • updated to LLVM 4.0. Thank you to @iame6162013 for inspiring me to do this
    • Tensorflow random_op_gpu.cc compiles and runs ok now :-). There were a few hoops to jump through, #24
  • May 20:
    • renamed to Coriander
  • May 18:
    • Presented Coriander at this year's IWOCL :-) Full IWOCL program here, and there is a link to my own slides
  • May 5:
  • May 1:
    • dnn tests pass on Radeon Pro 450, on Mac Sierra now
    • fix crash bugs in pooling forward/backward, on Mac Sierra
    • thanks to my employer ASAPP giving me use of a nice Mac Book Pro 4th Generation, with Radeon Pro 450, unit tests now pass on said hardware :-)
  • April 29:
    • Updated to latest EasyCL. This lets you use environment variable CL_GPUOFFSET to choose different gpus, eg set to 1 to use second gpu, to 2 to use third gpu, etc
  • April 15:
  • April 14:
    • added backwards implementation for convolution, including data, filters, and bias
  • April 13:
    • added CLBlast wrappers for: sgemv, sscal, saxpy
  • April 4:
    • merged in current dnn branch, which provides forward convolutional implementation for cudnn API, using im2col over Cedric Nugteren's CLBlast
    • Coriander got accepted for a technical presentation at this year's IWOCL conference :-) Conference sessions here: IWOCL 2017 Conference program
  • Nov 25:
  • Nov 24:
    • merge from branch clwriter:
      • lots of refactorization under-the-hood
      • can handle determining the address-space of functions returning pointers
      • opencl generation is at runtime now => facilitates determining address-space; and counter-intuitively is actually faster, because less OpenCL to compile by the GPU driver
  • Nov 18:
  • Nov 17:
    • merged runtime-compile branch into master branch. This brings a few changes:
      • opencl generation is now at runtime, rather than at compile time
        • this lets us build only the one specific kernel we need
        • means more information is available at generation time, facilitating the generation process
      • build on Mac OS X is more or less working, eg https://travis-ci.org/hughperkins/Coriander/builds/176580716
      • code radically refactorized underneath
      • remove --run_branch_transforms, --branches_as_switch, for now
  • Nov 8:
    • exposed generation options as cocl options, eg --run_branching_transforms, --branches_as_switch, and the --devicell-opt [opt] options
  • Nov 6:
    • created dockerfiles for Beignet and NVIDIA docker
  • Nov 5:
  • Nov 4:
    • merged in changes that remove labels and gotos, and replace them with ifs, whiles, fors. This is a bit flaky/beta/duct-tape, but the unit tests do all pass...
  • Nov 1:
    • turned on rpath, switched from static to shared compilation
  • Oct 29:
    • negative infinity float constants handled correctly now (pre-requisite for reduce_min working in tensorflow)
    • properties now return correct device name, total memory, and a few other device parameters
    • added callbacks
    • remember to cache the kernels between calls :-P (this should make things run quite a lot faster now...)
  • Oct 28:
    • denormalized generated OpenCL out of SSA form, to make it more human-readable
    • added support to pass null pointers into kernels
  • Oct 26:
  • Oct 25:
    • BLAS wrapper handles memory offsets correctly now
  • Oct 24:
    • fixed pow, min, max (beta)
  • Oct 23:
    • fixed float4s. This is a critical bug-fix, without which Eigen componentwise works less well in Tensorflow :-P
    • added BLAS, using Cedric Nugteren's CLBlast)
  • Oct 22:
    • arrays of structs can be passed to kernels again, as long as they contain no pointers
      • (structs containing pointers can be passed only by-value)
    • possible to call kernels with offsets added now, as in eg test/cocl/offsetkernelargs.cu
  • Oct 20:
    • fix bug where threadIdx.x was being incorrectly written as get_global_id instead of get_local_id ...
      • magically, the test_cuda_elementwise kernel works much better now :-)
  • Oct 18:
    • installs to /usr/local now
    • libcocl.a contains libEasyCL.a now, no need for libEasyCL.so at runtime
    • fixed bug with linking multiple compiled .cu files causing error about 'multiple definitions of __opencl_source'
  • Oct 16:
    • added streams, including kernel launch on non-default stream
    • removed pinned memory: cuMemHostAlloc now just calls malloc, see design.md for analysis and thoughts on this. Let me know if you have any ideas (eg via an issue).
    • added ability to copy to/from device memory, with an offset added
  • Oct 15:
    • fixed critical bug where return; wasnt being written out. Which didnt matter when that was at the end of a kernel. But mattered more when that was the only exit condition for a kernel :-P
    • added event handling
    • added pinned memory handling
    • added a bunch of api call implementations for getting information about the driver (mostly stubbed out for now...)
  • Oct 10:
  • Oct 8:
  • Oct 5
  • Oct 4:
    • added llvm.memcpy
    • added insertvalue
    • added dumpinttoptr, trunc, srem (beta)
  • Oct 3
    • added float4 (beta)
    • added local memory (beta)
  • Oct 2:
    • added structs
  • Oct 1:
    • first working end-to-end kernel launch, using both host-side and device-side code :-)
  • Sept 30:
    • added initial unit tests, that use pyopencl to compile the generated OpenCL code, and run tests against it
  • Sept 27:
    • first created