-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Per Dataset Caching #38
Conversation
… a global way, this converts the cache size limit to per dataset
… once the dataset is deleted.
…with RasterBands that do not have a dataset.
I realized that the GDALGetCacheMax() and GDALGetCacheMax64() are used in several places and this needs to be corrected to use the dataset cache max as necessary. I will submit a change to this soon. Additionally, new tests are likely to need to be written to test with the "GDAL_DATASET_CACHING=YES". Not quite sure yet how to do this in the framework, will look into that sometime soon. |
…heMax in various spots in the code. In many situations this should be pulling the size of the non global cache size in the situation where a dataset specific cache size was specified. Added methods to access the RBM from the Dataset and RasterBand as well.
Happy to see that you managed to do the implementation accordingly with our discussion.
|
…heMax in various spots in the code. In many situations this should be pulling the size of the non global cache size in the situation where a dataset specific cache size was specified. Added methods to access the RBM from the Dataset and RasterBand as well.
…ible order all to fix a very small problem
…ng protected object.
I am slightly concerned that with out a global cache size, the logic in gcore/rasterio.cpp might need to be revisited? Would two different sized caches be an issue? |
For rasterio.cpp we might perhaps use the min of the source and target dataset cache size for the default value of nTargetSwathSize. And for the if (bDstIsCompressed && bInterleave && nTargetSwathSize > poRBM->GetCacheMax64()) test, we should compare to the target dataset cache size. |
Removed the xxxx64 suffixes from the GDALRasterBlockManager class. |
…r during driver destruction.
…es during rasterio operations.
… a global way, this converts the cache size limit to per dataset
… once the dataset is deleted.
…with RasterBands that do not have a dataset.
…heMax in various spots in the code. In many situations this should be pulling the size of the non global cache size in the situation where a dataset specific cache size was specified. Added methods to access the RBM from the Dataset and RasterBand as well.
…heMax in various spots in the code. In many situations this should be pulling the size of the non global cache size in the situation where a dataset specific cache size was specified. Added methods to access the RBM from the Dataset and RasterBand as well.
…ng protected object.
…r during driver destruction.
…es during rasterio operations.
Sorry for the confusion, attempted to rebase like I should have the first time and it seems to have caused a lot of extra confusion in the pull request... |
…eventing a segfault when destroying a mutex that was already NULL.
Where in the automated documentation should I put the information? It seems like some of this information is currently only available via the wiki. |
Doing some testing on multithreading with the cache system, found an issue that is interesting. I keep getting this failure due to multi threading:
Basically this is caused when in GDALRasterBand, there is a deletion of a GDALRasterBlock. Like this:
Should there be a method in the GDALRasterBlockManager that handles the deletion of GDALRasterBlocks such that it will not delete them if the lock count is higher then 0? Additionally, this would mean that a mutex would need to be wrapped around the GDALRasterBlock->AddLock() and GDALRasterBlock->DropLock() |
… once the dataset is deleted.
…with RasterBands that do not have a dataset.
…heMax in various spots in the code. In many situations this should be pulling the size of the non global cache size in the situation where a dataset specific cache size was specified. Added methods to access the RBM from the Dataset and RasterBand as well.
…ng protected object.
…r during driver destruction.
…es during rasterio operations.
… a global way, this converts the cache size limit to per dataset
…with RasterBands that do not have a dataset.
…heMax in various spots in the code. In many situations this should be pulling the size of the non global cache size in the situation where a dataset specific cache size was specified. Added methods to access the RBM from the Dataset and RasterBand as well.
…eventing a segfault when destroying a mutex that was already NULL.
…atasets in a multi-threaded manner.
Conflicts: gdal/gcore/gdalrasterblock.cpp
Started the RFC, I know it probably needs more details. http://trac.osgeo.org/gdal/wiki/rfc47_dataset_caching I still haven't been able to have time to learn how the testing suite works in order to add the tests for this code properly as I have been using my own code mostly to test it. |
Regarding "The current scope of this mutex makes it lock for the entire read and writing process of the underlying driver", unless I'm seriously mistaken, this is wrong. At least for the read part that occurs outside of the rasterblock mutex. The writing may occur under the mutex from GDALRasterBlock::Internalize() when the block cache is saturated and they are dirty writable blocks to flush. I'd like also a clear explanation in the RFC of what is thread-safe and what is not. I'm afraid people will think that using a GDAL dataset handle is now thread-safe, whereas it is generally not, except in the rare cases where the underlying driver is thread-safe, which must be pretty much only the case of the MEM driver. And I think it is not even completely the case. For example GDALRasterBand::GetMaskBand() is not thread-safe. Most Set methods in MEMRasterBand are not thread-safe either. I'm still worrying that we can have dead-lock issues with the per-dataset rasterblock mutex and the per-band mutex. Imagine a 2 band dataset whose IReadBlock() would call GetLockedBlockRef() on the other band (similar to what is done in the GTiff driver for example). Now imagine that thread T1 calls GetLockedBlockRef on block (1,1) of band 1, at the same time that thread T2 calls GetLockedBlockRef on block(1,1) of band 2. In T1: As block (1,1) of band 1 is not cached yet, it goes to calling IReadBlock() under the per-band1 mutex. Then IReadBlock() calls GetLockedBlockRef on block(1,1) of band 2, but per-band2 mutex is already taken by thread T2. Same with T2 w.r.t T1. Dead-lock. Regarding the code : The comment of GDALRasterBlockManager::SafeLockBlock() about "touching" the block no longer apply in the code since the call to Touch() has been suppressed In GDALRasterBand::Fill(), I think all changes are whitespace changes GDALRasterBlock::AddLock() and DropLock() could likely be implemented with CPLAtomicInc() and CPLAtomicDec() instead of full mutex locking. It is inconsistant to take the mutex in GDALRasterBlock::MarkDirty() and not in MarkClean() GDALRasterBlock::Delete() should perhaps be renamed MarkForDeletion() or something like that GDALRasterBand::FlushBlock(): why the last 2 operations on the block are not in the mutex ? (not saying this is wrong, but probably deserves some comments) GDALRasterBlock::Internalize() : the comment about flushing other blocks is no longer up-to-date Regarding testing, as this heavily implies multi-threading, which Python is of no help to test, some parts of the test will have to be written as C/C++ code in autotest/cpp. |
It is wrong, I was mistaken in my thinking of the previous scope of the mutex. I will correct it here soon, it does lock during writing within GDALRasterBlock::Internalize() as you stated. This is where it remained locked the longest typically during my testing. Working on addressing the other items as quickly as possible |
…s into more distinct roles
…o make locking and unlocking more efficient.
This should perhaps help fixing, or at least workarounding, the following issue seen randomly on the gcc52_stdcpp14_sanitize Travis-CI target. e.g on https://api.travis-ci.org/v3/job/351015753/log.txt Running gdrivers/gdalhttp.py... TEST: http_1 ... success ERROR 1: Range downloading not supported by this server! ERROR 1: Request for 8-407 failed TEST: http_2 ... success TEST: http_3 ... success ERROR 4: `/vsicurl/ftp://download.osgeo.org/gdal/data/gtiff/utm.tif' not recognized as a supported file format. TEST: http_4 ... HTTP service for ftp://download.osgeo.org/gdal/data/gtiff/utm.tif is down (HTTP Error: ftp error: 425 Security: Bad IP connecting.) cannot open URL skip TEST: http_5 ... success ERROR 1: JSON parsing error: unexpected character (at offset 0) TEST: http_6 ... success ==31274==WARNING: AddressSanitizer failed to allocate 0x62d00019040f bytes ==31274==AddressSanitizer's allocator is terminating the process instead of returning 0 ==31274==If you don't like this behavior set allocator_may_return_null=1 ==31274==AddressSanitizer CHECK failed: ../../.././libsanitizer/sanitizer_common/sanitizer_allocator.cc:147 "((0)) != (0)" (0x0, 0x0) #0 0x7f3f527259f4 (/home/travis/build/OSGeo/gdal/install-gcc-5.2.0/lib64/libasan.so.2.0.0+0x9e9f4) #1 0x7f3f5272a453 in __sanitizer::CheckFailed(char const*, int, char const*, unsigned long long, unsigned long long) (/home/travis/build/OSGeo/gdal/install-gcc-5.2.0/lib64/libasan.so.2.0.0+0xa3453) #2 0x7f3f526a7461 (/home/travis/build/OSGeo/gdal/install-gcc-5.2.0/lib64/libasan.so.2.0.0+0x20461) #3 0x7f3f527286d5 (/home/travis/build/OSGeo/gdal/install-gcc-5.2.0/lib64/libasan.so.2.0.0+0xa16d5) #4 0x7f3f526acb3d (/home/travis/build/OSGeo/gdal/install-gcc-5.2.0/lib64/libasan.so.2.0.0+0x25b3d) #5 0x7f3f526adbd5 (/home/travis/build/OSGeo/gdal/install-gcc-5.2.0/lib64/libasan.so.2.0.0+0x26bd5) #6 0x7f3f5271e32e in realloc (/home/travis/build/OSGeo/gdal/install-gcc-5.2.0/lib64/libasan.so.2.0.0+0x9732e) #7 0x7f3f3ea2a940 in VSIRealloc /home/travis/build/OSGeo/gdal/gdal/port/cpl_vsisimple.cpp:814 #8 0x7f3f3e8e3958 in VSICurlHandleWriteFunc /home/travis/build/OSGeo/gdal/gdal/port/cpl_vsil_curl.cpp:839 #9 0x7f3f2ff61442 in Curl_client_write (/usr/lib/x86_64-linux-gnu/libcurl-gnutls.so.4+0x16442) #10 0x7f3f2ff8bc46 in Curl_pp_readresp (/usr/lib/x86_64-linux-gnu/libcurl-gnutls.so.4+0x40c46) #11 0x7f3f2ff621ab (/usr/lib/x86_64-linux-gnu/libcurl-gnutls.so.4+0x171ab) #12 0x7f3f2ff64546 (/usr/lib/x86_64-linux-gnu/libcurl-gnutls.so.4+0x19546) #13 0x7f3f2ff61bbf (/usr/lib/x86_64-linux-gnu/libcurl-gnutls.so.4+0x16bbf) #14 0x7f3f2ff61cd1 (/usr/lib/x86_64-linux-gnu/libcurl-gnutls.so.4+0x16cd1) #15 0x7f3f2ff6776b in Curl_disconnect (/usr/lib/x86_64-linux-gnu/libcurl-gnutls.so.4+0x1c76b) #16 0x7f3f2ff7c170 in curl_multi_cleanup (/usr/lib/x86_64-linux-gnu/libcurl-gnutls.so.4+0x31170) #17 0x7f3f3e902c6b in ClearCache /home/travis/build/OSGeo/gdal/gdal/port/cpl_vsil_curl.cpp:2913 #18 0x7f3f3e8fc9f7 in ~VSICurlFilesystemHandler /home/travis/build/OSGeo/gdal/gdal/port/cpl_vsil_curl.cpp:2564 #19 0x7f3f3e8fcf23 in ~VSICurlFilesystemHandler /home/travis/build/OSGeo/gdal/gdal/port/cpl_vsil_curl.cpp:2569 #20 0x7f3f3e9e4a02 in VSIFileManager::~VSIFileManager() /home/travis/build/OSGeo/gdal/gdal/port/cpl_vsil.cpp:1805 #21 0x7f3f3e9e5baa in VSICleanupFileManager /home/travis/build/OSGeo/gdal/gdal/port/cpl_vsil.cpp:1966 #22 0x7f3f3e533e42 in GDALDriverManager::~GDALDriverManager() /home/travis/build/OSGeo/gdal/gdal/gcore/gdaldrivermanager.cpp:262 #23 0x7f3f3e53418d in GDALDriverManager::~GDALDriverManager() /home/travis/build/OSGeo/gdal/gdal/gcore/gdaldrivermanager.cpp:329 #24 0x7f3f3e53a14e in GDALDestroyDriverManager /home/travis/build/OSGeo/gdal/gdal/gcore/gdaldrivermanager.cpp:898 #25 0x7f3f27794ea3 in ffi_call_unix64 (/usr/lib/python2.7/lib-dynload/_ctypes.so+0x1aea3) #26 0x7f3f277948c4 in ffi_call (/usr/lib/python2.7/lib-dynload/_ctypes.so+0x1a8c4) #27 0x7f3f277852c1 in _ctypes_callproc (/usr/lib/python2.7/lib-dynload/_ctypes.so+0xb2c1) #28 0x7f3f27785aa1 (/usr/lib/python2.7/lib-dynload/_ctypes.so+0xbaa1) #29 0x4c2645 in PyObject_Call (/usr/bin/python2.7+0x4c2645) #30 0x537589 in PyEval_EvalFrameEx (/usr/bin/python2.7+0x537589) #31 0x5376f1 in PyEval_EvalFrameEx (/usr/bin/python2.7+0x5376f1) #32 0x5376f1 in PyEval_EvalFrameEx (/usr/bin/python2.7+0x5376f1) #33 0x53e2af in PyEval_EvalCodeEx (/usr/bin/python2.7+0x53e2af) #34 0x536c45 in PyRun_StringFlags (/usr/bin/python2.7+0x536c45) #35 0x53ecb4 in PyRun_SimpleStringFlags (/usr/bin/python2.7+0x53ecb4) #36 0x51e62d in Py_Main (/usr/bin/python2.7+0x51e62d) #37 0x7f3f504657ec in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x217ec) #38 0x41bad0 (/usr/bin/python2.7+0x41bad0) git-svn-id: https://svn.osgeo.org/gdal/trunk@41669 f0d54148-0727-0410-94bb-9a71ac55c965
Updated the way caching work so that caches can be set per dataset or in a global way, this converts the cache size limit to per dataset is the per dataset option is selected.