Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metric for heap fragmentation #5090

Merged
merged 15 commits into from
Sep 10, 2018
Merged

metric for heap fragmentation #5090

merged 15 commits into from
Sep 10, 2018

Conversation

d-a-v
Copy link
Collaborator

@d-a-v d-a-v commented Aug 28, 2018

Available only in debug mode, to help debugging.

ESP.getHeapUnfragness() calculates "unfragmentation-ness" of the heap.
Whatever free heap value, it gives 1000 when the whole free memory is contiguous and decreases with fragmentation.
For example it gives 707 (which is 1000/sqrt(2)) for the same free size when it is in fact two equal-size continuous blocks. 999 is given with the same free size with a huge free block and a small one.

Here is a sketch log when allocating many blocks, then removing half of them (every other), then removing again half of them, with free heap and fragmentation metric:

allocating 24576 bytes with 24 blocks of 1024 bytes each
unfragness: 999/1000 (freeheap: 53304) <- start
unfragness: 999/1000 (freeheap: 28232) <- allocation
unfragness: 698/1000 (freeheap: 40712) <- freeing every other blocks
unfragness: 622/1000 (freeheap: 46952) <- freeing every other remaining blocks
unfragness: 999/1000 (freeheap: 53192) <- freeing everything
unfragness: 999/1000 (freeheap: 53304) <- end

allocating 21312 bytes with 1332 blocks of 16 bytes each
unfragness: 999/1000 (freeheap: 53304) <- start
unfragness: 998/1000 (freeheap:  5336) <- allocation
unfragness: 202/1000 (freeheap: 26648) <- freeing every other blocks
unfragness: 150/1000 (freeheap: 37304) <- freeing every other remaining blocks

(plenty of available space, but heap is in very bad shape: 150/1000)

unfragness: 999/1000 (freeheap: 47960) <- freeing everything
unfragness: 999/1000 (freeheap: 53304) <- end

Copy link
Collaborator

@devyte devyte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I think this is a must-have that should always be available to the user. It adds to a "health" metric of the esp, along with free heap, and the stack metrics.
I'd like it more without global state, though. Maybe a new function specific to the calculation or something?

#include "coredecls.h"
#include "Esp.h"

uint16_t EspClass::getHeapUnfragness(uint32_t* freeHeap)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I personally love the name, and the term "unfragmentation-ness", maybe it would be better for users to rename to something like contigousness (or an abbreviation like contig), or calculate the complement and call it fragmentation, which is a well-known term.


uint16_t EspClass::getHeapUnfragness(uint32_t* freeHeap)
{
ummFreeSize2 = 1; // enable processing
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like global state. Shouldn't these lines be protected against interrupts?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to do this without global state?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this code be explained with comments or a url or doc reference?

Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
*/

#ifdef DEBUG_ESP_PORT
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good metric to have available as a tool, just like free heap. I suggest not making it debug-only.

@@ -1025,6 +1029,11 @@ void ICACHE_FLASH_ATTR *umm_info( void *ptr, int force ) {
++ummHeapInfo.freeEntries;
ummHeapInfo.freeBlocks += curBlocks;

#ifdef DEBUG_ESP_PORT
if (ummFreeSize2)
ummFreeSize2 += (uint64_t)curBlocks * (uint64_t)sizeof(umm_block) * (uint64_t)curBlocks * (uint64_t)sizeof(umm_block);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary for this to be a uint64_t? I think for the esp it's not necessary to use 64bit arithmetic, but as a general extension to umm it may be good. I can't tell what should be done

@d-a-v
Copy link
Collaborator Author

d-a-v commented Aug 29, 2018

or calculate the complement and call it fragmentation, which is a well-known term

I have no idea. I reckon getUnfragness is quite a twisted name. I like getHeapContig() and its complement getHeapFrag() too.

ummFreeSize2 looks like global state. Shouldn't these lines be protected against interrupts?
Is there a way to do this without global state?

It is global but only used during the calculation. It's the way umm works: it uses a global structure (ummHeapInfo) and fills it from umm_info() / getFreeHeap(). Unfragness is an optional calculation within umm_info() and uses the same structure. To do this without global, umm_info() would have to change. With no side effect nor memory change this global can be moved to the (global) struct.

Can this code be explained with comments or a url or doc reference?

It's the L2 / Euclidian norm of free block sizes. We have getFreeHeap()=sum(hole-size), and
unfragness = 1000 * (sqrt(sum(hole-size²))) / sum(hole-size)
I will add that.

This is a good metric to have available as a tool, just like free heap. I suggest not making it debug-only.

As it is and if it is always enabled, it takes ~160 bytes in flash when not called, and 1KB when called (I guess sqrt is the reason, I'll check for integer version for this).

Is it necessary for this to be a uint64_t?

sqrt(2^32)=65536 so 32 bits is valid if we have always less then 64K free ram (which is the case today). I will change that.

@d-a-v
Copy link
Collaborator Author

d-a-v commented Aug 29, 2018

Now always enabled. Adds 16 bytes in flash, and another 128 bytes when used.
bonus: cheap and accurate integer square root added for everybody isqrt32()
(tested across the full 32 bits range, thanks to Tristan Muntsinger @ http://www.codecodex.com)

Copy link
Collaborator

@earlephilhower earlephilhower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still traveling so haven't looked in depth yet, but this seems like a great idea!

@devyte
Copy link
Collaborator

devyte commented Aug 31, 2018

If
unfragness = 1000 * (sqrt(sum(hole-size²))) / sum(hole-size)
then
fragness = 1000 - 1000 * (sqrt(sum(hole-size²))) / sum(hole-size)
so an unfragness of 1000 is the same as a fragmentation of 0, and they both mean one contiguous block.

@d-a-v
Copy link
Collaborator Author

d-a-v commented Sep 4, 2018

22:35:59.233 -> 
22:35:59.233 -> Filling memory with blocks of 8000 bytes each
22:35:59.233 -> frag:   1% (freeheap: 53464 - maxblock=53456) <- start
22:35:59.233 -> frag:   1% (freeheap:  5360 - maxblock= 5352) <- allocation
22:35:59.233 -> frag:  50% (freeheap: 29384 - maxblock= 8008) <- freeing every other blocks
22:35:59.233 -> frag:  30% (freeheap: 45400 - maxblock=24024) <- freeing every other remaining blocks
22:35:59.263 -> frag:   1% (freeheap: 53408 - maxblock=53400) <- freeing everything
22:35:59.263 -> frag:   1% (freeheap: 53464 - maxblock=53456) <- end
22:35:59.263 -> 
22:35:59.263 -> Filling memory with blocks of 4000 bytes each
22:35:59.263 -> frag:   1% (freeheap: 53464 - maxblock=53456) <- start
22:35:59.263 -> frag:   1% (freeheap:  1256 - maxblock= 1248) <- allocation
22:35:59.297 -> frag:  63% (freeheap: 29312 - maxblock= 5256) <- freeing every other blocks
22:35:59.297 -> frag:  49% (freeheap: 41336 - maxblock=12024) <- freeing every other remaining blocks
22:35:59.297 -> frag:   1% (freeheap: 53360 - maxblock=53352) <- freeing everything
22:35:59.297 -> frag:   1% (freeheap: 53464 - maxblock=53456) <- end
22:35:59.297 -> 
22:35:59.297 -> Filling memory with blocks of 2000 bytes each
22:35:59.297 -> frag:   1% (freeheap: 53464 - maxblock=53456) <- start
22:35:59.329 -> frag:   1% (freeheap:  1040 - maxblock= 1032) <- allocation
22:35:59.329 -> frag:  74% (freeheap: 27144 - maxblock= 2008) <- freeing every other blocks
22:35:59.329 -> frag:  63% (freeheap: 41200 - maxblock= 6024) <- freeing every other remaining blocks
22:35:59.329 -> frag:   1% (freeheap: 53248 - maxblock=53240) <- freeing everything
22:35:59.330 -> frag:   1% (freeheap: 53464 - maxblock=53456) <- end
22:35:59.330 -> 
22:35:59.330 -> Filling memory with blocks of 1000 bytes each
22:35:59.363 -> frag:   1% (freeheap: 53464 - maxblock=53456) <- start
22:35:59.363 -> frag:   2% (freeheap:   624 - maxblock=  616) <- allocation
22:35:59.363 -> frag:  81% (freeheap: 26832 - maxblock= 1008) <- freeing every other blocks
22:35:59.363 -> frag:  73% (freeheap: 39936 - maxblock= 3024) <- freeing every other remaining blocks
22:35:59.363 -> frag:   1% (freeheap: 53040 - maxblock=53032) <- freeing everything
22:35:59.363 -> frag:   1% (freeheap: 53464 - maxblock=53456) <- end
22:35:59.396 -> 
22:35:59.396 -> Filling memory with blocks of 500 bytes each
22:35:59.396 -> frag:   1% (freeheap: 53464 - maxblock=53456) <- start
22:35:59.396 -> frag:   5% (freeheap:   192 - maxblock=  184) <- allocation
22:35:59.396 -> frag:  87% (freeheap: 26400 - maxblock=  504) <- freeing every other blocks
22:35:59.396 -> frag:  81% (freeheap: 39504 - maxblock= 1512) <- freeing every other remaining blocks
22:35:59.396 -> frag:   1% (freeheap: 52608 - maxblock=52600) <- freeing everything
22:35:59.429 -> frag:   1% (freeheap: 53464 - maxblock=53456) <- end
22:35:59.429 -> 
22:35:59.429 -> Filling memory with blocks of 200 bytes each
22:35:59.429 -> frag:   1% (freeheap: 53464 - maxblock=53456) <- start
22:35:59.429 -> frag:   5% (freeheap:   192 - maxblock=  184) <- allocation
22:35:59.429 -> frag:  92% (freeheap: 25776 - maxblock=  208) <- freeing every other blocks
22:35:59.429 -> frag:  88% (freeheap: 38672 - maxblock=  624) <- freeing every other remaining blocks
22:35:59.462 -> frag:   1% (freeheap: 51360 - maxblock=51352) <- freeing everything
22:35:59.462 -> frag:   1% (freeheap: 53464 - maxblock=53456) <- end
22:35:59.462 -> 
22:35:59.463 -> Filling memory with blocks of 100 bytes each
22:35:59.463 -> frag:   1% (freeheap: 53464 - maxblock=53456) <- start
22:35:59.463 -> frag:  17% (freeheap:    48 - maxblock=   40) <- allocation
22:35:59.463 -> frag:  94% (freeheap: 24696 - maxblock=  104) <- freeing every other blocks
22:35:59.532 -> frag:  91% (freeheap: 37072 - maxblock=  312) <- freeing every other remaining blocks
22:35:59.532 -> frag:   1% (freeheap: 49344 - maxblock=49336) <- freeing everything
22:35:59.532 -> frag:   1% (freeheap: 53464 - maxblock=53456) <- end
22:35:59.532 -> 
22:35:59.532 -> Filling memory with blocks of 50 bytes each
22:35:59.532 -> frag:   1% (freeheap: 53464 - maxblock=53456) <- start
22:35:59.532 -> frag: 100% (freeheap:     8 - maxblock=    8) <- allocation
22:35:59.532 -> frag:  96% (freeheap: 22800 - maxblock=   56) <- freeing every other blocks
22:35:59.532 -> frag:  93% (freeheap: 34168 - maxblock=  168) <- freeing every other remaining blocks
22:35:59.532 -> frag:   1% (freeheap: 45536 - maxblock=45528) <- freeing everything
22:35:59.532 -> frag:   1% (freeheap: 53464 - maxblock=53456) <- end
22:35:59.532 -> 
22:35:59.532 -> Filling memory with blocks of 15 bytes each
22:35:59.532 -> frag:   1% (freeheap: 53464 - maxblock=53456) <- start
22:35:59.561 -> frag:  34% (freeheap:    24 - maxblock=   16) <- allocation
22:35:59.561 -> frag:  97% (freeheap: 15504 - maxblock=   40) <- freeing every other blocks
22:35:59.561 -> frag:  95% (freeheap: 23232 - maxblock=   72) <- freeing every other remaining blocks
22:35:59.561 -> frag:   1% (freeheap: 30960 - maxblock=30952) <- freeing everything
22:35:59.561 -> frag:   1% (freeheap: 53464 - maxblock=53456) <- end

@@ -171,6 +171,11 @@ uint32_t EspClass::getFreeHeap(void)
return system_get_free_heap_size();
}

uint16_t EspClass::getMaxFreeBlockSize(void)
{
return system_get_free_heap_size();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between this and the return via maxBlock pointer in getHeapFragmentation?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a mistake :) (the example uses getHeapFragmentation parameters)

#include "coredecls.h"
#include "Esp.h"

uint8_t EspClass::getHeapFragmentation (uint32_t* freeHeap, uint16_t* maxBlock)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it bothers me that we have a getFreeHeap() that takes no args and return only the free heap, a getMaxBlockSize() that takes no args and returns only the max block size, and this that does take 2 optional args for free heap and maxblock and returns the fragmentation. It just seems inconsistent to me. I think I'd prefer to have 3 standalone functions one for each, plus a 4th function getHeapStats() or whatever that takes 3 args for getting everything at once (performance) and returns nothing (void).
Thoughts?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the three functions (Free, MaxBlock, Fragmentation), the same umm_info() function is called which goes through every heap chunks. I figured that those who would call Fragmentation may optionally give pointers to fill those results for free (as this one may not be quite often used).
OK with getHeapStats().

@devyte
Copy link
Collaborator

devyte commented Sep 4, 2018

Awesome test sequence. I was thinking it'd be great adding that as a device test, but then we'd have to update the test every time we implement something that frees more heap, because the numbers could change. Not sure if that's desirable. Maybe just add it as an example, and include the expected output at the time of implementation?

@d-a-v
Copy link
Collaborator Author

d-a-v commented Sep 5, 2018

I will add the example.

Copy link
Collaborator

@devyte devyte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still some remaining questions at this point.

#include "coredecls.h"
#include "Esp.h"

void EspClass::getHeapStats(uint32_t* free, uint16_t* max, uint8_t* frag)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arg "free" shadows free(), arg max can shadow max() (many apps #define it). would be better to change the names.

@@ -1761,4 +1762,13 @@ size_t ICACHE_FLASH_ATTR umm_free_heap_size( void ) {
return (size_t)ummHeapInfo.freeBlocks * sizeof(umm_block);
}

uint16_t ICACHE_FLASH_ATTR umm_max_block_size( void ) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the umm code doesn't seem to rely on these uint8_t/uint16_t/uint32_t types. Maybe it would be better to stick to the types used in the rest of the code, i.e.: go low level adventurer and declare them as unsigned char/unsigned short/unsigned int

@@ -1024,6 +1024,7 @@ void ICACHE_FLASH_ATTR *umm_info( void *ptr, int force ) {
if( UMM_NBLOCK(blockNo) & UMM_FREELIST_MASK ) {
++ummHeapInfo.freeEntries;
ummHeapInfo.freeBlocks += curBlocks;
ummHeapInfo.ummFreeSize2 += (uint32_t)curBlocks * (uint32_t)sizeof(umm_block) * (uint32_t)curBlocks * (uint32_t)sizeof(umm_block);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe don't rely on these types and use basic types like in the rest of umm

if (max)
*max = ummHeapInfo.maxFreeContiguousBlocks * block_size;
if (frag)
*frag = 100 - (sqrt32(ummHeapInfo.ummFreeSize2) * 100) / fh;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would something like this be more precise, or same output? I ask because paranoid about numerical stability.
100 - (sqrt32( ummHeapInfo.ummFreeSize2 * 10000) / (fh*fh) );

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it'd be more numerically stable, but it would takes more flash (16 bytes), more ram (4 more bytes) and more times (64 instead of 32 bits ints).
About range, the current version works with at most 64K free heap.

@@ -171,6 +172,11 @@ uint32_t EspClass::getFreeHeap(void)
return system_get_free_heap_size();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure that this sdk api is the same as:

uint32_t fh = ummHeapInfo.freeBlocks * block_size;
...
*free = fh;

used in getHeapStats()? After a quick look I can't tell how the integration works between that sdk api and umm.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed this is a fw call that calls fw api'sxPortGetFreeHeapSize in which we call umm (heap.c).

@TD-er
Copy link
Contributor

TD-er commented Sep 8, 2018

This is a great addition.
Just one question, does it take a lot of resources to compute the fragmentation?

And just curious, what would be a good default allocation size for Strings to minimize fragmentation?
For example blocks of multiple of 32 Bytes for those strings that may be longer than a few bytes and have a short expected life span?

@d-a-v
Copy link
Collaborator Author

d-a-v commented Sep 9, 2018

@TD-er I did not measure time.
getFreeHeap now also internally computes the sum of (int) squares,
and calling getHeapFragmentation additionally computes a (int) square-root (15 iterations of simple 32bits operations including only one multiplication each).

And just curious, what would be a good default allocation size for Strings to minimize fragmentation?
For example blocks of multiple of 32 Bytes for those strings that may be longer than a few bytes and have a short expected life span?

Only testing would answer this tricky question because it depends on everything (how are these strings used).
I believe we can use it to compare an algorithm against another, or like you say, adjust the granularity of some structure often calling realloc. I made it to measure the mess in memory according to policies of some libraries.
Another use is for the user to trigger a reboot when one or both of HeapFragmentation and MaxFreeBlockSize go wrong.

@devyte devyte merged commit ce28a76 into esp8266:master Sep 10, 2018
@ayushsharma82
Copy link
Contributor

V2.5.0 gives me this error now:

'class EspClass' has no member named 'getHeapFragmentation'

             stats["heapFragmentation"] = ESP.getHeapFragmentation();

Did you guys just removed the getHeapFragmentation() completly?

@d-a-v d-a-v deleted the fragness branch September 13, 2019 23:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants