Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Are frequent parameter changes dangerous? #158

Closed
gerw opened this issue Feb 7, 2024 · 6 comments
Closed

Are frequent parameter changes dangerous? #158

gerw opened this issue Feb 7, 2024 · 6 comments
Labels
question Further information is requested

Comments

@gerw
Copy link
Collaborator

gerw commented Feb 7, 2024

Recently, I came across an interesting post which reads

I can't comment much on older Luxtronik versions, but if you would reverse engineer, you would see that the 2.1 Luxtronik has a Micron 1.8V 128MB NAND, which pretty certainly means it's one of those 100k erase count chips, and trust me they do balance the eraseblocks. There is an Atmel ARM-board in the luxtronik. Without going too much into the details my take is that changing a setting once or twice a day isn't going to wear that thing out any time soon, but I wouldn't go excessive on those settings changes.

Similarly, a Finnish post (maybe by the same person) translates to

The little birds sang that Luxtronik has Micron's 128 megabyte 1.8V Nand flash, which based on that voltage is most likely SLC technology and lasts up to 100,000 erase cycles per erase block. I hear that on average there are not fifty writes per erase block per year, so we can get to the point that the luxtronik will never die in terms of the number of writes. Even if you fiddle with its settings several times a day, it won't.

So I think tuning the relays is pointless. That Nand is very durable, something the Germans have done right.

Now I am wondering what "I wouldn't go excessive" really means.

I tried to make up some numbers. The first thing I don't know is the size of the erase blocks. In what follows, I use 128kb, but I might be totally wrong. Maybe we can use the featurebug and get this information via ssh? If we have 100k erase cycles per erase block, this means that we have 100M "write actions" (each one writing one erase block).

The next thing that we need to know is how many periodically occurring write actions we have.

  • Every two hours, the controller writes the DTA file NewProc (size 350 kb). [This can be seen by looking on the time stamp, since this file is served by the web server.] This means 3 write actions + file system overhead.
  • For each parameter change, the file appl_param2 is removed, appl_param1 is moved to appl_param2 and appl_param1 is rewritten. [Again, these files are served by the the server.] It is not clear to me, how many write actions this needs (in particular, the file system overhead again).
  • Every two hours, there is a parameter update (related to Heizgrenze), maybe only in the heating period?

Is there any idea how to estimate the file system overhead? Would it help to know the file system?

I think there are no other frequently occurring write actions. I think we can safely ignore firmware updates and other stuff which only happens now and then.

So, let us assume that each DTA file write takes 5 write action and each parameter change takes 3 write actions. Crunching some numbers, I get the following life time estimates:

without any parameter changes: 2851.9 years
1 parameter change per hour:   1629.7 years
10 parameter change per hour:   335.5 years
1 parameter change per minute:   62.0 years
1 parameter change per second:    1.1 years

These numbers should be taken with a huge bag of salt (as I do not have any clue what is happening on the NAND level). However, the lifetime of 2850 years fits nicely with the 50 erase cycles per block per year from above.

If these numbers are roughly correct, writing a parameter each second will kill your heat pump controller within one year. Moreover, one should bear in mind that these numbers could easily be wrong by one or two orders of magnitude. In this case, even 1-2 parameter changes per hour could be dangerous in the long term.

Does anybody have some ideas how to check/validate/improve these numbers?

@gerw
Copy link
Collaborator Author

gerw commented Feb 10, 2024

I just asked my heat pump:

# ubinfo /dev/ubi0
ubi0:
Volumes count:                           6
Logical eraseblock size:                 129024
Total amount of logical eraseblocks:     1024 (132120576 bytes, 126.0 MiB)
Amount of available logical eraseblocks: 62 (7999488 bytes, 7.6 MiB)
Maximum count of volumes                 128
Count of bad physical eraseblocks:       0
Count of reserved physical eraseblocks:  10
Current maximum erase counter value:     179
Minimum input/output unit size:          2048 bytes
Character device major/minor:            253:0
Present volumes:                         0, 1, 2, 3, 4, 5

The erase block size is really 128kb. If I further interpret this numbers correctly, I have (at least) one erase block with 179 erases, the others have lower counts. My heat pump is now almost three years old, again the numbers align well with the above posts. Since October 2023, I was changing parameters quite often (50-100 times per day).

At the time of the last boot (229 days ago), the counters were differently:

# dmesg | grep "NAND device" -A 75 
NAND device: Manufacturer ID: 0xc8, Chip ID: 0x61 (Unknown ESMT NAND 128MiB 1,8V 8-bit)
Scanning device for bad blocks
1 cmdlinepart partitions found on MTD device atmel_nand
Creating 1 MTD partitions on "atmel_nand":
0x000000000000-0x000008000000 : "UBI"
UBI: attaching mtd0 to ubi0
UBI: physical eraseblock size:   131072 bytes (128 KiB)
UBI: logical eraseblock size:    129024 bytes
UBI: smallest flash I/O unit:    2048
UBI: sub-page size:              512
UBI: VID header offset:          512 (aligned 512)
UBI: data offset:                2048
UBI: attached mtd0 to ubi0
UBI: MTD device name:            "UBI"
UBI: MTD device size:            128 MiB
UBI: number of good PEBs:        1024
UBI: number of bad PEBs:         0
UBI: max. allowed volumes:       128
UBI: wear-leveling threshold:    4096
UBI: number of internal volumes: 1
UBI: number of user volumes:     6
UBI: available PEBs:             62
UBI: total number of reserved PEBs: 962
UBI: number of PEBs reserved for bad PEB handling: 10
UBI: max/mean erase counter: 117/91
UBI: image sequence number: 0
[...]

@BenPru
Copy link
Collaborator

BenPru commented Feb 10, 2024

Great question. I asked me the same in the past.
So I'm very interested.

@gerw
Copy link
Collaborator Author

gerw commented Feb 11, 2024

I read a little bit about the file system UBIFS, which is at work on the heat pump. I realized, that I have a misconception in my first post: On the NAND memory, the data is written in chunks of 2048 bytes (a "page"). An erase block has to be erased, if all pages in it are written. Hence, the situation should be a little bit better than the estimates in my first post, since the 9 kB of appl_param1 can be written in 5 pages (+ file system overhead) and not on a full erase block.

However, I still think that one should not attempt to change parameters every second. If one limits oneself to one change per minute, one should be safe, but I will offer no guarantees...

@Bouni
Copy link
Owner

Bouni commented Feb 12, 2024

Do you think we should add a default throttle that prevents high frequent writes (but can be overridden If a user wants to and is aware of the shortening of the heatpumps lifespan)?

@Bouni Bouni added the question Further information is requested label Feb 12, 2024
@gerw
Copy link
Collaborator Author

gerw commented Feb 12, 2024

I am not sure. I can imagine that the situation is a little bit better than in my rough calculations, and then the frequent writes should not be a problem. If they really are a problem, then this is rather a bug in the Luxtronik controller itself and should be fixed by the firmware.

I would vote for adding a warning to the README (maybe including a link to this issue).

@Bouni
Copy link
Owner

Bouni commented Feb 12, 2024

I would vote for adding a warning to the README (maybe including a link to this issue).

Sound legit! Let's do it this way 👍🏽

gerw added a commit that referenced this issue Feb 12, 2024
@gerw gerw mentioned this issue Feb 12, 2024
gerw added a commit that referenced this issue Feb 13, 2024
@gerw gerw closed this as completed Feb 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants