-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
digitalWrite is slow #58
Comments
I haven't done any benchmarks. How much faster is the 328 compared to 4809? I'm not sure how I feel about this really. On one side it's nice to keep compatibility with the official megaAVR core, but on the other side, it sucks that the speed and efficiency aren't what it could be. Maybe some improved non-constant code could be made as a PR to the official repo? What would non-constant code look like, and how much faster would it be? |
Hmm. Actually, digitalWrite is NOT so much slower than 328p, with the current cores. They both run a while(1) loop at pretty close to 150kHz. (I could swear it was worse the last time I checked. Maybe that was a PWM pin, or before the removal of the unnecessary pgmspace stuff.)
Alas, the official repository has been very resistant to "performance improvements." I guess they prefer stability to speed-ups that no one needs. I can understand that, but I was wondering whether a new 3rd party core for new chips would be a good demo/proving-ground/example. |
Wouldn't this break the "Arduino compatibility" where you pretty much can't screw things up? |
I'd propose to add "digitalWriteFast()" with the modifications (and the possibility of single instructions for the constant-based case), while leaving the original digitalWrite() function alone. |
I'm definitely up for adding a separate "extended" digitalWriteFast() function while leaving the original alone. If you post your code here I can format it the way I like it and so some testing on various hardware before I add it to the main repo. I'm sure megaTinyCore would benefit from this as well. |
Any updates on the digitalWriteFast/digitalReadFast implementation? I'd love to have "fast" digitalRead/digitalWrite implemented! |
Hmm; some progress. I made a test shield so I can see if things "seem" to be working.
|
It would be great if it were as fast as possible, without any "safety features" like digitalWrite() has. A little documentation is all that's needed. I'm fine with adding this as a core feature, and not a library. On the other side, will it take PinStatus or a boolean value as argument? Personally, I think PinStatus was and is a dumb idea that breaks lots of existing code. This core has always LTO enabled, so 4. won't be a problem. Another, slightly related thing. I'm not sure how It can be achieved, but it would be awesome if there was a way to support the "old" pinMode()/digitalRead()/digitalWrite() functions with the old arguments. However, the existing functions are written in C (and not C++), so the can't be overridden. I've tried to move them around and rename some files, but without luck. |
The fancy (and cool) LTO detection code isn't necessary if you use macros like Wiring does. I suggested this while back for MiniCore (but didn't create a separate issue for it). I suggested the same thing on the Arduino forums years ago, to no avail. Here's the board defs from Wiring. MicroCore already uses the same technique (macros instead of pin mapping arrays), so you shouldn't have any concerns over compatibility, Hans. If type safety and readability of compiler errors from macros is a concern, extern inline functions may be worth trying. |
@nerdralph the reason why I've been hesitant about replacing the digitalRead/digitalWrite functions is because I'm afraid people's projects rely on these being slow. However, as you pointed out, there are many things that affect the speed of these functions anyways. Maybe I should try to improve these functions in the next release, and see if I get any issues related to it? |
Arduino is actually a fork of Wiring, so the fact that the "upstream" repo made the optimizations should give you some confidence. Wiring optimized the digitalRead/Write functions after the Arduino fork, but Arduino did not merge any upstream changes. There seems to be no technical reasons for this; I suspect it was really about Massimo's ego. Secondly, if commonly used Arduino/Wiring libraries relied on digitalWrite/digitalRead being slow, then they would already have issues on the non-AVR Arduino cores like esp8266, STM32, or even the "official" Arduino cores like ARM (Arduino SAM & SAMD boards). About 5 years ago I got so annoyed with the bloat and lousy coding in the official AVR core that I started working on a small/optimized core for the mX8 series, based on Wiring (but never finished it). |
The macros used in wiring are really ugly, hard to maintain, and presumably make the non-fast versions slower, at least eventually (ie they don't scale, especially to the bigger chips where the mappings are increasingly ungrouped.) I wish I could talk people into ALWAYS using macros or functions for accessing pin-info arrays instead of occasionally directly accessing g_aPinDescription[] or whatever. You should be able to replace the mapping mechanism without having to change the actual driver code. :-( |
I tried to implement the same type of macros as the Wiring core file does, but I did not see a significant bump in speed or reduced compiled size. I've taken a look at the digitalWriteFast library, but this library expects constants, which I can't do because I'm trying to keep this core as compatible as possible. @WestfW have you been able to create a faster digitalWrite that's fully compatible (no compiler warnings or errors) with the "stock" one? |
Actually I find the Wiring macros reasonably clean and easy to follow. Here's an example:
I actually find the pin mapping arrays uglier and harder to maintain. Look at how tricky it is to catch the missing line in the code below, even with the comments.
Macros can be trickier to debug, so an inline function would probably be even better. |
If you tried it with MiniCore, you'd certainly see a difference. With Wiring, a digtitalWrite(13, HIGH) compiles to a single sbi instruction which takes two cycles. For Minicore, there's a few table lookups (a couple ldi instructions followed by lpm), then several more instructions to save SREG, then read/modify/write the port register. Overall it's about 10x slower and 10x the code. |
@nerdralph could you upload a code snippet that I can test on? I tried to implement the Wiring macros to MegaCore, but I did not gain any flash savings or speed. |
I think the wiring-style macros really fall apart compared to the table-based approach as the mapping between Arduino pin numbers and ports becomes more random and scattered. For example the (beginning of the) Uno WiFi 2 table looks like:
(and it goes on for 40 "pins.") Which is not going to make for a pretty "if" statement.
It's a little over-complicated because of the two styles of the array that have existed, and I've been dragging my feet on the "faster but not a single sbi/cbi" versions. |
@WestfW thanks for sharing! I did a test on an ATmega4809 running at 16MHz. I got close to 4 MHz toggle speed! But why include this code? if (digital_pin_to_bit_position[0] == 0x99) { // never true.
LTO_Not_Enabled();
}
if (port > 0x1F) {
/*
* "FASTER" port logic is low bits of PORTB struct address
* spaced every 32 bytes. PORTA is still 0.
*/
vport = (VPORT_t *) ((port / 8));
} It's never true for any of the megaAVR-0 series, and afaik they all have vports. |
I tried to use the same implementation for digitalReadFast: #define digitalReadFast(pin) \
if (__builtin_constant_p(pin) /*&& __builtin_constant_p(val)*/) {_digitalReadFast(pin);} else {digitalRead(pin);}
static inline __attribute__((always_inline)) int _digitalReadFast(uint8_t pin)
{
// Mega-0, Tiny-1 style IOPORTs
// Assumes VPORTs exist starting at 0 for each PORT structure
uint8_t mask = 1 << digital_pin_to_bit_position[pin];
uint8_t port = digital_pin_to_port[pin];
VPORT_t *vport;
// Old style port logic is a small integer 0 for PORTA, 1 for PORTB etc.
vport = (VPORT_t *)(port * 4);
// Read pin value from VPORTx.IN register
return !!(vport->IN & mask);
} However, I'm not able to do this (compiler is complaining): digitalWriteFast(LED_BUILTIN, !digitalReadFast(LED_BUILTIN)); But this works: digitalWriteFast(LED_BUILTIN, !_digitalReadFast(LED_BUILTIN)); Any idea why using the digitalReadFast macro doesn't work? |
|
Ah; probably needs to be a ternary expression in order to return a value. Good job noticing! (untested...)
|
That didn't work at all...
|
Nevermind, I figured it out: #define digitalReadFast(pin) \
(__builtin_constant_p(pin) ? /*&& __builtin_constant_p(val)*/ _drfast(pin) : digitalRead(pin)) |
I've written some prototype code for the traditional AVRs in which digitalPinToPort returns the address of the PORT. portOutputRegister returns this address cast as volatile uint8_t*, portModeRegister returns the address - 1, and portInputRegister returns the address -2. |
I am disappointed that digitalWrite() is slower on the 4809 than it is on the 328 series.
This is a feeler to see how willing "we" are to let this repository get out-of-sync with the official Arduino repository. I already have digitalWriteFast() for the case where the pin number is constant, but I can also envision an intermediate-speed version even for the non-constant case (that doesn't do the input_pullup check, for instance.)
Or, there might be value to re-doing the whole digitalPinToXXX macros and logic...
(but it depends on how close we want to stay to the source...)
The text was updated successfully, but these errors were encountered: