Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Due #82

Merged
merged 4 commits into from
Sep 12, 2021
Merged

Due #82

merged 4 commits into from
Sep 12, 2021

Conversation

clazarowitz
Copy link
Collaborator

As my commit message said, I forgot a lot of changes last time. You were probably quite confused about me saying I added comments to poormanfloat, added 21MHz constants, etc. It actually took me a long time to figure out what directory I had checked out from git since I have been merging into a pristine copy, and have another "dirty" copy in my Documents/arduino/libraries directory :)

Anyhow, this code looks really good. I found a few minor problems in code cleanup that really locked things in. Now, not only do I get smooth accelerations and decelerations, the delay between the channels remains constant, as it should in a deterministic system ;) Delays all work perfectly, though I didn't test up to 260 seconds.

I didn't run the make scripts...I'm probably in the minority of arduino devs, but I actually use Visual Studio, not VSCode, full on Visual Studio with Visual Micro. I've been using Visual Studio since it was just Visual C++ 4.5....yeah, that long! So while I can use vim, vscode, qt's IDE, and many others, my preference has been for Visual Studio, and at my age, I don't think that preference will change! :) I didn't want to go through and setup an entire arduino env on my ubuntu vm before committing. I did use git diff to find where winmerge inserted windows newlines, and remove those, as well as find any trailing whitespace and remove those. I also just temporarily set visual studio's clang-format to google and hit the format shortcut, so I think it should match pretty well.

You'll also see I have the manageSteppers on a timer ISR. I picked channel 5 because it is not connected to any pins. Note the confusion, TC1->TC_CHANNEL[2] is channel 5 :) Atmel's datasheet kills me :)

I think that's it. You won't hurt my feelings if you don't want the added comments in PoorManFloat. It was just what I used to figure out exactly how to generate new constants. And since I have already forgotten everything, those comments would help me (hopefully others as well) generate new constants if it was necessary.

I forgot multiple files...and I'm not sure how considering when using a
text editor for commit messages it shows you what you're about to
commit!

This code actually understands pausing (!hasSteps || e->steps==0) so
thats a major improvement.  I also have the PWM peripheral interrupts
and PIO interrupts playing perfectly together.  The waveforms are all
deterministic, slowly increase in frequency then decrease to a stop the
way its supposed to.  Channel 0 is slightly ahead of channel 1 (by about
160 microseconds).  I've run some pretty crazy longevity tests on this,
and will run an "all day" test tomorrow.  I also ran a test with crazy
acceleration so it hit max speed of 50KHz, no steps lost.  So I think
its good enough to commit.

Hopefully this is the last time
#ifndef ARDUINO_ARCH_SAM
#define ARDUINO_ARCH_SAM
#endif
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This definition of ARDUINO_ARCH_SAM causes for the other architectures errors for double defined functions. Any reason to keep this ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, now that was an old line I completely forgot about. One quirck of Visual Studio is that it doesn't pick up compiler definitions, so without the #define, all the code I was working on was greyed out, no autocomplete, no intellisense, etc. So yes, I should not be there. I had thought I'd remember it, but the time away from this caused me to easily forget it.

return true;
}
}
Copy link
Owner

@gin66 gin66 Sep 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no legal return value in false case. Better use

bool StepperQueue::isRunning() {
   return _hasISRactive;
 }

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're absolutely right. I had at one point changed the function, and yes, this was another thing forgotten by my time away from the project. I suspect I was getting by with undefined behavior. Never a good way to operate!

channelsUsed[pinToChannel(step_pin)] = true;
numChannels++;
return true;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The channel allocation should not be here. This function is only a test.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmmm....I don't even remember doing that, but yes, I think that was supposed to be in the init function...With this, I decided to make all the changes and will test as soon as I get to the bottom of the list. Obviously I spent more time on the ISRs than the support code. :)

// run manage steppers...
fas_engine->manageSteppers();
TC1->TC_CHANNEL[2].TC_CCR = TC_CCR_SWTRG | TC_CCR_CLKEN; // CLKEN and SWTRG
Copy link
Owner

@gin66 gin66 Sep 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the avr code, the interrupts are enabled before calling manageSteppers().
Reason is, that executing manageSteppers() in the interrupt context will block other interrupts on avr and as such will cause the pulse generation to have hiccups. How is this with due ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the Arm Cortex M3 might not be the big beast that the STM32H series are, but it still has some impressive hardware. The nested vectored interrupt controller is one such piece. Its unlikely to be the same NVIC on the STM32H series, but it does already allow for interrupt nesting. I intentionally set the timer 5 IRQ to a priority lower than the PIO IRQs. However, that did bring me to where I never explicitly set the PWM IRQ. (I believe 0 is default, but just like undefined behavior, suspected default isn't good enough and should be explicitly called out!) I have had that adjustment in the code and commented some other code around it. So manage steppers should be the lowest priority IRQ currently defined. PIO/PWM are priority 0 (which is the highest priority), timer5 which runs manageSteppers is run at priority 1. A user who needs this to be lower priority can easily call NVIC_SetPriority(TC5_IRQn, lowerPriority);

static const PWMCHANNELMAPPING* mapping = fas_queue[Q].mapping; \
static Pio* port = fas_queue[Q].mapping->port; \
if (!q->_hasISRactive || q->_pauseCommanded) { \
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the above commands, fas_queue[Q]. can be replaced with q->.
Should be faster, if the compiler cannot see this optimization.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm, its static, so I thought it was initialized only once. If the compiler isn't doing that, then sure, removing an indirection helps....however, there are other optimizations that could be made....I didn't go right for them and to them because I'm not sure how big an impact they will have, and technically, if the optimizer is doing its job (something I don't have a lot of faith in having hand optimized X86/SSE and AMDIL GPU code) it should be recognizing constants looked up multiple times like mapping->channel and storing them in a register that persists throughout the function...I have seen where the compiler is adamant about not storing it in a register even when register pressure is -20! So, perhaps a static const would be warranted for many of these variables. However, I had working code, I wanted it committed before I went off making changes, and further, this was a distraction from my main project using the Due, amd so while I don't want to forget about this, working code is working code, which means I can move on to the mower, get that working, then in the winter, when inevitable downtime shows up, I can come back and see the assembly the compiler is generating and decide if its worth the changes in C code. I know one side likes to argue that the compiler can see optimizations like this and is good at it, but I've seen too many times where it just doesn't, so I feel its worth the check. (For example: Get Visual Studio, GCC, Clang, or whatever your favorite optimizing compiler is to use RSI/RDI for accumulation. Even when you do not need another pointer for the rest of the function. It simply will not, and will throw stuff on the stack!)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. I have not really seen, that this is defined as static per channel. It just looks bit inconsistent, if a pointer is defined, but is not always used.

And yes, first push a working code and polish later is same, what I do. And this comment should only support the polishing.

gChannelMap[queue_num].port = g_APinDescription[step_pin].pPort;
gChannelMap[queue_num].channelMask = 1 << gChannelMap[queue_num].channel;
static bool isPWMEnabled = false;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the above statements can replace gChannelMap[queue_num]. with mapping->

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is run just once, and mapping is defined as const, since after the init function, it should never be written to. I had the same idea as you, and had made the change last night. It doesn't compile :) the one that could be replaced is the the mask the channel will use, not the setting of the variable itself, but the read of channel used to shift. Again, its in init, so I don't have emotions about it one way or the other, so if you want to cast it so it can be written to, or make the one possible change, feel free. Doesn't seem like its worth it to me. I'm totally willing to add a comment though :)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but above the pointer is defined and not using it in the subsequent statements looks just inconsistent. But again, just a polishing thing.

break;
}
_connected = true;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can replace the above switch with this one liner ?:

attachInterrupt(digitalPinToInterrupt(_step_pin), ISRQueueName(_queue_num), RISING);

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it can be. The reason it exists the way it does is from when I thought the AtmelICE would be useful. I reorganized code to not use variables in some places where speed didn't matter to make it clear what the variable was. Even that didn't work so well. I thought I removed all of that code. You can see in there I even ended up resorting to Serial.prints despite trying to use the hardware JTAG debugger! Eventually I did give up on the debugger. The hardware is still around me somewhere....but I'm not even sure where :) I'll make that change and test it as well.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, nope, I did try to remove it previously. Its because its generating a function name, so it needs to paste in the value passed. passing queue_num results in a function called Queuequeue_numISR which obviously doesn't exist ;) cant mix variables with preprocessor directives. I knew that, and tried it once already, and now twice!

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for the inconvenience. I will later check, how/if platformio can be configured to build the examples for due, too. Then this just comes out for free from github actions’ CI

uint8_t read_idx; // ISR stops if readptr == next_writeptr
uint8_t next_write_idx;
volatile uint8_t read_idx; // ISR stops if readptr == next_writeptr
volatile uint8_t next_write_idx;
Copy link
Owner

@gin66 gin66 Sep 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This volatile is problematic, because it impacts the other architectures. If this is necessary, it should be conditionally compiled for sam only. But I tend to assume, this is not needed.

Why is it important to avoid ?
In the ISR, those two variables are accessed more than once. The volatile prevents the compiler to apply any register optimization, and for avr the ISR service time being as short as possible is crucial. So this compiler optimization is needed.

Why is volatile not needed ?
For the ISR those two variables are not volatile at all. During ISR execution no one will change it. In the cyclic task/interrupt, the volatile nature is mitigated by 1.disabling the interrupts 2. not reading more than once.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm....My understanding of volatile is quite different, and perhaps its handled differently in the embedded world. volatile is a hint to the compiler to warn it that it may have been written to by another variable, and to not cache the value. I didn't test it without, but I did wonder how this ever worked anywhere. It may even be the gcc for AVR has no concept of variable caching. All it should ever do though is hint the compiler to always read the value from the source. Unfortunately in bigger systems, it is just a hint and the compiler does ignore it sometimes. It should not disable optimizations. Again, I've spent most of my time in much much much higher powered multiprocessor, kernel threaded systems. I'd be pretty annoyed though if something so basic as volatile was interpreted differently on smaller architectures.

Copy link
Owner

@gin66 gin66 Sep 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

volatile is IMHO not really a hint. It should be a must for the compiler to not cache accordingly declared variables. If a compiler is just ignoring it, then I would be surprised. How can someone then write a reliable device driver ? That’s why most peripheral registers should be defined as volatile. For example a port input register may yield on every read a different value due to changed external signals. Imagine you need to busy wait for a high on an input signal to trigger a time critical action. With a wrongly optimizing compiler this turns into an endless loop and it is impossible to solve this without resorting to assembly code.

In any “parallel” execution environment (just using interrupts), shared variables are critical and just mark those as volatile is just an easy and working way out for the developer, but sacrifices performance a bit. But avr is with one interrupt per step already close to the limits. That’s why I have explained, why I do not see the need for volatile in this case and the volatile needs to be removed for at least avr (esp perhaps not) and for sam it’s up to you/your code.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ifndef should be an ifdef, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started writing it one way, then implemented it the other....I was doing ifndef and moving the volatile, but reversed it...give me a second...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and it turns out, I'm running with that mistake, so I do not have volatile....I had intended to run an all day torture test. Its up to 12 million pulses without error, and I don't really want to stop it now, so I'll leave my extreme long term test running....guess I'll prove to myself whether it is necessary...

@gin66
Copy link
Owner

gin66 commented Sep 12, 2021

Great stuff. Thanks a lot. A pleasure to read your code and comments. I have added a few comments to the code. Please have a look at it.

For the DIR pin delay. As there are several places, where you have issued a 30 microsecond delay, better to move this to a higher level, while enqueueing commands. At that place, the delay can be easily implemented with a pause performing the dir change and latter step commands do not need to change the DIR-Pin. Additional benefit, the API can be extended to add a configurable delay (>MIN_CMD_TICKS) or turn it off.

Later I will change it accordingly and remove those delays in the sam architecture.

The only real "big" thing was the change to isRunnig, which always
returned true, caused the need for startPreparedQueue, and a small
change in order to commandAddedToQueue (moving the write pointer
increment outside of the conditional of start)

I also made the volatile rp and wp for SAM only.  Again, it should be
just a hint, and should not remove any optimizations beyond variable
caching, but gin66 is sure its unnecessary.  Its the kind of thing that
requires some longevity tests though, so I just ifdef'd it for the Due.
Unless the compiler for the AVR/ESP is written incorrectly, this should
not be able to impact performance on these architecturees unless it was
incorrectly caching the variable!

I've restested (short term) that these changes are still functional.
@clazarowitz
Copy link
Collaborator Author

Oh, another point, right now I have artificially limited the pulse rate on the Due to 50KHz - that's 10 microsecond pulses at 50% duty cycle. This is already 2.5x faster than my junk driver can manage, and 25% of the speed the CLA86T can manage. I did test it at 50KHz to make sure the ISR was keeping up. I realized I should make the make the pulse width configurable, or be based on the speed rather than hard coded, but again, I wanted to get something up and running for my project, then do more torture tests later. I ran my 50KHz test with 2 channels. So I can say for sure this works. Anything else, well, hasn't been tested :)

@gin66 gin66 merged commit b88c3cc into gin66:master Sep 12, 2021
@clazarowitz clazarowitz deleted the Due branch September 12, 2021 20:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants