Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SEE misses out showing every byte when the name cannot be found #26

Closed
RigTig opened this issue May 14, 2017 · 9 comments
Closed

SEE misses out showing every byte when the name cannot be found #26

RigTig opened this issue May 14, 2017 · 9 comments
Assignees

Comments

@RigTig
Copy link
Collaborator

RigTig commented May 14, 2017

SEE misses out showing every byte when the name cannot be found and then misses out on some names too.

For example:

see see
' CR 1- 2+ DUP @ DUP 83 94 CD 51 83 94 CD 8F .ID 1+ 5 93 U. NUF? 83 94 CC 72 7D 57 ...

A better SEE is:

see3 see
' CR 1- 2+ DUP @ DUP CD 83 4E 94 9E >NAME CD 83 49 94 AE SPACE .ID 1+ 20 5 AD 93 U. NUF? CD 83 4E 94 8A DROP 94 7D ...

What you get to see depends upon what word headers exist. The option flags in globconf.inc control the inclusion (or exclusion) of both complete words or just their headers. In this example, 'CD 83 4E' would have shown as '?branch' if the WORDS_LINKRUNTI option had been set to 1 when the STM8 was flashed.

The Forth code for SEE3 is
: SEE3 ( -- ; )
\ A simple decompiler.
\ Updated for byte machines.
' CR 1- BEGIN
2+ DUP @ DUP IF
>NAME THEN
?DUP IF
SPACE .ID 1+ ELSE
1- DUP C@ U. THEN
NUF? UNTIL
DROP
;

2fix: In forth.asm, replace
SEE3: CALLR DUPPCAT
with
SEE3: CALL ONEM
CALLR DUPPCAT

Note: I am working backwards through forth.asm and adding more to the list of words removed with the BAREBONES option in globconf.inc. I am choosing words that can be added back in NVM mode or just put in ram for a temporary use because they can be written in Forth code. I may find other issues as I pick my way through the word list.

@TG9541 TG9541 self-assigned this May 14, 2017
@TG9541
Copy link
Owner

TG9541 commented May 14, 2017

Hi RigTig,
when I started working on Dr. C.H. Ting's STM8EF SEE was a bit more broken for STC than now. At that point, some of the execution threads were already tangled (e.g. a word anded with a jump to DROP. In my pursuit of denser code, I did more things that broke SEE than I can remember:

  • use more opportunities to merge threads with tail jumps
  • use relative jumps instead CALL BRAN
  • use assembly code when it's more efficient than STC Forth
  • factor out assembly segments that are not Forth threads
  • use more tricks (e.g. optionally use registers for return values, swap X/Y)
  • ...

I can see several things here:

  1. SEE is an important tool for reverse engineering
  2. there are many opportunities for making code more dense I may have missed by optimizing ROM code (e.g. structure words like IF, THEN, or WHILE ...)
  3. you have better ideas than I have, and your Forth skills are much better than mine :-)

I'd like to propose the following:

  • please fork the repository (this makes it easier to integrate and track changes with pull requests)
  • the SEE issue gets handled in this GitHub issue SEE misses out showing every byte when the name cannot be found #26)
  • we look for more opportunities to save ROM, e.g:
    • include options for moving interpreted core words to RAM (i.e. define them before compiling the rest)
    • make the EEPROM usable for compiled code (it would also be an option to move a part of the interpreter there)

Sounds like fun :-)

@RigTig
Copy link
Collaborator Author

RigTig commented May 15, 2017

I am learning about github here, but fork is done. We'll continue our quest to further shrink STM8EF and make it even more useful in that fork.
On SEE, it is just a utility for learning and development, especially when addresses move on changing compilation and flashing options. A disassembler is also a handy tool. I might just have a crack at it, and learn lots about STM8 in the process. As SEE stands with this proposed change, it does not show whether the command for a Forth word is a jump always or a jump subroutine, but it still helps see what it actually has in it and whether the header exists.
Definitely fun and a worthwhile challenge to boot :-)

@TG9541
Copy link
Owner

TG9541 commented May 15, 2017

Great, the first step worked! It looks to me like your STM8SEF master branch was a bit out of date, and a lot of changes were missing. I put some comments here. The first and the last comment are most important. I wrote the comments in the middle before I understood what had happened:

https://github.com/RigTig/stm8ef/commit/556c9c7a67a60c508c05af90b4b1491d0e9e580f#diff-4f1b15a588a349787acbd2e56d41d6d1

Normally merging changes between two different "baselines" in git works fine, but one has to merge changes from "upstream" first (i.e. first update your forth.asm in master, then "pull" your changes to the main repository. It's also good practice, to do all developments in a development or in a feature branch (e.g. develop or barebones, not in master). This makes merging changes much easier!

I'm also in learning mode with the GitHub workflow. Here is a generic intro: https://guides.github.com/introduction/flow/

@TG9541
Copy link
Owner

TG9541 commented May 25, 2017

Good writeup by @RigTig in this commit comment.
Many good ideas there (with some more ideas from my side):

  1. define a real core vocabulary that must be linked for bootstrapping an interactive Forth
  2. keep headers in a separate memory area, so that the are only in memory while they are needed
    a. RAM: fully volatile
    b. EEPROM: extend Flash space, non volatile
    c. Flash: remove the scaffolding
  3. Move to ITC (Indirect Threaded Forth)
    a. mostly re-write ("soft" inner interpreter imposes limits to a "interrupts in Forth" feature)
    b. use a TRAP to mix-and-merge with STC
  4. Take advantage of the SWIM interface in Forth programming (ICP instead of the serial interface)
    a. generate a list of entry points for headerless core words, and combine with 2) or 3)

A (non functional) demonstration of 2a is : : RAM : $CC c, $6e @ , NVM ; (what's needed is an additional level of redirection in NAME>).

One bug to be fixed:

  • make sure ABORT" and abort" can be told apart in case-insensitive mode

@RigTig
Copy link
Collaborator Author

RigTig commented May 27, 2017

Now you are getting ahead of me!! YooHoo!! Keep it going. I'll catch up soon. But, I can contribute something useful, relevant to 4a.
I just needed to use some headerless code again, and had to go through the process of figuring out the address again. Mmm...there is always a better way. I couldn't figure out a strategy based upon the compiled image, but I can get all I need from the listing of the relocated code (forth.rst). So I wrote a utility in python to scan the forth.rst and create a list of all the headerless code available in that flash. Of course, the most useful list is one ready to be loaded into Forth, though just selecting the ones you need is far less wasteful of our valuable ram. So the list looks like:

\ Scanning out/W1209/forth.rst
: ?RXP [ OVERT $CC C, $822B ,
: TXP! [ OVERT $CC C, $8232 ,
: branch [ OVERT $CC C, $8336 ,
: EXIT [ OVERT $CC C, $834C ,
: doVar [ OVERT $CC C, $83C6 ,
...
: $COMPILE [ OVERT $CC C, $8DB9 ,
: OVERT [ OVERT $CC C, $8DDD ,
: ULOCKF [ OVERT $CC C, $8F73 ,
: LOCKF [ OVERT $CC C, $8F7E ,

So you just pick out the ones you want and copy into your favourite serial terminal. Well, not quite. In this set, you'll notice that OVERT is one of the hidden pieces of code and you cannot use a word until it is defined. So, need to just waste a byte to define OVERT as follows (obviously before you use anything else):
: OVERT [ $CC C, $8DDD , ] ;

I'll put the python code into my barebones branch so you can see it. I actually have it installed in the folder above all of my branches. Usage is simple enough. Assuming working in barebones folder:
../getHeaders.py out/W1209/forth.rst >headers.f
I suppose the headers.f really belongs in the out/W1209/ folder since it will be different for each build.

@TG9541
Copy link
Owner

TG9541 commented May 28, 2017

Hehe the discussion is now unfolding in two "issue" threads. I wrote a similar script in AWK (not the most popular scripting language these days but still incredibly useful :-)

The ultimate goal is splitting the headers from the code. But how do you feel about writing an address list to the upper 512 bytes of the 128 bytes EEPROM ;-) ? Done right the index would only depend on the order of the words in forth.rst. Pointers to words excluded in a configuration could refer to an abort word. Does this sound practical?

@RigTig
Copy link
Collaborator Author

RigTig commented Jun 3, 2017

I haven't used awk since I first started with SGML (and then XML). It is very useful, but I've forgotten most of it. Can you please put a copy of your awk script in my BAREBONES branch (or somewhere I can see it)?
EEPROM is too small for header list: 128 bytes is only 64 eForth word addresses. Maybe could use it as an experiment, but I reckon on using the top of flash memory. Maybe top-down? Not sure about any dependency upon order in source at all: after all, it is just a look up table. If code is moved, just rewrite the new address. If code is removed, then put in address for abort, or re-use (providing ensure no other word refers to it). We need another branch of STM8EF to explore this. Do you want to do it or will I? I reckon we'll need to re-capture lots of our thoughts from these issues into that branch too.

@TG9541
Copy link
Owner

TG9541 commented Jun 3, 2017

@RigTig I worked a bit on the subject above.

The attached (g)AWK file produces output like this:

: OVERT [ $CC C, $8B8C , ] ;
: \ [ OVERT $CC C, $884F ,
: abort" [ OVERT $CC C, $89CB ,
: HERE [ OVERT $CC C, $85AE ,
: HAND [ OVERT $CC C, $8457 ,
: $,n [ OVERT $CC C, $8B33 ,
: AND [ OVERT $CC C, $83DA ,
: SAVEC [ OVERT $CC C, $8DD9 ,
: IRET [ OVERT $CC C, $8DE0 ,
: NEGATE [ OVERT $CC C, $8565 ,
: HOLD [ OVERT $CC C, $8655 ,
: ."| [ OVERT $CC C, $8782 ,
: ULOCKF [ OVERT $CC C, $8D64 ,
...

genalias.zip

BTW: encoding the ITC index table in about 250 bytes is be possible if it's expanded to RAM (based on the assumption that no core routine is longer than 255 bytes).

TG9541 added a commit that referenced this issue Jun 4, 2017
@TG9541
Copy link
Owner

TG9541 commented Jun 4, 2017

The alias feature has just been added. I'd propose discussion to be continued in #27. ITC is a story in its own right.

@TG9541 TG9541 closed this as completed Jun 4, 2017
hexagon5un pushed a commit that referenced this issue Jun 6, 2017
* 'master' of https://github.com/TG9541/stm8ef:
  fixes #26 support for *RigTig style* aliases
  closes #30
  fixes #25 Z for 7-seg LED is wrong way around
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants