Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please, use other/create new {SourcePos, Pos} datatypes #940

Closed
Anton-Latukha opened this issue May 23, 2021 · 3 comments
Closed

Please, use other/create new {SourcePos, Pos} datatypes #940

Anton-Latukha opened this issue May 23, 2021 · 3 comments

Comments

@Anton-Latukha
Copy link
Collaborator

Anton-Latukha commented May 23, 2021

The megaparsec discussion thread that spurred this report: mrkkrp/megaparsec#450

Thoughts about it and these data types.

Seems like {megaparsec, parsec, text-position, located, attoparsec}, sorry me please, make a "not so good" design in this question.

And I am talking about much more than having a Monoid.

"Positioning in the document" seems like the perfect case of "The information should reveal its nature in the data type & type-level code".

Aka to say, positioning is not only a Monoid. Files, texts, expressions have BOF, & EOF (latter is determined by size) & positioning should never go out beyond BOF & EOF, so positioning is Bounded, where {minBound=BOF; maxBound=EOF}. So when positioning is created - the EOF is determined & <> accounts to that, EOF is at least a right zero of the <>, 0 is mempty of the <>. Positioning can be absolute & delta, absolute positions can not be added, deltas can (but never go out of bounds) and absolute <> delta is legitimate.

It is clearly a specific data type with type-level logic, not "just Int".

Implementation of it would allow Pos, SourcePos, SrcSpan to become Monoid and become intuitive, fast, typesafe & would clean-up the code as a result.

It may be more of an idea to a new package then relating to HNix.

@expipiplus1
Copy link
Collaborator

expipiplus1 commented Nov 1, 2021

As mentioned before (#746), it would actually be convenient to have the byte offset reported somewhere. IIRC it would be possible (and faster) to parse into an annotated tree with byte offsets. If line and column are needed later then one could just map byteposition -> line*col over the annotations (in O(n)).

@Anton-Latukha
Copy link
Collaborator Author

Anton-Latukha commented Nov 2, 2021

I obviously regret writing some of these reports sometimes. I mean, why I opened that issue.

What you are saying about byte offset.
In any case, again - the type should be abstracted by newtype first, instances introduced, code around newtype polymorphed, then internal type can be changed easily. Or even polymorph also.

If position is changed to bytes - it would work faster & AST would look simplier/tidier, but the source code address would be unreadable by a human, so then the pretty show needs to be made for human to read in error messages.

And also byte offset is very low level & synchronization between bytecount & UTF8(to which I standardized codebase) or other character type - can desync, so byte address can point into the middle of a character. Also Megaparsec API ifaik alien to byte offset.

So I propose - what I do in the code for last 1-2 years - newtype, abstract & polymorph. It is always a part of useful work done already for any case of events. I want to eventually arrive at the project that is very easy to read (despite my current fascination with point free style & lifting code away from []), understand & modify into the needed shape.

@Anton-Latukha
Copy link
Collaborator Author

Anton-Latukha commented Nov 2, 2021

Related with #940

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants