Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP Bijective Read/Show instances for patterns #465

Merged
merged 9 commits into from
Jul 10, 2020

Conversation

archaephyrryx
Copy link
Contributor

adds helper module Network.Socket.ReadShow for defining bijections
between types, to be used for simultaneous read-show equivalence
definitions

implements read/show instances for SocketOption, SockType, and Family
according to this paradigm

additionally removes a bug in Network.Socket.Options where
the OOBInline pattern name was unintentionally allcaps-ed in
one of the CPP ifdef branches

@vdukhovni
Copy link

I see that the COMPLETE pragmas I added in Network/Socket/Types.hs are generating unknown pragma warnings in older versions of GHC. They therefore need a CPP guard #if __GLASGOW_HASKELL__ >= 806.

While this is a not result of your pull-request, perhaps you'd be willing to add a fix for this?

@kazu-yamamoto
Copy link
Collaborator

They therefore need a CPP guard #if GLASGOW_HASKELL >= 806.

This has been done in master.

@archaephyrryx
Copy link
Contributor Author

I am taking a look at the other two types that have user-readable pattern synonyms, CmsgId and MsgFlag.

MsgFlag is a bitmask so it is non-trivial to serialize and unserialize and the pattern translation would only occur in cases where only one flag was used. I am probably not going to implement any read or show instances unless there is a compelling reason to, and even then would need to know how exactly we wanted to read and show them.

For CmsgId, I noticed that the pattern CmsgIdFd is not exported in Network.Socket, and is only available indirectly via an instance declaration of ControlMessage Fd on the RHS of the controlMessageId method. I was wondering whether the omission of CmsgIdFd from the export list of Network.Socket was intentional, as it seems that the newtype constructor CmsgId is exported, which is different from most of the other cases I have encountered.
There is also no "generic" pattern for user-specified CmsgId values, unlike SocketType, Family, and SocketOption. Granted, I am nowhere near as familiar with the specifics of network programming as anyone else contributing to this project would be, but I wanted to make sure that everything was as it should be before going ahead and implementing bijective read-show instances for CmsgId as well.

@archaephyrryx
Copy link
Contributor Author

I also noticed that the second field of CmsgId is named cmsglType and I was wondering whether the lowercase l was intentional, or whether it was meant to be cmsgType instead. It seems like this is a non-issue either way because the constructor itself is exported by Network.Socket, but its field names are not.

@archaephyrryx
Copy link
Contributor Author

I rebased onto the updated master and added instances for CmsgId. As long as everything I have implemented so far works (subject to style change on the actual Show strings), the only pattern-defined type remaining is MsgFlag, which is unique to begin with.

@kazu-yamamoto
Copy link
Collaborator

I also noticed that the second field of CmsgId is named cmsglType and I was wondering whether the lowercase l was intentional, or whether it was meant to be cmsgType instead. It seems like this is a non-issue either way because the constructor itself is exported by Network.Socket, but its field names are not.

Good catch! This is just a typo. I have fixed it in master.

@kazu-yamamoto
Copy link
Collaborator

additionally removes a bug in Network.Socket.Options where
the OOBInline pattern name was unintentionally allcaps-ed in
one of the CPP ifdef branches

Good catch! Thanks!

@kazu-yamamoto
Copy link
Collaborator

For CmsgId, I noticed that the pattern CmsgIdFd is not exported in Network.Socket, and is only available indirectly via an instance declaration of ControlMessage Fd on the RHS of the controlMessageId method. I was wondering whether the omission of CmsgIdFd from the export list of Network.Socket was intentional, as it seems that the newtype constructor CmsgId is exported, which is different from most of the other cases I have encountered.

I added CmsgIdFd to replace cbits/ancilData.c and did not export it since it is internal use only.

@vdukhovni Do you think that CmsgIdFd should be exported?

@vdukhovni
Copy link

vdukhovni commented Jul 7, 2020

For CmsgId, I noticed that the pattern CmsgIdFd is not exported in Network.Socket, and is only available indirectly via an instance declaration of ControlMessage Fd on the RHS of the controlMessageId method. I was wondering whether the omission of CmsgIdFd from the export list of Network.Socket was intentional, as it seems that the newtype constructor CmsgId is exported, which is different from most of the other cases I have encountered.

I added CmsgIdFd to replace cbits/ancilData.c and did not export it since it is internal use only.

@vdukhovni Do you think that CmsgIdFd should be exported?

I see that this is presently only used via sendFd and recvFd, which make it possible to send a file descriptor with no accompanying data, but IIRC it is in principle allowed to send both a file descriptor and data (though I have a vague memory that this may expose some portability differences between Linux and BSD).

I don't see any obvious reason why users should not be allowed to use this control message directly, provided they know it only works for unix-domain sockets.

So my instinct is that it should be exported. Users should be able to pattern match on it, and can in any case instantiate it numerically, so they may as well have access to the pattern, unless you know of a reason for them to not be able to? The documentation for the pattern in question can then explain that they are typically supposed to use it via sendFd and recvFd. But they could for example receive and process file descriptors over unix-domain sockets opportunistically, along with whatever else is sent, and would then be able to write code essentially identical recvFd for themselves:

    case (lookupCmsg CmsgIdFd cmsgs >>= decodeCmsg) :: Maybe Fd of
      Nothing      -> return (-1)
      Just (Fd fd) -> return fd

Does the above make sense to you?

@kazu-yamamoto
Copy link
Collaborator

@vdukhovni Thanks. I have exported it in master.

@kazu-yamamoto
Copy link
Collaborator

@archaephyrryx Did I answer to all questions? If you wish, I will rebase this PR onto master and merge.

@vdukhovni
Copy link

@archaephyrryx Did I answer to all questions? If you wish, I will rebase this PR onto master and merge.

I think he also asked about MsgFlag. It is a bitmapped field. How would you like to see it shown (which determines in turn what it would take to have it read back). The default Show instance? Just a number? An expression of the form (Name1 .|. ... .|. NameN)? A string of the form "Name1|...|NameN" (which can be parsed with a fromString instance)? Lots of choices here. The simplest, if perhaps not the most user-friendly is to let the derived instance stand...

@kazu-yamamoto
Copy link
Collaborator

First of all, MsgFlag is a new comer. We don't worry about the backward compatibility. So, the following is not a MUST.

The current behavior:

> MSG_OOB
MsgFlag {fromMsgFlag = 1}

I'm OK with this if it is not pre-defined. But for pre-defined ones, I would like to have:

> MSG_OOB
MSG_OOB

@vdukhovni
Copy link

First of all, MsgFlag is a new comer. We don't worry about the backward compatibility. So, the following is not a MUST.

The current behavior:

> MSG_OOB
MsgFlag {fromMsgFlag = 1}

I'm OK with this if it is not pre-defined. But for pre-defined ones, I would like to have:

> MSG_OOB
MSG_OOB

IIRC the data type in question is a Monoid over bitwise or (.|.). How would you like to display MSG_OOB <> MSG_PEEK (however unlikely a combination that may or may not be). Should it be (MSG_OOB <> MSG_PEEK) or something else? Should the identity (MsgFlag 0) value display as mempty?

@kazu-yamamoto
Copy link
Collaborator

Ah. I understand the question! Yes, it's hard. Displaying CInt (the current behavior) is good enough, I think.

adds helper module Network.Socket.ReadShow for defining bijections
between types, to be used for simultaneous read-show equivalence
definitions

implements read/show instances for SocketOption, SockType, and Family
according to this paradigm

additionally removes a bug in Network.Socket.Options where
the OOBInline pattern name was unintentionally allcaps-ed in
one of the CPP ifdef branches
implements bijective read/show instances for CmsgId

adds missing ghc-dependent cpp guards around version-dependent
pragmas and inlined haddock annotations

adds commented-out pre-implementation of bijective read/show
boilerplate for MsgFlag type
@archaephyrryx
Copy link
Contributor Author

I rebased onto the updated master to include the various changes that were made since my last push.

I think it would still be worthwhile to make the code I wrote more robust and less WIP, before it gets included in any releases at least. I haven't had a chance to rigorously test the behavior of the bijective instances I wrote, and in any case some parts could definitely be cleaned up a bit since some of what I did was proof-of-concept over polish.

As far as the actual read and show methods are concerned, all of the nullary patterns should behave correctly since there is no real ambiguity as to how to represent them. For types that have a unary pattern on a CInt, the encoding currently appends the CInt representation to the name of the generic pattern/constructor without any intervening characters, so that it is considered a single token to make parsing more easy. For patterns that take two CInts in a tuple, the first CInt is directly appended, and the second is appended after the second with an intervening '_' character so that they can be parsed from a single token.

One pathological case that @vdukhovni pointed out to me directly was that, in cases where a CInt happens to have a negative value, a '-' character would appear that would potentially cause the tokenization to break. I wanted to discuss stylistic choices in any case so I would probably wait for your feedback before merely patching that by conversion to an unsigned integral type. The choice to make all encodings single-token was largely to avoid having to deal with parentheses and precedence levels, which are mostly irrelevant as far as the nullary patterns were concerned. It also simplified the bijective definition a great deal. I am open to feedback on alternative approaches if the style isn't quite what you had in mind.

@archaephyrryx
Copy link
Contributor Author

I also wanted to point out that because CmsgIdFd is not defined for the Windows version of the code, exporting it in Network.Socket breaks the AppVeyor test suite completely.

Promotes UnsupportedSocketType pattern to first element of
pairlist to prevent patterns whose CPP macros are undefined
from greedily displaying when calling show for -1 value

Adds UnsupportedCmsgId pattern to Network.Socket.*.Cmsg
and ensured proper short-circuiting

Reimplements read/show used in Network.Socket.Posix.Cmsg
in Network.Socket.Win32.Cmsg, omitting POSIX-exclusive CmsgIdFd pattern

Removes POSIX-exclusive CmsgIdFd from Network.Socket export list
Adds UnsupportedCmsgId pattern to Network.Socket export list
@archaephyrryx
Copy link
Contributor Author

It seems that some of my commits have wiped the conversation history, so I will attempt to restate the major points I noted previously.

  1. Currently there is no CmsgIdFd pattern in Network.Socket.Win32.Cmsg and so including it in the Network.Socket export list as-is broke Travis; I temporarily fixed this by removing the export in my latest commit, but it may be appropriate to instead a (possibly empty) pattern in the Win32 version and keep the export instead
  2. Negative-valued CInts are displayed with '-' characters, which would prevent single-token parsing. I have ensured that at least -1 values for unary types and -1 -1 values for binary types are handled as unified Unsupported cases, which now properly short-circuit before they can be caught by any pattern whose CPP macro is undefined and defaults to -1.
  3. I can implement certain style-changes without breaking my current framework, such as changing the prefix name before non-nullary patterns, or adding separator characters that don't constitute a token break, but otherwise would need to refactor the code I added. Let me know what style changes, if any, you consider appropriate.
  4. I wrote some of the earlier code as a WIP proof-of-concept and not as polished library-code, so it hasn't gone through rigorous testing and could probably be cleaned up or reimplemented more elegantly. If we need to change our approach, however, that would end up being moot, so I wanted to hear your opinion on the style first.

Otherwise, I am willing to continue working on polishing some of the rougher implementations, though more major changes might take more time.

adds pattern UnsupportedSocketOption for short-circuit,
as addendum to previous commit. Also adds export of said
pattern to Network.Socket
@kazu-yamamoto
Copy link
Collaborator

  1. Currently there is no CmsgIdFd pattern in Network.Socket.Win32.Cmsg and so including it in the Network.Socket export list as-is broke Travis; I temporarily fixed this by removing the export in my latest commit, but it may be appropriate to instead a (possibly empty) pattern in the Win32 version and keep the export instead

The current policy avoids partial exporting. Things MUST be exported on all platforms. So, if CmsgIdFd is not defined on Windows, it must not be exported. Good catch!

@kazu-yamamoto
Copy link
Collaborator

  1. Negative-valued CInts are displayed with '-' characters, which would prevent single-token parsing. I have ensured that at least -1 values for unary types and -1 -1 values for binary types are handled as unified Unsupported cases, which now properly short-circuit before they can be caught by any pattern whose CPP macro is undefined and defaults to -1.

This short-circuit approach looks fine to me.

@kazu-yamamoto
Copy link
Collaborator

  1. I can implement certain style-changes without breaking my current framework, such as changing the prefix name before non-nullary patterns, or adding separator characters that don't constitute a token break, but otherwise would need to refactor the code I added. Let me know what style changes, if any, you consider appropriate.

If you can keep backward compatibility, you can do what you want.

@kazu-yamamoto
Copy link
Collaborator

  1. I wrote some of the earlier code as a WIP proof-of-concept and not as polished library-code, so it hasn't gone through rigorous testing and could probably be cleaned up or reimplemented more elegantly. If we need to change our approach, however, that would end up being moot, so I wanted to hear your opinion on the style first.

You are doing more than I expected. ;-) Please move forward.

@kazu-yamamoto
Copy link
Collaborator

Otherwise, I am willing to continue working on polishing some of the rougher implementations, though more major changes might take more time.

No rush. I would like to merge your PR when you are satisfied.


socketOptionPairs :: [Pair SocketOption String]
socketOptionPairs =
[ (UnsupportedSocketOption, "Unsupported")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to "UnsupportedSocketOption" so that the string is the same as the pattern synonym.

socketOptionBijection = Bijection{..}
where
cso = "CustomSockOpt"
defFwd = \(CustomSockOpt (n,m)) -> cso++show n++"_"++show m
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like have a space (' ') before/after ++.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving out spaces ensures that the read conversion only needs to see a single token. While the style would be improved, it would be difficult to implement without requiring more invasive changes on the other side.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless you meant space characters in the source rather than space characters in the function result.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant cso ++ show n ++ "_" ++ show m. But this is not mandatory. :-)

Unifies several common usage patterns of Network.Socket.ReadShow
for types with bijective read/show definitions, consolidating
reused boilerplate code to in-module functions
refactors definitions of _parse and _show in Network.Socket.ReadShow
to be specific to Int-like types, with a single-quote character
separating the two tuple elements

introduces _readInt and _showInt
for representation and parsing of negative numbers encoded with
underscore ('_') instead of minus ('-')

changes variable names in defRead and defShow to more appropriate
descriptors

explicitly reimplements bijections for Family and SocketType to use
`_readInt` and `_showInt` instead of regular `read` and `show` methods
of CInt; SocketOptions and CmsgId code unchanged as _parse and _show
are internally reimplemented but their calls are unchanged
@archaephyrryx
Copy link
Contributor Author

As far as this approach goes, I think that there isn't much more left to do. I am not planning on working more along these lines, though I may go with the suggestion of @vdukhovni to later implement a more feature-complete variant (on a separate branch) that doesn't ignore precedence and displays values in such a way that they would be valid in source code. As it stands, however, I am finished with this particular pull request.

@vdukhovni
Copy link

As far as this approach goes, I think that there isn't much more left to do. I am not planning on working more along these lines, though I may go with the suggestion of @vdukhovni to later implement a more feature-complete variant (on a separate branch) that doesn't ignore precedence and displays values in such a way that they would be valid in source code. As it stands, however, I am finished with this particular pull request.

@kazu-yamamoto The basic issue that remains to decide is whether we want to have Show instances that work by the book, and are valid Haskell expressions in source code. Given all the requisite imports in scope, the more strictly correct variant would not only round-trip show/read, but actually be syntactically valid.

For example, instead of GeneralFamily_2147483648 the alternative would produce GeneralFamily (-2147483648) or when part of a larger product type: P p1 ... (GeneralFamily (-2147483648)) ... pN, while of course representing the known cases by their pattern names: P p1 ... AF_INET ... pN.

I think that this PR is not far from the more strictly correct approach, and unless you'd rather go with the current version as-is, I'll encourage @archaephyrryx to take the next step...

@kazu-yamamoto kazu-yamamoto merged commit 33d0043 into haskell:master Jul 10, 2020
@kazu-yamamoto
Copy link
Collaborator

I think that this PR is not far from the more strictly correct approach, and unless you'd rather go with the current version as-is, I'll encourage @archaephyrryx to take the next step...

I have merged this PR.

@archaephyrryx Thank you for your great effort! If you took the next step, it would be appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants