Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stripping control codes #462

Open
spreadred opened this issue Feb 24, 2014 · 14 comments
Open

Stripping control codes #462

spreadred opened this issue Feb 24, 2014 · 14 comments
Assignees
Milestone

Comments

@spreadred
Copy link

I am trying to write a bot using @rules to trigger certain functions based on channel text. mIRC control codes seem to interfere with this. Is there a way an option to get willie to strip all mIRC control codes before passing the text to @rule?

@elad661
Copy link
Contributor

elad661 commented Feb 28, 2014

Right now we don't have this, but I assume the proper fix would be adding an optional boolean parameter to @rule, that will probably be named strip_control_codes.

When that boolean set to true, willie will know to strip control codes from the message before matching the regex. Message passed to the callable will be then clean from control codes as well.

As not all modules will want that, the default will be set to False.

To implement that, we'll need to make a list of all possible control codes.

@spreadred
Copy link
Author

I'm not sure if these are all the possible control codes, but this seems to work to strip the most common (maybe all). I got this expression from a user on stackexchange: regex = re.compile(r"\x1f|\x02|\x12|\x0f|\x16|\x03(?:\d{1,2}(?:,\d{1,2})?)?", re.UNICODE)

@elad661
Copy link
Contributor

elad661 commented Feb 28, 2014

I don't put code copypasted from stackoverflow into my projects.

When you write code, you should not use copy-paste, you should understand what you're doing. In the case of this regex, for example, we don't know what each of this character bytecodes is, and we don't know why re.UNICODE is used. Therefor, this should not be used.

The proper way to implement that would be reading up on these control codes. If they are standard ascii control codes, we already have comprehensive documentation. If not, we need to learn the format and make a list of all possible codes. Leaving a regex like that in will result in cryptic code which will be a pain in the ass to debug.

@spreadred
Copy link
Author

I wasn't suggesting that you implement the code from SE. I'm not a professional programmer and I'm completely new to python. I was simply trying to help. As for the usage of control codes, I'm not sure if you can find a definitive document listing them all, as they seem to differ a bit based on which client someone may be using. This might be helpful if you're interested in implementing this stripping feature, although, as I said, I'm unsure if this is an accurate or exhaustive list. http://www.ircbeginner.com/ircinfo/colors.html

@Exirel
Copy link
Contributor

Exirel commented Jan 12, 2019

Are mIRC control codes still a thing today?

@spreadred
Copy link
Author

@Exirel mIRC is still alive and well, although not under active development IIRC. If you happen to find yourself in channels dedicated to serving files via XDCC and/or other methods, you likely will encounter these control codes, which are used to make their "pack listings" stand out. Although many modern clients seem to transparently strip them.

@Exirel
Copy link
Contributor

Exirel commented Jan 12, 2019

Yeah, I saw the latest release of mIRC is from 2018, so it's alive, that's nice! I see that @dgw used some regex in #34, so maybe that could be reused to implement a @rule(..., skip_control_code=True)?

@dgw
Copy link
Member

dgw commented Jan 12, 2019

@Exirel That should be sopel-irc/sopel-extras#34, but yes. 😁

I'm actually going to retitle the issue, because referring to them as "mIRC control codes" is no longer accurate. While mIRC may have begun the trend toward formatting support for IRC, it's far from the only client to recognize formatting characters today. #1302 tracked implementation of all remaining (not yet supported by sopel.tools.formatting methods) codes known to exist in at least one client, as documented by modern.ircdocs.horse at the time.

@dgw dgw changed the title stripping mirc control codes Stripping control codes Jan 12, 2019
@Exirel
Copy link
Contributor

Exirel commented Jan 13, 2019

Hm, in this case, I even wonder if that skip_control_code argument could be False in 7.x, then True in 8.x if the usage is so common? I mean, that looks reasonable to me.

@dgw
Copy link
Member

dgw commented May 8, 2020

Implementation thought: #1768 (comment)

@dgw dgw modified the milestones: 7.1.0, 8.0.0 Aug 4, 2020
@dgw
Copy link
Member

dgw commented Aug 4, 2020

After implementation discussion on IRC, we've (@Exirel and I) decided that it'll be cleaner to do this in 8.0 after we no longer have to worry about Python 2 support.

@dgw
Copy link
Member

dgw commented Jun 22, 2021

We are now in the 8.0 dev cycle and firmly located in py3-only land. This can be implemented with the use of keyword-only arguments to the relevant decorators now.

@Exirel Exirel self-assigned this Jan 22, 2022
@Exirel
Copy link
Contributor

Exirel commented May 27, 2022

Oh no. I fell down the rabbit hole:

  • plugin.rule is one thing... but they are others: find and search for instance
  • lazy versions are painful to deal with, since the current loader spec is to just return a list of patterns, no mention of control code or anything like that
  • as pointed out by @dgw someone will, at some point, want to match both with and without control code for the same pattern, so making it really flexible is not easy at all

However, I've a counter offer to make:

from sopel import plugin

def _rule_handler(bot, trigger):
    ...

@plugin.plain  # or @plugin.plain(True)
@plugin.rule('.*')
def plain_trigger(bot, trigger):
    _rule_handler(bot, trigger)

@plugin.plain(False)  # the default
@plugin.rule('.*')
def plain_trigger(bot, trigger):
    _rule_handler(bot, trigger)

If someone is really into having both version at the same time, they always can. Making plain a single flag on a plugin callable makes everything more simple and "just work" with very few changes.

Sure, that completely throw away the "but we have to wait until Python 3+ only" thing and also it forces plugin authors to create wrappers if they want both. On the other hand... what are the odd, exactly, to match both version?

... but wait! I've another idea: instead of a boolean flag, plain (or whatever the flag will be), could be more than a binary value. For instance, what if:

@plugin.plain(plugin.PLAIN_ONLY)
def plain_trigger(bot, trigger):
    ...

@plugin.plain(plugin.RAW_ONLY)  # the default for now
def raw_trigger(bot, trigger):
    ...

@plugin.plain(plugin.PLAIN_OR_RAW)
def plain_or_raw_trigger(bot, trigger):
    ...

I'm not convinced that someone will want to say "this pattern will match if and only if it's plain while the second pattern will match only if and only if raw" on the same plugin callable... while having pattern that could match on the wrong version.

So yeah, that one last solution is my best offer.

@dgw
Copy link
Member

dgw commented Jul 13, 2022

Postponed for rework of internals to make this and other rule-related features work better. You can see some of the discussion here (the rest was on IRC).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants