Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Samsung Messages #5962

Closed
creadone opened this issue May 14, 2019 · 3 comments
Closed

Samsung Messages #5962

creadone opened this issue May 14, 2019 · 3 comments
Labels

Comments

@creadone
Copy link
Contributor

FYI

One of the clients our shorten link service informed us about the inconsistency of statistics. According to his story we double it up at least. The Client is engaged in SMS mailing and accurate statistics is important to him.

We return back Nginx logs from archive and found strange behavior ordinary user-agent, please look:

Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:24.0) Gecko/20100101 Firefox/24.0

It is a just Desktop Ubuntu user with old Firefox released at September 17, 2013. However, this user follow by uniq link three times in the second. Every time, from different IPs, users follows by uniq link three times in the second.

This is not a bot, this is not a young hacker, this is a — SMS Link Rich Preview by Samsung. Okay, we know that everyone lies but it just looks like poor-quality software.

  1. Discussion on Stackoverflow: https://stackoverflow.com/q/48068227/1597964
  2. Unanswered question from Samsung Dev: https://developer.samsung.com/forum/thread/sms-link-rich-preview/201/346325
@sgiehl
Copy link
Member

sgiehl commented Oct 28, 2019

Sorry for the late response.
As the useragent actually only contains valid details, there is nothing we could do in this library.
If you want to sort them out, that might need to be done based on the results.

@creadone
Copy link
Contributor Author

@sgiehl, not a problem. We added this UA to exception rules.

I have one question and will be glad if you take the time to answer. I support the shard device_detector based on your regexes. It's not very convenient, because each new UA can be unrecognised and I have to reprocess raw statistic data after update regexes.

A slightly easier to maintain solution is to use grammar (like BNF) based parsers. It is more flexible tool because you don't need describe in Regex rule each known UA, you can describe the types of UA and then extract data. For example i found grammar for ANTLR parser generator. It worked and I think it requires less effort to support.

Have you considered the option with a grammar based parsers?

@sgiehl
Copy link
Member

sgiehl commented Oct 28, 2019

No. Actually I haven't considered something like this yet. Will try to have a look when I have some time. But actually I'm not sure if that would make the detection faster or slower

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants