-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discussion of automated spam detection techniques (formerly spam dashboard discussion) #2377
Comments
In this URL scheme, the TypeaheadService.new.users(keyword)` (using this code: https://github.com/publiclab/plots2/blob/master/app/services/typeahead_service.rb#L15-L25) |
Let's start with a single keyword system, then follow up with a multiple keyword system. The new rich search may automatically incorporate multiple, so let's also see how that works. |
Not to derail, but adding more information: Here are examples of non-offensive single words that are present in thousands of spam profiles on our site, but aren't a sole criteria for banning a profile. Even working through a supportive interface such as publiclab.org/spam/profiles/keyword would still require massive manual time investment: payday |
Adding to the list: |
Could we achieve this through a route like |
This is a little like what @skilfullycurled has recommended on key words. Just flagging similarities |
@steviepubliclab, I completely agree with you. Two thoughts:
Perhaps instead of seeing the key words, we can just give a number or score? I think the difficult thing about seeing all of the keywords is that there could be a lot of them. We could add to the scores other factors as well. Some that I've found that I think would be pretty easy catches: a. Spam users have bios (we wouldn't know otherwise unless they post) |
Hi, this seems like a nice project that could be part of GSoC, potentially. We could workshop how to develop a minimal, and more fleshed out version of this, ranging from simply adding some kind of "score" marker to a list of recent profiles, to a more complete system with bulk spamming tools, different filtering methods, and a more flexible "spam dashboard". |
for a truly automated system, we would want to include a pretty rigorous section on identifying false positives and negatives, tuning the filter and manually evaluating the results, etc. on the other hand, just adding some filtering tools to support manual banning could be done faster, and would be a good stepping stone on the way to a more automated system, since the same methods could be used to "collect" batches for evaluation, and eventually automated spamming. Another relatively easy filter would be Askimet integration, using WordPress's pretty sophisticated spam filter: https://github.com/jonahb/akismet/ So there are a number of different possible sub-projects to be considered, prioritized, and that could make up a nice Summer of Code project. Thanks! |
Benjamin, i wonder if you are working on a paper of anything from your analysis of spam? I really appreciated your points a) b) c) d) from your above comment, especially d which i instantly recognized as a decisive pattern but would not have named it on my own! |
@barry, I recently took a very good class which inadvertently made me question the utility of the vast majority of papers. I think it was meant to make you more critical of research but I took it a bit to heart. So, no, I don't have any plans to write any papers at the moment. Admittedly, this is a bit of a problem, but I'll figure it out. As for spam in general, we'd need a novel angle which could be some sort of examination of community designed classification. I think this is an itnteresting area because right now, machine learning classifiers are developed in an all purpose manner, the goal of which is to ideally create something which is perfectly predictive. As people consider the flaws and biases of machine learning, an interesting question (I think) is what if these programs are designed and tuned by the community and they are the ones who decide which biases and trade offs they feel comfortable with. |
Regarding your comment @jywarren, I'm now leaning heavily towards your idea of building some tools which help moderators do their job with less overhead instead of trying to remove the job entirely.
Yes, this would all be a part of the training process, but this brings up a good point which makes me think we need to take a step back: Instead of asking how we can end the human cost of spam moderation, we probably should be asking where the pain points are This is what is nice about the question @ebarry's is asking here. She has a specific thing that would help her. Classification may not lessen the amount of effort it promises to. The classifier will need to continually be re-evaluated to avoid The Parable of Google Flu in which their flu predictions worked amazingly, until people's search habits changed and it didn't anymore. Ideally, when a person is a false positive, they would contact us, we'd un-ban them, and then that would be the human manual evaluation. However, we do not email people if they are marked as spam (for good reason) and so it would rely on the user logging in again to find out. That'll work technically, but it won't really be a good "Welcome to Public Lab!" experience because they probably won't think to log in until a lot of time has passed and they're wondering why they still haven't been approved. So, that may not reduce the labor involved in moderation, since the determination of the classifier would need to be confirmed anyway. My experience with Akismet is similar, but that was for personal email, I'm not sure how user friendly it is on a wordpress site. But it does give a score which is nice.
Agreed. This seems like a more prudent way to go. |
Oh wow, this is a really interesting subject! I'd love to contribute to building this system 👍 |
@Uzay-G, I'm really glad you chimed into this and the ML thread at #4660 because it made me realize that there are two parts to this.
Okay. The classification is back on! @Uzay-G, I'm going to ping you on #5450 and you can see if you're interested in that aspect (and it's okay if you're not). The nice thing about that issue is that you can get started/work on it whenever you want how ever you want. You don't have to wait for planning or anything because it's not integrated with the website, it's just a dataset from the website. |
@jywarren @skilfullycurled This seems quite interesting! Would love to contribute to this ❤️ and take it up as part of my proposal. ✌️ Keep me in the loop! 😄 |
@skilfullycurled I have been a bit busy with GCI but I will be happy to work on this when I have time 👍 |
Hi all! I'm going to rename this to narrow in on automated spam detection techniques, as the bulk moderation and filtering UI work is complete or nearly complete in the spam2 project in #7885 -- thanks! |
@jywarren here, just stepping in to try to organize some of the ideas here. I see a few different "groupings" --
having a 'users' tab as part of our spam dashboard with ban/approve etc similar to other tabsCOMPLETE in GSoC'20: "Spam Management Dashboard" Project Planning #7885(3) above is very big and would need significant planning, UI, etc etc, but 1 is relatively simple and 2 is a medium-sized project. Thanks, all!
Please describe the problem (or idea)
This idea is about extending the Spam Page to assist moderators in banning spam profiles.
The current page at https://publiclab.org/spam/ is only visible to moderators, so here's a screenshot:
What the current spam page achieves is an efficient listing of content to that moderators can more quickly "ban" without multiple page reloads.
This idea is to add a tab for viewing profiles, and to allow searching within profile bio content via URL:
https://publiclab.org/spam/people/_keyword_
or
https://publiclab.org/spam/profiles/_keyword_
An outstanding question i have is that most of my profile moderation depends on having more than one offensive keyword present. Can this sort of URL query take multiple keywords?
Thanks for thinking about this!!!
The text was updated successfully, but these errors were encountered: