A bot that tweets any time devs swear on GitHub (opens in new tab)

(thenextweb.com)

50 pointsdeepsy8y ago31 comments

31 comments

26 comments · 10 top-level

uiri8y ago· 4 in thread

Author here. I made this almost five years ago when I was in high school. The Scunthorpe problem is real - "shit" is used in a lot of good compound swears.

The code isn't public because I was concerned about people taking it to make a more popular version of the same thing. Not that it is difficult to glue together the two APIs. It is also an embarrassing mess of around 150 lines of Python.

One issue with linking to the commits or repo is naming & shaming. The other is, as I mentioned, people trying to get on the bot intentionally.

It scans a GitHub API once a minute so as not to put noticeable strain on their API. I think the firehose of constant commit messages has only gotten worse.

tzs8y ago

I did the communications server for a small online gaming site once, and it had a profanity filter on the chat channels. The approach we took mostly avoided the Scunthorpe problem, and also caught when people tried to sneak profanity past it by splitting it across words (e.g., knowing that "motherfucker" would be filtered, someone might try "motherfu cker", or maybe even "m0th3rfu ck3r").

Here is what we did, as far as I recall. This worked a heck of a lot better than I expected it to.

1. Make a copy of the text and replace all the common non-letters that look somewhat like letters with those letters. E.g., 0 => O, 1 => L, 3 => E, 4 => A, 7 => T, @ => A.

2. Do a spell check on each word. Set a flag on each character that comes from a word that passed the spell check.

3. Scan the sequence of characters, ignoring all punctuation and white space, looking for sequences that match known profanity.

4. For each sequence that matches, mark it as profanity if any of its characters were not flagged in step #2.

As described above this probably would not scale well to high traffice because of the spell check. That could be improved by changing the order. Look for potential profanity first. Only if some is found would you then do the spell check, and you would only need to spell check the words that span the potential profanity.

Daviey8y ago

>code isn't public because I was concerned about people taking it to make a more popular version of the same thing.

Why would that be a bad thing?

painbody8y ago

Some interesting collisions here: https://en.wikipedia.org/wiki/Scunthorpe_problem

"Problems can occur with the words socialism, socialist, and specialist because they contain the substring Cialis, the brand name for an erectile dysfunction medication commonly advertised in spam e-mails."

flavio818y ago

When i saw this, i thought the reason for making this bot was obvious:

Measuring code quality in "WTFs per minute" (as detailed on this famous comic - http://i.imgur.com/J1svNp7.jpg)

The less WTF/min, the better the code is.

painbody8y ago· 5 in thread

I fear this will be used to further arguments that programmers create a hostile environment.

I know people in all occupations swear, but this puts the focus on it and allows people to quantify it.

Just imagine the article:

"One report showed that the f-word was used over 5,000 times in a single month. No other profession that's been measured has showed near this level of profane, unwelcome environment. How are stay-at-home parents supposed to feel welcome in this community?"

Klathmon8y ago

Note: This kind of has less to do with your comment than I thought it was going to, but I feel it's still worth submitting.

I think we as society will soon need to learn how to "forgive and forget".

As technology becomes more ubiquitous, we are quickly getting to the point of having the majority of our time in this world recorded and stored permanently. Every mistake, every poor decision, every fashion fad, every comment, every choice seared into the never forgetting internet.

We will need to move away from the idea that people can't change, and that something a person did 10 years ago means anything significant today. We should celebrate changing opinions, not attack someone for "waffling". We should be giving significantly less weight to actions and opinions far in the past, and more to things that happen more recently.

There are commits on some projects i've worked on that I absolutely wouldn't word that way now. There are silly jokes and things like it in messages that I wouldn't do now. But my past is my past, and it honestly says very little about my future. I know people that are consistent with some things throughout their whole lives, and others that change drastically over the course of a year.

It won't be pretty, but we will need to learn as a species that people make mistakes, and that shouldn't be a lifelong never-ending mark against someone, and in some cases the opposite is true! I lost a significant amount of money years ago because a hard drive died and my backup hard drive died as well, I now have a militant backup strategy which I am unwavering on because of that experience.

I don't think we should shy away from tools like this, in fact I think they will be more important in the future. Expose people's "skeletons" that are hidden in plain sight. Show the world that it's not a big deal. It will hurt at first, but hopefully over time society will begin to change course, and "character assassination" style headlines will be a thing of the past when "Person did something bad 10 years ago" isn't a big deal since they have 9 years of "newer" history showing they are not the same as they were.

People fear the unknown, people fear change. Showing them that the "foul programmer" is no different than they are is (in my opinion) a better way of dealing with the problem than hiding it and hoping the public doesn't find out.

cr0sh8y ago

> No other profession that's been measured has showed near this level of profane, unwelcome environment.

Author obviously has never worked a blue collar job...

taxpayer98y ago

I'm sorry, in what way is the word "fuck" creating a hostile environment? It depends on context. Compare this commit message:

"Get the fucking login button working"

"Fuck painbody, they don't know how to wire up a login button"

I write a lot of commit messages with "offensive" language, but it's not directed at anybody and it's just to blow off steam. How are stay-at-home parents supposed to feel welcome in this community? Release the death grip on their pearls and realize that people swear and that there's no way a "bad word", on its own, can hurt you. Unless someone is directing harsh language at you, you have no reason to be offended by the word "fuck". Good god, it's just a word.

SeanDav8y ago

> "Good god, it's just a word"

Extremely naive to think that just because it is a word, it does not matter.

A single word in the wrong place or at the wrong time, could destroy your career, make you a social pariah or even get you killed. Words are powerful.

2 more replies

hanoz8y ago

"Good god, it's just a word" works marvelously when applied to words which offended our parents' and grandparents' ears, but good luck with that defense when you've offended the current generation's sensibilities.

hasbot8y ago· 2 in thread

> one bored Microsoft programmer has built a Twitter bot

Hmm, I'm not sure I would want to be labeled, in public, as a "bored Microsoft programmer." His manager is wondering "Did he write this on Microsoft time?" "What else is he doing to alleviate his boredom?" "I give him plenty of work to do so why is he bored?"

hooksfordays8y ago

Problem is, this was developed 5 years ago, well before the developer joined Microsoft. I understand where you're coming from, but the article is misleading.

wutbrodo8y ago

I don't think Microsoft has the best work culture in the world, but this is just a cartoonishly evil caricature. The phrase doesn't imply that he was bored while on the job, and it's downright neo-Dickensian to imagine a manager thinking that an employee being bored in his off hours must need more work.

strictnein8y ago

The Twitter account:

https://twitter.com/gitlost

leejo8y ago

Erazal8y ago· 1 in thread

I just sweared in a commit and it did not appear. Makes me wonder how the tweeter API works, how often it updates, etc... Off to discover new horizons !

_qxjp8y ago

Can confirm, it doesn't always work. Most likely due to Twitter's 2400 tweet/day API limit

mrighele8y ago

A bit more evil would be to do something similar every time someone mentions "password" or "secret" in a commit message...

MrQuincle8y ago· 4 in thread

Moby Dick is not a swearword :-)

amarraja8y ago

We had a small bespoke customer support system we wrote back in 2000 or so. The system would allow an operator to type the message body, and would automatically generate the mail text e.g. "Dear {customer name}, {body}, Regards {operator name}". Cutting edge stuff back then!

One day, we get a call that an operator couldn't send an email since the third word was classed as "offensive", however the message body was fine. After firing up the debugger, it became obvious... the customer's surname was "Dick".

We never got to the bottom of how to solve it, so hacked something in. I wonder how many times Mr. Dick has issues with false positives in profanity filters.

atomwaffel8y ago

Probably often enough to be used to it, although it's still bad design and needs to be fixed. (I'm not sure why a non-public-facing system needed a profanity filter in the first place.) https://en.wikipedia.org/wiki/Scunthorpe_problem

I find it interesting that this Twitter bot seems to have the same problem in reverse: it can't reliably filter out things that aren't offensive.

Edit: Thinking about it, it's really still the same problem, i.e. false positives when trying to automatically determine whether or not a given string contains swearwords.

bpicolo8y ago

`fix: previousHitLabel and nextHitLabel for intl norwegian`

needs some \b

1 more reply

jmkni8y ago

Neither is Hancock

pvinis8y ago

Would be nice to have the repo where the commit happened too.

sAbakumoff8y ago

is it implemented by using GitHub web-hooks?

j / k navigate · click thread line to collapse