Traffic Jam, a program that helps track prostitution rings by using public data (opens in new tab)

(broadly.vice.com)

54 pointseegilbert10y ago24 comments

24 comments

Clever. One of the more interesting challenges that I've run into in the last few years is just the sheer amount of raw data out there. It's mind-boggling how many problems can be solved if we could sift through that data quickly, from human trafficking down to weather. I'm particularly fascinated by her intuition that writing patterns and templates can identify pimps. I'm not sure how long it would have taken me to come to that conclusion.. but now that it's out there, it's obvious.

I wonder what other problems we can solve with the same toolset.

jsprogrammer10y ago

People have been trying to do textual analysis to divine all kinds of things about the people who wrote the text. You'll even find papers in psychology or psychiatry journals claiming to be able to distinguish mental illnesses, based purely on a textual analysis. Also, all manner of religions are essentially founded on textual analysis.

It is correct to call such analyses, tools. They cannot give you answers that you can rely on. At best, they may give you hints of other places to look. However, one problem such analysis can run into is that when the signal (the patterns and templates that the analysis was looking for) disappears, the tool becomes rather worthless (in which case, you may want to consider how useful/correct-for-the-job the tool really was).

In the case described in the article, it seemed like an appropriate use of textual analysis.

nemo44x10y ago

The USGS learns about earthquakes in regions without censors via Twitter. [1]

I'm of the opinion we have only scratched the surface of what is possible to predict by analyzing realtime data from social networks, user groups and message board communities.

[1] https://blog.twitter.com/2015/usgs-twitter-data-earthquake-d...

J_Darnley10y ago

You mean sensors not censors. Sensors sense, censors censor.

1 more reply

learning_still10y ago

"When I asked her how detectives differentiate Traffic Jam's data between trafficked victims and sex workers, she said that they rely on their intuition and knowledge of the community they protect."

It sounds like she's busting low class pimps, and hoping that a few of them are human traffickers.

dean10y ago

This is interesting. The article doesn't talk about she implemented the Traffic Jam program, but it does discuss how she came to 'know' sex ads, as way to keep tabs on pimps.

""I would literally just spend hours on these websites, looking at ads, getting a sense for what was the norm," she said. She began to pick up the nuances of every post, understand how a template was made, and get a feel for the different voices behind these ads."

I don't know how this information fits with her implementation, but I was reminded of an old article by Paul Graham "A Plan For Spam" (http://www.paulgraham.com/spam.html), where he talks about automating the process of detecting spam using Bayesian Filtering.

"I think it's possible to stop spam, and that content-based filters are the way to do it. The Achilles heel of the spammers is their message. They can circumvent any other barrier you set up. They have so far, at least. But they have to deliver their message, whatever it is. If we can write software that recognizes their messages, there is no way they can get around that."

Substitute the spam message for the sex message, and we're talking about the same thing. It would be an interesting exercise to try Bayesian Filtering on sex ads, or any other kind of message, to see where it leads.

jcromartie10y ago

I could imagine a naive Bayes classifier would do the trick when it comes to figuring out which ads were written by the same person.

llamataboot10y ago

But what does the software /do/ ?

asdf_asdf_asdf10y ago

If they told you, maybe you'd try to defeat it or create your own version in furtherance of patronage. Instead just be aware:

There's software. Out there. Doing something. (and it's always watching)

ChuckMcM10y ago

Fascinating article, trying to automate what the CIA would call an Analyst. Back when I was building my old computer collection I would read hundreds of ebay listings to find the "good stuff" and started to recognize sellers that listed under a variety of user names, or buyers who were also sellers. Just by the way they talked about the hardware, and did they call it by its "common" name or the product catalog name, etc. Never thought about making a resarch project out of it though.

Buetol10y ago

The product page: http://www.marinusanalytics.com/trafficjam/

guelo10y ago

I'm really uneasy with police analyzing our social media data, it's heading into thought-crime territory. But there's tons of money to be made off of it.

kirkbackus10y ago

Why are you saying this? There is literally no mention of social media in this article. The data (at least from the original project) is mined from publicly available data, which could be social media data, but that data has to be made public in the first place.

It is possible that the "Research Grade" version of the program does use that data, but there is no evidence of that here.

macrael10y ago

Some happy medium between this title and the original title could be found.

dang10y ago

We changed the article's baity title to its subtitle (shortened to fit 80 chars) in accordance with the HN guidelines. If you or anyone would like to suggest a better title, we can change it again.

https://news.ycombinator.com/newsguidelines.html

eegilbertOP10y ago

Much better.

searine10y ago

Interesting software, horribly written article.

dalacv10y ago

Word

dalacv10y ago

why the downvotes? I meant 'Word' like I agree: http://www.urbandictionary.com/define.php?term=Word

1 more reply

andrewclunn10y ago

Oh it's used to find them so you can CRAK DOWN on them. Yeah, that's totally what I expected, not an app or anything like that...

dang10y ago

Please don't do this here.

j / k navigate · click thread line to collapse

24 comments

Jemaclus10y ago

I wonder what other problems we can solve with the same toolset.

jsprogrammer10y ago

In the case described in the article, it seemed like an appropriate use of textual analysis.

nemo44x10y ago

The USGS learns about earthquakes in regions without censors via Twitter. [1]

I'm of the opinion we have only scratched the surface of what is possible to predict by analyzing realtime data from social networks, user groups and message board communities.

[1] https://blog.twitter.com/2015/usgs-twitter-data-earthquake-d...

J_Darnley10y ago

You mean sensors not censors. Sensors sense, censors censor.

1 more reply

learning_still10y ago

"When I asked her how detectives differentiate Traffic Jam's data between trafficked victims and sex workers, she said that they rely on their intuition and knowledge of the community they protect."

It sounds like she's busting low class pimps, and hoping that a few of them are human traffickers.

dean10y ago

This is interesting. The article doesn't talk about she implemented the Traffic Jam program, but it does discuss how she came to 'know' sex ads, as way to keep tabs on pimps.

jcromartie10y ago

I could imagine a naive Bayes classifier would do the trick when it comes to figuring out which ads were written by the same person.

llamataboot10y ago

But what does the software /do/ ?

asdf_asdf_asdf10y ago

If they told you, maybe you'd try to defeat it or create your own version in furtherance of patronage. Instead just be aware:

There's software. Out there. Doing something. (and it's always watching)

ChuckMcM10y ago

Buetol10y ago

The product page: http://www.marinusanalytics.com/trafficjam/

guelo10y ago

I'm really uneasy with police analyzing our social media data, it's heading into thought-crime territory. But there's tons of money to be made off of it.

kirkbackus10y ago

It is possible that the "Research Grade" version of the program does use that data, but there is no evidence of that here.

macrael10y ago

Some happy medium between this title and the original title could be found.

dang10y ago

We changed the article's baity title to its subtitle (shortened to fit 80 chars) in accordance with the HN guidelines. If you or anyone would like to suggest a better title, we can change it again.

https://news.ycombinator.com/newsguidelines.html

eegilbertOP10y ago

Much better.

searine10y ago

Interesting software, horribly written article.

dalacv10y ago

Word

dalacv10y ago

why the downvotes? I meant 'Word' like I agree: http://www.urbandictionary.com/define.php?term=Word

1 more reply

andrewclunn10y ago

Oh it's used to find them so you can CRAK DOWN on them. Yeah, that's totally what I expected, not an app or anything like that...

dang10y ago

Please don't do this here.

j / k navigate · click thread line to collapse