undefined | Better HN

0 points_xhok12y ago0 comments

It's pretty good at tossing out posts that are completely irrelevant, which is most of the crap on oDesk. I.e. I'm a web developer who writes custom code, so it tosses out anything about design or Wordpress. But "web developer who writes custom code" is as good as it gets. It scored "MVC" a 77% chance of being good even though I don't use MVC frameworks.

0 comments

kyzyl12y ago

Yeah I see. If you're essentially doing a multi-round elimination test based on your custom keywords, you might be able to reduce your noise a bit by preprocessing the 'phrase' stream with something like tf-idf (it looks like your working with the raw data right now?). Then you get a list of the keywords in the document, an estimate of how important they are to that document, and you know how important certain keywords are to you. With that info you can try to classify s.t. you admit jobs with a high intersection if keywords that are important to both of you.

I think something along the lines of a knn classifier be pretty efficient at doing that. Anyhow, just a suggestion, I'll leave you alone now ;-)

j / k navigate · click thread line to collapse

0 comments

kyzyl12y ago

I think something along the lines of a knn classifier be pretty efficient at doing that. Anyhow, just a suggestion, I'll leave you alone now ;-)

j / k navigate · click thread line to collapse