Yeah I see. If you're essentially doing a multi-round elimination test based on your custom keywords, you might be able to reduce your noise a bit by preprocessing the 'phrase' stream with something like tf-idf (it looks like your working with the raw data right now?). Then you get a list of the keywords in the document, an estimate of how important they are to that document, and you know how important certain keywords are to you. With that info you can try to classify s.t. you admit jobs with a high intersection if keywords that are important to both of you.
I think something along the lines of a knn classifier be pretty efficient at doing that. Anyhow, just a suggestion, I'll leave you alone now ;-)