It's your lucky day. There are two data sets you can purchase for a decent price:
- 30M news headlines and 500K web sources, 30gb of JSON data ($300)
- 15K news domains that are the most popular in US market ($100)
These were gathered by Andrew Montalenti, co-founder of Parse.ly. See more info here: http://pixelmonkey.org/pub/python-crawling-slides/