"... This was one dataset we used to get started but, we intend to incorporate data from Angel List and other sources as well. ..."
Sooner the better, TechCrunch data is iffy ~ http://www.flickr.com/photos/bootload/2913315731/ though useful for a start point. Did you do a select on the companies to check for multiple listings?