I recommend watching this [0] ted talk on the subject. In it, you'll see that Google, last year, already had solved bicyclist hand signs, lane markings, unforeseen obstacles, etc. And they did it __without__ human annotators. Just some straight up convolution neural networks.
[0]: https://www.ted.com/talks/chris_urmson_how_a_driverless_car_...