And yet imbalanced datasets are used all over the place, e.g. to identify "criminals" in China (https://www.newscientist.com/article/2114900-concerns-as-fac...) and the US (https://www.engadget.com/2019/08/14/aclu-facial-recognition-...)
I'm off the street. I still look and dress the same, in part because I currently do freelance work from home. I don't have to meet a dress code.
While homeless and in downtown San Diego, I fairly often gave away food I had been given but couldn't eat, either because of dietary restrictions or time limits (in that a large amount of stuff that should be refrigerated would spoil before I could eat it). I tried to offer it to other homeless people mostly.
One woman who panhandled regularly was reluctant to accept too much food from me, explaining "I'm not homeless." She panhandled because she was a retiree in high-priced downtown San Diego living on a fixed income. I told her to take it home, stick it in the fridge and eat some tomorrow. I assured her it was fine, I didn't have a fridge.
Another woman got mad at me for offering and told me to feed it to my dog. She was sitting on a curb in a neighborhood near a lot of homeless services where sitting on the curb outside was often a sign of homelessness.
She was also black and I'm white. She likely lived in the apartment building she was in front of and probably thought I was being a racist bitch. She was insulted at my sincere offer of charity and attempt to give away most of the fresh fruit I had been given so it wouldn't go to waste.
There are a lot of stereotypes about what homeless people look like. The reality is that there are a lot of homeless people with jobs and/or attending college and/or living in their car who successfully manage to pass for "normal" much of the time.
I have no idea what criteria was used to target homeless people by Google, but I'm skeptical that the dataset:
A. Is representative of homeless people generally.
B. Was chosen based on people looking homeless, rather than people behaving homeless.
C. Actually is a 100% correlation that people believed to be homeless were actually homeless.
The examples you give are blatant misuses of data sets. How you source the data has little bearing on the dumb ideas people come up with for how to use it.