Our ground truth should reflect the "correct" output expected of the model in regards to it's training. So while in many cases "truth" and "correct" should algin, there are many many cases where "truth" is subjective, and so we must settle for "correct".
Case in point: we've trained a model to parse out addresses from a wide-array of forms. Here is an example address as it would appear on the form.
Address: J Smith 123 Example St
City: LA State: CA Zip: 85001
Our ground truth says it should be rendered as such:
Address Line 1: J Smith
Address Line 2: 123 Example St
City: LA
State: CA
ZipCode: 85001
However our model outputs it thusly:
Address Line 1: J Smith 123 Example St
Address Line 2:
City: LA
State: CA
ZipCode: 85001
That may be true, as there is only 1 address line and we have a field for "Address Line 1", but it is not correct. Sure, there may be a problem with our taxonomy, training data, or any other number of other things, but as far as ground truth goes it is not correct.
Are you trying to tell me that the COCO labelling of the cars is what you call correct?
If, as it seems in the article, they are using COCO to establish ground truth, i.e. what COCO says is correct, then whatever COCO comes up with is, by definition "correct". It is, in effect, the answer, the measuring stick, the scoring card. Now what you're hinting at is that, in this instance, that's a really bad way to establish ground truth. I agree. But that doesn't change what is and how we use ground truth.
Think of it another way:
- Your job is to pass a test.
- To pass a test you must answer a question correctly.
- The answer to that question has already been written down somewhere.
To pass the test does your answer need to be true, or does it need to match what is already written down?
When we do model evaluation the answer needs to match what is already written down.