It's intuitively obvious that a result that is unlikely under the null hypothesis constitutes some evidence in favor of the alternative hypothesis, but the precise nature of that relationship depends on information that is not usually available, such as prior estimates of the likelihood that each model is true. If such information is available, you can use Bayesian statistics to answer the question that you really want to ask (e.g. "What is the probability that the alternative hypothesis is true given this data?"), instead of using p-values to answer the only question you are capable of answering, even though that answer isn't a particularly useful one.
For a concrete example, xkcd comes to the rescue: https://xkcd.com/882/
Consider that, when testing the 20 flavors, you expect to get at least one p-value of 0.05 by random chance, since 0.05 = 1 in 20. So in this specific case there's actually a very high probability (much higher than 5%, even higher than 50%) that the result is bullshit. But even when you're doing a single test, not 20 of them, a p-value of 0.05 can still mean much higher than 5% of bullshit. Or it could be much lower.
Lastly, note that "confidence intervals" are just a statement of the thresholds for p-values. For example, the 95% confidence interval includes your null hypothesis if and only if your p-value is greater than 0.05. So everything I said above about p-values applies equally well to confidence intervals. In particular, "95% confidence interval" does NOT mean "95% confidence that the value is within this interval".
If you want to ask me some more questions, email me at rct at thompsonclan dot org.