Beautiful Probability (opens in new tab)

(readthesequences.com)

71 pointscodeAligned2y ago64 comments

64 comments

As other commenters have pointed out any given introductory chapter in a book on Bayesian statistics, including Jaynes’, is better exposition than this. I found _Probability Theory: The Logic of Science_ very easy to follow and very well-written.

I had a similar experience when I finally found a copy of Barbour’s _The End of Time_ and discovered, much to my chagrin, that it wasn’t nearly as mystical or complicated as EY makes it seem in the Timeless Physics “sequence”. Barbour’s account was much more readable and much easier to understand.

Yudkowsky just isn’t that great of a popular science writer. It’s not his specialty, so this shouldn’t be surprising.

topologie2y ago

100% with you on Jaynes and Barbour.

Jaynes' book is a game changer, but I particularly love that you mentioned Barbour and his work.

On Barbour's work: Apart from being an incredibly interesting book, I was amazed that he was a sort of "outsider" writing papers and books "on his own" (or at least outside of Academia) while making money through technical translations is just a really clever way to be able to explore any interesting avenues one might find. Einstein had the right idea too...

(Sadly, it's also something that wouldn't be as feasible nowadays, but who knows...)

xelxebar2y ago

Jaynes is great, but The Logic of Science is a bit rough around the edges, with lots of errata. Jaynes died when the book was really just a very rough draft plus notes. Bretthorst had to go in and turn it into something publishable, not an enviable task by any means.

Here's a list of errata and commentary, collected by a fan: https://ksvanhorn.com/bayes/jaynes/index.html.

topologie2y ago

Thank you for this!

I had spotted some errors here and there, but it's always good to have them in one place.

I think we are all in the same wagon when I say that even with those rough edges Jaynes' book is kind of a transformative experience for everyone who has already been "conditioned" to other Probability texts.

For example, for me Feller is a great intro to "start working with Probability," but Jaynes is where one starts actually "thinking in Probability."

The whole Maximum Entropy thing was mind blowing for me.

lalaithion2y ago

Here's a link: http://www.med.mcgill.ca/epidemiology/hanley/bios601/Gaussia...

And if you want to read what he has to say on the optional stopping problem, you can scroll down to page 196 (166 in page numbers) to the heading "6.9.1 Digression on optional stopping"

I don't personally think Jaynes is much easier to read than Yudkowsky, but he's definitely more rigorous.

AbrahamParangi2y ago

I'm confused in that I don't see how this is troubling. Yes, the two experimenters rolled dice and got the same result, but it's as if one of them was rolling a 6 sided die and the other a 20 sided one. Each experiment is not a result per se but a sample from a distribution.

How you infer the shape of that distribution based on the experiment is a function of the distribution of all courses your experiment could have taken. This set of paths is different in each case, which means the inference we make must also be different.

There is no inconsistency. The confusion seems to be in assuming that the experimental result was a true statement about the nature of the world rather than a true statement about simply what happened.

edit: This seems to me to be a specific case of a general class of difficult thinking where you ask yourself: "what are all the worlds that I might be in that are consistent with what I'm presently observing".

lalaithion2y ago

If you see two people roll a d20 and get a 20, you get to say "wow, that was unlikely" to both of them, even if one of them privately admits they were going to quickly re-roll their die if they got below a 10. What matters is their actual behavior (identical in the example) not their intentions. The d6 vs d20 version is different because their behavior is different.

AbrahamParangi2y ago

Let's imagine that we ran it as a simulation and we ran it a million times. The two people would have a different distribution of results. If you ignore the intention, you ignore reality as if that intention were not a part of it.

Do you not notice that your inference is less accurate using this line of reasoning? Does that not suggest that it's simply wrong?

lalaithion2y ago

What do you mean by 'results'?

They would not have different distributions of results on their first die roll.

They would have different distributions of results on their reported die roll.

If I am looking at their first die roll, the fact that they would have different reported die rolls doesn't matter!

lalaithion2y ago

Here’s another example:

Say you have a lazy researcher. They flip a coin, and if it comes up heads, they do the experiment. If it comes up tails, they just write down a random number.

If you _only get access_ to the final number, then you should discount what they wrote down – it’s 50% likely to be fake.

If you do 1,000,000 simulations of this, it’s useless 50% of the time.

But if you know the result of the coin flip, it doesn’t matter whether they would have generated a nonsense number in a different timeline, or that they’re not reliably accurate. _You know_ they’re reliably accurate in _this case_, so you can trust their data.

usgroup2y ago

This is well put. Coincidentally in the example the results are the same , but they need not be. given repeated experiments with the same intentions one may expect different distributions.

However, one could just move the argument up a level and manufacture a case of different intentions leading to the same distributions and then ask the same question.

2 more replies

ninthcat2y ago

Unlikely in what probability space? We only see one version of reality so the probabilities that we assign to any outcome are based on a prior choice of probability space. That is why the researchers' intent matters.

lalaithion2y ago

Both events have the same probability of happening; 1/20. The fact that the researcher intended to do something in a reality that didn't happen isn't relevabnt.

1 more reply

AbrahamParangi2y ago

Yes, indeed.

derbOac2y ago

I had the same reaction?

We don't actually care at all about what happened in the two experiments per se, we care about the information provided by the experiments about future or other events.

If somehow we learned that both experiments were totally unreplicable and a product purely of that time and location with no implications for anything else ever before or since we wouldn't care about them except maybe as a historical curiosity.

Intentionally is a red herring; what matters is our expectation about what might be observed if we were to repeat the experiments again.

In that sense, there's variability in the second experiment's results due to sample size being random. So we interpret and infer based on that potential experiment we could do, not what happened to be observed at a particular moment.

I'm also confused about what this has to do with Bayesian versus non-Bayesian inference as you could approach either experiment from either paradigm, and there are different forms of Bayesianism, including nonsubjective Bayesianism.

kgwgk2y ago

> We don't actually care at all about what happened in the two experiments per se, we care about the information provided by the experiments about future or other events.

How can the experiments provide relevant information other than through what happened?

If what happened is exactly the same (first patient with such and such characteristics had this outcome, etc.) what information can be provided by the things that didn’t happen in either?

How could it matter that the things that didn’t happen in one experiment are different from the things that didn’t happen in the other when we are interested in the information provided by what did happen?

We don't actually care at all about the distribution of things that could have happened per se, we care about the information provided by the experiments about future or other events.

pdonis2y ago

> Each experiment is not a result per se but a sample from a distribution.

But what distribution? What is this "distribution" that we are taking a sample from?

The frequentist says: because the two experimenters have different intentions, the experiments they ran are samples from different distributions.

But the Bayesian says: the experimenter's intentions can't affect things like how dice rolls come out or how well a given treatment works on a given patient. The actual "distribution" is the set of all factors that do affect how the dice rolls come out or how well the treatment works on each patient. And those factors are the same for both experimenters; their different intentions don't affect that. So both sets of data are samples from the same distribution, not different ones.

> How you infer the shape of that distribution based on the experiment is a function of the distribution of all courses your experiment could have taken.

If you're going to state it this way, then the Bayesian response is: "all courses your experiment could have taken" has nothing to do with the experimenter's intentions. The experimenters can't magically make the physical world and the biology of humans work differently depending on what stopping criterion they choose. And the physical world and the biology of humans is what determines "the courses your experiment could have taken".

In other words, when the frequentist makes up "distributions" based on the experimenter's stopping criterion, they are, whether they admit it (or even realize it) or not, making a claim about how the physical world and the biology of humans works that is obviously false.

AbrahamParangi2y ago

This seems to assume that intentions "don't count" in some way, as if they were nonphysical, whereas unless you presume a supernatural soul, they are as physical as any other part of the experiment.

pdonis2y ago

> This seems to assume that intentions "don't count" in some way, as if they were nonphysical

Not nonphysical: just not part of the physical degrees of freedom that can affect things like how die rolls come up or how well a given treatment works on a patient.

The experimenter's intentions (not about the stopping criterion, but about other things) can of course be upstream physical causes, so to speak, of things like what the actual process of the treatment is, and that can, of course affect how well the treatment works. But in the scenario under discussion, all those things are stipulated to be the same in both experiments. And once that is specified, whatever physical variation corresponds to the variation in the experimenters' intentions cannot affect the results.

1 more reply

kgwgk2y ago

The question is whether we should draw different conclusions from one set of observations depending not just on what we are observing but also on different ways to define "what are all the worlds that I might be in that are consistent with what I'm presently observing".

usgroup2y ago

So you know when you believe something and then you update your belief because you get some evidence?

Yeah, and then you stack some beliefs on top of that.

And then you discover the evidence wasn’t actually true. Remind me again what the normative Bayesian update looks like in that instance.

Unfortunately it’s turtles all the way down.

lalaithion2y ago

    P(B|I saw E, P) = P(I saw E|B,P) * P(B|P) / P(I saw E|P)

    P(B|E was false, I saw E, P) = P(E was false|B,I saw E,P) * P(B|P,I saw E) / P(E was false|P, I saw E)

This is a pretty basic application of Bayes' theorem.

usgroup2y ago

Love it: p(I saw E) and p(I didn’t really see E).

Just move the argument one level down: “I saw E is false” and it turns out so is “E is false” . So then? Add “E was false was false”?

Turtles all the way down.

At some point something has to be “true” in order to conditionalise on it.

drdeca2y ago

I believe you can condition on a probability of proposition.

For example, if you are in a fairly dark room and you observe with 90% confidence a red object. Then you can do (iirc) P(X | 90% confidence see red object) = 90% * P(X | see red object) + 10% * P(X | do not see red object)

I would think that in principle, this allows for allowing all observations to be fallible, without any kind of “infinite regress” problem? You just apply the same kind of process each time.

psychoslave2y ago

Yes sure, here are a few truths that never disapointed me:

There is an absolute universal truth.

Absolute universal truth, as a whole, is unreachable even to the most intelligent and resourceful human that will ever exist.

jawarner2y ago

Real world systems are complicated. In theory, you could do belief propagation to update your beliefs through the whole network, if your brain worked something like a Bayesian network.

biomcgary2y ago

Natural selection didn't wire our brains to work like a Bayesian network. If it had, wouldn't it be easier to make converts to the Church of Reverend Bayes? /s

Alternatively, brains ARE Bayesian networks with hard coded priors that cannot be changed without CRISPR.

nerdponx2y ago

> you discover the evidence wasn’t actually true

Not really going to vouch for the normative Bayesian approach, but you might just consider this new (strong) evidence for applying an update.

crdrost2y ago

The precise claim (I believe) is that the prior update which you had, made some assumptions about the correct way to phrase your perceptions.

That is, you say, for the update, "the probability that this trial came out with X successes given everything else that I take for granted, and also that the hypothesis is true" vs. "the probability that this trial came out with X successes given everything else that I take for granted, and also that the hypothesis is false." So you actually say in both cases the fragment, "this trial came out with X successes."

What happens if it didn't really? Well, the proper Bayesian approach is to state that you phrased this fragment wrong. You actually needed to qualify "the probability that I saw this trial come out with X successes given ...", and those probabilities might have been different than the trial actually coming out with X successes.

OK but what happens if that didn't really, either. Well, the proper Bayesian approach is to state that you phrased the fragment doubly wrong. You actually needed to qualify it as "the probability that I thought I saw this trial come out with X successes given...". So now you are properly guarded, like a good Bayesian, against the possibility that maybe you sneezed while you were reading the experiment results and even though you saw 51, it got scrambled in your head and you thought you saw 15.

OK but what happens if that didn't really, either either. You thought that you thought that you saw something, but actually you didn't think you saw anything, because you were in The Matrix or had dementia or any number of other things that mess with our perceptions of ourselves. So you, good Bayesian that you wish to be, needed to qualify this thing extra!

The idea is that Bayesianism is one of those "if all you have is a hammer you see everything as a nail" type of things. It's not that you can't see a screw as a really inefficient nail, that is totally one valid perspective on screwness. It's also not that the hammer doesn't have any valid uses. It does, it's very useful, but when you start trying to chase all of human rationality with it, you start to run into some really weird issues.

For instance, the proper Bayesian view of intuitions is that they are a form of evidence (because what else would they be), and that they are extremely reliable when they point to lawlike metaphysical statements (otherwise we have trouble with "1 + 1 = 2" and "reality is not self-contradictory" and other metaphysical laws that we take for granted) but correspondingly unreliable when, say, we intuit things other than metaphysical laws, such as the existence of a monster in the closet or a murderer hiding under the bed or that the only explanation for our missing (actually misplaced) laptop is that someone must have stolen it in the middle of the night." You need to do this to build up the "ground truth" that allows you to get to the vanilla epistemology stuff that you then take for granted like "okay we can run experiments to try to figure out stuff about the world, and those experiments say that the monster in the closet isn't actually there."

cyanydeez2y ago

TThis just sounds like logical tetris

4bpp2y ago

I think there is a simple solution to the thought experiment in the beginning, ignoring the paragraphs upon paragraphs of EY liking the sound of his own voice: The information content of each experiment consists of more than just the stated number of patients tested and success rate. In particular, each experiment report I notice is strong evidence that someone actually used humanity's limited resources to perform that experiment, and slightly less strong evidence that they actually followed the stated procedure. Therefore, the completion of the "stop when I have a high enough success rate" experiment should cause me to update in favour of people with the means to actually running such an experiment, and hence make it more likely that at this very moment there are other research groups out there that are like 1000 patients in and have not yet gotten their 60% success rate.

lalaithion2y ago

From _Probability Theory: The Logic of Science_:

> Then the possibility seems open that, for different priors, different functions r(x1,..., xn) of the data may take on the role of sufficient statistics. This means that use of a particular prior may make certain particular aspects of the data irrelevant. Then a different prior may make different aspects of the data irrelevant. One who is not prepared for this may think that a contradiction or paradox has been found.

I think this explains one of the confusions many commenters have; for an experimenter who repeats observations until they reach their desired ratio r/(n-r), the ratio r/(n-r) is not a sufficient statistic! But when we have an experimenter who has a pre-registered n, then ratio r/(n-r) is a sufficient statistic. However, in either case,

> We did not include n in the conditioning statements in p(D|θ I) because, in the problem as defined, it is from the data D that we learn both n and r. But nothing prevents us from considering a different problem in which we decide in advance how many trials we shall make; then it is proper to add n to the prior information and write the sampling probability as p(D|nθ I). Or, we might decide in advance to continue the Bernoulli trials until we have achieved a certain number r of successes, or a certain log-odds u = log[r/(n − r)]; then it would be proper to write the sampling probability as p(D|rθ I) or p(D|uθ I), and so on. Does this matter for our conclusions about θ?

> In deductive logic (Boolean algebra) it is a triviality that AA = A; if you say: ‘A is true’ twice, this is logically no different from saying it once. This property is retained in probability theory as logic, since it was one of our basic desiderata that, in the context of a given problem, propositions with the same truth value are always assigned the same probability. In practice this means that there is no need to ensure that the different pieces of information given to the robot are independent; our formalism has automatically the property that redundant information is not counted twice.

roenxi2y ago

That seems a bit long winded since this situation is a direct result of Bayes' theorem. It seems to me equivalent to say:

Bayes' Theorem holds because it can be proven. Therefore, situations can be constructed where considering identical data without considering priors gives nonsense conclusions. For example if we happen to know as a prior that P(outcome of experiment is a certain ratio) = P(experiment is completed) then that must be considered when interpreting the results.

psychoslave2y ago

>Think laws, not tools

But laws are tools, and the esthetical intellectual elegance is an epiphenomenal bonus or a mean to keep human psychism motivated to keep its focus away from all the other attention sinks that life throw at it.

And that apply for both law in judiciary and sciences parlances.

d0mine2y ago

Bayesian approach sounds like a religion (one true way).

There is nothing unusual about different mathematical methods/models producing different results e.g., the number of roots even for the same quadratic equation may depend on "private" thoughts such as whether complex roots are of interest (sometimes they do/sometimes they don't). All models are wrong some are useful.

lalaithion2y ago

> the number of roots even for the same quadratic equation may depend on "private" thoughts such as whether complex roots are of interest

You are confusing ambiguity in a problem statement due to human language being imprecise with two well-specified identical experimental results having different results due to the intentions of the human carrying them out.

Is arithmetic a religion because there's "one true way" of adding integers?

d0mine2y ago

It is not about human language being imprecise. I can formulate the questions using the math language precisely with the exact same result (different number of roots are possible for different formulations of the problem for the same "physical" (coefficients of the quadratic equation) setup).

The Map is not the Territory.

Different maps can be useful. No true map.

kevindamm2y ago

I can think of at least two ways to add integers.. the categorical way that applies a mapping from the set into itself, and the set-theoretic way that deals with unwrapping and rewrapping successor relations. The latter is sometimes resorted to in heavily-relational contexts like Datalog.

lalaithion2y ago

Yes, this is addressed in the original article... there are multiple "lawful" ways of adding integers which all give the same results, and likewise in probability all "lawful" ways of analyzing data should give the same results. If you have two different ways of adding numbers which give different results, one is not lawful.

pdonis2y ago

> Bayesian approach sounds like a religion (one true way).

Only about the things that can be mathematically proven. Which is just like any other branch of math.

It is true that some Bayesians (and EY can be argued to be among them) like to talk as though Bayesian computation is a drop-in replacement for your brain. Of course it isn't, and Bayesianism, like any mathematical approach, should be taken with a good-sized dose of humility. As Bertrand Russell said, to the extent that mathematical propositions refer to reality, they are not certain, and to the extent that they are certain, they do not refer to reality.

pdonis2y ago

> the number of roots even for the same quadratic equation may depend on "private" thoughts such as whether complex roots are of interest

No. The number of roots that you care about might depend on your private thoughts; but the number of roots itself does not. It's a mathematical fact. It just might not be a mathematical fact that you actually care about. But what you care about is not part of math.

lupire2y ago

Why are you ignoring the quaternion roots? 3x3 matrix roots?

pdonis2y ago

Normally "quadratic equation" means "over the complex numbers" (and the mention of "complex roots" in the post I responded to bears out that interpretation).

But yes, different mathematical models can give different answers for things like "number of roots of an equation". But that doesn't mean math depends on "private thoughts". It just means you need to specify which mathematical model you are talking about.

biomcgary2y ago

One of my priors: "a group of people who look like a faith-based community, but claim not to be one, should not be trusted".

usgroup2y ago

Yeah I’d agree at some depth. We don’t talk enough about integers, rationals and real numbers and what they imply for our “normative rationality” or “epistemological commitment”. But aside from the integers, everything else is totally suspicious.

randomsolutions2y ago

I use Bayesian methods often, but this a just religious. Bayesian methods are just that, tools, methods for approaching a problem.

There are no laws for applying probability to the real world. To think so puts too much faith in your models. Remember, all models are wrong. Applying probability to the real world requires a host of assumptions, regardless of the methods you use.

Frequentist and Bayesian methods have different goals, both have there place.

For a counterweight to the strong likelihood principle find discussions of Larry Wasserman: https://youtu.be/Z-YvWyM6dRQ?si=qwzRiaPbj9ruiUEv

And for a balanced discussion for why both are great see Michael Jordan: https://youtu.be/HUAE26lNDuE?si=cwg6wpRS1gXL6r1Y

jawarner2y ago

Isn't that Edwin T. Jaynes example just p-hacking? If only 1 out of 100 experiments produces a statistically significant result, and you only report the one, I would intuitively consider that evidence to be worth less. Can someone more versed in Bayesian statistics better explain the example?

skulk2y ago

I find the original discussion to be far more interesting than whatever I just read in TFA: https://books.google.com.mx/books?id=sLz0CAAAQBAJ&pg=PA13&lp...

abeppu2y ago

> One who thinks that the important question is: "Which quantities are random?" is then in this situation. For the first researcher, n was a fixed constant, r was a random variable with a certain sampling distribution. For the second researcher, r/n was a fixed constant (approximately), and n was the random variable, with a very different sampling distribution. Orthodox practice will then analyze the two experiments in different ways, and will in general draw different conclusions about the efficacy of the treatment from them.

But so then the data _are_ different between the two experiments, because they were observing different random variables -- so why is it concerning if they arrive at different conclusions? In fact, the _fact that the 2nd experiment finished_ is also an observation on its own (e.g. if the treatment was in fact a dangerous poison, perhaps it would have been infeasible for the 2nd researcher to reach their stopping criteria).

usgroup2y ago

Yeah generally Jaynes book is very nice and easy to read for this sort of material.

Terr_2y ago

I think the point is that the different planned stopping rules of each researcher--their subjective thoughts--should not affect what we consider the objective or mathematical significance of their otherwise-identical process and results. (Not unless humans have psychic powers.)

It's illogical to deride one of those two result-sets as telling us less about the objective universe just because the researcher had a different private intent (e.g. "p-hacking") for stopping at n=100.

_________________

> According to old-fashioned statistical procedure [...] It’s quite possible that the first experiment will be “statistically significant,” the second not. [...]

> But the likelihood of a given state of Nature producing the data we have seen, has nothing to do with the researcher’s private intentions. So whatever our hypotheses about Nature, the likelihood ratio is the same, and the evidential impact is the same, and the posterior belief should be the same, between the two experiments. At least one of the two Old Style methods must discard relevant information—or simply do the wrong calculation—for the two methods to arrive at different answers.

lalaithion2y ago

If you have two researchers, and one is "trying" to p-hack by repeating an experiment with different parameters, and one is trying to avoid p-hacking by preregistering their parameters, you might expect the paper published by the latter one to be more reliable.

However, if you know that the first researcher just happened to get a positive result on their first try (and therefore didn't actually have to modify parameters), Bayesian math says that their intentions didn't matter, only their result. If, however, they did 100 experiments and chose the best one, then their intentions... still don't matter! but their behavior does matter, and so we can discount their paper.

Now, if you _only_ know their intentions but not their final behavior (because they didn't say how many experiments they did before publishing), then their intentions matter because we can predict their behavior based on their intentions. But once you know their behavior (how many experiments they attempted), you no longer care about their intentions; the data speaks for itself.

usgroup2y ago

Well no because it’s talking about either a fixed sample size or stopping when a % total is reached. Neither imply a favourable p-value necessarily.

I think the author means to say that it’s two methods incidentally equivalent in the data they collect that may draw different conclusions based on their initial assumptions. Question is how do you make coherent sense of it.

At level 1 depth it’s insightful.

At level 2 depth it’s a straw man.

At level 3 depth, just keep drinking until you’re back at level 1 depth.

tech_ken2y ago

> The other ... decided he would not stop until he had data indicating a rate of cures definitely greater than 60%

I believe that "definitely greater than 60%" is supposed to imply that the researcher is stopping when the p-value of their HA (theta>=60%) is below alpha, so an optional stopping (ie. "p-hacking") situation.

mturmon2y ago

This essay is so weird to read. The author is extremely passionate, yet also claiming to be simply rational. He’s throwing terminology around (Dutch book, ZF) but seems unaware of the limits of the approach he advocates.

There are so many cracks in the Bayesian edifice promoted in TFA!

These problems are well-known in the Theories of Probability community [1] (which is only a subset of the larger set of theorists recognizing the limits of mechanical Bayesian reasoning in decision problems).

Here are a couple.

(1)

Bayesian approaches force you to assign a sharp probability to every event. How do we map any event to a sharp probability? E.g., I need to give a number for the probability of rain tomorrow, a non-repeating event. How do I map that to a number? Not through relative frequencies- it’s non-repeating. If two people give different numbers, how do we decide who is right?

This problem is what Peter Walley has called the “Bayesian dogma of precision.” [2]

(2)

As noted above in an aside, we have a hard time computing probabilities. This is a practical problem that we all are aware of, but often discount.

In what we could call CMP (Conventional Mathematical Probability - Kolmogorov’s axioms) we typically can’t even correctly enumerate the sample space. We’re always forgetting something, so our models are too confident. (In the “Dutch book” analogy alluded to in TFA, we are following the axioms but are somehow always losing money, in a very real sense.)

Related to this problem of computing probabilities, we don’t have a rigorous way to determine when two real-world events are independent. Yet we constantly invoke independence to construct models. Kolmororov’s 1933 manuscript was clear on this problem. [3]

Not satisfied with this, we go on to hypothesize conditional independence relationships in order to feed our complex “rational” Bayesian machine. It’s thirsty for numbers, and we just make them up!

This all sounds somewhat hypothetical. It’s not. In my day job, I compute supposed Bayesian credible intervals for various physical variables.

The people downstream who use those variables to assimilate into physical models typically multiply our credible intervals by 2. My friend across lab has it even worse, they multiply his Bayesian intervals by 3.

This is not a well-functioning machine.

[1] E.g., https://isipta23.sipta.org/, or https://plato.stanford.edu/entries/imprecise-probabilities/#...

[2] https://issuu.com/impreciseprobabilities/docs/imprecise_prob..., first paragraph, although the whole short article is on-point

[3] from memory, the quote is something like, “determining the conditions under which events may be judged independent is one of the major outstanding problems in theory of probability“

bdjsiqoocwk2y ago

Meaningless drivel.

j / k navigate · click thread line to collapse

64 comments

birdofhermes2y ago

Yudkowsky just isn’t that great of a popular science writer. It’s not his specialty, so this shouldn’t be surprising.

topologie2y ago

100% with you on Jaynes and Barbour.

Jaynes' book is a game changer, but I particularly love that you mentioned Barbour and his work.

(Sadly, it's also something that wouldn't be as feasible nowadays, but who knows...)

xelxebar2y ago

Here's a list of errata and commentary, collected by a fan: https://ksvanhorn.com/bayes/jaynes/index.html.

topologie2y ago

Thank you for this!

I had spotted some errors here and there, but it's always good to have them in one place.

For example, for me Feller is a great intro to "start working with Probability," but Jaynes is where one starts actually "thinking in Probability."

The whole Maximum Entropy thing was mind blowing for me.

lalaithion2y ago

Here's a link: http://www.med.mcgill.ca/epidemiology/hanley/bios601/Gaussia...

And if you want to read what he has to say on the optional stopping problem, you can scroll down to page 196 (166 in page numbers) to the heading "6.9.1 Digression on optional stopping"

I don't personally think Jaynes is much easier to read than Yudkowsky, but he's definitely more rigorous.

AbrahamParangi2y ago

lalaithion2y ago

AbrahamParangi2y ago

Do you not notice that your inference is less accurate using this line of reasoning? Does that not suggest that it's simply wrong?

lalaithion2y ago

What do you mean by 'results'?

They would not have different distributions of results on their first die roll.

They would have different distributions of results on their reported die roll.

If I am looking at their first die roll, the fact that they would have different reported die rolls doesn't matter!

lalaithion2y ago

Here’s another example:

Say you have a lazy researcher. They flip a coin, and if it comes up heads, they do the experiment. If it comes up tails, they just write down a random number.

If you _only get access_ to the final number, then you should discount what they wrote down – it’s 50% likely to be fake.

If you do 1,000,000 simulations of this, it’s useless 50% of the time.

usgroup2y ago

This is well put. Coincidentally in the example the results are the same , but they need not be. given repeated experiments with the same intentions one may expect different distributions.

However, one could just move the argument up a level and manufacture a case of different intentions leading to the same distributions and then ask the same question.

2 more replies

ninthcat2y ago

lalaithion2y ago

Both events have the same probability of happening; 1/20. The fact that the researcher intended to do something in a reality that didn't happen isn't relevabnt.

1 more reply

AbrahamParangi2y ago

Yes, indeed.

derbOac2y ago

I had the same reaction?

We don't actually care at all about what happened in the two experiments per se, we care about the information provided by the experiments about future or other events.

Intentionally is a red herring; what matters is our expectation about what might be observed if we were to repeat the experiments again.

kgwgk2y ago

> We don't actually care at all about what happened in the two experiments per se, we care about the information provided by the experiments about future or other events.

How can the experiments provide relevant information other than through what happened?

If what happened is exactly the same (first patient with such and such characteristics had this outcome, etc.) what information can be provided by the things that didn’t happen in either?

We don't actually care at all about the distribution of things that could have happened per se, we care about the information provided by the experiments about future or other events.

pdonis2y ago

> Each experiment is not a result per se but a sample from a distribution.

But what distribution? What is this "distribution" that we are taking a sample from?

The frequentist says: because the two experimenters have different intentions, the experiments they ran are samples from different distributions.

> How you infer the shape of that distribution based on the experiment is a function of the distribution of all courses your experiment could have taken.

AbrahamParangi2y ago

This seems to assume that intentions "don't count" in some way, as if they were nonphysical, whereas unless you presume a supernatural soul, they are as physical as any other part of the experiment.

pdonis2y ago

> This seems to assume that intentions "don't count" in some way, as if they were nonphysical

Not nonphysical: just not part of the physical degrees of freedom that can affect things like how die rolls come up or how well a given treatment works on a patient.

1 more reply

kgwgk2y ago

usgroup2y ago

So you know when you believe something and then you update your belief because you get some evidence?

Yeah, and then you stack some beliefs on top of that.

And then you discover the evidence wasn’t actually true. Remind me again what the normative Bayesian update looks like in that instance.

Unfortunately it’s turtles all the way down.

lalaithion2y ago

    P(B|I saw E, P) = P(I saw E|B,P) * P(B|P) / P(I saw E|P)

    P(B|E was false, I saw E, P) = P(E was false|B,I saw E,P) * P(B|P,I saw E) / P(E was false|P, I saw E)

This is a pretty basic application of Bayes' theorem.

usgroup2y ago

Love it: p(I saw E) and p(I didn’t really see E).

Just move the argument one level down: “I saw E is false” and it turns out so is “E is false” . So then? Add “E was false was false”?

Turtles all the way down.

At some point something has to be “true” in order to conditionalise on it.

drdeca2y ago

I believe you can condition on a probability of proposition.

I would think that in principle, this allows for allowing all observations to be fallible, without any kind of “infinite regress” problem? You just apply the same kind of process each time.

psychoslave2y ago

Yes sure, here are a few truths that never disapointed me:

There is an absolute universal truth.

Absolute universal truth, as a whole, is unreachable even to the most intelligent and resourceful human that will ever exist.

jawarner2y ago

Real world systems are complicated. In theory, you could do belief propagation to update your beliefs through the whole network, if your brain worked something like a Bayesian network.

biomcgary2y ago

Natural selection didn't wire our brains to work like a Bayesian network. If it had, wouldn't it be easier to make converts to the Church of Reverend Bayes? /s

Alternatively, brains ARE Bayesian networks with hard coded priors that cannot be changed without CRISPR.

nerdponx2y ago

> you discover the evidence wasn’t actually true

Not really going to vouch for the normative Bayesian approach, but you might just consider this new (strong) evidence for applying an update.

crdrost2y ago

The precise claim (I believe) is that the prior update which you had, made some assumptions about the correct way to phrase your perceptions.

cyanydeez2y ago

TThis just sounds like logical tetris

4bpp2y ago

lalaithion2y ago

From _Probability Theory: The Logic of Science_:

roenxi2y ago

That seems a bit long winded since this situation is a direct result of Bayes' theorem. It seems to me equivalent to say:

psychoslave2y ago

>Think laws, not tools

And that apply for both law in judiciary and sciences parlances.

d0mine2y ago

Bayesian approach sounds like a religion (one true way).

lalaithion2y ago

> the number of roots even for the same quadratic equation may depend on "private" thoughts such as whether complex roots are of interest

Is arithmetic a religion because there's "one true way" of adding integers?

d0mine2y ago

The Map is not the Territory.

Different maps can be useful. No true map.

kevindamm2y ago

lalaithion2y ago

pdonis2y ago

> Bayesian approach sounds like a religion (one true way).

Only about the things that can be mathematically proven. Which is just like any other branch of math.

pdonis2y ago

> the number of roots even for the same quadratic equation may depend on "private" thoughts such as whether complex roots are of interest

lupire2y ago

Why are you ignoring the quaternion roots? 3x3 matrix roots?

pdonis2y ago

Normally "quadratic equation" means "over the complex numbers" (and the mention of "complex roots" in the post I responded to bears out that interpretation).

biomcgary2y ago

One of my priors: "a group of people who look like a faith-based community, but claim not to be one, should not be trusted".

usgroup2y ago

randomsolutions2y ago

I use Bayesian methods often, but this a just religious. Bayesian methods are just that, tools, methods for approaching a problem.

Frequentist and Bayesian methods have different goals, both have there place.

For a counterweight to the strong likelihood principle find discussions of Larry Wasserman: https://youtu.be/Z-YvWyM6dRQ?si=qwzRiaPbj9ruiUEv

And for a balanced discussion for why both are great see Michael Jordan: https://youtu.be/HUAE26lNDuE?si=cwg6wpRS1gXL6r1Y

jawarner2y ago

skulk2y ago

I find the original discussion to be far more interesting than whatever I just read in TFA: https://books.google.com.mx/books?id=sLz0CAAAQBAJ&pg=PA13&lp...

abeppu2y ago

usgroup2y ago

Yeah generally Jaynes book is very nice and easy to read for this sort of material.

Terr_2y ago

_________________

> According to old-fashioned statistical procedure [...] It’s quite possible that the first experiment will be “statistically significant,” the second not. [...]

lalaithion2y ago

usgroup2y ago

Well no because it’s talking about either a fixed sample size or stopping when a % total is reached. Neither imply a favourable p-value necessarily.

At level 1 depth it’s insightful.

At level 2 depth it’s a straw man.

At level 3 depth, just keep drinking until you’re back at level 1 depth.

tech_ken2y ago

> The other ... decided he would not stop until he had data indicating a rate of cures definitely greater than 60%

mturmon2y ago

There are so many cracks in the Bayesian edifice promoted in TFA!

Here are a couple.

(1)

This problem is what Peter Walley has called the “Bayesian dogma of precision.” [2]

(2)

As noted above in an aside, we have a hard time computing probabilities. This is a practical problem that we all are aware of, but often discount.

This all sounds somewhat hypothetical. It’s not. In my day job, I compute supposed Bayesian credible intervals for various physical variables.

This is not a well-functioning machine.

[1] E.g., https://isipta23.sipta.org/, or https://plato.stanford.edu/entries/imprecise-probabilities/#...

[2] https://issuu.com/impreciseprobabilities/docs/imprecise_prob..., first paragraph, although the whole short article is on-point

[3] from memory, the quote is something like, “determining the conditions under which events may be judged independent is one of the major outstanding problems in theory of probability“

bdjsiqoocwk2y ago

Meaningless drivel.

j / k navigate · click thread line to collapse