undefined | Better HN

0 pointsgeneralizations3y ago0 comments

It's not like A/B tests approach anything like the rigor that we expect from actual science, though.

0 comments

> actual science

I'm not sure what you mean by this. Many quality studies are A/B tests. A/B just refers to the two IV states you're testing, which you're then observing a DV - sales, engagement, errors, etc.

A/B tests can be double blinded (don't tell the error monitoring people which results are from a trial), and have high number of samples, far beyond even most pharmaceutical trials.

They can also be really crappy, changing too many variables at once, etc. But they are certainly "real science".

EDIT: an example, Drug vs placebo - is an A/B test.

zackees3y ago

Technically it's not science because it doesn't follow the scientific method. Instead it's the close cousin, empiricism.

https://www.merriam-webster.com/dictionary/empirical

thewataccount3y ago

A/B testing is literally just the scientific method, A/B are the two states of the IV. A drug vs placebo experiment, is an A/B test.

For example with changing the font size of a button:

Your null hypothesis is there is no difference in the number of clicks. Your alternative hypothesis is that there is an increase in number of clicks.

Your IV is the button font size. Your DV is the number of button clicks over a set period of time.

You randomly sample 50% of the population to State A (same button size) You put the other group into State B (increased button size)

You observe the number of clicks of the button.

You analyze this data, and can determine the statistical significance between your null and alternative hypothesis.

compiskey3y ago

Advertisers and marketers use the equations but measure contemporary trends.

Science is more “what’s true if humans didn’t exist.”

Marketing is more “what widget generates more revenue?”

typest3y ago

This isn’t true. Science is ultimately the scientific method — make a hypothesis, test a change, observe the results, repeat. It’s an algorithm for learning and broadly gaining information about reality. It can equally be applied to things having to do with humans and things not having to do with humans.

2 more replies

radicaldreamer3y ago

It wasn't "scientific" to begin with but what's happening right now is pretty clearly panicked throwing stuff at the wall based on Elon's intuition and day to day demands... while that may have worked for him in the past, I don't think it's going to work as well in this domain where you're working with a complex system dependent on millions of people's behaviors and incentives.

dragonwriter3y ago

> while that may have worked for him in the past

Did it? The nature of Tesla and his other current businesses buffer that a bit even if it has been his approach, and it seems to have gotten him thrown out as CEO at X.com twice; among the things going on Twitter seems to be Musk trying to relitigate his failure at X.com without other investors being in a position to kick him out, but he seems to be piling up existential threats without resolving them.

radicaldreamer3y ago

It does kind of seem like he's addicted to dancing on the edge of cliffs

dredmorbius3y ago

Others have addressed the ways in which A/B testing does hew quite closely to the standard of empirical observation under controlled circumstances, with which I largely agree.

Where A/B studies may go wrong in my view is a few other elements:

- A/B studies have difficulty in determining differences based on multiple interacting characteristics. In fairness, so does empirical science, and the principle of "holding all else constant" is a frequent assumption of scientific processes.

- A/B studies face an inherent self-selection / exclusion bias: the participants in this round of A/B testing are those who've not been driven off the project/product from past experiments and design changes. Given that many Web 2.0 companies eventually dance with pushing people right up to the border of tolerance, it's quite possible that A/B testing has a long-term effect of pushing those participants whose tolerance has been exceeded out of the study population entirely. I don't know how large a factor this is, though loud / rage quitters are certainly a prominent (if not necessarily large) cohort. Whether or not they're also influential, or perhaps more importantly when they become influential is another question. Again, this is a fairly common problem with any social experiment, including natural social experiments, see various forms of brain-drain and social flight.

- A/B testing tends to focus on short term changes and behaviours, which may mask longer-term outcomes. This has some overlap with the above, but also with subjects' general response to change. See the classic case of this in the Hawthorne Effect (<https://en.wikipedia.org/wiki/Hawthorne_effect>).

The upshot is that A/B testing can be valid and useful, but that experimental design, particularly in the case of social and psychological experiments, where subject feed back into the study and its methodology itself, is exceptionally thorny.

TeMPOraL3y ago

> it's quite possible that A/B testing has a long-term effect of pushing those participants whose tolerance has been exceeded out of the study population entirely.

I'd compare this to how evaporation rate increases with temperature, as more particles find themselves with enough energy to escape the liquid.

From my personal experience, even if I can tolerate a lot of UX abuse, each such "optimization" lowers my threshold of switching to a competitor. Software in general, and SaaS specifically, resists commoditization, but every now and then an actual alternative to a product/service I'm using shows up - and whether or not I switch (and when) is correlated with how much I resent the incumbent for their UX "improvements".

I'd add one bullet point to your list:

- Unlike regular scientific experimentation, A/B testing is a methodology primarily spread in business circles using regular hype channels. That is, the average practitioner is not qualified to execute it correctly, which is one of the reasons I see A/B testing more as tools to launder arbitrary decisions. Because consequences of doing it wrong are typically not immediately apparent or obvious, both companies and customers suffer (and a vast space for fraudsters is created).

I'm in a charitable mood, so I'm not passing judgement on people for not having PhD-level understanding of statistics - just pointing out that, to the degree much larger than in sciences (even soft ones, which suffer some of the same structural problems), there's little pressure to do such tests correctly (and there's lot of ways to make money or status by doing them without regards for correctness).

From what I hear, a common way of executing A/B test badly and getting bullshit results, is by terminating the test early when it shows the relevant metrics improving for the test group - vs. running it longer if no big improvements are observed (or the metrics start getting worse for the test group). This biases the experiment towards giving false positives. This problem was big enough that there was a debacle around Optimizely few years ago, whose UI was accused to promote this early termination of tests. The cynical take I'm still somewhat partial to is that it wasn't an accident (if not done on purpose, then possibly... a result of an A/B test!) - false positives make the (statistically naive) users feel they're getting more value from Optimizely than they actually are.

dredmorbius3y ago

Three excellent points, yes.

There's a reason that the technical term for "A/B testing in SaaS products" is gaslighting.

O hai werd uv yeer!

<https://news.ycombinator.com/item?id=33784262>

arrrg3y ago

That’s not at all an argument for dropping AB testing, it’s an argument for being more rigorous, especially since in principle the circumstances under which services with a large active user base can test are downright luxurious.

Sociologists will frequently not have as good access to such a large participant pool under near ideal experimental conditions with such good ways to observe behavior. And the stuff you have to keep in mind when running experiments is not terribly complex. A bit of statistics, a few things you absolutely have to get right, that’s it.

Obviously there are reasons why AB tests are often not run rigorously (statistical illiteracy and pressure to get things done quick as well as to produce tangible results as often as possible – all three of which might lead you to run underpowered experiments with too few participants and to stop testing early which will lead to too many false positives). However, stopping to do experiments (and instead just releasing new stuff and observing the reaction) isn’t really an improvement that leads to better outcomes compared to that.

Waterluvian3y ago

If "actual science" is a real thing, then absolutely nothing humanity has ever done in the name of scientific endeavour is "actual science," given every experiment exists on a continuum of trade-offs.

A case could be made that A/B testing is insufficiently rigorous given specific goals, resources, limitations, context, etc. But that case isn't being made here.

williamstein3y ago

Some pure mathematics is actually rigorous.

Waterluvian3y ago

Hah! You're fast. I was about to comment on that: "actual math" probably exists. "actual science" probably doesn't. I'm not sure any experiment can ever be free from error, uncertainty, trade-offs. (And now I'm super curious about this...)

Thanks though. I've narrowed my original comment to more accurately represent the scope I am referring to.

1 more reply

root_axis3y ago

Based on what reasoning? Data quality with respect to computer systems approaches perfection, it is far more reliable than any data you can hope to approximate in the real world. There is also plenty of "actual science" with terrible data quality, hence the replication crises spanning large sections of the scientific landscape.

cornel_io3y ago

If they're done right, they are exactly as rigorous as we'd expect from science, since they are literally the same as randomized control trials. You can do even better with A/B testing because you have much tighter control over inclusion criteria, treatment compliance, and outcome analysis.

squaredot3y ago

It's about marketing, the A/B testing they do, not science.

sithlord3y ago

what is not scientific about it? maybe its not good science, but it is science. really almost anything is science...

Science is just observation and experimentation.

Science doesn't dictate how you do the above. Now, someone would find it impossible to reproduce your findings, but - that would just suggest bad science

Sohcahtoa823y ago

Incorrect conclusions are frequently drawn from A/B tests. To some, this makes it unscientific, to others, it just means it's bad science. I think the argument is more semantic than objective.

For example, if your metric is "time spent interacting on the platform", then a testing of a rollout of a feature ends up with longer page load times, so users spend more time there because they're waiting for pages to load would increase that metric, and management decides it's a good idea.

ska3y ago

> Science is just observation and experimentation.

That's not enough. If you don't include some sense of both 'systematic' and 'rigorous' (and yes, these terms are slippery), you aren't doing science.

1 more reply

fullshark3y ago

You are likely putting "actual science" on a pedestal here

joshuamorton3y ago

Most scientists likely wish they could get the rigor of a company like Twitter's a/b testing suite.

HWR_143y ago

Why wouldn't Twitter (or similarly large companies) have open-sourced their a/b testing suite? It's not like the math there is proprietary.

I mean, I'm sure the parameters to the math are proprietary. But the basic math seems simple enough.

joshuamorton3y ago

If you're not using an off the shelf one, chances are it's tightly integrated to your application framework.

Trying to tease out the pieces that aren't coupled to Twitter's User class is probably more effort than it's worth

2 more replies

j / k navigate · click thread line to collapse

0 comments

thewataccount3y ago

> actual science

I'm not sure what you mean by this. Many quality studies are A/B tests. A/B just refers to the two IV states you're testing, which you're then observing a DV - sales, engagement, errors, etc.

A/B tests can be double blinded (don't tell the error monitoring people which results are from a trial), and have high number of samples, far beyond even most pharmaceutical trials.

They can also be really crappy, changing too many variables at once, etc. But they are certainly "real science".

EDIT: an example, Drug vs placebo - is an A/B test.

zackees3y ago

Technically it's not science because it doesn't follow the scientific method. Instead it's the close cousin, empiricism.

https://www.merriam-webster.com/dictionary/empirical

thewataccount3y ago

A/B testing is literally just the scientific method, A/B are the two states of the IV. A drug vs placebo experiment, is an A/B test.

For example with changing the font size of a button:

Your null hypothesis is there is no difference in the number of clicks. Your alternative hypothesis is that there is an increase in number of clicks.

Your IV is the button font size. Your DV is the number of button clicks over a set period of time.

You randomly sample 50% of the population to State A (same button size) You put the other group into State B (increased button size)

You observe the number of clicks of the button.

You analyze this data, and can determine the statistical significance between your null and alternative hypothesis.

compiskey3y ago

Advertisers and marketers use the equations but measure contemporary trends.

Science is more “what’s true if humans didn’t exist.”

Marketing is more “what widget generates more revenue?”

typest3y ago

2 more replies

radicaldreamer3y ago

dragonwriter3y ago

> while that may have worked for him in the past

radicaldreamer3y ago

It does kind of seem like he's addicted to dancing on the edge of cliffs

dredmorbius3y ago

Others have addressed the ways in which A/B testing does hew quite closely to the standard of empirical observation under controlled circumstances, with which I largely agree.

Where A/B studies may go wrong in my view is a few other elements:

TeMPOraL3y ago

> it's quite possible that A/B testing has a long-term effect of pushing those participants whose tolerance has been exceeded out of the study population entirely.

I'd compare this to how evaporation rate increases with temperature, as more particles find themselves with enough energy to escape the liquid.

I'd add one bullet point to your list:

dredmorbius3y ago

Three excellent points, yes.

There's a reason that the technical term for "A/B testing in SaaS products" is gaslighting.

O hai werd uv yeer!

<https://news.ycombinator.com/item?id=33784262>

arrrg3y ago

Waterluvian3y ago

A case could be made that A/B testing is insufficiently rigorous given specific goals, resources, limitations, context, etc. But that case isn't being made here.

williamstein3y ago

Some pure mathematics is actually rigorous.

Waterluvian3y ago

Thanks though. I've narrowed my original comment to more accurately represent the scope I am referring to.

1 more reply

root_axis3y ago

cornel_io3y ago

squaredot3y ago

It's about marketing, the A/B testing they do, not science.

sithlord3y ago

what is not scientific about it? maybe its not good science, but it is science. really almost anything is science...

Science is just observation and experimentation.

Science doesn't dictate how you do the above. Now, someone would find it impossible to reproduce your findings, but - that would just suggest bad science

Sohcahtoa823y ago

Incorrect conclusions are frequently drawn from A/B tests. To some, this makes it unscientific, to others, it just means it's bad science. I think the argument is more semantic than objective.

ska3y ago

> Science is just observation and experimentation.

That's not enough. If you don't include some sense of both 'systematic' and 'rigorous' (and yes, these terms are slippery), you aren't doing science.

1 more reply

fullshark3y ago

You are likely putting "actual science" on a pedestal here

joshuamorton3y ago

Most scientists likely wish they could get the rigor of a company like Twitter's a/b testing suite.

HWR_143y ago

Why wouldn't Twitter (or similarly large companies) have open-sourced their a/b testing suite? It's not like the math there is proprietary.

I mean, I'm sure the parameters to the math are proprietary. But the basic math seems simple enough.

joshuamorton3y ago

If you're not using an off the shelf one, chances are it's tightly integrated to your application framework.

Trying to tease out the pieces that aren't coupled to Twitter's User class is probably more effort than it's worth

2 more replies

j / k navigate · click thread line to collapse