undefined | Better HN

0 pointsthewataccount3y ago0 comments

> actual science

I'm not sure what you mean by this. Many quality studies are A/B tests. A/B just refers to the two IV states you're testing, which you're then observing a DV - sales, engagement, errors, etc.

A/B tests can be double blinded (don't tell the error monitoring people which results are from a trial), and have high number of samples, far beyond even most pharmaceutical trials.

They can also be really crappy, changing too many variables at once, etc. But they are certainly "real science".

EDIT: an example, Drug vs placebo - is an A/B test.

0 comments

zackees3y ago

Technically it's not science because it doesn't follow the scientific method. Instead it's the close cousin, empiricism.

https://www.merriam-webster.com/dictionary/empirical

thewataccountOP3y ago

A/B testing is literally just the scientific method, A/B are the two states of the IV. A drug vs placebo experiment, is an A/B test.

For example with changing the font size of a button:

Your null hypothesis is there is no difference in the number of clicks. Your alternative hypothesis is that there is an increase in number of clicks.

Your IV is the button font size. Your DV is the number of button clicks over a set period of time.

You randomly sample 50% of the population to State A (same button size) You put the other group into State B (increased button size)

You observe the number of clicks of the button.

You analyze this data, and can determine the statistical significance between your null and alternative hypothesis.

compiskey3y ago

Advertisers and marketers use the equations but measure contemporary trends.

Science is more “what’s true if humans didn’t exist.”

Marketing is more “what widget generates more revenue?”

typest3y ago

This isn’t true. Science is ultimately the scientific method — make a hypothesis, test a change, observe the results, repeat. It’s an algorithm for learning and broadly gaining information about reality. It can equally be applied to things having to do with humans and things not having to do with humans.

sdrinf3y ago

Not gp, but there's a significant kink when this applies to humans; namely, that humans have the ability to reflect on publicly known outcomes, and change their behavior en-masse in light of information so gained.

I put this earler in the phrase "reflection completeness": https://sdrinf.com/reflection-completeness ie there are things which stops working when people know about it.

In particular with A/B testing, this means that the initial A/B test is intermingled from at least 3 effects: specifically it measures how the naive population's behavior changes as a function of new functionality being made available. This is heavily, heavily time-dependent; specifically there's a "novelty effect" (early data collection will not be representative to long-term usage patterns); and there's "reflection effect" (once the outcome of the test is widely known, people can change their behavior based on that). Controlling for the first is difficult, but possible; controlling for the second, beyond just "keeping everything secret", is significantly more so, as the timelines for that might be years in length.

I strongly suspect GP was pointing at this timeline factor, and specifically that market engineering, as currently, generally, widely practiced, is grounded on the immediately available signal of "does it increases sales in 2 weeks of A/B test running". Which, given novelty effects, is heavily biased towards "yes"; and these people aren't incentivized (nor have the time/energy) to measure _very_ long-term effects beyond novelty, and reflection period.

thewataccountOP3y ago

I agree that it can be a difficult thing to analyze. There's also the Hawthorne Effect at play here too. But those are just confounding variables, they do not negate the fact that A/B tests are still "real science".

An A/B test just refers to observing how a dependent variable changes when an independent variable is in two different states, State A and State B.

Drug vs placebo - is an A/B test.

eachro3y ago

Most companies (or at least the ones doing things properly) will also have a long running retro test to see if impact persists (new test group = don't use the new changes).

ilyt3y ago

I feel like it's especially bad for any UI changes that have relation to long-term productivity; measuring how given change affect existing users and whether the performance will go back to previous level or get below it after few weeks or month.

compiskey3y ago

Agreed. My point is I am not going to see a marketer as aiming for the same goal “as an experimental physicist.”

To borrow the Lindy effect; whether someone likes the jacket in color A or B is of such short lived value it’s a huge waste of the resources that went into the pipeline needed to come to the conclusion.

Here’s an A/B test; rethink logistics to increase customization of outputs or continue to create design jobs who define what’s trendy and acceptable?

thewataccountOP3y ago

I think we're are getting caught up on what's being tested.

In the context of what we're talking about, you can A/B test more than marketing, you're can test variables like UI/UX.

Yes clothes fall in and out of fashion, but changing the placement, color, size of the "add to cart" button isn't something that's going to be changing frequently.

Another example might be adding a "trending" tab the top navigation of a page or whether the "what's trending" vs "what you like" provide more engagement as the default page.

Youtube recently tested randomly lowering people's video resolution to see who changed it back to gauge the importance of the resolution to their customers.

2 more replies

j / k navigate · click thread line to collapse

0 comments

zackees3y ago

Technically it's not science because it doesn't follow the scientific method. Instead it's the close cousin, empiricism.

https://www.merriam-webster.com/dictionary/empirical

thewataccountOP3y ago

A/B testing is literally just the scientific method, A/B are the two states of the IV. A drug vs placebo experiment, is an A/B test.

For example with changing the font size of a button:

Your null hypothesis is there is no difference in the number of clicks. Your alternative hypothesis is that there is an increase in number of clicks.

Your IV is the button font size. Your DV is the number of button clicks over a set period of time.

You randomly sample 50% of the population to State A (same button size) You put the other group into State B (increased button size)

You observe the number of clicks of the button.

You analyze this data, and can determine the statistical significance between your null and alternative hypothesis.

compiskey3y ago

Advertisers and marketers use the equations but measure contemporary trends.

Science is more “what’s true if humans didn’t exist.”

Marketing is more “what widget generates more revenue?”

typest3y ago

sdrinf3y ago

I put this earler in the phrase "reflection completeness": https://sdrinf.com/reflection-completeness ie there are things which stops working when people know about it.

thewataccountOP3y ago

An A/B test just refers to observing how a dependent variable changes when an independent variable is in two different states, State A and State B.

Drug vs placebo - is an A/B test.

eachro3y ago

Most companies (or at least the ones doing things properly) will also have a long running retro test to see if impact persists (new test group = don't use the new changes).

ilyt3y ago

compiskey3y ago

Agreed. My point is I am not going to see a marketer as aiming for the same goal “as an experimental physicist.”

Here’s an A/B test; rethink logistics to increase customization of outputs or continue to create design jobs who define what’s trendy and acceptable?

thewataccountOP3y ago

I think we're are getting caught up on what's being tested.

In the context of what we're talking about, you can A/B test more than marketing, you're can test variables like UI/UX.

Yes clothes fall in and out of fashion, but changing the placement, color, size of the "add to cart" button isn't something that's going to be changing frequently.

Another example might be adding a "trending" tab the top navigation of a page or whether the "what's trending" vs "what you like" provide more engagement as the default page.

Youtube recently tested randomly lowering people's video resolution to see who changed it back to gauge the importance of the resolution to their customers.

2 more replies

j / k navigate · click thread line to collapse