undefined | Better HN

0 pointsmorley9y ago0 comments

How would you have fixed it?

Assuming you implement the obvious thing -- tracking each page of a book the user has opened --

(a) How could you reliably track this if the user always keeps their Kindle on airplane mode?

(b) How could you track this accurately if the user reads a few hundred pages in the subway, where there's no Internet service?

(d) If Kindles don't already track page views this way, how do you update the software on all Kindles to start tracking this way? When do you switch your billing script to track purchases like that?

(e) If you're QAing a Kindle and you spot this loophole, how do you do all these fixes? How long are you willing to keep the software from shipping? How certain are you that your theoretical solution is better than what's already shipped?

Product development is hard, and it makes me angry when people handwave it as "gross incompetence" from a position of ignorance.

0 comments

phonon9y ago

Uh, store a log file of every user action in that book, and send those log files to the mothership periodically, as internet is available? It does not have to be same day, just eventually.

Analyzing log files for duration/pages visited is probably easier than the equivalent for web server logs, and there are very many services that will analyze those for you.

bduerst9y ago

Yeah, I'm not getting the "This is a very difficult problem to solve" take on this.

The books currently track by location point and you could log on blocks (e.g. every 100 location points progressed).

Amazon's books are already DRM'd to hell, meaning the kindle has to use the unlimited books through the marketplace. Then it's just a matter of reporting user stats, which can be covered in the Unlimited TOS.

levemi9y ago

You're still trusting the client. Any system that trusts the client is flawed. Time to read per page varies, and if you read the original article it says the scammers are mitigating chances of alerts by clicking through a book over a three day period. You're going to find people who click through at very cost efficient means somewhere in the world when you're making $60,000 a month from this scam.

bduerst9y ago

You would still need to force the users through each page.

Fast-reading bad actor accounts can be flagged as abusers through pattern recognition. Since a subscription is necessary, creating numerous accounts to game the system becomes expensive fast.

2 more replies

otterley9y ago

Amazon writes the client software (and ships hardware). If the clients communicate securely with the servers, Amazon should be able to trust them.

(I'm excluding Kindle Web Viewer, of course. Perhaps it should not have access to Kindle Unlimited.)

DanBC9y ago

Fucking horrific level of data collection.

kbenson9y ago

Yes, it is, but unfortunately I'm not sure how to get around it in a system where you aren't actually buying the goods, but borrowing them and then they are required to know how much of it you used. Thankfully you can still buy books outright if you don't want to be tracked (sort of. All KU books are Amazon exclusive, so Amazon will at least track that you bought it).

That said, Amazon is already syncing your location,and any annotations you've made[1] so they persist across all kindle devices, so there's already a bunch of tracking in place. Given that there's already some tracking, I wouldn't be too opposed to a per-page bit for whether it was read, triggered when the page has been lingered on for five or more seconds (scaled down to 1 second for partial pages, such as ends of chapters).

1: Anyone remember the big episode years back over Amazon realizing they didn't have the license to a book, then removing it from all Kindle devices automatically, including the annotations made? In what is possibly the most ironic situation I can imagine, the book was 1984.

1 more reply

DanHulton9y ago

Way more in-depth levels of data collection happen on literally every single page of the web, for reference.

If you're comfortable browsing the internet, this level of reporting on a Kindle seems almost quaint by comparison.

1 more reply

vkjv9y ago

This. It doesn't even really need to be a log. A bitset with each bit representing a page and a `1` representing "this page read" would do the trick. On a massive 8000 page tomb, that's only 1kb.

If Amazon doesn't need the exact pages read, POPCNT the total and send that.

y4mi9y ago

...that wouldn't change anything. They'd just change the report file to to sync straight 1s... no, you still need obfuscation and encryption, bloating it to at least 100kb.

but thats still pretty minor

2 more replies

Spivak9y ago

And you don't think that those logs can be faked? It might stop the casual, "hey fans, read this 'book' to support me" but it wont stop the real scammers or people who would buy reads for revenue and ratings.

phonon9y ago

Kindles are pretty locked down...it's not that difficult to have the kindle sign the data it sends (probably does that already). Being scammed by hacked kindles is one thing, but they're not even trying here...

odbol_9y ago

You could easily sign the log with the same certificate that is providing the DRM on the book itself. Or a different certificate. Encrypting things is not new, nor hard.

bduerst9y ago

Doesn't work.

You would need to fake the logs for paid accounts, and since rev sharing is a formula of all paid subscriptions, you'd be hard pressed to make positive returns.

cwyers9y ago

Not that any of those are difficult -- and all of the problems you list around connectivity are, uh... it's not like tracking it poorly resolves that problem, they're still finding some way to eventually sync the data up now, it's just lower-quality data.

But... even if those ARE difficult problems, shouldn't you try to solve them BEFORE you launch a business model where you promise people (ie, your authors) that you can do these things? Hell, especially if they're difficult problems, you should fix them before telling people you've solved them.

coldtea9y ago

>(a) How could you reliably track this if the user always keeps their Kindle on airplane mode?

I wouldn't let users always in airplane mode participate in the program. Actually, I'm pretty sure they already can't, as they need to connect to get books from KU.

>(b) How could you track this accurately if the user reads a few hundred pages in the subway, where there's no Internet service?

By using this magic thing called computer storage, and syncing later...

>(c) How could you distinguish that between someone who hopped around a book in that same time period?

By observing how much time they spend in each page (with some allowances for different reading speeds, skipping, speed reading etc) and making sure they've legitimately read a good portion of the book.

Even if they haven't actually read it, but only mimicked the above, this constraint just made the fraudsters' process much much slower to complete.

>(d) If Kindles don't already track page views this way, how do you update the software on all Kindles to start tracking this way?

You simply require users to update their software to continue participating in KU, and give them a deadline.

Users need to connect to browse/get new books anyway.

>When do you switch your billing script to track purchases like that?

After the deadline, only people with updated KU software will be there, so no problem, you just switch it.

In between, you could always switch it on an account basis (like you already have KU and non-KU account and other tiers) -- those who already updated get the new behavior, etc.

>(e) If you're QAing a Kindle and you spot this loophole, how do you do all these fixes? How long are you willing to keep the software from shipping? How certain are you that your theoretical solution is better than what's already shipped?

It's not like this things are rocket science. Companies do such QA an keep back BS products for a few months all the time. Even companies losing billions from doing so, like Apple. For Amazon, which barely breaks even and lots of offerings are loss leaders that's even easier.

>Product development is hard, and it makes me angry when people handwave it as "gross incompetence" from a position of ignorance.

Hard or not, there are always lots of cases of actual, bona fide, certified, 100% legit, "gross incompetence" too...

jonnathanson9y ago

Right. There is no easy, tradeoff-free way to automate the tracking and proportional payment process.

Which means that Amazon really does need to move to a more Apple-like human curation process for all new authors, and/or for all new titles. Doing so will immediately tank precious vanity metrics like # of titles added to the store each month. But the alternative is an ever-growing jungle of weeds crowding out the legitimate works. The more that happens, the harder it will be to eventually weed the garden.

lifeformed9y ago

I don't see why it needs to be always online. Just locally record the number of pages that were in view for X amount of time.

star0zero9y ago

I'm also assuming that they had to take a bit of a lowest common denominator approach to m2m communication given that they have cell-based (read - costs amazon money) and wifi (does not cost amazon money) enabled versions of the device. If they tracked every page read and sent a log periodically, that _could_ get expensive quickly on the part of the cell-based versions depending on what network agreement they have (numerex, for example, still charges by the kb for this type of low byte traffic). Given that the rules needed to be the same for both types of devices, you couldn't necessarily have an if(wifi){ //send log} else { //send last page syncd} code branch. This is just a giant guess given that I know nothing of amazon's partner network agreements.

moheeb9y ago

I have foobar2000 setup to track my plays for each song. It has an adjustable slider that I have set for 35%. Once 35% of the song has been played it increments the play count. It doesn't even need internet to do this! This stuff isn't that hard.

tedmiston9y ago

Spotify does it similarly: 30 seconds == a stream.

tedmiston9y ago

> (a) How could you reliably track this if the user always keeps their Kindle on airplane mode?

Spotify's solution is to require a device to sync at least once every 30 days or the offline content expires.

> (b) How could you track this accurately if the user reads a few hundred pages in the subway, where there's no Internet service?

With a log of events that syncs when they do reconnect.

> (c) How could you distinguish that between someone who hopped around a book in that same time period?

By measuring the duration spent on each page.

Aissen9y ago

Are you serious ? All of this is fixable in software, in a way that would just penalize revenue from (too) fast readers.

LoSboccacc9y ago

We run a small microsite service for designers and once enabled single page view tracking metrics - we had at the time very few customers and yet manage to smash trough our 50k keen.io event allowance in a single day. Can't imagine it on sonething where books have hundred pages and users running in the million

joeld429y ago

You can do the aggregation locally on the device. You wouldn't want to send every page view as an immediate event, just send the aggregates every 15 minutes or at the start or end of each session.

takno9y ago

That seems like an exceptionally low allowance for any kind of page view tracking. Given that they have the technology in place server-side, and it's not an incredibly hard problem, the server-side costs to Amazon of doing this would be tiny. No comment on the cost of designing a decent algorithm and keeping ahead in the cat and mouse games.

LoSboccacc9y ago

eh it's the cost of using a prepackaged solution. I'd move to an internal one but up until now developing features for the app had more precedence than developing a state of the art event tracking solution.

1 more reply

phonon9y ago

You don't think amazon can handle a few GB of log files every day for the entire kindle unlimited service? Really? Worst case you do some sampling.

LoSboccacc9y ago

I'm sure they can do, they bill themselves at a different price

1 more reply

ceejayoz9y ago

It needn't be a tracking event per page.

They're already firing {"current_page":3000}. They just need to start doing something like {"current_page":3000,"pages_seen":5}.

silverbax889y ago

All of these issues were solved over a decade ago by sales contact software.

rmc9y ago

Kindles could store, locally, the pages that are displayed, and then upload to the server when they have internet connection

j / k navigate · click thread line to collapse