undefined | Better HN

0 pointssegasaturn1y ago0 comments

>suffered from a catastrophic data loss that several of my Photojournalism classmates fell victim to

How does that happen? Forgetting to periodically save their work and have the app crash, or was it saving incorrectly and producing corrupted files?

0 comments

spacedcowboy1y ago

Aperture was utterly paranoid about data-loss.

There was the SQLite database that was run on its own thread, and regularly synced to disk, the hard-sync that waited until the data had flushed through to the disk platters.

In addition to that there was a whole structure of plist files, one per image, that meant the database could be reconstructed from all these individual files, so if something had somehow corrupted the SQLite database, it could be rebuilt. There was an option to do that in the menu or settings, I forget which. The plists were write-once, so they couldn't be corrupted by the app after they'd been written-and-verified on ingest.

Finally, there were archives you could make which would back up the database (and plist files) to another location. This wasn't automated (like Time Machine is) but you could set it running overnight and come back to a verified-and-known-good restore-point.

If there was a catastrophic data loss, it's (IMHO much) more likely there was a disk failure than anything in the application itself causing problems - and unless you only ever had one instance of your data, and further that the disk problem was across both the platter-area that stored plists and well as database, it ought to have been recoverable.

Source: I wrote the database code for Aperture. I tested it with various databases holding up to 1M photos on a nightly basis, with scripts that randomly corrupted parts of the database, did a rebuild, and compared the rebuilt with a known-good db. We regarded the database as a cache, and the plists as "truth"

I'm not saying it was impossible that it was a bug in Aperture - it was a very big program, but we ran a lot of tests on that thing, we were very aware that people are highly attached to their photos, and we also knew that when you have millions of users, even a 1-in-a-million corner-case problem can be a really big issue - no-one wanted to read "Aperture lost all my photos", ever.

vr461y ago

Again, thanks for the interesting insights.

I personally witnessed one incident I mentioned, and for my sins tried to help my panicking classmate, I think we reached a good-enough outcome. On the subject of raw files processing, I have yet to find an ideal system, if it is even possible, where edits to get a RAW photo to its final state are handled and stored in some deterministic format, yet somehow connected to said image, in a way that allows the combination of the edit and raw to travel around.

Everything I've tried - let's see, Aperture, Lightroom, Capture One - have to use some kind of library or database and there's no great way of managing the whole show. The edits ARE the final image and the only solution I had that ever works was to maintain a Mac Pro with RAID and an old copy of Lightroom, and run all images through that.

IIRC, I never understood the Aperture filesystem, probably not meant for humans, which didn't help. Does that sound right?

spacedcowboy1y ago

Adobe have (had?) a DNG file-format that encompasses the RAW data, JPEGs and the changes, but by the simple fact that adjustments are application-specific anything you do to modify the image won't be portable. It's basically a TIFF file with specific tags for photography.

The thing is, if you want any sort of history, or even just adequate performance, you want a database backing the application - it's not feasible to open and decode a TIFF file every time you want to view a file, or scan through versions, or do searches based on metadata, or ... It's just too much to do, compared to doing a SQL query.

The Aperture Library was just a directory, but we made it a filesystem-type as a sort of hint not to go fiddling around inside it. If you right-clicked on it, you could still open it up and see something like <1>

Masters were in the 'Masters' folder, previews (JPEGs) inside the 'Previews' folder, Thumbnails (small previews) were in the 'Thumbnails' folder. Versions (being a database object) had their own 'Versions' folder inside the 'Database' folder. This was where we had a plist per master + a plist per version describing what had been done to the master to make the version.

We didn't want people spelunking around inside but it was all fairly logically laid out. Masters could later be referenced from places outside the Library (with a lower certainty of actually being available) but they'd still have all their metadata/previews/thumbnails etc inside the Library folder.

1: https://imgur.com/a/disk-structure-within-aperture-library-m...

1 more reply

vr461y ago

Going back to 2007, so can't remember super clearly, but IIRC the db was a sqlite like thing and all info about everything was stored in this, and it was vulnerable to corruption, plus all versions and thumbnails were mixed together with original image files - a total mess. The digital photo management landscape wasn't so mature then, and some people trusted Aperture with their original images whereas later versions allowed or encouraged people to keep their "masters" elsewhere.

Because the whole thing was as slow as a slug dragging a ball-and-chain, pre-SSD, issues with that filesystem or master database were sometimes mistaken for just general slowness. I jumped to Lightroom faster than you could say Gordon Parks.

spacedcowboy1y ago

Aperture 1.0 was very slow. The stories I could tell about its genesis...

I came on board just before 1.0 release, and for 1.5 we cleaned things up a bit. For 2.0 we (mainly I) completely rewrote the database code, and got between 10x and 100x improvements by using SQLite directly rather than going through CoreData. CoreData has since improved, but it was a nascent technology itself back then, and not suited to the sort of database use we needed.

The SQLite database wasn't "vulnerable to corruption", SQLite has several articles about its excellent ACID nature. The design of the application was flawed at the beginning though, with bindings used frequently in the UI to managed objects persisted in the database, which meant (amongst other things) that:

- User changes a slider

- Change is propagated through bindings

- CoreData picks up the binding and syncs it to disk

- But the database is on another thread, which invalidates the ManagedObjectContext

- Which means the context has to re-read everything from the database

- Which takes time

- By now the user has moved the slider again.

So: slow. I fixed that - see the other post I made.

vr461y ago

Thanks for the lovely insight, super interesting - I don't think I made it to Aperture 2 - but sounds like some unusual decisions made in that editing process. I suspect, based on my own history with disk problems, that the filesystem issues that would regularly pop up and not dealt with by your average technically-over-trusting student were the root cause, but exacerbated by the choices of image management and application speed.

j / k navigate · click thread line to collapse

0 comments

spacedcowboy1y ago

Aperture was utterly paranoid about data-loss.

There was the SQLite database that was run on its own thread, and regularly synced to disk, the hard-sync that waited until the data had flushed through to the disk platters.

vr461y ago

Again, thanks for the interesting insights.

IIRC, I never understood the Aperture filesystem, probably not meant for humans, which didn't help. Does that sound right?

spacedcowboy1y ago

1: https://imgur.com/a/disk-structure-within-aperture-library-m...

1 more reply

vr461y ago

spacedcowboy1y ago

Aperture 1.0 was very slow. The stories I could tell about its genesis...

- User changes a slider

- Change is propagated through bindings

- CoreData picks up the binding and syncs it to disk