undefined | Better HN

0 pointsPaulHoule9y ago0 comments

I discovered this a few months ago, when I went looking for XMP metadata in the filesystem and used the magic number trick to extract it from files of all kinds.

I found it is common to find XMP inside media files embedded inside Windows EXE, as well as Linux binaries, JAR, Microsoft Word and other composite formats.

Complex media objects frequently use an encapsulation system such as ZIP. When a PNG file is incorporated into a JAR or a Word Document, the XMP content in the file may not be compressed because the archiver may not attempt to compress the png file since it assumes the data is already compressed.

XMP is very good from the viewpoint of content creators in terms of having comprehensive metadata incorporated into files so that it does not get out of sync. XMP data is RDF data using an improved version of Dublin Core, IPCC and other industry RDF vocabulary. You can write SPARQL queries right away, plus XMP specifies a way to make an XMP packet based on pre-existing metadata in common industry schemes.

The XMP packets can get big, and you sometimes see people make a tiny GIF image (say a transparent pixel GIF) that is bulked up 100x because of bulky metadata. Once you package data for delivery to consumers you want to strip all that stuff out.

The XMP spec is here:

http://www.adobe.com/devnet/xmp.html

There is some brilliant thinking in there, but also things that will make your head explode such as the method for embedding an XMP packet into a GIF

0 comments

jszymborski9y ago

Hmm... would be interesting if we started taking XMP into account when designing compression programs then...

abdias9y ago

You could actually take any ancillary chunks into consideration, ie. chunks starting with a lower-case first letter. These are non-critical/mandatory.

TazeTSchnitzel9y ago

> When a PNG file is incorporated into a JAR or a Word Document, the XMP content in the file may not be compressed because the archiver may not attempt to compress the png file since it assumes the data is already compressed.

PNG can apply DEFLATE to blocks though, right? Does XMP not use it?

abdias9y ago

Deflating can be applied to some chunks, but not at will. The zTXt chunk can be compressed while for example the tEXt chunk cannot. The newer iTXt chunk can vary.

The two former are limited in scope and language encoding support, so iTXt is typically used for extended textual data such as XML/XMP etc. But if is saved compressed or not depends on the PNG encoder/host used (there can also be multiple instances of these chunks in the same file).

Photoshop for instance saves uncompressed, I guess to give fast access for performance reasons (ie. file viewers using galleries for numerous images while displaying their meta-data).

j / k navigate · click thread line to collapse

0 comments

jszymborski9y ago

Hmm... would be interesting if we started taking XMP into account when designing compression programs then...

abdias9y ago

You could actually take any ancillary chunks into consideration, ie. chunks starting with a lower-case first letter. These are non-critical/mandatory.

TazeTSchnitzel9y ago

PNG can apply DEFLATE to blocks though, right? Does XMP not use it?

abdias9y ago

Deflating can be applied to some chunks, but not at will. The zTXt chunk can be compressed while for example the tEXt chunk cannot. The newer iTXt chunk can vary.

Photoshop for instance saves uncompressed, I guess to give fast access for performance reasons (ie. file viewers using galleries for numerous images while displaying their meta-data).

j / k navigate · click thread line to collapse