This page documents information related to efforts to add to Tiki the ability to view or edit metadata embedded in files that are uploaded to the file gallery. In the case of image files, metadata from any image displayed through the PluginImg should also be viewable.
Below are the key features that could be added, roughly in order of priority:
- View metadata from any image displayed using PluginImg
- Edit metadata through PluginImg for images stored in a file gallery
- Option to extract serialized metadata from images and add to image file gallery database record, allowing for searching on the data and for use in the first two items
- Allow adding name, title and description to image metadata information (instead of or in addition to putting in the database) upon upload to file gallery
- Do all of the above for other file types as well (e.g., PDF, video files, etc.)
The key issues to deal with from a coding perspective are:
Multiple Types within One File
With images for instance, there are three common methods for storing metadata:
- IPTC - standards first established in 1979 by the International Press Communications Council
- Exif - Exchangeable Image File Format first published in 1995
- XMP - Adobe's Extensible Metadata Platform first established in 2001
An image may have all three types of metadata and there are somewhat complex industry guidelines for how to display, edit and maintain information when there is duplication across the types of metadata. These guidelines try to deal with the fact that the various metadata types have been introduced at different times over the years and applications recognize and handle the different types inconsistently.
Adobe's XMP, which is based on XML and uses W3C's Resource Description Framework, is the newest and is attempting to become the standard and can incorporate Exif and IPTC data within it. However it is too soon to tell whether it will become the standard and some older applications do not recognize it, a factor in complicating reconciliation guidelines. It is also used in non-image files, which can also have separate "native" metadata, making mapping and reconciliation also necessary for these files if both XMP and native metadata are used. (this document gives examples for different file types).
Varied Methods of Storing Metadata
Each file type stores metadata in different locations and using different markers, etc. For example, JPEG, TIFF and PSD files can use Exif, IPTC and XMP data, but they each store them differently as shown in the Guidelines for Handling Image Metadata in section 126.96.36.199, More complex reconciliation in popular image formats. (The Adobe document on XMP Specifications Part 3 - Storage in Files also has helpful information on how different file types (including non-images) store metadata.) This means each file type will need to be handled separately in the code.
Limited Existing Libraries
PHP has some limited image metadata functions built in:
- Exif - several functions for reading (but not writing) Exif data in JPEG and TIFF files only
- IPTC - iptcparse and iptcembed for reading and writing IPTC data in JPEG files only
XMP data in PDF files can be manipulated using Zend Framework's Zend_PDF, otherwise there are no other third-party functions for reading XMP data in the Tiki code base.
As far as other open source programs that could be leveraged, there don't seem to be good alternatives:
None of these have the right license and also don't appear to be stable third party programs that could be relied on to include in the code base.
The overlapping nature of the three metadata formats used in images is illustrated below:
Below is a summary of the Guidelines for Handling Image Metadata, which should be reflected in how the code is written for this feature. (Numbers in parentheses represent section numbers from the November 2010 version of the document)
Guidelines for Handling Image Metadata
- The different forms of metadata must be reconciled when displayed
- Information added by a user must only be deleted by explicit user intent (3.1.2)
- Data not added by a user should only be modified or deleted if inaccurate or problematic (3.1.3)
- All forms of metadata that are modified should be kept in sync with one another (3.1.3)
- This can be accomplished by deleting one metadata form
When Exif and XMP are Used
- Read both but prefer Exif if property is found in both
- Exception for date/time: XMP has the time zone whereas Exif does not, therefore use XMP as long as the date is the same as in Exif, but do not convert into computer's local time for display
- When updating, if a value is found in both, both should be updated
- In the case the file format does support Exif natively, Exif and TIFF device properties (e.g. XResolution, YResolution, WhitePoint, etc.) should not be duplicated in the XMP exif: and tiff: namespaces.
- Exif metadata is formatted as a TIFF stream, even in JPEG files. TIFF streams have an explicit indication of being big endian or little endian - the existing byte-order should be preserved when writing data.
When IPTC and XMP are Used
- When reading the data, if a checksum for the IPTC data is missing, or is there and matches the IPTC block, then prefer the XMP value, but use the IPTC value if the XMP value is missing
- Otherwise, if the stored checksum for the IPTC block does not match a current checksum calculation for the block, then compare each common IPTC and XMP field after making any necessary truncations to the XMP field to match size limitations for the corresponding IPTC field. If a field matches, use the full XMP value, otherwise use the IPTC field.
- If IPTC is already in the file, data should be written data back to the file in both XMP and IPTC – otherwise only XMP should be written
- If IPTC and XMP are both present, whether changed or not, the checksum value must be created or updated
All Three Formats Used
- If there is a conflict between Exif and IPTC, Exif is preferred in the case the IPTC-IIM checksum matches or does not exist and IPTC is preferred in the case the checksum does not match
- XMP metadata must be read/written as Unicode in the form appropriate to the file. For JPEG, TIFF and PSD files this is UTF-8.
- IPTC metadata must be read as UTF-8 if a 1:90 DataSet is present indicating the use of UTF-8. When written, it must be written as UTF-8, and must include a 1:90 DataSet indicating the use of UTF-8.
- Encoding for Exif fields is complex - see the document for details
- Time-zone information must not be implicitly added and existing values should be preserved.
- Keywords should be completely replaced when reconciling
- Specific rules on keeping date/times in sync - see document
- Specific rules on Orientation, Rating, Copyright, Creator, Location - see document
- Metadata Working Group - sets industry standards for handling of image metadata for programs that display or edit image metadata. They publish:
- IPTC Photo Metadata - IPTC metadata page with links to doucumentation, such as:
- IPTC Standard Photo Metadata - document mapping IPTC fields to XMP
- Exif Specifications - specifications for Exif metadata in images
- Adobe XMP Developer Center - documentation related to Adobe's extensible metadata platform specifications:
- JPEG Specifications - Specifications for all aspects of JPEG images
PHP Reader - Zend_Media - Zend_Io
- PHP Reader is a well documented small library written in PHP to read and write media files and their information headers in an object-oriented manner. Currently supported formats are ASF (Windows Media Player files, ie WMA, WMV, etc), ID3, including both ID3v1 and ID3v2 (MPEG files, ie MP3), MPEG Audio Bit Stream (ie ABS, MP1, MP2, MP3), MPEG Program Stream (MPEG movies, and DVD and HD DVD video discs, ie MPG, MPEG, VOB, EVO), and ISO Base Media File Format (eg QuickTime, MPEG-4 and iTunes AAC files, ie QT, MOV, MP4, M4A, M4B, M4P, M4V, etc).