Loading...
 
PDF Print

File Metadata

This page documents information related to efforts to add to Tiki the ability to view or edit metadata embedded in files that are uploaded to the file gallery. In the case of image files, metadata from any image displayed through the PluginImg should also be viewable.

Potential Functionality

Below are the key features that could be added, roughly in order of priority:

  • View metadata from any image displayed using PluginImg
  • Edit metadata through PluginImg for images stored in a file gallery
  • Option to extract serialized metadata from images and add to image file gallery database record, allowing for searching on the data and for use in the first two items
  • Allow adding name, title and description to image metadata information (instead of or in addition to putting in the database) upon upload to file gallery
  • Do all of the above for other file types as well (e.g., PDF, video files, etc.)

Overview

The key issues to deal with from a coding perspective are:

Multiple Types within One File

With images for instance, there are three common methods for storing metadata:

  1. IPTC - standards first established in 1979 by the International Press Communications Council
  2. Exif - Exchangeable Image File Format first published in 1995
  3. XMP - Adobe's Extensible Metadata Platform first established in 2001

An image may have all three types of metadata and there are somewhat complex industry guidelines for how to display, edit and maintain information when there is duplication across the types of metadata. These guidelines try to deal with the fact that the various metadata types have been introduced at different times over the years and applications recognize and handle the different types inconsistently.

Adobe's XMP, which is based on XML and uses W3C's Resource Description Framework, is the newest and is attempting to become the standard and can incorporate Exif and IPTC data within it. However it is too soon to tell whether it will become the standard and some older applications do not recognize it, a factor in complicating reconciliation guidelines. It is also used in non-image files, which can also have separate "native" metadata, making mapping and reconciliation also necessary for these files if both XMP and native metadata are used. (this document gives examples for different file types).

Varied Methods of Storing Metadata

Each file type stores metadata in different locations and using different markers, etc. For example, JPEG, TIFF and PSD files can use Exif, IPTC and XMP data, but they each store them differently as shown in the Guidelines for Handling Image Metadata in section 4.2.3.4, More complex reconciliation in popular image formats. (The Adobe document on XMP Specifications Part 3 - Storage in Files also has helpful information on how different file types (including non-images) store metadata.) This means each file type will need to be handled separately in the code.

Limited Existing Libraries

PHP has some limited image metadata functions built in:

  • Exif - several functions for reading (but not writing) Exif data in JPEG and TIFF files only
  • IPTC - iptcparse and iptcembed for reading and writing IPTC data in JPEG files only


XMP data in PDF files can be manipulated using Zend Framework's Zend_PDF, otherwise there are no other third-party functions for reading XMP data in the Tiki code base.

As far as other open source programs that could be leveraged, there don't seem to be good alternatives:

Program License Language Comments
ExifTool GPL Perl Excellent program but in Perl and a one-man show
getID3() GPL PHP Covers lots of file types but not very active
PHP JPEG Metadata Toolkit GPL PHP Only JPEG, last update 2005
PEL: PHP Exif Library GPL PHP Only Exif for JPEG and TIFF, one-man show

None of these have the right license and also don't appear to be stable third party programs that could be relied on to include in the code base.

Image Metadata

The overlapping nature of the three metadata formats used in images is illustrated below:
Image

Below is a summary of the Guidelines for Handling Image Metadata, which should be reflected in how the code is written for this feature. (Numbers in parentheses represent section numbers from the November 2010 version of the document)

Guidelines for Handling Image Metadata

General

  • The different forms of metadata must be reconciled when displayed
  • Information added by a user must only be deleted by explicit user intent (3.1.2)
  • Data not added by a user should only be modified or deleted if inaccurate or problematic (3.1.3)
  • All forms of metadata that are modified should be kept in sync with one another (3.1.3)
    • This can be accomplished by deleting one metadata form

When Exif and XMP are Used

(4.2.3.1)

  • Read both but prefer Exif if property is found in both
    • Exception for date/time: XMP has the time zone whereas Exif does not, therefore use XMP as long as the date is the same as in Exif, but do not convert into computer's local time for display
  • When updating, if a value is found in both, both should be updated
  • In the case the file format does support Exif natively, Exif and TIFF device properties (e.g. XResolution, YResolution, WhitePoint, etc.) should not be duplicated in the XMP exif: and tiff: namespaces.
  • Exif metadata is formatted as a TIFF stream, even in JPEG files. TIFF streams have an explicit indication of being big endian or little endian - the existing byte-order should be preserved when writing data.

When IPTC and XMP are Used

(4.2.3.2)

  • When reading the data, if a checksum for the IPTC data is missing, or is there and matches the IPTC block, then prefer the XMP value, but use the IPTC value if the XMP value is missing
    • Otherwise, if the stored checksum for the IPTC block does not match a current checksum calculation for the block, then compare each common IPTC and XMP field after making any necessary truncations to the XMP field to match size limitations for the corresponding IPTC field. If a field matches, use the full XMP value, otherwise use the IPTC field.
  • If IPTC is already in the file, data should be written data back to the file in both XMP and IPTC – otherwise only XMP should be written
  • If IPTC and XMP are both present, whether changed or not, the checksum value must be created or updated

All Three Formats Used

  • If there is a conflict between Exif and IPTC, Exif is preferred in the case the IPTC-IIM checksum matches or does not exist and IPTC is preferred in the case the checksum does not match

Text Encodings

  • XMP metadata must be read/written as Unicode in the form appropriate to the file. For JPEG, TIFF and PSD files this is UTF-8.
  • IPTC metadata must be read as UTF-8 if a 1:90 DataSet is present indicating the use of UTF-8. When written, it must be written as UTF-8, and must include a 1:90 DataSet indicating the use of UTF-8.
  • Encoding for Exif fields is complex - see the document for details

Other Cases

  • Time-zone information must not be implicitly added and existing values should be preserved.
  • Keywords should be completely replaced when reconciling
  • Specific rules on keeping date/times in sync - see document
  • Specific rules on Orientation, Rating, Copyright, Creator, Location - see document

PHP Reader - Zend_Media - Zend_Io

Spaces [Toggle]

Search Wishes (subject only) [Toggle]

Keywords [Toggle]

The following is a list of keywords that should serve as hubs for navigation within the Tiki development and should correspond to documentation keywords.

Each feature in Tiki has a wiki page which regroups all the bugs, requests for enhancements, etc. It is somewhat a form of wiki-based project management. You can also express your interest in a feature by adding it to your profile. You can also try out the Dynamic filter.

Accessibility (WAI & 508)
Accounting 7.x
Administration
Ajax 2.x
Articles & Submissions
Backlinks
Banner
Batch 6.x
BigBlueButton audio/video/chat/screensharing (5.x)
Blog
Bookmark
Browser Compatibility
Calendar
Category
Chat
Comment
Communication Center
Consistency
Contacts Address book
Contact us
Content template
Contribution 2.x
Cookie
Copyright
Credits 6.x
Custom Home (and Group Home Page)
Database MySQL - MyISAM
Database MySQL - InnoDB
Date and Time
Debugger Console
Directory (of hyperlinks)
Documentation link from Tiki to doc.tiki.org (Help System)
Docs 8.x
DogFood
Draw 7.x
Dynamic Content
Preferences
Dynamic Variable
External Authentication
FAQ
Featured links
Feeds (RSS)
File Gallery
Forum
Friendship Network (Community)
Group
Help
Hotword
HTML Page
i18n (Multilingual, l10n, Babelfish)
Image Gallery
Import-Export
Install
Integrator
Interoperability
Inter-User Messages
InterTiki
jQuery
Kaltura video management
Karma
Live Support
Logs (system & action)
Lost edit protection
Mail-in
Map
Menu
Meta Tag
Missing features
Visual Mapping 3.x
Mobile Tiki and Voice Tiki
Mods
Module
MultiTiki
MyTiki
Newsletter
Notepad
OS independence (Non-Linux, Windows/IIS, Mac, BSD)
Payment 5.x
Performance Speed / Load / Compression / Cache
Permission
Poll
Profiles
Quiz
Rating
Report
Score
Search engine optimization (SEO)
Search
Security
Semantic links 3.x
Shopping Cart 5.x
Shoutbox
Site Identity
Slideshow
Smarty Template
Social Networking
Spam protection (Anti-bot CATPCHA)
Spellcheck
Spreadsheet
Staging and Approval
Stats
Survey
Syntax Highlighter (Codemirror)
Tags 2.x
Task
Tell a Friend, alert + Social Bookmarking
TikiTests 2.x
Theme
Toolbar (Quicktags)
Trackers
TRIM
User Administration
User Files
User Menu
Watch
WebHelp
Webmail and Groupmail
WebServices 3.x
Wiki 3D
Wiki History, page rename, etc
Wiki plugins extends basic syntax
Wiki syntax text area, parser, etc
Wiki structure (book and table of content)
Workspace and perspectives 4.x
WYSIWTSN 4.x
WYSIWYCA
WYSIWYG 2.x
XMLRPC