February 7th, Montreal

Louis-Philippe Huberdeau Monday 07 February, 2011

Several features are still to be desired in file galleries. Among others, alternate storage engines, using file galleries for attachments across Tiki and customizable views may be desired. Some features are partially available, like uploading files to public locations to avoid the overhead of permission checking for files part of the site.

In order to see what was possible and explore the current code, some low level refactoring was performed and included:

These changes left the code in a substancially better condition, leaving the library at just over 3000 lines of code. However, the process highlighted several flaws:

Less code paths

These multiple factors lead to inconsistencies in how Tiki behaves, breaking user expectations. Cleaner rules are needed to identify which files are valid and where they should be stored. One of the major issues while refactoring was the widely duplicate condiitions to identify where podcast gallery files should be stored. While this is mostly resolved at this time, there are other cases where this occurs, and other conditions where it should.

There are multiple entry points which are unavoidable and desired, including file upload, batch upload, attachment upload, webdav upload, batch import from directory, and such. Right now, each of those handles the full path to the storage. The initial refactoring brought some helpers to handle common parts, but the differences in handling could not be unified without breaking some existing behavior, no matter how wrong they are.

The code needs to transition to a state where only the input is identified and then moved along to a single code path to handle validation:

  1. Identify input as a wrapper
  2. Identify destination
  3. Collect file information (using fileinfo when available)
  4. Perform validation
  5. Store the file and reference
  6. Store meta-data (Optionally)
  7. Post-processing
    • Fill missing meta-data (default names and such)
    • Index content
    • Handle various notifications, watches, ...

This transition is a significant refactoring effort that will lead to some rules to some validation rules to change.

Re-think purposes

File galleries serve multiple purposes at this time.

All of these concepts serve different users and could be maintained separately. By using categories as the means for organization and navigation, a regular filesystem could be replicated, except that files could naturally live in multiple locations to serve different navigation requirements. The user would simply select the desired view, like a thumbnail view, just like it is done in operating systems. These views could be configured by administrators, but would remain independent from the individual galleries.

This would leave the physical properties to file galleries, leaving them as mere partitions determining where files are stored and the available capacity, a tool for administrators to handle system requirements. Files could be migrated between galleries over time without affecting the navigation. For example, old files that are not accessed frequently could be stored on a remote SAN with slower access times, but higher storage capacity. Public files could be stored directly in a gallery where the directory is web-accessible, avoiding the PHP overhead for images on the site. By detecting this at link creation, the correct link could be built when using the appropriate plugin.

This model is a significant change from the current implementation and is not without migration challenges. One of the earliest changes that could be performed is around removing the view properties from the galleries and introducing a separate view concept.

To combine with the effort to reduce the amount of code paths, the file gallery should probably optionally define a location to store files, removing this condition that is currently based on the file type.

Just files

One aspect where different decisions in the code must be made is around all of the meta-data surrounding files. When uploading a single file, adding a name and description field comes in naturally, but when uploading multiple files are uploaded or files are uploaded without a form, those gaps are left empty and different behaviors may be used. More broadly, a user may question the very use of having a different name and a description. Once the content is indexed, those serve a much smaller purpose. Depending on the usage, alternate meta-data may be preferable altogether.

The files should only contain the file relevant to the file itself. Other properties could be deferred to trackers, perhaps using specific values per file type.

If tracker attachments were stored in file galleries and benefited from the complete indexing capabilities, the result with separated concerns would not be much different. Files that do not need meta-data could simply use file galleries, those that do would go through trackers. Some user interface could be added to attach meta-data to an existing file, essentially creating a new tracker item and automatically adding the attachment.

Similarly, other attachments could be treated the same way. In order to make the attachments available through WebDAV, some category synchronization between the container object and the attachment would be required, or at least user interfaces to allow selecting them.


Permalink: https://dev.tiki.org/blogpost19-February-7th-Montreal