It feels like I’ve just attached an intergalactic infinity-drive engine to a steam train... should be an interesting ride!
It feels like I’ve just attached an intergalactic infinity-drive engine to a steam train... should be an interesting ride!
So if you are developing a theme on server A, and you want to copy to server B, it's no longer just copying over the theme files, but also reconfiguring the modules.
This is time consuming and error-prone. Profiles are perfect for this, but building and maintaining profiles in wiki pages is time consuming. Now, thanks to the profile export, it becomes easier & faster.
For now, go to tiki-admin.php?page=profiles -> Export
And on the new site, tiki-admin.php?page=profiles -> Advanced -> Profile tester
The code for this is r42346
Tiki7 is yet to be released, but still there are some important changes are happening in trunk. Over the last few days, I rewrote most of the UI for comments. The objective was to use the same comments for trackers as for the rest of Tiki, but without having to handle those includes and global variables. The UI is now a full AJAX interface that merely requires linking from the template to be enabled. Old tracker comments were converted and now tracker comments benefit from threaded comments, locking, archiving, moderation and all of the other features that are expected from comments.
The new AJAX comment interface was also deployed everywhere object commenting was used. This transition allowed to clean up the old comments code that used to be shared between comments and forums (which still use the same database storage). From now on, comments.php and comments.tpl are only used for... forums. This should leave the code in a state that is much easier to understand and allow the forums to improve.
However, the big change comes with the comments. They are powered by a micro-MVC framework with the common controller/action paradigm. It handles content negociation, so it will return JSON or HTML depending on what is requested. When HTML is required, it will use a template matching the controller and action to render the data that would otherwise be sent to the client as JSON. If the call is made outside of an AJAX request, it will even render the request using the tiki.tpl frame. Essentially, there is a full MVC hidden inside tiki-ajax_services.php.
What does this change?
Much of the controller code that was traditionally in the PHP file is moving to controller classes where helper methods can be created. Each action becomes an independent entity, reducing the amount of dependencies between code and the initialization mess typically found. Each action also now gets its own template as well, and those templates are separated into different directories by convention. Less files in the main folders might help keeping new developers. We also get a JSON API for free, although it's not the defined kind of API that will stay stable across releases.
The URLs are not pretty at this time. They would need rewrite rules if they get to be used significantly. The broker code can be called from other files at well. The set-up required is minor enough.
Check it out in trunk:
Tiki8 is coming up with a few minor changes. The permission definitions were moved out of the database, allowing for easier maintenance and adding of new properties. The category permission screen no longer lists the global-only permissions and the object permission lists now also display additional permissions that can be applied to the object outside of the direct type. When making the permissions global, the correct list is also extracted, fixing long time issues with scope checking of permissions.
The newest change is the introduction of a Signal/Slot type of mechanism to handle events within Tiki. It will allow to connect features through hooks rather than having to go directly in the code to check the feature and adding an extra call, leading to extremely long functions with sequential operations.
As a side benefit, it will also allow customizers to plug in new behaviors without affecting the core, making it easier to contribute and to maintain, for those cases where the feature is too specific to be contributed back.
So far, it has been partially deployed for wiki page update and creation. Much of the functionality in there still needs to be extracted, but the skeleton is in place. Events are organized in hierarchies, allowing for functionality that applies to all object types to be bound at a single location. For example,
$tikilib->update_page(...) will trigger tiki.wiki.update, which chains to tiki.wiki.save, which chains to tiki.save. Each event can have multiple functions or methods bound to it.
At this time, binding happens in a single location: lib/setup/events.php. This will change in the future to allow late event binding and speed up the setup process. However, since there is very little at this time, it's not much of an issue.
From Admin > Features > Programmer, you can see a graphical representation of the events in the system, updated live as they are deployed. The graph engine used by transitions was refactored to reduce code duplication and can now be used with very few lines of code.
Several features are still to be desired in file galleries. Among others, alternate storage engines, using file galleries for attachments across Tiki and customizable views may be desired. Some features are partially available, like uploading files to public locations to avoid the overhead of permission checking for files part of the site.
In order to see what was possible and explore the current code, some low level refactoring was performed and included:
These changes left the code in a substancially better condition, leaving the library at just over 3000 lines of code. However, the process highlighted several flaws:
These multiple factors lead to inconsistencies in how Tiki behaves, breaking user expectations. Cleaner rules are needed to identify which files are valid and where they should be stored. One of the major issues while refactoring was the widely duplicate condiitions to identify where podcast gallery files should be stored. While this is mostly resolved at this time, there are other cases where this occurs, and other conditions where it should.
There are multiple entry points which are unavoidable and desired, including file upload, batch upload, attachment upload, webdav upload, batch import from directory, and such. Right now, each of those handles the full path to the storage. The initial refactoring brought some helpers to handle common parts, but the differences in handling could not be unified without breaking some existing behavior, no matter how wrong they are.
The code needs to transition to a state where only the input is identified and then moved along to a single code path to handle validation:
This transition is a significant refactoring effort that will lead to some rules to some validation rules to change.
File galleries serve multiple purposes at this time.
All of these concepts serve different users and could be maintained separately. By using categories as the means for organization and navigation, a regular filesystem could be replicated, except that files could naturally live in multiple locations to serve different navigation requirements. The user would simply select the desired view, like a thumbnail view, just like it is done in operating systems. These views could be configured by administrators, but would remain independent from the individual galleries.
This would leave the physical properties to file galleries, leaving them as mere partitions determining where files are stored and the available capacity, a tool for administrators to handle system requirements. Files could be migrated between galleries over time without affecting the navigation. For example, old files that are not accessed frequently could be stored on a remote SAN with slower access times, but higher storage capacity. Public files could be stored directly in a gallery where the directory is web-accessible, avoiding the PHP overhead for images on the site. By detecting this at link creation, the correct link could be built when using the appropriate plugin.
This model is a significant change from the current implementation and is not without migration challenges. One of the earliest changes that could be performed is around removing the view properties from the galleries and introducing a separate view concept.
To combine with the effort to reduce the amount of code paths, the file gallery should probably optionally define a location to store files, removing this condition that is currently based on the file type.
One aspect where different decisions in the code must be made is around all of the meta-data surrounding files. When uploading a single file, adding a name and description field comes in naturally, but when uploading multiple files are uploaded or files are uploaded without a form, those gaps are left empty and different behaviors may be used. More broadly, a user may question the very use of having a different name and a description. Once the content is indexed, those serve a much smaller purpose. Depending on the usage, alternate meta-data may be preferable altogether.
The files should only contain the file relevant to the file itself. Other properties could be deferred to trackers, perhaps using specific values per file type.
If tracker attachments were stored in file galleries and benefited from the complete indexing capabilities, the result with separated concerns would not be much different. Files that do not need meta-data could simply use file galleries, those that do would go through trackers. Some user interface could be added to attach meta-data to an existing file, essentially creating a new tracker item and automatically adding the attachment.
Similarly, other attachments could be treated the same way. In order to make the attachments available through WebDAV, some category synchronization between the container object and the attachment would be required, or at least user interfaces to allow selecting them.
It is often criticized that the source code in Tiki is not object-oriented enough. There are several issues this brings up when adapting the code to new needs. Most of which are code duplication, inconsistency in behavior and growing functions with loosely coupled concepts. There have been multiple proposals in the past which resulted in little to no results. The following factors may be the cause:
To this day, one of the most successful refactoring made in Tiki was the dynamic preferences even if it required a massive effort and extended over multiple releases (hello pkdille!). The reason that lead to success were:
Not all required changes are as easy to deploy and will have as a direct benefit to the end user, but it is almost guaranteed that changes that are purely aesthetic will fail to gain traction. Simply going 'object oriented' won't cut it. Design methods using object oriented are good, but not a holy grail or a goal to aim for. Great design patterns are built and described using object oriented features, but implementing a pattern is not an objective either. Desired properties and behaviors of the code must be identified, than a design can be selected.
The classes used as function libraries are not a bad thing in themselves. TikiLib is an issue because it is too large and contains completely unrelated functions, but the other ones provide a good overview to what is available for a given feature. Something that is often crucially missing in well design object oriented libraries. The concept is entrenched in Tiki and thinking it will go away is wishful thinking.
The problems lie within those libraries, not in their existence. Tiki should provide better utilities for developers than it does now, but going for a share parent class will only bring back to the same issues we have right now with TikiLib.
There are a few common patterns in Tiki that lead to code smell. Targeting those patterns would lead to better code. It may be subtle, but work has begun a long time ago.
The Unified Search project aims at the listing issue by providing a flexible search index interface allowing for customizable output formatting and advanced filtering. The need for these parameters will be scaled down as we move forward. Perhaps we can begin to remove them.
The SQL query building part could easily be solved using database utility methods. It would reduce the amount of errors made and simplify the code. It can easily be deployed over time
The pre and post behavior could be triggered through the SignalSlot pattern, allowing external code to hook into the save process without affecting the actual library. A data container could be sent around for the pre hooks to allow them to alter the data or interrupt the chain altogether. The cross-feature behaviors could be registered only once and apply where needed.
A drawback of this approach is that the code becomes harder to trace. As there are more dynamic bindings, knowing which pieces of code run becomes more of an issue. To mitigate this, the data container could contain a debug log for the different pieces of code to indicate that they ran and summarize what they did, allowing developers to look into what happened.
Forms are always defined in template file. The PHP file then collects the input in a ad-hoc manner, converts the format into what is expected by the library (which is close to the database in most cases). The library then pushes the data to the database. Better facilities are required to collect the data from the request and in many case, the format taken by the library has to be standardized. Just like list functions, the order of the arguments is arbitrary in most cases.
Tiki should provide functionality to support form validation, both on the PHP side and on the browser side. This could be done in a number of ways, but could be composed of form field configurations and smarty plugins to include within the forms. The additions should remain a toolkit rather than an obligation to allow for gradual deployment and preserve the flexibility.
Preserving the environment is always an issue. Compatibility has to be maintained, expectations must be met. Developers have been used to flexibility in Tiki and any framework that constrains what can be done (remember Magic?) is likely not to gain support. The codebase is large and not everything can be done within a single release cycle.
It has been a while since the last report. Work has been going slow. After the major milestones, I never really feel like starting a new one right away. First there was the feature-completeness of the engine, then the coding of most of the content sources and global sources I had on my list. The next big step was to test it all with real data. That's scary.
I was right to be scared. The first tests were inconclusive. Downright failures. Testing on a database dump from doc.tiki.org, the indexer first crashed. That does not start the day too well. To put things in perspective, that database is old. It has been upgraded for many years, starting long before I got involved in the project. In most cases, if something has been done in a twisted way, it was done there. When I rewrote the parser plugin, I tested it with the home page. That failed the first time too.
It turns out it was a minor issue. The redirect plugin would just terminate it, so it was just about making sure that one would not execute. There were a few other issues related to various plugins. I had to solve countless notices around the code. Turns out I had plenty of time to do that, because the indexing was painfully slow. Not that much slower than I expected, but still slow enough to be annoying to test.
The execution was also way too long to profile, leading to disk filling up and not being able to open the file anyway. I was hoping to see a quick win that would make the rest of the testing faster. However, the little I saw on partial runs indicated that most of the time was actually spent indexing in Lucene, and not collecting data in the unified search part of the code. I didn't know if it was a relief or a terrible technological decision at the time.
Some searching around lead to the conclusion that the default values in how the index was built were very conservative. I was happy the indexing kept the memory usage low, but more speed is a nice thing when you want to index data. There are three knobs you can play with in Lucene to adjust the trade-off between memory usage and speed. Sadly, none of them will actually mention a memory limit. It's just a matter of document counts and merge cycles. That makes life hard. I moved up some values, got better results, but it's not optimal yet. I hope I won't have to expose those settings in out admin panels. I barely understand them myself at this time.
Good news is, I was able to cut indexing time in half. A good first day of work.
I then tried to actually search the index. I shouldn't have done that before the week-end. It just timed-out after 60 seconds. Profiling indicated a huge mess. I did not know where to start. The call counts in there made no sense at all. Terrible technological decision? The total index size was just over 2M at the time for approximately 2000 documents. I was really wondering why they were even bothering to mention that the maximum index size on a 32 bit system was 2G.
To be continued.
Still in a hotel lobby. I hope this won't become a recurring theme. On the road from Berlin to Strasbourg, I figured out a to reuse existing piece of abstraction to achieve a design objective I was struggling with.
The objective was quite simple. The indexer gathers a whole lot of information and stores it in various fields to be indexed. Depending on the type, some of those fields are can be retrieved, some not. You can configure the index store a copy of all the data, but that has a huge impact on the index size and memory requirements. Zend_Search_Lucene is a PHP implementation and comes with several limitations. Some fields are transformed to allow indexing anyway and cannot be reverted back to the original form.
The objective was to be able to retrieve the information from the database on the fly for the results. Essentially, it's the same work the content sources and global sources do. The issue is that the format was not quite right. The sources return the data encapsulated in objects indicating their type to allow different indexes to index them optimally. For example, multi-value fields like those used to index categories become a string of individual tokens generated by hashing values and replacing numbers, because lucene does not seem to like those.
The solution was quite simple. Sources already used a factory to select the proper implementation to use based on a reduced list of supported value types. Really all that was needed was to provide a different factory that would only provide pass-through implementations and retrieve the value. Simple, but not that obvious. I was really scared I would have to duplicate code for this, but it turns out the sources did not require any change to retrieve the data.
I implemented the design yesterday. Or maybe it was the day before. Can't tell. Now the unifies search can display any information it indexes, allowing for really powerful formatting that does not require knowing where the information is actually from.
I also added a value formatter to render any value as a link to the object. There wasn't really a way to link to the object before that. It did make the thing unusable, but it wasn't really critical. Anyway, going through there made be realize the way URLs are generated really is inconsistent. There are two smarty plugins, one function that is useless and one modifier that is really the only thing anyone should use. The category library also attempts to do it, but entirely ignores any sefurl configuration you may have. It should use the modifier's implementation. But later. There are more fun issues to deal with.
Next step is to write a generic table formatter for the unified search, then I think it will be ready for massive implementation.
November 15th, sitting in a hotel lobby, waiting to move to a next location to get even more discussions. There is so much done, and yet so much left to be done. I see plenty of ideas around, but my mind is mostly impermeable. I have a single focus and can't keep my thoughts away from it. I can see what I'm trying to reach. It's far away, but I can almost feel it. A few weeks back, all my thoughts aligned for the first time on resolving searches and listings. It has been years. I had heard countless problems. I had seen code crumbling. Heard stories of despair and sleepless nights attempting to fix yet an other bug. Until now, I could not see a sustainable solution. Years.
I begun working right away. We need to have a better content index. One index that can be efficient enough for advanced category filtering. One index that can provide WYSIWYCA listings without killing servers. One index that will lead to maintainable code we can grow with. One index to rule them all. Skip that last one.
The work is progressing. The last few weeks have been hard. It's cold out there in the abstract. But finally, the feature is now somewhat usable in trunk. It's still tiny, but it breathes.
I know it does not do much. It's only wiki pages at this time, but that is the least of my concerns, because adding more is just a matter of doing it. It's not even hard. All that is needed is to implement Search_ContentSource_Interface for each feature and add it to the indexer. Global features like categories will just work. No need for individual hacks in each plugin anymore, and the current filters are more powerful than any others.
I don't know how long it will take. I don't know in which order it will happen. Call it development time dependency resolution.
Kind of short term
Want to help out? Got ideas?