Loading...
 

In the trenches

December 14th, Montréal

Louis-Philippe Huberdeau Wednesday 15 December, 2010

It has been a while since the last report. Work has been going slow. After the major milestones, I never really feel like starting a new one right away. First there was the feature-completeness of the engine, then the coding of most of the content sources and global sources I had on my list. The next big step was to test it all with real data. That's scary.

I was right to be scared. The first tests were inconclusive. Downright failures. Testing on a database dump from doc.tiki.org, the indexer first crashed. That does not start the day too well. To put things in perspective, that database is old. It has been upgraded for many years, starting long before I got involved in the project. In most cases, if something has been done in a twisted way, it was done there. When I rewrote the parser plugin, I tested it with the home page. That failed the first time too.

It turns out it was a minor issue. The redirect plugin would just terminate it, so it was just about making sure that one would not execute. There were a few other issues related to various plugins. I had to solve countless notices around the code. Turns out I had plenty of time to do that, because the indexing was painfully slow. Not that much slower than I expected, but still slow enough to be annoying to test.

The execution was also way too long to profile, leading to disk filling up and not being able to open the file anyway. I was hoping to see a quick win that would make the rest of the testing faster. However, the little I saw on partial runs indicated that most of the time was actually spent indexing in Lucene, and not collecting data in the unified search part of the code. I didn't know if it was a relief or a terrible technological decision at the time.

Some searching around lead to the conclusion that the default values in how the index was built were very conservative. I was happy the indexing kept the memory usage low, but more speed is a nice thing when you want to index data. There are three knobs you can play with in Lucene to adjust the trade-off between memory usage and speed. Sadly, none of them will actually mention a memory limit. It's just a matter of document counts and merge cycles. That makes life hard. I moved up some values, got better results, but it's not optimal yet. I hope I won't have to expose those settings in out admin panels. I barely understand them myself at this time.

Good news is, I was able to cut indexing time in half. A good first day of work.

I then tried to actually search the index. I shouldn't have done that before the week-end. It just timed-out after 60 seconds. Profiling indicated a huge mess. I did not know where to start. The call counts in there made no sense at all. Terrible technological decision? The total index size was just over 2M at the time for approximately 2000 documents. I was really wondering why they were even bothering to mention that the maximum index size on a 32 bit system was 2G.

To be continued.

Keywords

The following is a list of keywords that should serve as hubs for navigation within the Tiki development and should correspond to documentation keywords.

Each feature in Tiki has a wiki page which regroups all the bugs, requests for enhancements, etc. It is somewhat a form of wiki-based project management. You can also express your interest in a feature by adding it to your profile. You can also try out the Dynamic filter.

Accessibility (WAI & 508)
Accounting
Administration
Ajax
Articles & Submissions
Backlinks
Banner
Batch
BigBlueButton audio/video/chat/screensharing
Blog
Bookmark
Browser Compatibility
Calendar
Category
Chat
Comment
Communication Center
Consistency
Contacts Address book
Contact us
Content template
Contribution
Cookie
Copyright
Credits
Custom Home (and Group Home Page)
Database MySQL - MyISAM
Database MySQL - InnoDB
Date and Time
Debugger Console
Diagram
Directory (of hyperlinks)
Documentation link from Tiki to doc.tiki.org (Help System)
Docs
DogFood
Draw -superseded by Diagram
Dynamic Content
Preferences
Dynamic Variable
External Authentication
FAQ
Featured links
Feeds (RSS)
File Gallery
Forum
Friendship Network (Community)
Gantt
Group
Groupmail
Help
History
Hotword
HTML Page
i18n (Multilingual, l10n, Babelfish)
Image Gallery
Import-Export
Install
Integrator
Interoperability
Inter-User Messages
InterTiki
jQuery
Kaltura video management
Kanban
Karma
Live Support
Logs (system & action)
Lost edit protection
Mail-in
Map
Menu
Meta Tag
Missing features
Visual Mapping
Mobile
Mods
Modules
MultiTiki
MyTiki
Newsletter
Notepad
OS independence (Non-Linux, Windows/IIS, Mac, BSD)
Organic Groups (Self-managed Teams)
Packages
Payment
PDF
Performance Speed / Load / Compression / Cache
Permission
Poll
Profiles
Quiz
Rating
Realname
Report
Revision Approval
Scheduler
Score
Search engine optimization (SEO)
Search
Security
Semantic links
Share
Shopping Cart
Shoutbox
Site Identity
Slideshow
Smarty Template
Social Networking
Spam protection (Anti-bot CATPCHA)
Spellcheck
Spreadsheet
Staging and Approval
Stats
Survey
Syntax Highlighter (Codemirror)
Tablesorter
Tags
Task
Tell a Friend
Terms and Conditions
Theme
TikiTests
Federated Timesheets
Token Access
Toolbar (Quicktags)
Tours
Trackers
TRIM
User Administration
User Files
User Menu
Watch
Webmail and Groupmail
WebServices
Wiki History, page rename, etc
Wiki plugins extends basic syntax
Wiki syntax text area, parser, etc
Wiki structure (book and table of content)
Workspace and perspectives
WYSIWTSN
WYSIWYCA
WYSIWYG
XMLRPC
XMPP




Useful Tools