Loading...
 

Manticore Search

The community reaction in public and private is very enthusiastic. So proposal is accepted and work started in January 2022. We hit our first bug but it was already fixed for the next version :-)


Latest code: https://gitlab.com/kroky/tiki/-/commits/feature/manticore/ Will be merged in Tiki25

Wow! Manticore 5.0 will have so many new features! https://github.com/manticoresoftware/manticoresearch/blob/master/manual/Changelog.md

Next step before 25.0 is released: update integration to use Manticore 5.x (and if easy, still support 4.2.1+)

What

This is a proposal to add Manticore Search support to Tiki. Manticore is a fork of Sphinx Search

Why

As of 2021-11-28, we have 2 good options for the Unified Index:

  • MariaDB/MySQL Full Text Search is the default and built-in.
  • Elasticsearch for advanced projects

Details are here: Unified Index Comparison.

While MariaDB/MySQL Full Text Search is great for small projects, and is impressively fast, we hit limits on more advanced project. For example, Faceted search is not supported. We could add support, but the feature requests will just keep coming. (Stored Search, “Did you mean?”, etc.)

Elasticsearch is great for big projects. But it requires beefy servers, and can be complex to manage. And Elasticsearch is no longer available under an OSI approved license., so a fork has emerged: OpenSearch. We can speculate that they will try to stay close, to make it easy to support both (like MariaDB vs MySQL). However, sooner or later, we can also expect them to diverge.


This is a good time to re-explore our abstraction layer, and add support for Manticore.

Benefits of Manticore

Planned enhancements

  • Secondary indexes ➡️ higher performance.
  • Docstore for columnar attributes ➡️ higher performance.
  • Read-only listeners ➡️ better security.
  • Bulk insert/replace via HTTP JSON ➡️ higher performance.
  • Keepalive support in HTTP for multi-queries ➡️ ease of use.
  • Further columnar storage performance optimizations.
  • Making full-text optional. Manticore is not only about full-text, but still requires at least one full-text field in each index. It's time to change it.

Drawbacks

  • This is a major project. It will take away resources from other projects.

Risks


Questions

  • Is it possible to develop this, without any risk for current setup? This being said, we don't want to maintain too many things in parallel in the long term.
  • Do we need extra conversion or publishing a guide for parameters like:
    • ft_min_word_len (or innodb_ft_min_token_size)
    • ft_stopword_file (or INNODB_FT_DEFAULT_STOPWORD)
    • ...

About revamping Unified Index

  • In the course of this project
    • Make it clear which engine supports what

Some notes from Victor (who lead the analysis)

Did quite a review of their docs, sources and manual/courses. Forked from Sphinx and based on C++ sounds like a win on the performance part.
SQL search and indexing language is a huge win over elastic. Integration will be much easier. They even have SphinxSE kind of extensions to store the index inside existing mysql db but that's more of an edge case to explore.
I see they support sub-queries which kind of work like sql joins - e.g. ability to get all tasks with a timesheet not in category XYZ - this is a classic example of a join we currently don't support but manticore seems to support which is very nice.
In terms of existing ES functionality, we seem mostly covered. Full text, range queries, boolean operators, geo spatial search, faceted search, ranking, sorting, filtering, everything seems there. There is also NLP module for lemming, stemming and other pre-processing which might allow newer usages.
Our current mysql index stores everything in one big table and has a huge performance overhead because of this. ES stores also all mappings in the index at once. Manticore, on the contrary, allows us to define a schema - different "tables" for different types of documents which might open even more use-cases to support. We can potentially store per-tracker index tables which makes searching even faster, support sub-queries and joins to retrieve data from different document types more easily, etc. I'd be exited to integrate that into Tiki if we have the chance to do that...

1. We can use their PHP lib but they also support SQL query interface which we already support with our Mysql search index, so it will be a little effort to bypass their PHP lib.
2. Federated search could be implemented relatively easily with Manticore search. It uses a mysql-like table per index and we can just search in multiple tables to combine results. Furthermore, we can even search in different manticore servers and combine results. I think we are good to go here.
3. I don't think we need to have separate manticore processes running on the same machine. We can do logical segmentation and use index prefixes for different clients. We can also do the other way around - https://manual.manticoresearch.com/Creating_a_cluster/Creating_a_cluster - use distributed index and load balancing to improve performance on high-traffic sites.


Keywords

The following is a list of keywords that should serve as hubs for navigation within the Tiki development and should correspond to documentation keywords.

Each feature in Tiki has a wiki page which regroups all the bugs, requests for enhancements, etc. It is somewhat a form of wiki-based project management. You can also express your interest in a feature by adding it to your profile. You can also try out the Dynamic filter.

Accessibility (WAI & 508)
Accounting
Administration
Ajax
Articles & Submissions
Backlinks
Banner
Batch
BigBlueButton audio/video/chat/screensharing
Blog
Bookmark
Browser Compatibility
Calendar
Category
Chat
Comment
Communication Center
Consistency
Contacts Address book
Contact us
Content template
Contribution
Cookie
Copyright
Credits
Custom Home (and Group Home Page)
Database MySQL - MyISAM
Database MySQL - InnoDB
Date and Time
Debugger Console
Diagram
Directory (of hyperlinks)
Documentation link from Tiki to doc.tiki.org (Help System)
Docs
DogFood
Draw -superseded by Diagram
Dynamic Content
Preferences
Dynamic Variable
External Authentication
FAQ
Featured links
Feeds (RSS)
File Gallery
Forum
Friendship Network (Community)
Gantt
Group
Groupmail
Help
History
Hotword
HTML Page
i18n (Multilingual, l10n, Babelfish)
Image Gallery
Import-Export
Install
Integrator
Interoperability
Inter-User Messages
InterTiki
jQuery
Kaltura video management
Karma
Live Support
Logs (system & action)
Lost edit protection
Mail-in
Map
Menu
Meta Tag
Missing features
Visual Mapping
Mobile
Mods
Modules
MultiTiki
MyTiki
Newsletter
Notepad
OS independence (Non-Linux, Windows/IIS, Mac, BSD)
Organic Groups (Self-managed Teams)
Packages
Payment
PDF
Performance Speed / Load / Compression / Cache
Permission
Poll
Profiles
Quiz
Rating
Realname
Report
Revision Approval
Scheduler
Score
Search engine optimization (SEO)
Search
Security
Semantic links
Share
Shopping Cart
Shoutbox
Site Identity
Slideshow
Smarty Template
Social Networking
Spam protection (Anti-bot CATPCHA)
Spellcheck
Spreadsheet
Staging and Approval
Stats
Survey
Syntax Highlighter (Codemirror)
Tablesorter
Tags
Task
Tell a Friend
Terms and Conditions
Theme
TikiTests
Timesheet
Token Access
Toolbar (Quicktags)
Tours
Trackers
TRIM
User Administration
User Files
User Menu
Watch
Webmail and Groupmail
WebServices
Wiki History, page rename, etc
Wiki plugins extends basic syntax
Wiki syntax text area, parser, etc
Wiki structure (book and table of content)
Workspace and perspectives
WYSIWTSN
WYSIWYCA
WYSIWYG
XMLRPC
XMPP




Useful Tools