Loading...
 

URL Rewriting Revamp

2013-01-28 A PHP route.php file now replaces the rewrite rules from _htaccess http://sourceforge.net/p/tikiwiki/code/44661/ for Tiki11

Very important: adding to tiki log or error log so we catch them all through usage


At the TikiFest, we decided to Update all links to have friendly URLs

Lets attempt to solve URL rewriting problem in Tiki once and for all. Here are goals:

  1. Server-agnostic enabling Tiki users to deploy any web server.
  2. Reduce maintenance server specific config files .htaccess vs web.config, etc
  3. Have great SEO
  4. Nice to read for humans
  5. To work with Canonical URLs
    • For existing ones (wiki, blog, articles) and future ones (forums, trackers, etc.)
  6. New feature: if something changes to a wiki page, article, blog post, etc. and this breaks the URL, make it an option (default on) to have a redirect 301
    • Articles & blogs: nothing needs to be done because it's well handled by canonicals
    • Wiki pages: could offer to append a page alias or something else?
  7. Be able to deal with a URL being chopped off accidentally (in an email message line breakage) or intentionally to shorten URL
    • Current SEFURL for articles and blogs make this possible
  8. Seemless upgrade path
    • All old URLs should have 301 redirects to the new ones
    • All currently accepted characters should continue to be. Ex.: a-z . _
  9. When a URL is broken (a site moves from another tech to Tiki), all info should be thrown to the search engine to try to find new URL for the old page
    • So this would bypass Apache error pages?
  10. Work for non-Latin character set
    • Test with Korean pages on doc.tiki.org
    • Tiki blog post with title "ä, ë, ï, ö, ü, ÿ Ä, Ë, Ï, Ö, Ü, Ÿ å, Å æ, Æ œ, Œ ç, Ç ð, Ð ø, Ø ¿ ¡ ß" becomes blogpost3-a-e-i-o-u-A-E-I-O-U-a-A-ae-AE-c-C-D-o-O-s
    • luci reported that in Czech language, "Most" and "mošt" are two very different things (a bridge and a cider). In the blog post, this is not a problem because there is a blogpost ID. But if you were making a glossary in Czech language, there would be a collision. This is an edge case. Perhaps Tiki could detect this and offer a disambiguation page?
      • I just checked in trunk on demo and there is no way to create "Most" and "mošt" as different pages. So a manual disambiguation page is already necessary.
  11. Pollution of URLs by sending relative links to inexisting subdirectories
  12. Be consistent thoughout Tiki features
  13. non-ambiguous correspondance back and forth Jyhem: can you explain?
  14. all the rewriting could be done in index.php and use the database, so everyone could customise their sefurls to suit the site
  15. Performance
  16. Make it possible to have a new optional feature where URL is different than wiki page name, blog post title, etc. Ex.: blog title: "Best Practices for Optimizing URL Structure" : url: best-practice-url-structure
    • And this is used for the canonical
    • Someone could use this for the old URL before a migration to Tiki
  17. Should there be any link with Sitemap?
  18. Should be able to handle when
    • Tiki is not in the root directory
    • Other apps than Tiki are installed in the same root as Tiki or in a subdirectory
  19. Ideally, we'd like to make it possible to put everything that needs writable access in one directory (temp, templates_c, etc)
  20. Ideally, we'd like that site customizations could be all done in one directory so you can easily see what is modified, without SVN. All .php/.tpl/.css/.png files there should "magically" overrride Tiki defaults. Historically, for a Tiki theme, you need to put some files in templates/styles/abc/* and styles/abc/* and styles/abc.css
    • custom/tiki-calendar.php
    • custom/templates/tiki-calendar.tpl
    • custom/templates/styles/abc/tiki-calendar.tpl
    • etc.
  21. Future-proof

Issues with status quo

  • While researching about the possibility to use other than Apached webservers for deploying Tiki, we found the following post post explaining that:
    “...according to RFC 3986 using ‘+’ outside of the scheme or query string in a URI as the Tiki shortlinks do is invalid...and Tiki is clearly at fault for generating non-compliant URIs...”

Steve Streeting wrote:

My involvement was just trying to get it working as best I could with the minimum of changes, and my hack was pretty nasty:


tiki-index.php.old 2012-05-06 17:26:12.000000000 +0100 +++ tiki-index.php 2012-05-20 12:06:13.000000000 +0100

@@ -98,6 +98,11 @@

$info = null;

+// SJS Hack for Nginx compatibility +// Nginx encodes '+' characters when going through rewrite, so replace them +$_REQUEST%22page%22 = strtr($_REQUEST%22page%22, "+", " "); +// End SJS Hack +

$structs_with_perm = array();
if( $prefs'feature_wiki_structure' == 'y' ) {
$structure = 'n';

So basically, this undoes Nginx's (strictly correct) encoding of the '+' characters and puts them back as spaces, which then works when Tiki looks up the page. However, if any page name actually includes a REAL '+' character (say if a page was called 'C++'), then those pages would fail to be resolved, because we'd turn them into spaces - but we can't tell the difference by the time we receive them. There aren't that many pages with actual '+' characters in the name though, so this hack fixed more things than it broke. Good enough for government work ;)

A permanent fix which is compliant with the stricter rules Nginx is adhering to would be awesome though.

Cheers
Steve

Who

  • Gour
  • Marc Laporte
  • You?


What


The present Tiki's problem with URL rewriting for non-Apache server is nicely explained by Jean-Marc-Libs in this post:

“More precisely, no RFC ever made + a *replacement* for space. + is actually an alternate, visually friendly *url-encoding* for space.

The classic, visually unfriendly one being %20.

So, if the Tiki page is called "Tiki page", the URL should be sent as tiki-index.php?page=Tiki+page or tiki-index.php?page=Tiki%20page (equivalent ref RFC). And that is what the web server should get, post rewriting rules.
If the page is called "Tiki+page", then the RFC says it should be tiki-index.php?page=Tiki%2Bpage

Since this is a very common misconception among developpers, apache does allow for replacing space with +.

For similar reasons, firefox will detect %20 or + in the url and display space instead. You can still see it's really %20 if you cut-paste the urls field of the browser to an editor.

Since the Tiki confuses both spaces and +, Tiki does not allow + in page names. That's not a very good way of dealing with the issue.

Hope this makes things clearer.”


The possible solution which would be server-agnostic and therefore much easier to maintain was suggested by the developer of Hiawatha server who wrote (see this thread):

“But, I seriously think that there is a better way to solve your issue. Create a new PHP file, say route.php and rewrite every request for a non-existing file to that route.php. In route.php, you determine what other PHP file should have been called. At the end of route.php, simply include that PHP file. In other words, transfer the routing business logic to where it should be: inside your application and away from the webserver's configuration.”


Some participants in the tiki-devel thread expressed concern about backward-compatibility for the present TIki sites and Hugo Leisink - main developer of Hiawatha webserver addressed it with:

“That should also not be a concern, because I'm also very sure that it's perfectly possible to create a PHP script that has the same functionality as the URL rewrite rules.

The only thing current tiki-users have to do is to place the new route.php on their server and change the current URL rewriting rules with the rewrite-requests-for-non-existing-files-to-route.php one.”

Related things to think about


When

Nice to have

Handling 404

Now that we have route.php, we could easily support

Ideal rewrite rules

https://www.google.com/search?q=best+practices+for+SEO+in+URLs

  • So hypen is reportedly a better separator (Matt Cutts on hyphens).
  • Should we keep case or put to all lowercase?

Examples

Wiki

In trunk, we have a feature to prevent certain characters in page names. If you try, you get:
Copy to clipboard
The page name specified contains unallowed characters. It will not be possible to save the page until those are removed: /?#[]@$&+;=<>


We would want all variants

Blog


Perhaps example.com/test+page could become a search for those two terms and lead to example.com/test-page or search results?

Related links


alias

Keywords

The following is a list of keywords that should serve as hubs for navigation within the Tiki development and should correspond to documentation keywords.

Each feature in Tiki has a wiki page which regroups all the bugs, requests for enhancements, etc. It is somewhat a form of wiki-based project management. You can also express your interest in a feature by adding it to your profile. You can also try out the Dynamic filter.

Accessibility (WAI & 508)
Accounting
Administration
Ajax
Articles & Submissions
Backlinks
Banner
Batch
BigBlueButton audio/video/chat/screensharing
Blog
Bookmark
Browser Compatibility
Calendar
Category
Chat
Comment
Communication Center
Consistency
Contacts Address book
Contact us
Content template
Contribution
Cookie
Copyright
Credits
Custom Home (and Group Home Page)
Database MySQL - MyISAM
Database MySQL - InnoDB
Date and Time
Debugger Console
Diagram
Directory (of hyperlinks)
Documentation link from Tiki to doc.tiki.org (Help System)
Docs
DogFood
Draw -superseded by Diagram
Dynamic Content
Preferences
Dynamic Variable
External Authentication
FAQ
Featured links
Feeds (RSS)
File Gallery
Forum
Friendship Network (Community)
Gantt
Group
Groupmail
Help
History
Hotword
HTML Page
i18n (Multilingual, l10n, Babelfish)
Image Gallery
Import-Export
Install
Integrator
Interoperability
Inter-User Messages
InterTiki
jQuery
Kaltura video management
Kanban
Karma
Live Support
Logs (system & action)
Lost edit protection
Mail-in
Map
Menu
Meta Tag
Missing features
Visual Mapping
Mobile
Mods
Modules
MultiTiki
MyTiki
Newsletter
Notepad
OS independence (Non-Linux, Windows/IIS, Mac, BSD)
Organic Groups (Self-managed Teams)
Packages
Payment
PDF
Performance Speed / Load / Compression / Cache
Permission
Poll
Profiles
Quiz
Rating
Realname
Report
Revision Approval
Scheduler
Score
Search engine optimization (SEO)
Search
Security
Semantic links
Share
Shopping Cart
Shoutbox
Site Identity
Slideshow
Smarty Template
Social Networking
Spam protection (Anti-bot CATPCHA)
Spellcheck
Spreadsheet
Staging and Approval
Stats
Survey
Syntax Highlighter (Codemirror)
Tablesorter
Tags
Task
Tell a Friend
Terms and Conditions
Theme
TikiTests
Federated Timesheets
Token Access
Toolbar (Quicktags)
Tours
Trackers
TRIM
User Administration
User Files
User Menu
Watch
Webmail and Groupmail
WebServices
Wiki History, page rename, etc
Wiki plugins extends basic syntax
Wiki syntax text area, parser, etc
Wiki structure (book and table of content)
Workspace and perspectives
WYSIWTSN
WYSIWYCA
WYSIWYG
XMLRPC
XMPP




Useful Tools