Although the RFC (for example, http://www.robotstxt.org/wc/norobots-rfc.html) doesn't explicitly require a leading slash (/) before the page name, I have found that, as of late Oct, 2005, many 'bots, including Googlebots, have started requiring them.
For example, before the change,
Disallow: tiki-pagehistory.php
would prevent well-behaved 'bots from trying to index tiki-pagehistory.php. However, after the change, I had to have:
Disallow: /tiki-pagehistory.php
in robots.txt, or else all my page history would be indexed! I verified this using my server log, and also by doing google searches against my site for phrases that only appeared in page history. I have every reason to believe this is a problem for all other TikiWiki-based sites.
Others have noticed this. There is discussion in the forums at:
Putting a leading slash before all page references in robots.txt solved the problem. See http://ihuck.com/robots.txt (a TikiWiki site), compare to e.g. http://dupli.tikiwiki.org/robots.txt
I have many years of web dev experience, and two plus years experience with PHP and TikiWiki, but almost no CVS experience. I'm happy to learn CVS and implement this solution, but I am hoping first for some feedback from the community re have I overlooked any reason not to make these changes. Thanks.
Assign this back to me and I'll start working on the changes (except for the 3rd change which a *.tw.o admin will need to do).
To help developers solve the bug, we kindly request that you demonstrate your bug on a show2.tiki.org instance. To start, simply select a version and click on "Create show2.tiki.org instance". Once the instance is ready (in a minute or two), as indicated in the status window below, you can then access that instance, login (the initial admin username/password is "admin") and configure the Tiki to demonstrate your bug. Priority will be given to bugs that have been demonstrated on show2.tiki.org.