Loading...
 

Monitoring revamp

Context

The Tiki community manages several Tiki-powered sitesAnd a few non-Tiki sites, but this is not that important for this page.. Ref:

All these sites are on various servers, and managed by various people. When one site goes down or has poor performance, it is usually not easy to find the root cause and solve it. The main problem is that we are somewhat blind

We do have a Zabbix server, but as of 2021-07-19, it reports a lot of noise. We know we need to move to better servers. And for some messages: what are supposed to do? See below:

Friday, 16 July 2021
(04:21) bot.zabbix at diablo.montefuscolo.com.br: PROBLEM
---
Trigger: HTTP service is down on server.promo.suite.wiki
Trigger status: PROBLEM
Trigger severity: High
Trigger URL:
Item values:
1. HTTP service is running (server.promo.suite.wiki:net.tcp.servicehttp): Down (0)
Original event ID: 109333
(04:25) bot.zabbix at diablo.montefuscolo.com.br: OK
---
Trigger: HTTP service is down on server.promo.suite.wiki
Trigger status: OK
Trigger severity: High
Trigger URL:
Item values:
1. HTTP service is running (server.promo.suite.wiki:net.tcp.servicehttp): Up (1)
Original event ID: 109333
(04:25) bot.zabbix at diablo.montefuscolo.com.br: OK
---
Trigger: HTTPS service is down on server.promo.suite.wiki
Trigger status: OK
Trigger severity: High
Trigger URL:
Item values:
1. HTTPS service is running (server.promo.suite.wiki:net.tcp.servicehttps): Up (1)
Original event ID: 109334
(16:27) bot.zabbix at diablo.montefuscolo.com.br: PROBLEM
---
Trigger: Processor load is too high on server.promo.suite.wiki
Trigger status: PROBLEM
Trigger severity: Warning
Trigger URL:
Item values:
1. Processor load (1 min average per core) (server.promo.suite.wiki:system.cpu.loadpercpu,avg1): 4.48
Original event ID: 109349
(16:31) bot.zabbix at diablo.montefuscolo.com.br: OK
---
Trigger: Processor load is too high on server.promo.suite.wiki
Trigger status: OK
Trigger severity: Warning
Trigger URL:
Item values:
1. Processor load (1 min average per core) (server.promo.suite.wiki:system.cpu.loadpercpu,avg1): 4.27
Original event ID: 109349
Saturday, 17 July 2021
(02:08) bot.zabbix at diablo.montefuscolo.com.br: PROBLEM
---
Trigger: Free disk space is less than 20% on volume /
Trigger status: PROBLEM
Trigger severity: Warning
Trigger URL:
Item values:
1. Free disk space on / (percentage) (server.promo.suite.wiki:vfs.fs.size/,pfree): 20 %
Original event ID: 109355
(02:11) bot.zabbix at diablo.montefuscolo.com.br: OK
---
Trigger: Free disk space is less than 20% on volume /
Trigger status: OK
Trigger severity: Warning
Trigger URL:
Item values:
1. Free disk space on / (percentage) (server.promo.suite.wiki:vfs.fs.size/,pfree): 20 %
Original event ID: 109355
(04:22) bot.zabbix at diablo.montefuscolo.com.br: PROBLEM
---
Trigger: HTTP service is down on server.promo.suite.wiki
Trigger status: PROBLEM
Trigger severity: High
Trigger URL:
Item values:
1. HTTP service is running (server.promo.suite.wiki:net.tcp.servicehttp): Down (0)
Original event ID: 109358
(04:22) bot.zabbix at diablo.montefuscolo.com.br: PROBLEM
---
Trigger: HTTPS service is down on server.promo.suite.wiki
Trigger status: PROBLEM
Trigger severity: High
Trigger URL:
Item values:
1. HTTPS service is running (server.promo.suite.wiki:net.tcp.servicehttps): Down (0)
Original event ID: 109359
(04:26) bot.zabbix at diablo.montefuscolo.com.br: OK
---
Trigger: HTTPS service is down on server.promo.suite.wiki
Trigger status: OK
Trigger severity: High
Trigger URL:
Item values:
1. HTTPS service is running (server.promo.suite.wiki:net.tcp.servicehttps): Up (1)
Original event ID: 109359
(21:44) bot.zabbix at diablo.montefuscolo.com.br: PROBLEM
---
Trigger: Too many processes on tiki.suite.wiki
Trigger status: PROBLEM
Trigger severity: Warning
Trigger URL:
Item values:
1. Number of processes (tiki.suite.wiki:proc.num[]): 389
Original event ID: 109373
(21:44) bot.zabbix at diablo.montefuscolo.com.br: OK
---
Trigger: Too many processes on tiki.suite.wiki
Trigger status: OK
Trigger severity: Warning
Trigger URL:
Item values:
1. Number of processes (tiki.suite.wiki:proc.num[]): 389
Original event ID: 109373
Monday, 19 July 2021
(00:18) bot.zabbix at diablo.montefuscolo.com.br: PROBLEM
---
Trigger: HTTP service is down on tiki.suite.wiki
Trigger status: PROBLEM
Trigger severity: High
Trigger URL:
Item values:
1. HTTP service is running (tiki.suite.wiki:net.tcp.servicehttp): Down (0)
Original event ID: 109398
(00:18) bot.zabbix at diablo.montefuscolo.com.br: PROBLEM
---
Trigger: HTTPS service is down on tiki.suite.wiki
Trigger status: PROBLEM
Trigger severity: High
Trigger URL:
Item values:
1. HTTPS service is running (tiki.suite.wiki:net.tcp.servicehttps): Down (0)
Original event ID: 109399
(04:22) bot.zabbix at diablo.montefuscolo.com.br: PROBLEM
---
Trigger: HTTPS service is down on server.promo.suite.wiki
Trigger status: PROBLEM
Trigger severity: High
Trigger URL:
Item values:
1. HTTPS service is running (server.promo.suite.wiki:net.tcp.servicehttps): Down (0)
Original event ID: 109401
(04:22) bot.zabbix at diablo.montefuscolo.com.br: PROBLEM
---
Trigger: HTTP service is down on server.promo.suite.wiki
Trigger status: PROBLEM
Trigger severity: High
Trigger URL:
Item values:
1. HTTP service is running (server.promo.suite.wiki:net.tcp.servicehttp): Down (0)
Original event ID: 109400
(04:22) bot.zabbix at diablo.montefuscolo.com.br: OK
---
Trigger: HTTPS service is down on server.promo.suite.wiki
Trigger status: OK
Trigger severity: High
Trigger URL:
Item values:
1. HTTPS service is running (server.promo.suite.wiki:net.tcp.servicehttps): Up (1)
Original event ID: 109401
(04:23) bot.zabbix at diablo.montefuscolo.com.br: OK
---
Trigger: HTTP service is down on server.promo.suite.wiki
Trigger status: OK
Trigger severity: High
Trigger URL:
Item values:
1. HTTP service is running (server.promo.suite.wiki:net.tcp.servicehttp): Up (1)
Original event ID: 109400
(05:34) bot.zabbix at diablo.montefuscolo.com.br: OK
---
Trigger: HTTPS service is down on tiki.suite.wiki
Trigger status: OK
Trigger severity: High
Trigger URL:
Item values:
1. HTTPS service is running (tiki.suite.wiki:net.tcp.servicehttps): Up (1)
Original event ID: 109399
(05:37) bot.zabbix at diablo.montefuscolo.com.br: PROBLEM
---
Trigger: Too many processes running on tiki.suite.wiki
Trigger status: PROBLEM
Trigger severity: Warning
Trigger URL:
Item values:
1. Number of running processes (tiki.suite.wiki:proc.num,,run): 51
Original event ID: 109406
(05:39) bot.zabbix at diablo.montefuscolo.com.br: PROBLEM
---
Trigger: Processor load is too high on tiki.suite.wiki
Trigger status: PROBLEM
Trigger severity: Warning
Trigger URL:
Item values:
1. Processor load (1 min average per core) (tiki.suite.wiki:system.cpu.loadpercpu,avg1): 9.3
Original event ID: 109407
(05:45) bot.zabbix at diablo.montefuscolo.com.br: OK
---
Trigger: Processor load is too high on tiki.suite.wiki
Trigger status: OK
Trigger severity: Warning
Trigger URL:
Item values:
1. Processor load (1 min average per core) (tiki.suite.wiki:system.cpu.loadpercpu,avg1): 1.506667
Original event ID: 109407
(06:42) bot.zabbix at diablo.montefuscolo.com.br: PROBLEM
---
Trigger: Host information was changed on tiki.suite.wiki
Trigger status: PROBLEM
Trigger severity: Information
Trigger URL:
Item values:
1. System information (tiki.suite.wiki:system.uname): Linux tiki.suite.wiki 5.12.2-x86_64-linode144 #1 SMP Mon May 10 13:10:23 EDT 2021 x86_64
Original event ID: 109414
(07:41) bot.zabbix at diablo.montefuscolo.com.br: OK
---
Trigger: Host information was changed on tiki.suite.wiki
Trigger status: OK
Trigger severity: Information
Trigger URL:
Item values:
1. System information (tiki.suite.wiki:system.uname): Linux tiki.suite.wiki 5.12.2-x86_64-linode144 #1 SMP Mon May 10 13:10:23 EDT 2021 x86_64
Original event ID: 109414
(14:49) bot.zabbix at diablo.montefuscolo.com.br: PROBLEM
---
Trigger: Free disk space is less than 20% on volume /
Trigger status: PROBLEM
Trigger severity: Warning
Trigger URL:
Item values:
1. Free disk space on / (percentage) (server.promo.suite.wiki:vfs.fs.size/,pfree): 20 %
Original event ID: 109416

.


Is it a

  • A server issue?
  • A network issue?
  • A Tiki bug?
  • A Tiki misconfiguration?
  • A spike in traffic?
  • A PHP bug? We struggled for over a year for random server crashes, and in the end, it was this: https://bugs.php.net/bug.php?id=71135


Different issues require different types of skill set to resolve.

We need better tools and processes. These will help:

  1. GlitchTip
    • Let's add a trigger on slow pages
  2. The upcoming Virtualmin + Debian 10 infrastructure : https://gitlab.com/wikisuite/virtualmin-installer
  3. Real User Measurement
  4. A monitoring solution that informs the right people in real time. We have selected Zabbix and NetData and we need a better integration with Tiki, keeping it open-ended to integrate with any monitoring system.

Steps

1.1.1. Set up Zabbix (Fabio) and NetData (Horia)

  • Sending alerts to XMPP room: xmpp:monitoring@conference.wikisuite.chat
  • With generic server monitoring (disk space, CPU, etc.)
  • Is OPcache covered? So we avoid this message in Tiki: "Little memory available. Thrashing likely to occur. The values to increase are apc.shm_size (for APC), xcache.size (for XCache) or opcache.memory_consumption (for OPcache)." If Zabbix doesn't do well, we should add a Tiki-specific alert
  • Use NetData in Virtualmin


1.1.2. Improve Tiki code to provide Tiki-specific alert to Zabbix and NetData (help needed)




Here is some code, which needs a review and a revamp:


1.1.3. Improve Tiki manager code to provide specific alerts to Zabbix and NetData (help needed)


1.1.4. Real User Measurement

  • Real User Measurement
  • tiki-performance-stats.php will provide a list of slowest pages, on which we can focus our energy.

Keywords

The following is a list of keywords that should serve as hubs for navigation within the Tiki development and should correspond to documentation keywords.

Each feature in Tiki has a wiki page which regroups all the bugs, requests for enhancements, etc. It is somewhat a form of wiki-based project management. You can also express your interest in a feature by adding it to your profile. You can also try out the Dynamic filter.

Accessibility (WAI & 508)
Accounting
Administration
Ajax
Articles & Submissions
Backlinks
Banner
Batch
BigBlueButton audio/video/chat/screensharing
Blog
Bookmark
Browser Compatibility
Calendar
Category
Chat
Comment
Communication Center
Consistency
Contacts Address book
Contact us
Content template
Contribution
Cookie
Copyright
Credits
Custom Home (and Group Home Page)
Database MySQL - MyISAM
Database MySQL - InnoDB
Date and Time
Debugger Console
Diagram
Directory (of hyperlinks)
Documentation link from Tiki to doc.tiki.org (Help System)
Docs
DogFood
Draw -superseded by Diagram
Dynamic Content
Preferences
Dynamic Variable
External Authentication
FAQ
Featured links
Feeds (RSS)
File Gallery
Forum
Friendship Network (Community)
Gantt
Group
Groupmail
Help
History
Hotword
HTML Page
i18n (Multilingual, l10n, Babelfish)
Image Gallery
Import-Export
Install
Integrator
Interoperability
Inter-User Messages
InterTiki
jQuery
Kaltura video management
Kanban
Karma
Live Support
Logs (system & action)
Lost edit protection
Mail-in
Map
Menu
Meta Tag
Missing features
Visual Mapping
Mobile
Mods
Modules
MultiTiki
MyTiki
Newsletter
Notepad
OS independence (Non-Linux, Windows/IIS, Mac, BSD)
Organic Groups (Self-managed Teams)
Packages
Payment
PDF
Performance Speed / Load / Compression / Cache
Permission
Poll
Profiles
Quiz
Rating
Realname
Report
Revision Approval
Scheduler
Score
Search engine optimization (SEO)
Search
Security
Semantic links
Share
Shopping Cart
Shoutbox
Site Identity
Slideshow
Smarty Template
Social Networking
Spam protection (Anti-bot CATPCHA)
Spellcheck
Spreadsheet
Staging and Approval
Stats
Survey
Syntax Highlighter (Codemirror)
Tablesorter
Tags
Task
Tell a Friend
Terms and Conditions
Theme
TikiTests
Federated Timesheets
Token Access
Toolbar (Quicktags)
Tours
Trackers
TRIM
User Administration
User Files
User Menu
Watch
Webmail and Groupmail
WebServices
Wiki History, page rename, etc
Wiki plugins extends basic syntax
Wiki syntax text area, parser, etc
Wiki structure (book and table of content)
Workspace and perspectives
WYSIWTSN
WYSIWYCA
WYSIWYG
XMLRPC
XMPP




Useful Tools