Discussion around the Zabbix configuration

This is the page where we hold the debate on the specific final configuration of the distributed monitor based on Zabbix. Feel free to edit pros and cons, add comments, etc, but please be sure to tag your input with your name and the date.

The ground rules

  1. the whole point of the monitor is to know what's going on at various levels at each site, in a way that is site-independant - meaning that every site has to be configured in exactly the same way.
  2. Input is appreciated if it will contribute to
    • maintainability
    • stability
    • functionality of the monitor. Else you are just making a noise or being polemic wink
  3. Keep an eye to the future and always imagine the point of view of a new site administrator who will have to take over from you one day knowing nothing about the previous discussion.
  4. Be sure to provide evidence in support of your position or comment
  5. Bear in mind that any change is to be borne by everyone
  6. From time to time, votes will be held, but in the interest of order, the coordinator (BruceBecker) will arbitrate.

Which operating system to use ?

An initial debate sprung up around which operating system to base the monitor on. The orginal choice posed (around June) was between Ubuntu and CentoOS. TimothyCarr had experience with setting up Zabbix on Ubuntu, while StavrosLambropoulos volunteered to implement a reference installation and configuration on CentOS. A vote was taken to decide whether to use Ubuntu or CentOS, and CentOS won over Ubuntu. at which point HannesKriel suggested that we use the OS which is standard for all our grid services - it should be noted however that the choice is anyway between two very compatible RPM-based distros. Extracts of some of the discussion :

(from Sean) I(iThemba) vote with you Hannes for SL.M

(from Sergio) I also think that Hannes' arguments make a lot of sense... but I'm just wondering whether the differences between SL and Centos are actually relevant. I may be more important to decide which kernel release to choose, and make sure everybody updates at the same time, and use the same zabbix binaries. The SL releases tend to come a little bit later than CentOS if I'm not mistaken.

Coordinator's word

The balance between standardisation and reliability is most important. For now, it seems that the CentOS repositories are more reliable, and very compatible with SL. I don't see the strong motivation to change from CentOS, however if a SL implementation can be proven to be similar to the CentOS installation, I would consider a change in the standard OS.

How to maintain configurations across upates :

Some discussion was had as to how to deal with package updates. SeaMurray? suggested going with an RPM-based approach,possibly invoking a custom script pre- or post-installation. However, this was coutered by SergioBallestrero :

(From SeanMurray)

> I have an SL rpm for zabbix 1.6.5 which installed fine its a modified > version of the zabbix rpm on epel. So that does the agent and server > hence agents on nodes is not a problem. > > I would propose someone (?) to extend it to include the config > settings inside the rpm or via a shell script (I vote former) to preconfig > the entire software.

(From SergioBallestrero) hmm.. I'm not convinced. RPMs tend not to work well for configs, especially with automatic updates. If you just touch the file (say, add an innocuous space while viewing in an editor), the new one remains as .rpmnew, and you may not notice that you need to manually intervene. If you don't mark it as config, then it just gets overwritten - and you may lose actually important changes.

How uniform can we actually make the sites, talking from iThemba point of view, I have 2 3com switches, 1 cisco router, and 1 ntu and a pfsense firewall between my grid services and the rest of the world, are we going to monitor all of them ? together with the standard services of grid, if so, all sites will vary greatly. If not then we should be able to write something to parse the siteinfo.def file and pick out the relevant machines to put into zabbix? Or am i talking absolute junk ? -- Main.SeanMurray - 07 Aug 2009

Cordinator's word

BruceBecker thinks that the configuration should not be distributed as an RPM - there is too much risk of losing configuration across updates. The best case scenario is an virtual appliance pre-configured in a single instance and distributed atomically to all sites simultaneously.

-- SeanMurray - 7 Aug 2009 But this ultimately was why glite chose to standardise on an os ? if one distributes a script (in or outside an rpm) that pulls everything from either the site-info.def or from ldap queries to the site bdii one could automate it with out requiring to much standardisation in names and physical setup ? or am I suffering the effects of a late night working ?

-- BruceBecker - 08 Aug 2009 The software is distributed as an RPM for gLite, and the depenedencies are managed in a metadata repository - first apt, then yum. However, the configuration is only done through YAIM. So, if we want to be consistent, we should provide the yaim scripts and targets to configure zabbix. However, this leaves us open to other problems. I'm still convinced that the monitor should be distributed and configured in a way entirely independant of the middleware which it is supposed to monitor (amongst other things).

Autodiscovery functionality

SeanMurray condisders that the functionality of Zabbix's autodiscovery is inferior to that of Zenoss. Nothing more to add on this point so far. -- BruceBecker - 07 Aug 2009

-- SeanMurray - 07 Aug 2009 This is effectively a mute point as the discovery capabilities are irrelevant to our problem. It will be manually preconfigured ?

Topic revision: r3 - 08 Aug 2009 - 07:50:17 - BruceBecker
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback