Standard Operating Procedures for SAGrid.
Coordination
Announcements and Communication
Communication is fundamental to the proper functioning of the grid services and to the user experience. To properly channel information to the right place at the right time, please follow the communications SOP below.
Site intervention and downtime announcements
Downtimes or interventions on any service on your site have to be announced with the relevant notice period.
| Incident |
Channel ROC - AfricaGrid? Regional Operations Centre - https://roc.africa-grid.org
1. AfricaGrid? GOCDB - regional Operations Centre Database for AfricaGrid? ">(1) |
Minimum Notice period |
| service failure |
AfricaROC ? |
immediately |
| site service restart |
SAGrid Ops mailing list |
30 minutes |
| planned site downtime |
AfricaGrid ROC ? |
1-2 days |
| service upgrade |
SAGrid Ops mailing list |
1 week |
| new VO SLA |
GGUS,AfricaGrid ROC |
1 month |
Site Admin availability or replacement
Meeting scheduling
Procedure to schedule site operations meetings.
Issue reporting, routing and management
procedure to report issues, route and assign them to the responsible support unit, and
Checklists
Site Operator On Shift Checklist
Core Services Operator On Shift Checklist
- StartOfCoreServicesShiftChecklist?
- Service Checklists
- WMSChecklist
- BDIIChecklist
- LFCChecklist
- AMGAChecklist
- EndOfShiftChecklist?
User Support On Shift Checklist
Application Support On Shift Checklist
Site Deployment and integration
Site deployment is usually done only once, and should be done, as far as possible, according to the best practices of the federation. The outline for deployment should be done according to
SiteDeploymentSOP?
Follow
EMI middleware deployment guide ?
Site service testing
Site BDII
Compute element
Storage element
Site integration
- add line to bdii.conf
- check to see whether site appears in top-bdii
- add site to gocdb
- add services to site entry in gocdb
Site Upgrade
SAGrid Operations Meetings
As a member of the SAGrid Operations team, you are required to participate to weekly meetings, and provide your input on the issues at your site. There
Reference Documentation and Manuals
How to use these references
It's a big, complicated world and things can seem confusing sometimes. The documentation linked below is under permanent construction as usage and technology of the grid changes. It is even not uncommon to find blatant conflicts between what one set of documentation or reference suggests and what another suggests. Here are some guiding principles to help you through the confusion.
Federation and Interoperability : Site Priorities
The shared and collaborative nature of the grid means that your site and the services that it providesa re often used by many different groups of people, which result in some conflicting requests. You should take decisions and apply procedures based on the priorities of your site. These are generally hierarchical and
- Your site should be configured and maintained in as close as possible a way to the other sites in SAGrid.
Grid Infrastructure Reference Documentation
- EGEE Operational Procedures for ROCs and Sites ?
- EGEE Operational Procedures for Regional Operator on Duty ?
- EGI Operations Manuals ?
- EGI Operations Best Practices ?
- EUMedGrid Site Deployment Guide ?
Middlware and Service Reference Documentation
--
BruceBecker - 01 Aug 2011
Notes
:
Topic revision: r7 - 03 Oct 2011 - 08:42:10 -
BruceBecker