02 august 20041 oramonplans 08/04. 02 august 20042 topics enhancements –oramon db redundancy layer...
TRANSCRIPT
02 August 2004 1
OraMonPlans 08/04
02 August 2004 2
Topics
Enhancements– OraMon DB redundancy layer– Compare and fix OraMon configurations– Expiry of historical data– Saving disk space
OraMonArchBugsOthers
OraMon OO development with TogetherOraMon changes for Maciej’s alarm interfacing system?
02 August 2004 3
OraMon DB redundancy layerRequirements:
1. OraMon should retry connect after loosing DB connectionCurrently (as for OraMon 0.0.3), upon DB connection failure, OraMon
issues a [FATAL] log and stops2. OraMon should support ‘Do(Not)InsertSamples’ command
Currently, OraMon inserts or does not insert samples, according to the value of environment variable MR_READONLY
3. OraMon should have a ‘HeartBeat’ commandCurrently, one may check if an OraMon instance is alive by issuing a MR
API query to it (via lemon-utils/lemon-cli.pl).
Attitudes to satisfy ‘retry connect’ and ‘Do(Not)InsertSamples’:– ‘External’: (do some variable setting and) start Oramon
Pros: Simple to implement, no internal changes to OraMon Con: Few minutes down time
– ‘Internal’: Change OraMon to satisfy requirement by adding specific codePros and cons are the opposite compared to ‘External’
02 August 2004 4
OraMon DB redundancy layerRequirements:
1. OraMon should retry connect after loosing DB connectionCurrently (as for OraMon 0.0.3), upon DB connection failure, OraMon
issues a [FATAL] log (+ failure kind) and stops2. OraMon should support ‘Do(Not)InsertSamples’ command
Currently, OraMon inserts or does not insert samples, according to the value of environment variable MR_READONLY
3. OraMon should have a ‘HeartBeat’ commandCurrently, one may check if an OraMon instance is alive by issuing a MR
API query to it (via lemon-utils/lemon-cli.pl).
Attitudes to satisfy ‘retry connect’ and ‘Do(Not)InsertSamples’:– ‘External’: (do some variable setting and) start Oramon
Pros: Simple to implement, no internal changes to OraMon Con: Few minutes down time
– ‘Internal’: Change OraMon to satisfy requirement by adding specific codePros and cons are the opposite compared to ‘External’
02 August 2004 5
OraMon DB redundancy layer‘External’ solutions:
1. Retry connect after loosing DB connectionA simple (restart-oramon like) service that issues:
/etc/rc.d/init.d/OraMon start
after OraMon stops, if ‘failure kind’ belongs to a TBD failure set.
2. ‘InsertSamples’ command to OraMonrestart OraMon after un/set MR_READONLY: • Do insert: unset MR_READONLY ; /etc/rc.d/init.d/OraMon restart• Do not insert: set MR_READONLY=yes ; /etc/rc.d/init.d/OraMon restart
3. OraMon ‘HeartBeat’Check sane response to a lemon-cli.pl queryShould not get: Failed to MRs_getSamples() : #-1 : Connection refusedExample: perl lemon-utils/lemon-cli.pl --metrics="10002" --nodes="lcgmon002d«
--remote-server="http://ccs002d:12510"
02 August 2004 6
OraMon DB redundancy layer‘Internal’ solutions:
1. Retry connect after loosing DB connectionChange OraMon code: when an SQL command fails, because of a
TBD failure set, do not fail, but rather try to connect again first (for a few times, sleeping between each try)
2. ‘InsertSamples’ command to OraMonReuse and extend existing proprietary ‘insert samples’ protocol: • Define ‘pseudo’ metricId (set) that OraMon interprets as
commands rather than as metrics to be inserted• Commands arrive from a specific port or from samples port. • Commands may be added to ‘metrics configuration’ (like)
configuration
3. OraMon ‘HeartBeat’: the same as previous
02 August 2004 7
Changing metrics configuration
Related OraMon documentation: Changing metrics configuration
German’s email 19/7 [Lemon] changes in metric data fields:
- changes (adding/removing/changing data fields) to latestOnly metrics: okDavid: - ok.
- When applying a new configuration, all (TBD changed) latest tables and views will be automatically dropped
- changes to latestOnly metrics which have a historical table defined, but not (anylonger) used (reconfigured from 'latestOnly=false' to true): drop historical table altogether.David: - ok.
- Also, drop tables of removed metrics?(- Also, is Archiving of tables to be dropped required?)
02 August 2004 8
Changing metrics configuration Cont.
- changes to 'historical' metrics (not latestOnly): - added data fields: OKDavid: TBD: ok iff adding fields does not complicate restoring of old
data that do not have new fields- removed and changed data fields: drop historical values in DB, or
refuse (global OraMon configuration Boolean parameter).David: I doubt that dropping historical data will satisfy potential
problems while restoring older data. Assuming this is correct, ‘refuse’ will always be applied.
- changes where historical values should be preserved: define a new metric ID. I don't think any conversion magic is appropriate, and for being consistent, it should be applied as well to all historical data already archived into CASTOR, which is far from trivial.David: As a rule of thumb: I suggest to avoid applying changes to
archived data
02 August 2004 9
Changing metrics configurationDavid’s suggestions
- Observation: The OraMon level of complexity to add a field is similar to that of applying other ‘compatible’ changes: remove field, change length
- In order to avoid clashes between existing OraMon data schemas and previously archived data, I suggest that:- Each change to a metricClass will have new metricIds- Previous metricIds will be marked ‘obsolete’, by new metadata field- Previous metricIds may have a ‘replaced by metricId’ metadata field
- In order to preserve older data and allow data schema changes, I suggest that when a ‘compatible’ change is applied to a metricClass, its existing historical table will be renamed to the new name, and automatic fixes will be applied by OraMon.
02 August 2004 10
Expiry of historical data
4162 expiry of historical data To be discussed at CERN 2004-Jul-19 12:14
jveldik
02 August 2004 11
Saving disk space
Compress partitions – Howto: OraMon partitions thread to compress
partitions that are at least one day old– TBD: May cause unexpected complications– Saving space is important, but not urgent
Make numbers (and strings) smaller– May be applied after applying all ‘Changing
metrics configuration’ items
02 August 2004 12
OraMonArch
OraMonArch documentation• If ‘archive and not drop’ is required,
implementation should be enhanced, since current implementation drops and returns data
• Two OraMonArch instances: continuous and non-continuous:Non continuous requests can not be queued
• OraMonArch transaction error when stop/crash after DDL command and before updating relevant checkpoint
02 August 2004 13
bug reportsItem ID Summary Submitted on Submitted b
y
4000 OraMon packaging issues, broken restart-oramon Minor: understand a minor rpm mistake: restart-oramon is installed by OraMon non config rpm
2004-Jul-05 07:33 gcancio
4001 LSB compliance for OraMon Minor 2004-Jul-05 07:40 gcancio
4002OraMon should continue running with old metadata if incompatibility is found Medium: See: Compare and fix OraMon configurations
2004-Jul-05 07:58 Gcancio
4004 Floating point exception error using OraMonAdmin Small: Fix a bug 2004-Jul-05 08:35 gcancio
4015
define/document policy for valid / invalid configuration changesSmall: OraMon should also check for valid characters and keywords for eg. metric field
descriptions. This should be part of the documentation as well.Add:OraMon and/or the script that creates metrics configuration may be enhanced to check
against using Oracle reserved words as identifiers. http://www-rohan.sdsu.edu/doc/oracle/server803/A54661_01/ares.htm
Make sure that OraMon will not fail with fieldNames that consist more than one word + strange chars (see email from 19/7)
2004-Jul-05 12:43 gcancio
4074 OraMon - Validation Failures Minor 2004-Jul-08 10:22 waldron
4097 Add OraMon possible errors to its documentation Small 2004-Jul-12 12:28 dfront
4162 expiry of historical data To be discussed at CERN 2004-Jul-19 12:14 jveldik
4180OraMon should support number sizes and a boolean typeSmall. Add: Learn if OraMon and agent can use the same code for metric validation.
2004-Jul-21 05:58 dfront
02 August 2004 14
Bugs found while installing OraMon 0.0.3
1) OraMon views indicate time that is later by one hour than the real time
2) OraMonArch/Cont service script (/etc/rc.d/init.d/OraMonArchContCtl): Return only after completing the work. Should return immediately. May cause computer to stuck at reboot.
3) Probable problem: metric validation errors at lcgmon002d differ from those at ccs002d
4) To be addressed to German: recognizing metric configuration change according to date causes rpm update to fail by mistake. Suggested fix: A hard coded date attribute.
5) To be checked: I suspect that logrotate does not work at ccs002d for /var/log/OraMon.log, because it did grow to: 66M as for 27/7
6) OraMonArch transaction error when stop.crash after DDL command and before updating relevant checkpoint (See above)