taming*your*data · *agenda! osu*splunk*deployment–environmental*background*...
TRANSCRIPT
Copyright © 2014 Splunk Inc.
Mark Runals Sr Security Engineer The Ohio State University
Taming Your Data
Disclaimer
2
During the course of this presentaFon, we may make forward-‐looking statements regarding future events or the expected performance of the company. We cauFon you that such statements reflect our current expectaFons and
esFmates based on factors currently known to us and that actual events or results could differ materially. For important factors that may cause actual results to differ from those contained in our forward-‐looking statements,
please review our filings with the SEC. The forward-‐looking statements made in the this presentaFon are being made as of the Fme and date of its live presentaFon. If reviewed aRer its live presentaFon, this presentaFon may not contain current or accurate informaFon. We do not assume any obligaFon to update any forward-‐looking statements we may make. In addiFon, any informaFon about our roadmap outlines our general product direcFon and is subject to change at any Fme without noFce. It is for informaFonal purposes only, and shall not be incorporated into any contract or other commitment. Splunk undertakes no obligaFon either to develop the features or funcFonality described or to
include any such feature or funcFonality in a future release.
Disclaimer
3
During the course of this presentaFon, we may make forward looking statements regarding future events or the expected performance of the company. We cauFon you that such statements reflect our current expectaFons and
esFmates based on factors currently known to us and that actual events or results could differ materially. For important factors that may cause actual results to differ from those contained in our forward-‐looking statements,
please review our filings with the SEC. The forward-‐looking statements made in the this presentaFon are being made as of the Fme and date of its live presentaFon. If reviewed aRer its live presentaFon, this presentaFon may not contain current or accurate informaFon. We do not assume any obligaFon to update any forward looking statements we may make. In addiFon, any informaFon about our roadmap outlines our general product direcFon and is subject to change at any Fme without noFce. It is for informaFonal purposes only and shall not, be incorporated into any contract or other commitment. Splunk undertakes no obligaFon either to develop the features or funcFonality described or to
include any such feature or funcFonality in a future release.
Agenda
! OSU Splunk deployment – environmental background ! Props/field extracFon score methodology ! Look at data curator app
4
FYI -‐ Splunk Admin Focused PresentaFon
Some Background & Program Drivers
5
135 Distributed IT units around OSU • Each group is autonomous • No standardizaFon • Huge variety of technologies • Splunk use not mandatory Desired lightweight onboarding process • For units & for Splunk team
=
OSU Environment Incredible roll-‐on/adopFon rate
+
Fast Forward a Year or 2 +/-‐
6
! 2TB Of data ! 1,800+ Splunk agents ! 10k Devices ! 12 Types of firewalls ! MulFple OS ! 90+ Teams with data in Splunk ! 700+ Sourcetypes – many ‘learned’ ! 350+ People
Fast Forward a Year or 2 +/-‐
7
! 2TB Of data ! 1,800+ Splunk agents ! 10k Devices ! 12 Types of firewalls ! MulFple OS ! 90+ Teams with data in Splunk ! 700+ Sourcetypes – many ‘learned’ ! 350+ People
Is data being ingested correctly? What fields have been defined? Where? What types of data are in Splunk? What’s not configured correctly?
Issue Overview
8
Out of the box and without specific data definiFon Splunk will generally ingest data correctly • Host names • Sourcetypes • Timestamp • Line breaking • Auto key-‐value fields At best though, this isn’t efficient. At worst, it can strain your deployment and may drop/lose events
Factors in play • Hardware • RaFo of indexers to total log volume • Sourcetype velocity • Data distribuFon (forwarders pre 5.0.4 will favor first indexer listed in autoLB outputs.conf) • Weird date/Fme informaFon in your logs • Etc…
Data Import/DefiniFon Pipeline
9
DM = Index Time Processing • Sourcetyping • Line breaking • Timestamp • Host field • etc
KM = Search Time Processing • Base level field extracFon • Normalized field names • Field name alignment within
Common InformaFon Model (CIM) • Knowledge objects
Get Data to Splunk Data Management Knowledge Management
(Mark’s View)
The Plan
10
Data Management Score based on ‘Gepng Data in Correctly’ .conf 2012 preso
Knowledge Management Score based on length of fields relaFve to _raw length (conversaFon with Kevin Meeks) Data Curator App
Data Taxonomy Create way to classify sourcetypes
IdenFfy Common Issues Munge through internal logs
Data Management – Props Score
11
[mah_data_stanza] TIME_PREFIX = MAX_TIMESTAMP_LOOKAHEAD = TIME_FORMAT = SHOULD_LINEMERGE = LINE_BREAKER = TRUNCATE = TZ =
Data Management – Props Score
12
[mah_data_stanza] TIME_PREFIX = MAX_TIMESTAMP_LOOKAHEAD = TIME_FORMAT = SHOULD_LINEMERGE = LINE_BREAKER = TRUNCATE = TZ =
+1 +1
+1 OR DATETIME_CONFIG = +3
Data Management – Props Score
13
[mah_data_stanza] TIME_PREFIX = MAX_TIMESTAMP_LOOKAHEAD = TIME_FORMAT = SHOULD_LINEMERGE = False LINE_BREAKER = TRUNCATE = TZ =
+1
….but what if my data should be merged?
Data Management – Props Score
14
[mah_data_stanza] TIME_PREFIX = MAX_TIMESTAMP_LOOKAHEAD = TIME_FORMAT = SHOULD_LINEMERGE = True LINE_BREAKER = TRUNCATE = TZ = +1
AND
One of these is populated BREAK_ONLY_BEFORE MUST_BREAK_AFTER MUST_NOT_BREAK_BEFORE MUST_NOT_BREAK_AFTER
Data Management – Props Score
15
[mah_data_stanza] TIME_PREFIX = MAX_TIMESTAMP_LOOKAHEAD = TIME_FORMAT = SHOULD_LINEMERGE = LINE_BREAKER = TRUNCATE = TZ =
+1
Default is ([\r\n\]+)
Don’t want to line break? ((?!)) or ((*FAIL)) are a couple opFons*
*hyp://answers.splunk.com/answers/106075/each-‐file-‐as-‐one-‐single-‐splunk-‐event
Data Management – Props Score
16
[mah_data_stanza] TIME_PREFIX = MAX_TIMESTAMP_LOOKAHEAD = TIME_FORMAT = SHOULD_LINEMERGE = LINE_BREAKER = TRUNCATE = TZ =
Default is 10000
+1
Game your score! Ø Set this to anything other than the default
i.e. 10001 or 999999
+0
Data Management – Props Score
17
[mah_data_stanza] TIME_PREFIX = MAX_TIMESTAMP_LOOKAHEAD = TIME_FORMAT = SHOULD_LINEMERGE = LINE_BREAKER = TRUNCATE = TZ = +1
If sepng this across your environment isn’t possible/pracFcal reduce the max score macro in the app. It’s used as a variable.
Macro: props_score_upper_bounds = 7 6 \
Data Management – Props Score
18
[mah_data_stanza] TIME_PREFIX = MAX_TIMESTAMP_LOOKAHEAD = TIME_FORMAT = SHOULD_LINEMERGE = LINE_BREAKER = TRUNCATE = TZ =
Max Score = 7 (st_score * `props_score_scale`) / `props_score_upper_bounds` 10
Props Score Caveats
19
There are a lot of addiFonal props sepngs that could be applicable for your data/environment. This method/app doesn’t address host fields that are incorrect
syslog Default host field?
Splunk UF
Props Score Caveats
20
There are a lot of addiFonal props sepngs that could be applicable for your data/environment. This method/app doesn’t address host fields that are incorrect
syslog Default host field?
Splunk UF
Field ExtracFon Score Methodology
21
10.10.10.10 -‐ -‐ [20/Aug/2014:13:44:03.151 -‐0400] "POST /services/broker/phonehome/connecFon_10.10.10.10_8089_10.10.10.10_TEST-‐TS_68D82260-‐CC1D-‐4203-‐83CA-‐6E24F9FE6538 HTTP/1.0" 200 24 -‐ -‐ -‐ 1ms
1. Account for any autokv field names 2. Do convoluted search to get length of fields 3. Account for Fmestamp in log 4. Get total length
1. Remove spaces 2. Remove newline characters 3. Get _raw length
_raw length Length of Fields
= % of Event has Fields Defined
Field ExtracFon Score Methodology
22
10.10.10.10 -‐ -‐ [20/Aug/2014:13:44:03.151 -‐0400] "POST /services/broker/phonehome/connecFon_10.10.10.10_8089_10.10.10.10_TEST-‐TS_68D82260-‐CC1D-‐4203-‐83CA-‐6E24F9FE6538 HTTP/1.0" 200 24 -‐ -‐ -‐ 1ms
1. Account for any autokv field names 2. Do convoluted search to get length of fields 3. Account for Fmestamp in log 4. Get total length
1. Remove spaces 2. Remove newline characters 3. Get _raw length
_raw length Length of Fields
= % of Event has Fields Defined
11
2 3 11 11 7 36 8 3 4
Field ExtracFon Score Methodology
23
10.10.10.10 -‐ -‐ [20/Aug/2014:13:44:03.151 -‐0400] "POST /services/broker/phonehome/connecFon_10.10.10.10_8089_10.10.10.10_TEST-‐TS_68D82260-‐CC1D-‐4203-‐83CA-‐6E24F9FE6538 HTTP/1.0" 200 24 -‐ -‐ -‐ 1ms
1. Account for any autokv field names 2. Do convoluted search to get length of fields 3. Account for Fmestamp in log 4. Get total length
1. Remove spaces 2. Remove newline characters 3. Get _raw length
_raw length Length of Fields
= % of Event has Fields Defined
11
2 3 11 11 7 36 8 3 4
* Not a great example – Splunk forwarder phonehome logs actually have +100% field length compared to _raw
Field ExtracFon Score Methodology
24
Caveats/ConsideraFons
Doesn’t account for field alias (will arFficially inflate score)
If field extracFon % is over 100 the score is set to 100
DirecFonally correct is about the best this will get
Fields extracted != field value Ø
Data Taxonomy
25
Version 1 – deprecated out of the box
Designed to answer “What type of data is in Splunk?” Created a 2nd field classificaFon csv for several hundred sourcetypes • Data family • Data subtype Very useful but too many one-‐to-‐many relaFonships based on data use
netstat ConfiguraFon? Networking?
Server Monitoring Server InformaFon Server ConfiguraFon Server Performance
Too many server *
Data Taxonomy – InteracFve Host Dashboard
26
Host A
Data Taxonomy – InteracFve Host Dashboard
27
Host B
Data Curator App
28
Goals • Flexible scoring scale
• Generate aggregate, system maturity scores
• Generate ~accurate individual maturity score
• Show what app/package contained props sepngs
• Show current props sepngs
• Highlight issues related to/solvable by props sepngs – Line breaking – Timestamp – Transforms issues
Take Note! • Will NOT tell you what the sepngs should be • Requires Splunk 6 search head • Only able to work through issues I saw in my
environment -‐ you may have others. • I can troubleshoot my app
– not your deployment =)
Deployment At A Glance
29
Props Score Breakdown
30
Holy Crap!! Lots of Work
….but before you slit your wrists
Props Score Breakdown
31
Learned Sourcetypes (-‐too_small OR -‐#)
32
Beware of diminishing returns on working the ‘long tail’
Sourcetype Deep Dive Dashboard
33
Avamar Logs
Sourcetype Deep Dive Dashboard
34
Avamar Logs
Not all items factor into score
Sourcetype Deep Dive Dashboard
35
Avamar Logs
Loaded score based on volume of events per punct. Score created on the fly
Sourcetype Deep Dive Dashboard
36
Avamar Logs Based on volume of events per punct. Quick way to see how unique logs in a parFcular sourcetype are.
Had 75 unique punct
Sourcetype Deep Dive Dashboard
37
ABDCB (learned)
Sourcetype Deep Dive Dashboard
38
Argus
IdenFfying Date/Time Issues
39
IdenFfying Date/Time Issues
40
These events don’t have Fmestamps!
IdenFfying Date/Time Issues
41
These events don’t have Fmestamps! What if Splunk thinks the last known good Fmestamp was 6 years ago?
IdenFfying Date/Time Issues
42
These events don’t have Fmestamps! What if Splunk thinks the last known good Fmestamp was 6 years ago?
Date/Time Workspace Dashboard
43
Pre-‐populated with sourcetypes having issues
(DATETIME_CONFIG added to view aRer screenshot)
AddiFonal Dashboard Elements • Clustered internal logs giving you a level of visibility • 100 most recent events
(No Fme informaFon set)
Line Breaking/Truncate Workspace Dashboard
44
Line Breaking/Truncate Workspace Dashboard
45
Line Breaking Sanity Check Dashboard
46
Sourcetypes have line breaking set but have mulFple line counts in recent events
Line Breaking Sanity Check
47
Sourcetypes have line breaking set but have mulFple line counts in recent events
Set in mulFple apps; potenFal problem down the road?
Query TroubleshooFng
48
Two main scheduled searches that are somewhat computaFonally expensive. Dashboard allows admin to compare run length & frequency to coverage
Sourcetype field length percentage query
Extract/Report/Transforms Issues
49
08-‐21-‐2014 08:55:46.348 -‐0400 WARN SearchOperator:kv -‐ IndexOutOfBounds invalid The FORMAT capturing group id: id=7, transform_name='Message'
08-‐21-‐2014 08:59:02.854 -‐0400 WARN SearchOperator:kv -‐ Invalid key-‐value parser, ignoring it, transform_name='extract_cmd_change'
08-‐21-‐2014 08:59:03.345 -‐0400 WARN SearchOperator:kv -‐ Invalid key-‐value parser, ignoring it, transform_name='(?i)^(?:[^\|]*\|){3}(?P<dest_domain>[^\|]+)'
…wut? Which app? In props or transforms?
Example Internal Warning Logs
SoluFon: grep -‐r through 520+ packages in deployment-‐apps directory for ‘Message’?
Extract/Report/Transforms Issues
50
Extract/Report/Transforms Issues
51
Only 5 tokens
Extract/Report/Transforms Issues
52
Anyone know what the issue is?
Extract/Report/Transforms Issues
53
Should be an EXTRACT
KM – Sourcetype Fields Comparison
54
Boyom of explanatory text. There is a freeform text search box at top of dashboard
App Roadmap
55
Now • Props maturity scores • Field extracFon scores • Issues workspaces • Data taxonomy
RelaFvely non-‐scaling
Next • Dashboard opFmizaFon
(ie searchTemplate) • Tag based data taxonomy • Any iniFal app bug fixes
ARer Next • Tie in data model fields • Field value? • Expand issue
troubleshooFng Based on community feedback
56
?
Check out the Forwarder Health app in Splunkbase
Blog: runals.blogspot.com
.conf 14 updated Ge8ng Data in Correctly presentaFon– Andrew Duca
THANK YOU