Splunk take any type of data of millions of entries and allows you to process it into reports, dashboards and alerts.
It’s great at parsing machine data. We can train Splunk to look for certain patterns in data and label those patterns as fields.
Planning Splunk Deployments
A note on config files
Everything Splunk does is governed by configuration files. They’re stored in /etc
and they’ve .conf
extension.
They’re layered. You can have files with the same name in several directories. You might have a global level conf file and an app specific conf file. Splunk check which one to use based on the current app.
The /etc/<app>/default
directory contains preconfigured versions of .conf
files. You should never edit those files, and instead edit them in the /etc/<app>/local
directory.
Inside a conf file there’re stanzas
[Stanza]
Attribute = Value
[Stanza]
Attribute = Value
Deployment Models
Splunk offers two types of deployments
- On cloud
- On premise
No matter which type you choose, the data pipeline has the same input parsing, indexing, and searching, and they’re both scalable.
The Splunk Data Pipeline
- Input: Splunk consumes data from sources. It does not look at its contents.
- Parsing: It happens on the index or heavy forwarder. Splunk examines, analyzes and transforms the data, and identifies timestamps. It also adds metadata.
- Indexing: Splunk takes the parsed data and writes it to indexes on disk in the form of flat files stored in the indexer (called buckets).
- Searching: Where users interact with Splunk to search through the data.
Data Storagement
When Splunk processes raw data, it adds it into indexes. They map to places in the disk, which are called buckets. It comes with several indexes built-in and you can create your own indexes as well.
During the index phase, Splunk transforms raw data into events and then stores those events in these buckets. An event is a single row of data that has metadata attached to it. The default index is called Main.
Splunk also has an _internal
index that stores internal indexes.
An event is a single row of data. It has fields, which are key-value pairs.
key | value |
---|---|
username | fred |
Splunk adds default fields to all events:
- timestamp
- host
- source: name of the file stream
- sourcetype: format of the data
In Splunk, an index contains compressed raw data from associated index files, these files are spread out into different directories depending on their age: hot, warm, cold, frozen, thawed.
Searching and Reporting
What is SPL? (Search Processing Language)
SPL encompasses all the search commands and their functions, arguments, and clauses. Its scope includes data searching, filtering, modification, manipulation, insertion, and deletion.
Time
The _time
field stores time in epoch format. You can specify absolute time ranges using SPL.
earliest=01/14/2021:16:32:00 latest=07/14/2021:21:00:00
You can also specify relative ranges using -
or +
to indicate the offset
-30m -> 30 mins ago
-7d -> 7 days ago
+1d -> 1 day from now
There’re a lot of time variables to format time in Splunk.
Variable | Description |
---|---|
%c | Date and time in server’s format |
%H | Hour (24-hour clock) |
%I | Hour (12-hour clock) |
%M | Minutes (00-59) |
%p | AM or PM |
%S | Seconds (00-59) |
We also can format dates using variables
Variable | Description |
---|---|
%F | ISO 8601 format (yyyy-mm-dd) |
%A | Full weekday name (Monday) |
%d | Day of month (01-31) |
%j | Day of year |
%B | Full month name (January) |
%m | Month (01-12) |
%y | Year as two digit number (0-99) |
%Y | Year as four digit number (yyyy) |
We can convert time into the format we want during search time using eval
expression and time variables on _time
field.
eval New_Time = strftime(_time, "%I:%M, %p")
Basic Searching
The search pipeline starts off with a big glob of data. The first thing to do is filter out as much data as we can. One of the best ways to do that is to use one of Splunk’s metadata fielsd like index
, host
, source
or sourcetype
. If we know any of these, we should start with those.
index = main, index = default
host = server.com, host = 192.168.1.1
source = /var/lib, sourcetype = csv
Broad search terms
For broad search, we may use:
- Literal keywords.
failed
,error
- Phrases.
"failed login"
- Fields as key value pairs.
user=user1.domain.com
- Wildcards.
*ailed
,fail*
,user=*
- Booleans.
AND
,OR
,NOT
Basic search commands
After we have our broad search terms, we can do the first pipe, and then after the pipe we can start doing our search commands
chart
/timechart
. returns results in tabular output for chartingrename
. renames specific fieldsort
. sorts results by specified fieldsstats
. statisticseval
. calculates an expressiondedup
. removes duplicatestable
. builds a table with specified fields
Constructing a basic search
host=myhost.lcl source=hstlogs user=* (message=fail* OR message=lock*)
| table _time user message
| sort -_time
Field extraction
Splunk has built-in field discovery, altough custom field extractions can be built into it with regex which extracts fields based on patterns.
Intermediate Searching
This are some of the most used commands. They come after the pipeline.
top
- returns the most common values of a given field
- defaults to 10 fields
- can be combined with
limit=<number>
- automatically builds a table with count and percent columns
- can be used with multiple fields (return the top value for a field organized by another field)
top user
rare
- opposite of top
- returns the least common values of a field
- options are identical to top
rare user
stats
- it’s used along a function by a field
- some common functions: count, avg, max, mean, median, sum, stdev, values, list
stats avg(kbps) BY host
stats count(failed_logins) BY user
Reference(s)
https://deloittedevelopment.udemy.com/course/splunker/learn/lecture/22765743#overview