Splunk

Splunk take any type of data of millions of entries and allows you to process it into reports, dashboards and alerts.

It’s great at parsing machine data. We can train Splunk to look for certain patterns in data and label those patterns as fields.

Planning Splunk Deployments

A note on config files

Everything Splunk does is governed by configuration files. They’re stored in /etc and they’ve .conf extension.

They’re layered. You can have files with the same name in several directories. You might have a global level conf file and an app specific conf file. Splunk check which one to use based on the current app.

The /etc/<app>/default directory contains preconfigured versions of .conf files. You should never edit those files, and instead edit them in the /etc/<app>/local directory.

Inside a conf file there’re stanzas

[Stanza]
Attribute = Value

[Stanza]
Attribute = Value

Deployment Models

Splunk offers two types of deployments

  • On cloud
  • On premise

No matter which type you choose, the data pipeline has the same input parsing, indexing, and searching, and they’re both scalable.

The Splunk Data Pipeline

  • Input: Splunk consumes data from sources. It does not look at its contents.
  • Parsing: It happens on the index or heavy forwarder. Splunk examines, analyzes and transforms the data, and identifies timestamps. It also adds metadata.
  • Indexing: Splunk takes the parsed data and writes it to indexes on disk in the form of flat files stored in the indexer (called buckets).
  • Searching: Where users interact with Splunk to search through the data.

Data Storagement

When Splunk processes raw data, it adds it into indexes. They map to places in the disk, which are called buckets. It comes with several indexes built-in and you can create your own indexes as well.
During the index phase, Splunk transforms raw data into events and then stores those events in these buckets. An event is a single row of data that has metadata attached to it. The default index is called Main.

Splunk also has an _internal index that stores internal indexes.

An event is a single row of data. It has fields, which are key-value pairs.

key value
username fred

Splunk adds default fields to all events:

  • timestamp
  • host
  • source: name of the file stream
  • sourcetype: format of the data

In Splunk, an index contains compressed raw data from associated index files, these files are spread out into different directories depending on their age: hot, warm, cold, frozen, thawed.

Searching and Reporting

What is SPL? (Search Processing Language)

SPL encompasses all the search commands and their functions, arguments, and clauses. Its scope includes data searching, filtering, modification, manipulation, insertion, and deletion.

Time

The _time field stores time in epoch format. You can specify absolute time ranges using SPL.

earliest=01/14/2021:16:32:00 latest=07/14/2021:21:00:00

You can also specify relative ranges using - or + to indicate the offset

-30m -> 30 mins ago
-7d -> 7 days ago
+1d -> 1 day from now

There’re a lot of time variables to format time in Splunk.

Variable Description
%c Date and time in server’s format
%H Hour (24-hour clock)
%I Hour (12-hour clock)
%M Minutes (00-59)
%p AM or PM
%S Seconds (00-59)

We also can format dates using variables

Variable Description
%F ISO 8601 format (yyyy-mm-dd)
%A Full weekday name (Monday)
%d Day of month (01-31)
%j Day of year
%B Full month name (January)
%m Month (01-12)
%y Year as two digit number (0-99)
%Y Year as four digit number (yyyy)

We can convert time into the format we want during search time using eval expression and time variables on _time field.

eval New_Time = strftime(_time, "%I:%M, %p")

Basic Searching

The search pipeline starts off with a big glob of data. The first thing to do is filter out as much data as we can. One of the best ways to do that is to use one of Splunk’s metadata fielsd like index, host, source or sourcetype. If we know any of these, we should start with those.

index = main, index = default
host = server.com, host = 192.168.1.1
source = /var/lib, sourcetype = csv

Broad search terms

For broad search, we may use:

  • Literal keywords. failed, error
  • Phrases. "failed login"
  • Fields as key value pairs. user=user1.domain.com
  • Wildcards. *ailed, fail*, user=*
  • Booleans. AND, OR, NOT

Basic search commands

After we have our broad search terms, we can do the first pipe, and then after the pipe we can start doing our search commands

  • chart / timechart. returns results in tabular output for charting
  • rename. renames specific field
  • sort. sorts results by specified fields
  • stats. statistics
  • eval. calculates an expression
  • dedup. removes duplicates
  • table. builds a table with specified fields
host=myhost.lcl source=hstlogs user=* (message=fail* OR message=lock*)
| table _time user message
| sort -_time

Field extraction

Splunk has built-in field discovery, altough custom field extractions can be built into it with regex which extracts fields based on patterns.

Intermediate Searching

This are some of the most used commands. They come after the pipeline.

top

  • returns the most common values of a given field
  • defaults to 10 fields
  • can be combined with limit=<number>
  • automatically builds a table with count and percent columns
  • can be used with multiple fields (return the top value for a field organized by another field)
top user
rare
  • opposite of top
  • returns the least common values of a field
  • options are identical to top
rare user
stats
  • it’s used along a function by a field
  • some common functions: count, avg, max, mean, median, sum, stdev, values, list
stats avg(kbps) BY host
stats count(failed_logins) BY user

Reference(s)

https://deloittedevelopment.udemy.com/course/splunker/learn/lecture/22765743#overview