Imaging when you upload machine data into a Spunk engine, you are basically uploading a bunch of data that are most likely unstructured and cannot be understand or stored by traditional relational database in a structured way. However, Splunk can get you the result in less than one second when you search it. Splunk must have some special way to classify the data. Have you even image how does Splunk read your machine data?
The selling point of Splunk is its unique ability to index machine data. This ability allows Splunk to quickly search for analysis, reporting and alerts.
What? Splunk index machine data? Yes! Remember, from previous blog Splunk Tutorial 03: Licensing of Splunk 7.1.1, we have mentioned that Splunk is charged per indexed data. Splunk actually read your data by indexing it with it’s own way.
Following is how splunk index your data.
- Indexing Pipeline
- Search Head
After Splunk received the raw data, either from forwarder or user upload, it’s indexing Pipeline will firstly reads the machine data and then divide it into a lot of different events and identifies some default fields.
Remember, an even can means one line in the raw data or as complicated as a stack trace for over a few hundred lines.
Following 4 fields are always indexed:
Source;;;Identify the source of the data;;;WinEventLog:Security;nn;
SoureType;;;Identify what kind of data is it.;;;WinEventLog:Security ;nn;
Host;;;Name of the host or machine where the data come from;;;OraclePC-PC;nn;
_time;;;When did the event happened;;;28/07/2018;nn;
Following is an example:
However, Splunk always looks for all interesting fields from raw data index it.
Raw data is then copied to the index and will be avalable during the search process.
The searches head distributes the search across many indexes and consolidated the result.