NuHarbor Security
  • Solutions
    Solutions
    Custom cybersecurity solutions that meet you where you are.
    • Overview
    • Our Approach
    • Data Icon Resources
    • Consultation Icon Consult with an expert
    • By Business Need
      • Identify Gaps in My Cybersecurity Plan
      • Detect and Respond to Threats in My Environment
      • Fulfill Compliance Assessments and Requirements
      • Verify Security With Expert-Led Testing
      • Manage Complex Cybersecurity Technologies
      • Security Monitoring With Splunk
    • By Industry
      • State & Local Government
      • Higher Education
      • Federal
      • Finance
      • Healthcare
      • Insurance
    Report 2023-2024 SLED Cybersecurity Priorities Report
    2023-2024 SLED Cybersecurity Priorities Report
    Read Report
  • Services
    Services
    Outcomes you want from a team of experts you can trust.
    • Overview
    • Data Icon Resources
    • Consultation Icon Consult with an expert
    • Security Testing
      • Penetration Testing
      • Application Penetration Testing
      • Vulnerability Scanning
      • Wireless Penetration Testing
      • Internal Penetration Testing
      • External Penetration Testing
    • Assessment & Compliance
      • CMMC Compliance
      • NIST 800-53
      • HIPAA Security Standards
      • ISO 27001
      • MARS-E Security Standards
      • New York Cybersecurity (23 NYCRR 500)
      • Payment Card Industry (PCI)
    • Advisory & Planning
      • Security Strategy
      • Incident Response Planning
      • Security Program Reviews
      • Security Risk Assessments
      • Virtual CISO
      • Policy Review
    • Managed Services
      • Curated Threat Intelligence
      • Managed Detection and Response (MDR)
      • Sentinel Managed Extended Detection and Response (MXDR)
      • SOC as a Service
      • Splunk Managed Services
      • Tenable Managed Services
      • Vendor Security Assessments
      • Vulnerability Management
      • Zscaler Support Services
    Report 2023-2024 SLED Cybersecurity Priorities Report
    2023-2024 SLED Cybersecurity Priorities Report
    Read Report
  • Partners
  • Resources
    Resources
    Explore reports, webinars, case studies, and more.
    • Browse Resources
    • Consultation Icon Consult with an expert
    • Blog icon Blog
    • Podcast icon Podcast
    • Annual SLED CPR icon Annual SLED CPR
    • Downloadable Assets icon Downloadable Assets
    Report 2023-2024 SLED Cybersecurity Priorities Report
    2023-2024 SLED Cybersecurity Priorities Report
    Read Report
  • Company
    Company
    We do cybersecurity differently – the right way.
    • Overview
    • Data Icon Resources
    • Consultation Icon Consult with an expert
    • Leadership
    • News
    • Careers
    • Contact
    Report 2023-2024 SLED Cybersecurity Priorities Report
    2023-2024 SLED Cybersecurity Priorities Report
    Read Report
  • Consult with an expert
  • Client support
  • Careers
  • Contact
1.800.917.5719
NuHarbor Security Blog
    • Compliance
    • Cybersecurity Technology
    • Security Operations
    • Industry Insights
    • Security Testing
    • Advisory and Planning
    • Application Security
    • Managed Detection and Response
    • Threat Intelligence
    • NuHarbor
    • Managed Services
    • Cyber Talent
May 8, 2018

The 5-Step process for onboarding custom data into Splunk

NuHarbor Security
Overview

Thanks for taking a few minutes to read our guide for onboarding custom data into Splunk. For support with Splunk Managed Services, please contact us here.

This guide is meant to help Splunk Administrators onboard new data when a Splunkbase TA isn’t available. It will cover how to properly onboard data at the index level and introduce search time considerations for Common Information Model (CIM) mapping. We’ll look at IIS logs, props, transforms, inputs, and the nullQueue.

1. Understand Your Data

Before you can properly onboard data, you’ll need a sample to understand the file format, timestamps, and what an event is. An event can be one line or multiline, like a JAVA debug dump.

Sample Data – Location on disk: C:\inetpub\logs\LogFiles\W3SVC2\u_ex180277.log

Sample IIS Log

#Software: Microsoft Internet Information Services 8.5

#Version: 1.0

#Date: 2016-06-30 14:44:43

#Fields: date time s-ip cs-method cs-uri-stem cs-uri-query s-port cs-username c-ip cs(User-Agent) cs(Referer) sc-status sc-substatus sc-win32-status time-taken

2016-06-30 14:44:43 172.30.16.210 GET / - 80 - 96.84.222.21 Mozilla/5.0+(Windows+NT+10.0;+Win64;+x64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/51.0.2704.103+Safari/537.36 - 200 0 0 546

2016-06-30 14:44:43 172.30.16.210 GET /favicon.ico - 80 - 96.84.222.21 Mozilla/5.0+(Windows+NT+10.0;+Win64;+x64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/51.0.2704.103+Safari/537.36 http://54.224.49.25/ 404 0 2 187

2016-06-30 14:46:33 172.30.16.210 GET / - 80 - 96.84.222.21 Mozilla/5.0+(Windows+NT+10.0;+Win64;+x64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/51.0.2704.103+Safari/537.36 - 200 0 0 62

2016-06-30 14:46:34 172.30.16.210 GET /page1.html - 80 - 96.84.222.21 Mozilla/5.0+(Windows+NT+10.0;+Win64;+x64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/51.0.2704.103+Safari/537.36 http://54.224.49.25/ 200 0 0 62

2016-06-30 14:46:37 172.30.16.210 GET /page2.html - 80 - 96.84.222.21 Mozilla/5.0+(Windows+NT+10.0;+Win64;+x64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/51.0.2704.103+Safari/537.36 http://54.224.49.25/page1.html 200 0 0 62

2016-06-30 14:46:37 172.30.16.210 GET /page3.html - 80 - 96.84.222.21 Mozilla/5.0+(Windows+NT+10.0;+Win64;+x64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/51.0.2704.103+Safari/537.36 http://54.224.49.25/page2.html 200 0 0 62

Key observations from the above sample IIS log:

  • Each line appears to be its own event.
  • Timestamps are at the beginning of each event.
  • Events actually start at line 5; the first four lines are just header information.
  • The file is structured (tab eliminated).

Get Control of Standard Props.conf Configuration

Props.conf controls many things. We’ll focus on index time operations in this section. Index time operations happen during the parsing pipeline and before the data is written to disk. Once data is written to disk, it can’t be changed or modified, so ensuring these are correct is the first step of onboarding data.

Props.conf Example

[iis_example]

TIME_PREFIX = ^

TIME_FORMAT = %Y-%m-%d %H:%M:%S

MAX_TIMESTAMP_LOOKAHEAD = 20

LINE_BREAKER = ([\r\n]+)\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}

SHOULD_LINEMERGE = FALSE

TRUNCATE = 9999

In the example above, we’re:

  • Creating a new sourcetype called iis_example (line 1).
  • Telling the indexer where the timestamp is located (line 2). In this case, the timestamp is at the start of the line, which is denoted by the ^ character.
  • Telling Splunk the format of the timestamp (line 3). While Splunk is pretty good about determining the format on its own, this makes it quicker because it doesn’t have to guess, leading to better performance on indexers. This is in strptime format.
  • Telling Splunk how many characters from TIME_PREFIX to read to find the full timestamp (line 4).  Ensure you account for situations where the date can be a single digit (e.g., the date is Feb 8 2018 instead of Feb 08 2018).
  • Telling Splunk when a new event starts (line 5). ([\r\n]+) removes all hard return characters and denotes the beginning of a new line. Be as specific as possible to ensure it only breaks when it should (e.g., in this case we’re also looking for a date after the hard return). A JAVA debug dump can be hundreds of lines long, but usually starts with a predetermined timestamp format. This is done using regular expression format.
  • Telling Splunk not to paste lines together (line 6). Making Splunk “glue” events back together before writing the event to disk is very costly on performance. This should always be FALSE if you have LINE_BREAKER set correctly.
  • Telling Splunk how many characters an event can be (line 7). When Splunk reaches this amount of characters, it automatically breaks the event and starts a new event. Ensure this is large enough to cover the max event length; not doing so will truncate the event prematurely and have an adverse effect on your ability to search the data.

Helpful References

  • The Splunk Quick Reference Guide has common regular expression and date/time formatting examples. Download the guide here: https://www.splunk.com/pdfs/solution-guides/splunk-quick-reference-guide.pdf
  • There’s a very helpful way to verify the expressions and date/time regexes you end up creating by going to https://regex101.com/. From there you can test your regex against the event string of interest and see if it’s correct.
3. Work on Advanced Props.conf Configuration: nullQueue

One of the observations above is that the events don’t actually start until line 5. If we only used what the props listed above, we’d end up with the header rows (i.e., junk data) being indexed as an event. This section works through how to tell Splunk to ignore these rows and not index the data.

First, create a regex to identify these rows. We can see these rows all start with a #, so we can use that as the identifier to tell Splunk to ignore these rows. We want to be as specific as possible to ensure we don’t toss out good data, so in this case our transforms.conf (line 2) is looking for rows that starts with a # followed by one or more letters. This would match #Software, #Version, #Date, and #Fields but won’t’ match any # in the event itself because the event does not begin with #.

Next, we want to set the queue to a special value ‘nullQueue’ which instructs Splunk to discard the data prior to writing it to disk. Remember, your license is based on the amount of data written to disk, so this will not count against your license.

transforms.conf

[eliminate-iis-header-row]

REGEX=^#\w+

DEST_KEY=queue

FORMAT=nullQueue

Lastly, we need to update the iis_example sourcetype in props.conf to use the new transform (line 10 below).

Updated props.conf

[iis_example]

TIME_PREFIX = ^

TIME_FORMAT = %Y-%m-%d %H:%M:%S

MAX_TIMESTAMP_LOOKAHEAD = 20

LINE_BREAKER = ([\r\n]+)\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}

SHOULD_LINEMERGE = FALSE

TRUNCATE = 9999 

# Apply Transform

TRANSFORMS-remove-header=eliminate-iis-header-row

At this point, we can deploy the props and transforms to the indexers. We need to ensure these are deployed prior to sending any data.

4. Send the Data to the Indexers

The next step is to send the data to the indexers. This assumes you already know how to use a deployment server, understand server classes, and have outputs.conf set up correctly. This will focus specifically on the inputs.conf file.

We know the location of this file is C:\inetpub\logs\LogFiles\W3SVC2\u_ex180277.log so we want to onboard all data in the C:\inetpub\logs\LogFiles\W3SVC2 directory as this new iis_example sourcetype (stanza name in props.conf) and put it into an index called web_logs.

inputs.conf

[monitor://C:\inetpub\logs\LogFiles\W3SVC2\*.log]

index=web_logs

sourcetype=iis_example

disabled=false

After this is deployed, we can validate the data is coming in and parsing correctly from the search head. You may need to wait for all the data to be indexed to validate time zones. Consider setting your time to ‘All Time’ for validation.

Example Search

index=web_logs sourcetype=iis_example

Garbage in = Garbage out

Validation is a crucial part of onboarding data. The major things you want to look for are as follows:

  • Is the timestamp correct? If the _time is not the same as the timestamp in the event, you likely have an issue with your TIME_FORMAT or MAX_TIMESTAMP_LOOKAHEAD.
  • Is the data in the past or the future? You may have a time zone issue and need to define that in your props.conf (TZ=XXX).
  • Are events breaking in the correct location or is every line of a JAVA debug dump its own event?
5. Search Your Data With Knowledge Objects

The last step is to make the data usable by creating fields. This can be done using regular expression, evaluations, and field aliases. You can add additional context to the data such as showing a human readable status code with lookup tables (while IIS subject matter experts will quickly know status 404 is “page not found”, end users may not know this). You may also want to normalize the data into a CIM data model using event types and tags so that you can view all your web data regardless of technology (e.g., IIS, Apache, Tomcat, WebLogic, etc.) in the same dashboard.

Thanks for taking a few minutes to read our guide for onboarding custom data into Splunk. For Splunk support or details on Splunk Managed Services, contact NuHarbor today!

Included Topics

  • Cybersecurity Technology

Related Posts

Cybersecurity Technology 8 min read
Splunk data onboarding: Success with Syslog-ng and Splunk – part 2 Read More
Cybersecurity Technology 6 min read
Splunk data onboarding: Success with Syslog-ng and Splunk – part 3 troubleshooting Read More
Cybersecurity Technology 4 min read
Splunk data onboarding: Success with Syslog-ng and Splunk - part 1 Read More

Subscribe via Email

Subscribe to our blog to get insights sent directly to your inbox.

Subscribe Here!

Latest Pwned episodes

Episode 200 - Reflections of Pwned...Until Next Time
April 03, 2024
Episode 200 - Reflections of Pwned...Until Next Time
Listen Now
Episode 199 - When a BlackCat Crosses Your Path...
March 21, 2024
Episode 199 - When a BlackCat Crosses Your Path...
Listen Now
Episode 198 - Heard it Through the Grapevine - Beyond the Beltway, 2024
March 08, 2024
Episode 198 - Heard it Through the Grapevine - Beyond the Beltway, 2024
Listen Now
NuHarbor Security logo
NuHarbor Security

553 Roosevelt Highway
Colchester, VT 05446

1.800.917.5719

  • Solutions
  • Services
  • Partners
  • Resources
  • Company
  • Contact
  • Privacy Policy
Connect
  • Twitter
  • Linkedin
  • YouTube
©2025 NuHarbor Security. All rights reserved.