1-800-917-5719

Splunk Data Onboarding Success with Syslog Part 2 The setup

By: Dan Potter, Senior Security Engineer

If you’re just getting started be sure to check out Part 1 of our series: Data Onboarding Success Part 1 – Success with Syslog-NG and Splunk.

How do I install syslog-ng?

Make sure you download the latest version of syslog-ng. The version that may be in your distributions package manager may not be current.

You can find the latest version from the VERSION file of the git-hub repository for syslog-ng: https://github.com/balabit/syslog-ng/blob/master/VERSION

**Note on SELinux – If you are using SELinux, syslog-ng will only open the default ports for syslog and output to the default syslog directories.  Either place SELinux mode in permissive and audit events that don’t match SELinux, or

Installation instructions can be found on the syslog-ng website.

Configuration goals:

We want to make the process of bringing new data in to Splunk as easy as and efficient as possible.  To do this, we have the following goals with our syslog-ng infrastructure:

  • Accept data on TCP and UDP
  • Ensure log files are readable by the ‘splunk’ user, without root permissions
  • Collect clean hostname values
  • Sort data into directories based on sourcetypes
  • Sort data into directories based on hostname
  • Ensure we have complete timestamp data on all events
  • Minimize likelihood of dropped or truncated events
  • Capture all events that are received by syslog server and route any ‘unknown’ devices into a catch_all location
  • Allow easy log rotation/cleanup
  • Ensure events are in plain text long enough for Splunk to read them

Sections of a syslog-ng configuration

Let’s talk about the syslog-ng.conf file.  This is where we configure the syslog server, and there are 4 primary sections of this configuration.

  • Options, settings that apply to all other configurations.
  • Sources, where we define what data we are listening for.
  • Template, how we should format the logs we save to disk.
  • Destination, where to save the logs.
  • Filters, determine what data to log to which destination.
  • Log, brings together all prior settings and determines what file to write to which destination based on its source and filter.

Next we will discuss how we can use configurations in those sections to meet our Splunk success goals.

Accept data on TCP and UDP

This is done in the Sources section. Each source line spawns a CPU thread. UDP sources are single threaded while TCP sources are multithreaded.

# Sources -
## ***Don't syslog across subnets, or your data may appear to come from the router that sends the data***
## For large environments, recommend sending big UDP data sources to unique ports
## to resolve UDP receive buffer issues.
source s_tcp_network { network(ip(0.0.0.0) port(514) transport(tcp)); };
##See comments below for environments with high UDP syslog volume
source s_udp_network { network(ip(0.0.0.0) port(514) transport(udp)) ; };
##Additional Source UDP Port. Useful when you have multiple high volume syslog sources
#source s_udp_1514 { network(ip(0.0.0.0) (port(1514) transport(udp)); };

Ensure log files are readable by the ‘splunk’ user, without root permissions

This is done via the Options section


## Create output directory if it doesn’t already exist
create_dirs (yes);
## Configure the directory owner, group, and permissions.
dir_owner("splunk");
dir_group("splunk");
dir_perm(0755);
## Configure the file owner, group, and permissions
owner("splunk");
group("splunk");
perm(0644);

Collect clean hostname values

Ensure that you have good, valid DNS A and PTR records.  There are also some key values to pay attention to in the Options section.  Here you will see a few key fields, check_hostname, keep_hostname, normalize_hostname, and use_dns.  You will also notice that setting keep_hostname(yes) causes syslog-ng to ignore several options if the original event contains the host value.


# Check client hostnames for valid DNS characters
check_hostname (yes);
##setting keep_hostname(yes) = if the message contains a $HOST value we will use that
## keep_hostname(no) = use DNS to resolve the $HOST name from the received event
keep_hostname(yes);
## If keep_hostname(yes) and the event has a hostname the following options have no effect:
## use_dns(), use_fqdn(), normalize_hostnames(), and dns_cache() options
use_dns(yes);
use_fqdn(no);
normalize_hostname(yes);
##cache results to reduce the number of lookups required to filter by $HOST or output to $HOST
dns_cache(yes);
## Number of clients to cache: default value is 1007
dns_cache_size(1007);

Sort data into directories based on hostname and sourcetypes

This is done via the destination configuration section.  In the example below, we have created 4 destinations, cisco-asa, juniper, paloalto, and a catch all directory.  You must make a destination for each sourcetype you wish to collect in Splunk, and the host value will be automatically filled in via the ${HOST} variable as received from the event message.

The remaining path variables in the destination lines allow easy log rotation/cleanup of ${YEAR}-${MONTH}-${DAY}-${HOUR}00 will split the output file into new files for each Hour.  This way you can configure log rotation to either tar.gz or delete any files older than X hours.

If you only want to create a new file by the day, just remove the ${HOUR} variable.


### Destinations
## destination lines define the output path format. Here, we put the sourcetype as the 4th directory, the host as the 5th directory, and the log file with the Year-Month-Day-Hour as part of the log name.
destination d_cisco_asa { file("/opt/syslog/logs/cisco-asa/${HOST}/${YEAR}-${MONTH}-${DAY}-${HOUR}00-cisco-asa.log" template(t_default_format) ); };
destination d_juniper { file("/opt/syslog/logs/juniper/${HOST}/${YEAR}-${MONTH}-${DAY}-${HOUR}00-juniper.log" template(t_default_format) ); };
destination d_palo_alto { file("/opt/syslog/logs/paloalto/${HOST}/${YEAR}-${MONTH}-${DAY}-${HOUR}00-palo.log" template(t_message_only) ); };
destination d_all { file("/opt/syslog/logs/catch_all/$HOST-$SOURCEIP/$YEAR-$MONTH-$DAY-${HOUR}00-catch_all.log" template(t_default_format) ); };

The filter section determines what data is routed to those destinations we have defined, as well as ensure that we capture all events that are received by syslog server and route any ‘unknown’ devices into a catch_all location.


### Filters
## filter lines are processed in order, so your first filter should be for your sourcetype with the most events per second. Generally, this will be your primary firewall, or a proxy server.
## Cisco:ASA filter, by a regex match for something that should only appear in ASA messages, as well as netmask and host filters
filter f_cisco_asa {
match(\%ASA-\d-\d{6}) or
netmask(10.10.10.11/32) or
host("ciscofw")
};
## Juniper filter, by values that appear in Juniper syslog messages
filter f_juniper {
match(\d\sJnpr Syslog\s\d+|\sJuniper\:\s|\sjunos-[\w\-]+\s|\sIDP\:\s|\:\sNetScreen\s)
};
## PaloAlto filter by netmask or host name
filter f_palo_alto {
netmask(10.10.10.11/32) or
host("paloalto")
};
## filter anything that doesn’t match our existing filters
filter f_all { not (
filter(f_cisco_asa) or
filter(f_juniper) or
filter(f_palo_alto)
); };
 

Ensure we have complete timestamp data on all events

This is handled by the Templates section of our config.   In the example below, I have defined two templates we can use to set the output format of our log file.  There are many more fields available for file formatting than I have below.


### Template
## template lines allow us to define the output log line format of messages. In the examples below, we have a ‘t_msg_only’ template, which is useful for events such as JSON formatted events. The ‘t_default_format’ template mirrors default output settings, but we use the timestamp from the Source of the event rather than the default setting of using the Syslog Receiver timestamp
template t_msg_only { template("${MSG}\n"); template_escape(no); };
template t_default_format { template("${DATE} ${HOST} ${MSGHDR}${MESSAGE}\n"); template_escape(no); };

Most of your destination lines will use template(t_default_format). This ensures that all messages have a timestamp and end with a new line, making parsing easy.  Additionally, if an event already has a timestamp, we standardize the format rather than use the format of the event itself.

The template, t_msg_only is useful for events that already contain a Timestamp and other useful data, such as JSON events.  We don’t want to put syslog headers in front of JSON events because that will turn them into an invalid JSON format. The same for XML or CEF formatted syslog messages. We want Splunk to ingest as close to the original format as possible.

Minimize likelihood of dropped or truncated events

There are a few things that will help reduce your chances of dropping events. The first is to use a different UDP source for each high-volume UDP syslog data source.  UDP inputs are single threaded, and if you have a very high-volume source it will hog the connection and not allow the other device to report.  Edge or Primary Firewalls and Proxy hosts generally tend to be the highest volume sources. If you are sending events from those systems over UDP, configure a separate source for each device.

Additionally, we need to ensure that we have enough buffer to allow for events to exist in memory until it can be written to disk. We also need to ensure that we can accept large messages.


##The size of the output buffer. Each destination has its own one of these. If you set it globally, one buffer is created for each destination, of the size you specify. This is the memory buffer before events are written to disk.
## Below are the default values
log_fifo_size(2048);
##maximum length of a message in bytes. Increase if events are being truncated in syslog
log_msg_size(8192);

If you only have a single source you expect large messages on, it’s better to add the log_msg_size option to your source line, rather than modify the line in Options.  If we want to double the event size for a source, here is what that looks like


## below configuration may not work without first increasing kernel buffer sizes due to use of so_rcvbuff. However you will still see a large benefit of simply using a new UDP source for your high volume syslog device
source s_udp_large_events { network(ip(0.0.0.0) port(11514) transport(udp) so_rcvbuf(2097152)) log_msg_size(16584) ; };

Bringing it all together, let’s see that syslog-ng.conf!


@version: 3.17
@include "scl.conf"
@include "/etc/syslog-ng/conf.d/*.conf"
### Options
options {
create_dirs (yes);
dir_owner("splunk");
dir_group("splunk");
dir_perm(0755);
owner("splunk");
group("splunk");
perm(0644);
log_fifo_size(2048);
log_msg_size(8192);
#frac_digits(3);
time_reopen (10);
check_hostname (yes);
keep_hostname(no);
use_dns(yes);
use_fqdn(no);
normalize_hostname(yes);
dns_cache(yes);
dns_cache_size(1007);
};
### Source
source s_tcp_network { network(ip(0.0.0.0) port(514) transport(tcp)); };
source s_udp_network { network(ip(0.0.0.0) port(514) transport(udp)) ; };
### Template
template t_msg_only { template("${MSG}\n"); template_escape(no); };
template t_default_format { template("${DATE} ${HOST} ${MSGHDR}${MESSAGE}\n"); template_escape(no); };
### Destinations
destination d_cisco_asa { file("/opt/syslog/logs/cisco-asa/${HOST}/${YEAR}-${MONTH}-${DAY}-${HOUR}00-cisco-asa.log" template(t_default_format) ); };
destination d_juniper { file("/opt/syslog/logs/juniper/${HOST}/${YEAR}-${MONTH}-${DAY}-${HOUR}00-juniper.log" template(t_default_format) ); };
destination d_palo_alto { file("/opt/syslog/logs/paloalto/${HOST}/${YEAR}-${MONTH}-${DAY}-${HOUR}00-palo.log" template(t_message_only) ); };
destination d_all { file("/opt/syslog/logs/catch_all/$HOST-$SOURCEIP/$YEAR-$MONTH-$DAY-${HOUR}00-catch_all.log" template(t_default_format) ); };
### Filters
filter f_cisco_asa { match(\%ASA-\d-\d{6}) or
netmask(10.10.10.11/32) or
host("ciscofw")
};
filter f_juniper {
match(\d\sJnpr Syslog\s\d+|\sJuniper\:\s|\sjunos-[\w\-]+\s|\sIDP\:\s|\:\sNetScreen\s)
};
filter f_palo_alto {
netmask(10.10.10.11/32) or
host("paloalto")
};
filter f_all { not (
filter(f_cisco_asa) or
filter(f_juniper) or
filter(f_palo_alto)
); };
### Logs
## These should be listed in order of most common to least common events since they will be processed in order. The flags(final) instructs syslog-ng to stop processing log lines once it finds one that matches and ensures you don’t end up with duplicate data.
log { source(s_udp_network); source(s_tcp_network); filter(f_cisco_asa); destination(d_cisco_asa); flags(final); };
log { source(s_udp_network); source(s_tcp_network); filter(f_juniper); destination(d_juniper); flags(final); };
log { source(s_udp_network); source(s_tcp_network); filter(f_palo_alto); destination(d_palo_alto); flags(final); };
log { source(s_udp_network); source(s_tcp_network); filter(f_all); destination(d_all); flags(catchall); };
 

Now that we have our data sorted into directory names that indicate our expected sourcetype, and we have our hostnames in the directory path, how do we bring this data in to Splunk?

The first step is to install the Splunk Universal Forwarder (UF).  There are a few reasons we want to use a UF and not a HF

  • UF is lightweight and requires minimal system resources
  • The start/stop/restart process is much faster for a UF
  • UFs just send the data
  • HFs process data before sending to the indexers. This increases the data size and sends what is called “cooked” data rather than “raw” data.

With the UF installed, we need to instruct it on what to collect.  To do this we need to update or create an inputs.conf file and push it out from a Deployment Server (DS). If you do not have a DS, you can configure inputs locally on the collector. Either in splunk/etc/system/local/inputs.conf, or preferably, in a stand-alone app, such as splunk/etc/apps/my_syslog_server_configuration_app.

For the actual inputs.conf file, here is what we would want to add to collect our sample data for Cisco-ASA, Juniper, and PaloAlto devices. We do not want to ingest the catch_all directory to Splunk.


[monitor:///opt/syslog/logs/cisco-asa/*/*.log]
disabled = 0
## host_segment defines the subdirectory in our path that contains the hostname of our device
host_segment = 5
## where to save the data in Splunk
index = network_firewall
## The sourcetype that we are using for our parsing within Splunk
sourcetype = cisco:asa
[monitor:///opt/syslog/logs/junper/*/*.log]
disabled = 0
host_segment = 5
index = network_firewall
sourcetype = juniper
[monitor:///opt/syslog/logs/paloalto/*/*.log]
disabled = 0
host_segment = 5
index = network_firewall
sourcetype = pan:log
 

What about “logrotate” to remove old files?

Since we are splitting files by Year-Month-Day-Hour, we can run a cron job to clean up our old files. If you only want to keep 2 weeks of data, add the following to cron

find /opt/syslog-ng/logs/ -daystart -mtime +14 -type f -exec rm {} \;

Next Steps?
Look out for Part 3 – Troubleshooting and advanced topics!

Pin It on Pinterest

Share This
%d bloggers like this: