Splunk Data Onboarding Success with Syslog Part 3 TroubleshootingBy: Dan Potter, Senior Security Engineer

If you’re just getting started be sure to check out Part 1 and 2 of our series:
Data Onboarding Success Part 1 – Success with Syslog-NG and Splunk, The Install and Setup.
Data Onboarding Success Part 2 – Success with Syslog-NG and Splunk. The Basics.

Troubleshooting syslog-ng and Splunk

Below is an assortment of common troubleshooting situations you’ll face with syslog-ng and Splunk.

Service Issues

1. syslog-ng service won’t start

A place to begin troubleshooting syslog-ng service issues, start by running:

syslog-ng –syntax-only

Organizing Issues

2. keeping track of events from multiple syslog servers

We can add new metadata to all events that come from your syslog server so that we can keep track of which host sent the data to Splunk.  To do this, we need to update our inputs.conf and add a fields.conf file.

Add the following to your inputs.conf

[default]
_meta = splunk_syslog_server::SERVERNAME

Create a fields.conf file with the following:

[splunk_syslog_server]
INDEXED = true

3. multiple departments/networks with own syslog servers that feed into a central syslog server where all events show the original syslog server

Syslog will always use the “last hop” value for the hostname or IP address. So if data passes through an intermediate syslog server, you will need to modify the options on that intermediate forwarder to chain the hostnames.

# If the log message is forwarded to the logserver via a relay, and the
# chain_hostnames() option is 'yes', the relay adds its own hostname to
# the hostname of the client, separated with a / character.
chain_hostnames(yes);

Generally, you will want to leave chain_hostnames(no)

Resourcing Issues

4. syslog server is receiving events faster than it can send them to Splunk

You will need to increase the output limits so your UF can send data as fast as it receives it. To do this, modify the limits.conf in Splunk.

# By default a universal or light forwarder is limited to 256kB/s
# Either set a different limit in kB/s, or set the value to zero to
# have no limit.
# Note that a full speed UF can overwhelm a single indexer.
[thruput]
#maxKBps = 0 is unlimited throughput. Which is what we want
#for our syslog server to ensure we send syslog as fast as
#we receive it
maxKBps = 0

5. monitor for dropped packets

netstat -su | grep “receive errors”

If you are dropping UDP packets, please see below:
1. Adjust so-rcvbuf:

When receiving messages using the UDP protocol, increase the size of the UDP receive buffer on the syslog receiver host. On certain platforms, even low message load (~200 messages per second) can result in message loss, unless the so-rcvbuf() option of the source is increased.
In such cases, you will need to increase the net.core.rmem_max parameter of the host but do not modify net.core.rmem_default parameter. As a general rule, increase the so-rcvbuf() so that the buffer size in kilobytes is higher than the rate of incoming messages per second.

For example, to receive 2000 messages per second, set the so-rcvbuf() at least to 2,097,152 bytes.
E.g.: source s_udp_network { udp(ip(0.0.0.0) port(514) so_rcvbuf(2097152)); };

2. Adjust net.core.rmem_max parameter. (Use large buffers in kernel):

If syslog-ng cannot read the messages fast enough from the UDP socket, the kernel receive buffers will start to fill and after the configured limit has been reached, the kernel will start discarding messages. In this case, it is necessary to adjust the buffer size accordingly. To raise the size of the kernel receive buffers, use the sysctl command to tune the net.core.rmem_max parameter. Next, raise the size of the so-rcvbuf option of the syslog-ng source definition as well, so that syslog-ng is capable of utilizing the larger kernel receive buffers.

In a high traffic environment as high as 256MB might be necessary:

Enter the value in bytes:
sysctl -w net.core.rmem_max=268435456

In the example above, 256*1024*1024=268435456 bytes.
As a rule of thumb, this buffer size should be enough to accommodate incoming peak message rate for at least one second.

3. Adjust log_fifo_size()

To be able to handle message bursts, increase the value of the log_fifo_size() option. To match the value configured for
net.core.rmem_max in the previous step.

Configuration Pointers

6. TLS syslog input

You will need to add a new source line, and update your log line

source s_tls_remotes { network(ip(0.0.0.0) port(10514) transport("tls") tls(
key-file("/opt/syslog-ng/key.d/privkey.pem")
cert-file("/opt/syslog-ng/cert.d/cacert.pem")
peer-verify(optional-untrusted)
### Logs
log {
source(s_tls_remote);
filter(f_TLS_SOURCETYPE);
destination(d_TLS_SOURCETYPE);
};

In the example above, we enable a new listener on port 10514 and specify the transport as TLS, and point to a valid certificate private key and certificate file on the syslog server. The peer-verify(optional-untrusted) instructs syslog-ng to not care if the cert is valid against the incoming data source. If you need to validate the certificate, comment out the peer-verify line or remove it.

If you need further assistance with TLS inputs, refer to the Administration Guide at https://support.oneidentity.com/technical-documents/syslog-ng-premium-edition/7.0.9/administration-guide/52#TOPIC-955503

7. sorting out ‘Info’ and ‘Debug’ messages to only show warnings and errors

You can get granular with the logging by log level or select ranges of log levels. You can then tell Splunk to only onboard the messages you care about while retaining or discarding the rest.

## Filters
## Useful filters
## level(emerg, alert, crit, err, warning, notice, info, debug)
## The level() filter selects messages corresponding to a single importance level, or a level-range.
## To select messages of a specific level, use the name of the level as a filter parameter,
## for example use the following to select warning messages: level(warning)
## To select a range of levels, include the beginning and the ending level in the filter,
## separated with two dots (..). For example, to select every message of error or higher level, use the following filter:
## level(err..emerg)
filter f_alert { level(alert); };
filter f_crit { level(crit); };
filter f_debug { level(debug); };
filter f_emerg { level(emerg); };
filter f_err { level(err); };
filter f_info { level(info); };
filter f_notice { level(notice); };
filter f_warn { level(warning); };
#destination d_cisco_asa_info { file("/opt/syslog/logs/cisco/asa/$HOST/$YEAR-$MONTH-$DAY-cisco-asa-INFO.log" template(t_default_format) ); };
#destination d_cisco_asa_notice { file("/opt/syslog/logs/cisco/asa/$HOST/$YEAR-$MONTH-$DAY-cisco-asa-NOTICE.log" template(t_default_format) ); };
#destination d_cisco_asa_warn { file("/opt/syslog/logs/cisco/asa/$HOST/$YEAR-$MONTH-$DAY-cisco-asa-WARN.log" template(t_default_format) ); };
#destination d_cisco_asa_crit { file("/opt/syslog/logs/cisco/asa/$HOST/$YEAR-$MONTH-$DAY-cisco-asa-CRIT.log" template(t_default_format) ); };
#destination d_cisco_asa_err { file("/opt/syslog/logs/cisco/asa/$HOST/$YEAR-$MONTH-$DAY-cisco-asa-ERR.log" template(t_default_format) ); };
#destination d_cisco_asa_err { file("/opt/syslog/logs/cisco/asa/$HOST/$YEAR-$MONTH-$DAY-cisco-asa-EMERG.log" template(t_default_format) ); };

8. devices sending syslog from multiple timezones

You will probably want to modify your t_default_template to use the ${ISODATE} rather than ${DATE}. This will ensure that we have the timezone included in all logged events. We can then tell splunk to use that timezone when parsing the timestamps.
Help, I need a different template for one of my log sources, what other options are there?
Here are some additional fields that may be useful in formatting your syslog messages:

• an S_ prefix on a time means take the value from the Source of the event. R_ means the time the syslog server received the message. C_ means current time
o useful to grab the correct timestamp from the sending server rather than using the receiving time of the syslog server. E.g. ${S_DATE}
o To use the S_ macros, the keep-timestamp() option must be enabled (this is the default behavior of syslog-ng OSE).
• ${DATE} = Jun 13 15:58:00
• ${ISODATE} = 06-13T15:58:00+01:00
• ${HOST} = server that generated the message
• ${LOGHOST} = server that received the message (if you use DNS lookup the host must be in DNS or it will say localhost.localdomain)
• ${MSGHDR} = The ${PROGRAM} and ${PID} fields, which are the program name and program ID for the sysloglog message
o PROGRAM[PID]: format. Includes a trailing whitespace after the :
• ${MSG} = Alias for the {MESSAGE} macro. Text contents of the log message without the ${MSGHDR} macro, and separately in the ${PROGRAM} and ${PID} macros.
• ${SOURCEIP} = Alternative to using {HOST}, useful if you don’t have DNS for a host
Additional macro info here, https://syslog-ng.com/documents/html/syslog-ng-ose-latest-guides/en/syslog-ng-ose-guide-admin/html/reference-macros.html
If you have issues with escape characters, please see this guide https://syslog-ng.com/documents/html/syslog-ng-ose-latest-guides/en/syslog-ng-ose-guide-admin/html/configuring-macros.html

9. ${S_DATE} used in template and now getting events with incorrect timestamps (ex. Dec 31, 1900)

When we tell syslog to accept whatever timestamp is in the original event, it’s possible we end up with bad time data. It is crucial that you have NTP enabled on all devices that send syslog data before configuring ${S_DATE}. Additionally, some devices like routers may start up with a bad date until they connect to an NTP server and update their time.

10. ${DATE} used in template and now some events have a current timestamp rather than the original event timestamp

This is the expected behavior and can occur if a device does not have connectivity for awhile and then reconnects. To account for issues like this, you may want to include 2 timestamps in your template, ${S_DATE} and ${DATE}. This is also useful in determining lag between source and syslog receiver.

11. filters aren’t working and all the data in catch_all directory says it’s coming from the same device name/IP

This can happen if you are trying to syslog across subnets and passing the data through a router. Ensure that your device can directly communicate with the syslog server without passing through routers or firewalls.

Still want to know more?

More than you ever wanted to know about syslog – https://tools.ietf.org/html/rfc5424
Here is a link to the current version of the Syslog-NG Open Source Edition Administrators Guide – https://www.syslog-ng.com/documents/html/syslog-ng-ose-latest-guides/en/syslog-ng-ose-guide-admin/html/

Missed the first two parts?:

Splunk Data Onboarding: Success With Syslog-NG and Splunk – Part 1
Splunk Data Onboarding: Success With Syslog-NG and Splunk – Part 2