April 5, 2019

Building a Security Operations Center with Splunk

Technology is Making it Easier to Build Your Own SOC

Many organizations are considering building their own security operation center (SOC). This desire for a DIY-style security operations center marks a significant industry change, and there are a variety of reasons for this shift.

One common reason is that security monitoring technology is now more readily available and lowers the bar for building a SOC. Another reason is that the security industry has long been dominated by black box MSSP models that have long frustrated corporate users. Today’s technology landscape allows corporations and their users to see their data in real time and gives them the ability to create a white box model for real-time data monitoring. Now more than ever, organizations are choosing to build their own SOC for these exact reasons.

Having worked with hundreds of organizations on building their SOC, the single best piece of advice I would offer to anyone starting this effort not to let perfect be the enemy of good enough. As you get going, there will be many things that you probably want to do better, and you probably want perfection, but in today’s shifting threat landscape, perfect today is substandard tomorrow. By focusing on good enough you can begin executing on building your SOC and taking steps towards progress. If you can commit to making incremental gains daily and making minor improvements over the course of a year, you’ll realize that you’re much closer to having the SOC that you originally envisioned.

Many security folks say that people, process, and technology are the key to building out a strong SOC, but this narrow approach fails to acknowledge the overall strategy and purpose. Until you’re able to fundamentally answer the “purpose” question, it doesn’t matter how many people, how many processes, or what technology is place because you fail to acknowledge the fundamental security use case driving you to build this in the first place. Before anything else, my recommendation is to figure out the who, what, when, where, why, and how you’re building your security operations center. Once you’ve answered those questions you’ll better understand your purpose and desired outcomes. You’ll be able to inform the type of personnel required to staff your SOC, build the correct process documentation, and most importantly, select the appropriate technology and tools for the job.

Getting Started Building Your SOC

Congratulations on deciding to build your own SOC! It’s an exciting endeavor, and for many security professionals it might the first and only time you get to build your own SOC. With this experience under your belt, you’ll be able to better inform and enhance existing security operations programs in the future. As you start down the path of building your SOC, the most important questions you can ask yourself are:

Why are you building a security operations center?
What will you monitor and be able to detect in your security operation center?
When do you need to have your security operation center completed?
Where will you position your security technology? And where will your staff be located?
How do you plan to build your security operation center?

Answer this question last:

Who will staff your security operation center based on all the previous requirements collected?

What Are You Monitoring in Your SOC?

This is actually a very hard question to answer. The answer can’t be, “We’re going to do the secur-it-tay.” You need more than that. Answering this question, even partially, as a starting point will dictate your approach to constructing your SOC and the type of technology you’ll need to procure. If you get jammed up by this question and find yourself consistently trying to boil the ocean, I’d recommend starting small. I commonly see folks get an effective start is with monitoring perimeter infrastructure and user authentication activity. This is because the log files generated by today’s common perimeter infrastructure come in common log formats that are easily ingested into your SOC. Windows authentication, Active Directory, and LDAP have also been around for a long time and exist in common standard log formats that are easily ingested into your SOC and easily correlated based on event activity.

I commonly see folks overlook the security of their applications. If your business is an E-commerce operation or you have core applications that support your business, these applications extend your attack surface beyond your perimeter infrastructure and are likely viable attack targets. One of the biggest issues in the application security space, especially for custom developed applications, is that those applications are not designed with a robust logging structure and monitoring capabilities. Because of this fact, it’s common that key log files don’t exist for core applications. If this is your case, send a request to your development team to build out this logging capability. If you’re in a position where application logs don’t exist, then snag the IIS or Apache log files – it’s better than nothing and these web activity logs will enhance any logging capability that is eventually developed.

My recommendation? Start small, establish a beachhead for security, and expand your reach as you’re able. You can’t overengineer and overanalyze this. Every day spent overengineering is one more day that you could miss a potential attack and infiltration.

Tools and Architecture

Tools and architecture might be one of the most overlooked components of any SOC. The security industry has muddy waters when it comes to capabilities of today’s commonly available security monitoring technologies, which levels the playing field for substandard solutions. The bottom line is that not all technologies are created equal, and as you begin to build your SOC, scalability becomes incredibly important. I regularly see folks cut corners to try and save a couple bucks. The long-term impacts of this decision could ultimately impact the overall viability and success of your SOC. It’s important to understand how the technology you use is architected, including how data is stored and indexed and how it’s being searched and correlated against. I can’t stress enough how much this matters. Everything runs great when no data is stored, but over time, data storage increases and the volume of data to be searched increases, so it’s important to consider a fileless data structure.

Flexible Ingest of Data Sources for Monitoring

Many security and event logging solutions restrict the data sources available to ingest and onboard into the platform. It’s important that you can customize your solution in terms of log ingest, otherwise your security monitoring software solution is dictating the capabilities of your SOC. One scenario being, Cisco and Checkpoint have very different log formats, so your security monitoring solution needs the ability ingest both types of log files. If you have Bro logs, or maybe an Office 365 blob file, you also need to be able to ingest and correlate events on that. You need to have the ability and flexibility to ingest all data formats with your technology solution. Whatever you do, avoid cobbling together a solution with disparate technologies – you’ll incur security technology debt, and left unchecked, you’ll be unable to staff your SOC.

Reports/Dashboards

You need the ability to build custom reports and dashboards. Many security incidents in event management tools have fixed and static reporting capabilities. Your monitoring solution shouldn’t dictate what you see from your solution. Your ability to customize these reports will help to ensure relevant metrics are consistently reported for your security program.

Consider Flexible Deployment Options

Having the ability to decide on your deployment options now and in the future should be an important part of your SOC. Businesses change and your SOC capabilities must be able to adjust with the changing business strategy. Your ability to deploy on-premises and in the cloud greatly increase the efficiency and effectiveness of your security operations center. If your business strictly operates from one physical location, this may be less of a concern. However, if you’re operating on a larger scale, having the ability to deploy data collections closer to the source and search closer to the data might reduce network latency when time is critical during an investigation.

Data Indexing

How data is indexed matters a lot. The capabilities of today’s security incident and event management tools have a come along way. Most current day security incident and event management platforms perform data parsing at the time of ingest, which is a notable improvement from legacy security incidents in event management platforms. Legacy platforms relied on ingesting the data first, parsing the data second. With this model, parsers needed to be built before data could be indexed, which significantly elongated incident response times because investigators needed to know what data they were looking for, design the report, build the parsers for the report, then let the parser index the data to populate the report. This could take hours to complete when time is of the essence during an investigation. In the SOCs of today, it’s common for data to be parsed and indexed during the time of ingestion. This gives cybersecurity investigators the ability to performs a Google-like log search.

Consider Adoption

Building it yourself is tough. If you can find a solution that has a broad adoption base, established use cases, and extensible “plug-in” capabilities, you can save time. Simply put, if others have already done what you’re trying to do and have published their results, this will make it easier for you to research (e.g., Google) the solution as you go.

Splunk SOC Considerations

NuHarbor Security builds a lot of security operation centers, and we deliver managed security services provider (MSSP) services for numerous organizations and institutions. Full disclosure, we use Splunk to build our solutions primarily because it scales, and we’re not limited to what the software provider tells us we should monitor. If you’re looking to build your security operations center with Splunk, here are some things to think about:

Splunk Data Sources

Start with the end goal in mind. Understanding what you want to monitor and what results you want to see is a step one for determining what data sources you should ingest into Splunk. Due to Splunk’s licensing model, it’s important to be clear as to what data is being ingested and the value of that data. If you know what you want to monitor and what dashboards you need to see, it becomes easy to reverse engineer the data sources needed to realize your goal. If you’re a Splunk Enterprise Security customer, I’d recommend choosing a list of notable events from out-of-the-box Enterprise Security and then reverse engineer the data sources required to realize the notable event in the corresponding correlations. This link is a good starting point: http://dev.splunk.com/view/dev-guide/SP-CAAAE3A.

Splunk Indexes

If you’re not familiar with the Splunk indexes, you can think of them as logical locations where your log file data is being stored. It’s the directory that is searched by the Splunk search heads. The Splunk indexes play a critical role in the overall security operations architecture for Splunk. How you structure your indexes becomes incredibly important to your ability to implement a robust access model for your data. As you set up your Splunk SOC, consideration of how many indexes you have and the data that goes into those specific indexes will help you ensure logical segmentation of certain datatypes. Once your indexes are set up and your strategy is established, you can specify which indexes should be searched for security purposes. This idea of logical data segmentation also becomes incredibly important when you have data from different international locations. For the purposes of GDPR and like privacy regulations, you can keep data in-country and limit searching to those with a valid need to know. These links will get you started:

https://docs.splunk.com/Documentation/Splunk/7.2.5/Data/WhatSplunkcanmonitor

https://docs.splunk.com/Documentation/Splunk/7.2.5/Indexer/Aboutindexesandindexers

Splunk Storage

Giving some thought to the amount and types of data you want to retain should be an important consideration for any SOC. Splunk can separate data sources from hot and warm storage that is quickly searchable, to cold storage that can take a little bit longer to search, to maintaining of an archive of frozen data stored for historical purposes. Depending on what industry you’re in could dictate how long you need to retain your data.

The Payment Card Industry Data Security Standard has its own regulations and requirements for how long data should be stored (i.e., 90 days readily searchable, and 1-year overall retention). Other industries such as the financial vertical or State and Local Government and Higher Education (SLED) often have a 7-year retention requirement for data. Many organizations choose to keep 30-90 days of data in a hot or warm state in flash-based storage for faster write and read times. Over 90 days is normally stored in a cold state on a disc with slower disk IOPS to save on costs. Any data desired to be retained longer than the cold state is considered archived in a frozen state and retrieved if the data needs to be recalled for historical purposes. Every organization is different, and your data storage strategy should reflect your organizational requirements. If you’re stuck on storage ideas, here’s some reference links to get you started:

https://docs.splunk.com/Documentation/Splunk/7.2.4/Indexer/Configureindexstorage

https://splunk-sizing.appspot.com/

Splunk Universal Forwarders, Heavy Forwarders, and Syslog

The Splunk universal forwarder collects data from a data source or another forwarder and sends it to a forwarder or another Splunk deployment. The most notable benefit of a Splunk universal forwarder is that it uses significantly fewer hardware resources than other Splunk software products. For example, it can coexist on a host that runs a Splunk Enterprise instance. It’s also more scalable than other Splunk products, as you can install thousands of universal forwarders with little impact on network and host performance.

A heavy forwarder has a smaller footprint than a Splunk Enterprise indexer but retains most of the capabilities of an indexer. An exception is that it cannot perform distributed searches. You can disable some services to further reduce its footprint size (e.g., Splunk Web).

Unlike other forwarder types, a heavy forwarder parses data before forwarding it and can route data based on criteria such as source or type of event. It can also index data locally while forwarding the data to another indexer.

In most situations, the universal forwarder is the best way to forward data to indexers. Its main limitation is that it forwards only unparsed data, except in certain cases, such as structured data. You must use a heavy forwarder to route data based on event contents.

The universal forwarder is ideal for collecting files from disk (e.g., a syslog server) or for use as an intermediate forwarder (e.g., aggregation layer) due to network or security requirements. Limit the use of intermediate forwarding unless absolutely necessary. Some data collection add-ons require a full instance of Splunk which requires a heavy forwarder (e.g., DBconnect, Opsec LEA, etc.).

When building your Splunk SOC, you generally need to have forwarders. There are ways around using forwarders, and there are other methods of collections, and while most of these other collections methods are supported they’re not entirely scalable.

In super simple terms, think of a universal forwarder as a lightweight agent, forwarding data to a central Splunk deployment. A heavy forwarder is used when additional data preparation needs to occur before being sent to the Splunk deployment. Syslog is commonly used where a universal forwarder can’t be installed (e.g., Linux) and is then passed to a forwarder and sent to a Splunk deployment. If you’re still confused when to use a universal or heavy forwarder, check out this link: https://www.splunk.com/blog/2016/12/12/universal-or-heavy-that-is-the-question.html

Splunk Search Heads

A Splunk search head is important to the search workload and can dictate who can see what data in your Splunk indexes. It’s set up the same as any other distributed searcher, but because it has no local indexes, all results come from remote nodes. Multiple search heads can be configured, but the indexers that store the data still have to perform significant work for each search.

As you design your Splunk SOC, being purposeful about which search heads can inspect certain indexes will help ensure the logical separation and preservation of your data sources. Some juicy links here: https://docs.splunk.com/Documentation/Splunk/7.2.5/Indexer/Configurethesearchhead

Splunk Access Model

When designing your SOC with Splunk, your access model becomes incredibly important. Simply put, you are aggregating all of your logs in one spot, and it can become incredibly easy to circumvent access controls at the local system or network level. The nice thing about Splunk is that you can implement a granular level of access control on your data stored in various locations. In the case of security, you can limit access to the search head directly or to the index itself to limit what data can be seen.

Now, if you are in the situation where you are performing security monitoring but don’t have administrative control of your Splunk instance, you can set up access restrictions at the index level that will effectively give administrators the ability to maintain the infrastructure but not see any of the data.

Be very cautious of your access model and how you choose to go forward here. If you’re not careful you might inadvertently circumvent access controls at different levels of your technology stack. Check this link for access control considerations:

https://docs.splunk.com/Documentation/Splunk/7.2.5/Security/UseaccesscontroltosecureSplunkdata

Splunk Correlations

If you’re planning to do any meaningful correlations then you really need to make sure you use Splunk’s Enterprise Security application (i.e., the security module for Splunk). This is a required module if you’re doing advanced correlations and looking to detect patterns in your data. Enterprise Security will automatically review the events in a security-relevant way using searches that correlate many streams of data. Additionally, Splunk compares the identified event against the assets and asset value in your environment to prepare a comprehensive view of enterprise security risk. Splunk Enterprise Security also contains other valuable features such as incident review, investigation management tools, Glass Tables, etc.

Splunk Security Essentials Is Not the Same as Splunk Enterprise Security

If you’re getting started with correlation searches and you’re not sure where to start, I’d recommend starting small. Start with a population of correlations that are easy to comprehend and conceptually track. As a starting point, I recommend getting used to operating 10 correlation searches to start. Consider adding more as you become increasingly comfortable tuning and optimizing searches. One consideration for adding correlation searches is the impact on CPU in compute horsepower to process the advanced number data streams. This link has the 411: https://www.splunk.com/en_us/software/enterprise-security.html

Processes

Process matters in a SOC, even more so if you could be investigating a crime. The more people that you have operating your SOC, the more important a tight and robust process becomes. If you’re operating in a 24/7 SOC, even having a system to coordinate handoff between shifts can be a differentiator for timely investigation of events. SOC complexity can vary widely, but at a minimum, the processes you should implement from the start include an incident response play book, standardization of which security alerts to investigate and what constitutes an escalation from tier 1 to tier 2 to tier 3, etc. Depending on the nature of your SOC (e.g., investigating potential crime) implementing processes to ensure sound forensic handling of information and systems can assist in preventing spoliation of forensic data. Only some lawyers want spoliation of evidence: Spoliation of evidence.

Threat Intelligence

Threat intelligence is crucial for any SOC. It’s important to be smart and judicious about the threat intelligence you use and how it’s integrated into your security correlations. A bad threat intelligence feed could escalate false positives within your environment; conversely, a high confidence threat feed may streamline and give you the ability to automate your security alert and capabilities. There are many options for threat intelligence in the market today. Threat intelligence sources can be incredibly expensive but due to the nature of the threat intel, the value can be fleeting as threat intel can be outdated quickly due to a shifting threat landscape. It’s possible to build your own threat intel platform, but it does take time if you’re building it from scratch. If you start down this path and establish a collection of threat intelligence feeds, I encourage you to be patient and never let your feeds become complacent. Always be testing the quality of your threat intelligence feeds, as even the best intelligence feeds can become outdated over time. Additionally, if you’re looking to build your threat intelligence platform from the ground up, there are many free high confidence threat intelligence feeds in the market today. Having seen many organizations invest heavily into threat intelligence, I would caution investing too much into this activity unless the value of the effort is truly worth it to your organization.

People

Last but not least: people. The big question for any SOC is how many people you need to operate your security function. The answer to this question is that it depends on the ability of your staff to meet the established service level agreement(s) of your SOC. It also depends on whether your SOC operates on a 24/7 model or is required to use onshore resources. Three very different types of staffing models include (but are not limited to) the following:

There’s the simple pager duty model. This means you have security staff operating during the day and critical alerts are escalated to an on-call analyst after hours.
A direct staffing model of an analyst escalating to a manager. In this model, target five security analysts to one manager. If you operate 24/7, triple your staff counts to account for second and third shift.
Advanced staffing model of tier 1 analyst, tier 2 senior analyst/engineer, tier 3 security manager. In this model it’s common to see five tier 1 analysts for every tier 2 senior analyst, and three Tier 2 senior analysts for every tier 3 security manager. With this model I typically see robust security investigation processes to ensure clean and reliable hand offs between teams.

When it comes to staffing your SOC, there's no firm rule to follow for how to best staff your operations. Every company is different, and every company must factor in the security risk, the complexity of doing business, and the cost of running a fully burdened SOC.

If You Still Need Advice

If you need help with your SOC or building out your security muscle with Splunk, check out our industry-leading Splunk MSSP and Managed Services options or contact us today!

Included Topics

Justin Fimlaid

Justin (he/him) is the founder and CEO of NuHarbor Security, where he continues to advance modern integrated cybersecurity services. He has over 20 years of cybersecurity experience, much of it earned while leading security efforts for multinational corporations, most recently serving as global CISO at Keurig Green Mountain Coffee. Justin serves multiple local organizations in the public interest, including his board membership at Champlain College.