Technology is making it easier to build your own SOC
Many organizations are considering building their own security operation center. This desire to do-it-yourself style security operations center marks a change in the industry. There is a variety of reasons for this industry shift. A couple common reasons that I commonly see is that the security monitoring technology available in the industry today is readily available to more companies and is lowering the bar of entry to build their own security operation center. Other significant reasons I hear is that the security industry has been dominated by Black Box MSSP models that have long frustrated corporate users. Today’s technology landscape allows Corporations and their users to see their data in real time and gives them the ability to create a white box monitoring model where they can see their own data real-time. Today, more than ever Corporations and institutions are choosing to build their own security operation center model for these exact reasons. Building a security operations center is not a trivial task but for an organization dedicated to the effort it is realistic to think you, within your own organization, can build your own security operation center. Having worked with hundreds of organizations to build their security operations center the single piece of advice I would offer to anyone starting this effort is “do not let perfect be the enemy of good enough”. As you start this effort there’s going to be a lot of things that you probably want to do better, and you probably want to be perfect, but in today’s shifting threat landscape perfect today is substandard tomorrow. By focusing on good enough you can begin executing on building your security operation center and taking baby steps towards progress. If you can commit to making incremental gains daily and making minor improvements as you go over the course of the year you will be able to look back at all of the effort and realize that you are that much closer to having the security operation center that you had originally envisioned. Many folks in the security space say that people, process, and technology is the key to building out a strong security operations center, however this narrow approach fails to acknowledge the overall strategy and purpose of building out of security operation center. Until you’re able to fundamentally answer the “purpose” question it does not matter how many people, how many processes, or what technology is place because you fail to acknowledge the fundamental reason and security use case of why you’re trying to build this in the first place. My recommendation to folks getting started in this space is to figure out the who, what, when, where, why, and how you’re building your security operations center. Once you have answered those questions then you will be able to better understand the purpose of building the security operation center in what you are trying to specifically achieve. Having the answer to the who, what, when, where, and why and how question will also better help to inform the type of personnel required to staff your security operation center, you will be able to better and more completely build the correct process documentation, In most importantly you will be able to select the correct technology or the correct tools to do the job.
Getting started building your SOC
Congratulations on deciding to build your own security operations center! This is an exciting task, and for many security professionals It might be one of the first times and only times that you get to build your own security operations center. Going forward and having gone through this exercise at least once in your career, you will be able to better inform and enhance an already existing security operations centers if you have a chance to work in one. As you start the path of building your security operations center the most important questions you can ask yourself is:
- Why are you building a security operations center?
- What will you monitor and be able to detect in your security operation center?
- When do you need to have your security operation center completed by?
- Where will you position your security technology? And where will your staff be located?
- How do you plan to build your security operation center?
- Answer this question last, Who will staff your security operation center based on all of the previous requirements collected?
What are you monitoring in your security operations center?
This is actually a very hard question to answer for any security professional looking to build a security operation center. The answer is not “we’re going to do the Secur-it-tay”, you need more than that. Answering this question, even partially, as a starting point will dictate the type of technology and approach to constructing your security operation center. If you get jammed by this question and you find yourself consistently trying to boil the ocean then I would recommend starting small. One area that I commonly see folks start is monitoring perimeter infrastructure, and user authentication activity. The reasons for this is that the log files generated by today’s common perimeter infrastructure comes in common log formats that are easily ingested into your security operation center. Windows authentication, Active Directory, LDAP have also been around for a long time and exist in common standard log formats that are easily ingested Into your security operation center and easily correlated based on event activity (but more that later). Common areas I commonly see folks overlook is security of their applications. If your business is an E-Commerce operation or you have core applications that support your business, then these applications extend your attack surface beyond your perimeter infrastructure and are likely viable attack targets. One of the biggest issues in the application security space, especially for custom developed applications, is that those applications are not designed with a robust logging structure and monitoring capabilities. Because of this fact, it’s common that key log files DO NOT exist for the key applications. If this is your case, I’d recommend adding a development request to the team to build out this logging capability. If you are in the position where application logs do not exist then snag the IIS or Apache log files–it’s better than nothing and these web activity logs will enhance and logging capability that is eventually developed.
My recommendation – start small, establish a beach head for security, and expand your reach as you can. BUT, you have to start. You can’t over engineer and over analyze this. Everyday spent over engineering is one more day passed that you might miss a potential attack and infiltration.
Tools and Architecture
Tools and architecture might be one of the most overlooked components of any security operation center. The security industry has some muddy waters on capabilities of today’s commonly available security monitoring technologies, this levels the playing field for substandard solutions. The bottom line is that not all technologies are created equal and as you begin to build your security operation center the idea of scalability becomes something that’s incredibly important. One of the issues that I commonly see folks run into is that they cut corners to save a couple bucks. The long term impacts of this decision could ultimately determine the overall viability and success of your security operation center. It’s really important to understand how the technology you use is architected including how data is being stored and indexed and how data is being searched and correlated against. I can’t tell you how much this matters—everything runs great when no data stored, but over time data storage increase and the volume of data to be searched increase it’s important that to consider a file less data structure.
Flexible Ingest of Data Sources for Monitoring
Many security and event logging solutions restrict the data sources available to ingest and onboard into the platform. It’s important that you are able to customize your solution in terms of log ingest otherwise your security monitoring software solution is dictating the capabilities of your security operations center. One scenario being – Cisco and Checkpoint have very different log formats, your security monitoring solution needs to have the ability ingest both types of log files. If you have Bro logs, maybe an Office 365 blob file, you also need to be able to ingest and correlate events on that as well. You need to have the flexibility and ability to ingest all data formats in your technology solution. Whatever you do, try to avoid cobbling together a solution with disparate technologies–you’ll incur security tech debt and left unchecked you’ll be unable to staff your security operations center.
You need to have an ability to build custom reports and dashboards. Many Security incident in event management tools have fixed and static reporting capabilities. Your monitoring solution should not dictate what you should and shouldn’t see from your solution. Your ability to customize these reports will help to ensure relevant metrics consistently reported for your security program.
Consider Flexible Deployment Options
Having the ability to decide on your deployment options now and in the future should be an important part of your security operation center. Businesses change and your security operations center capabilities should be able to adjust with the changing business strategy. Your ability to deploy on premise and the cloud or other location could greatly increase the efficiency and effectiveness of your security operations center. If your business operates from one physical location only then this may be less of a concern. However, if you are operating on a larger scale having the ability to deploy data collections closer to the source and the ability to search closer to the data might reduce network latency when time is critical during an investigation.
How data is indexed matters…a lot. The capabilities of today’s security incident and event management tools have a come along way. Most current day security incident and event management platforms perform the data parsing at the time of ingest. This Is a notable improvement from legacy security incident in event management platforms. Legacy platforms relied on ingesting the data first, parsing the data second. The result of this model is that parsers needed to be built before data could be indexed. This model significantly elongated incident response times because investigators needed to know what data they were looking for, design the report, build the parsers for the report, then let the parser index the data to populate the report. This some times took hours to complete when time is of the essence during an investigation. In today’s Security operation center It is commonly seen that data is parsed and indexed during the time of ingestion. The result of this feature is that it gives Cybersecurity investigators the ability to search logs in a Google like fashion.
Having to build it yourself is tough. If you can find a solution that has broad adoption base, established use cases, and being to leverage extensible capability you can “plug-in” will save you time. Simply put–if others have already done what you are trying to-do and have published their results, this will make it easier for you to “Google” the solution as you go.
Splunk Security Operations Center Considerations
If you are using Splunk here’s some additional considerations
NuHarbor Security builds a lot of security operation centers and we deliver managed security services provider (MSSP) services for many organizations and institutions. Full disclosure: we usually use Splunk to build our solutions primarily because it scales and we’re not limited to what the software provider tells us we should monitor. If you’re looking to build your security operations center with Splunk, here’s some things to think about:
Splunk Data Sources
Start with the end goal in mind. Understanding what you want to monitor and what results you want to see is a good starting point for determining what data sources you should ingest into Splunk. Due to Splunk’s licensing model it’s important to be clear as to what data is being ingested and the value of the data that’s being ingested. If you know what you want to monitor and what dashboards you need to see, it becomes easy to reverse engineer the data sources needed to realize your goal. If you are a Splunk Enterprise Security customer then I would recommend choosing a list of notable events from out of the box Enterprise Security and then reverse engineer the data sources required in order to realize the notable event in the corresponding correlations. This link is a good starting point: http://dev.splunk.com/view/dev-guide/SP-CAAAE3A
If you are not familiar with the Splunk indexes, you can think of them as logical locations where your log file data is being stored. It is the “directory” that is searched by the Splunk Search Heads. The Splunk indexes play an important role in the overall security operations architecture for Splunk. How you structure your indexes becomes incredibly important to your ability to implement a robust access model to your data. As you set up your Splunk Security Operations Center giving consideration as to how many indexes you have and the data that goes into those specific indexes will help you ensure logical segmentation of certain datatypes. Once your indexes are set up and your strategy is established, you can specify which indexes should be searched for security purposes. This idea of logical data segmentation also becomes incredibly important when you have data from different international locations. For the purposes GDPR and like privacy regulations you can keep data in Country and limit searching to those with a valid need to know. Here’s a couple links to get you started:
Giving some thought as to the amount and types of data you want to retain should be an important consideration for any security operation center. Splunk has the ability to separate data sources from hot and warm storage that is quickly searchable, to cold storage that can take a little bit longer to search, to maintaining of an archive of frozen data stored for historical purposes. Depending on what industry you’re in today could dictate how long you need to retain your data for. The Payment Card Industry Data Security Standard has its own regulations and requirements for how long data should be stored (90 days readily searchable, and one year overall retention). Other industries such as the financial vertical or State and Local Government often have a 7 year retention requirement data. Many organizations choose to keep 30 to 90 days of data in a hot or warm state stored on a flash based storage for faster write and read times. Over 90 days is normally stored in a cold state on a disc with slower disk IOPS In order to save cost. Any data desired to be retained older than the “cold” state is considered archived in a frozen state retrieved in the event that the data needs to be recalled for historical purposes period. Every organization is different and your data storage strategy should reflect your organizational requirements. If you’re stuck on storage ideas, here’s some reference links to get you jump started:
Splunk Universal Forwarders, Heavy Forwarders, and Syslog
The Splunk universal forwarder collects data from a data source or another forwarder and sends it to a forwarder or another Splunk deployment. The most notable benefit of a Splunk universal forwarder is that it uses significantly fewer hardware resources than other Splunk software products. It can, for example, coexist on a host that runs a Splunk Enterprise instance. It also is more scalable than the other Splunk products, as you can install thousands of universal forwarders with little impact on network and host performance.
A heavy forwarder has a smaller footprint than a Splunk Enterprise indexer but retains most of the capabilities of an indexer. An exception is that it cannot perform distributed searches. You can disable some services, such as Splunk Web, to further reduce its footprint size.
Unlike other forwarder types, a heavy forwarder parses data before forwarding it and can route data based on criteria such as source or type of event. It can also index data locally while forwarding the data to another indexer.
In most situations, the universal forwarder is the best way to forward data to indexers. Its main limitation is that it forwards only unparsed data, except in certain cases, such as structured data. You must use a heavy forwarder to route data based on event contents.
The universal forwarder is ideal for collecting files from disk (e.g. a syslog server) or for use as an intermediate forwarder (aggregation layer) due to network or security requirements. Limit the use of intermediate forwarding unless absolutely necessary. Some data collection Add ons require a full instance of Splunk which requires a heavy forwarder (e.g. DBconnect, Opsec LEA, etc..).
When building your Splunk Security Operations Center, you generally need to have forwarders. There are ways around using forwarders, and there are other methods of collections. Most of these other collections methods are supported but not entirely scalable.
In super simple terms, you can think of a universal forwarder as a light weight agent, forwarding data to a central Splunk deployment. A heavy forwarder is used when additional data preparation needs to occur before being sent to the Splunk deployment. Syslog is commonly used where a universal forwarder can’t be installed (e.g. Linux) and is then passed to a forwarder and sent to a Splunk deployment. Still confused when to use a universal or heavy forwarder, check out this link: https://www.splunk.com/blog/2016/12/12/universal-or-heavy-that-is-the-question.html
Splunk Search Heads
A Splunk search head is key and important to the search workload and can dictate who can see what data in your Splunk indexes. It’s set up the same as any other distributed searcher, but because it has no local indexes, all results come from remote nodes. Multiple search heads can be configured, but the indexers that store the data still have to perform significant work for each search.
As you design your Splunk security operation center being purposeful about which search heads can inspect certain indexes will help ensure the logical separation and preservation of your data sources. Some juicy links here: https://docs.splunk.com/Documentation/Splunk/7.2.5/Indexer/Configurethesearchhead
Splunk Access Model
When designing your security operation center with Splunk your access model becomes incredibly important. Simply put, you are aggregating all of your logs in one spot and it can become incredibly easy to circumvent access controls at the local system or network level. The nice thing about Splunk is that you can implement a granular level of access control on your data and data stored in various locations. In the case of security you can limit access to the search head directly or to the index itself to limit what data can be seen.
Now, if you are in the situation where you are doing security monitoring but I do not have administrative control of your Splunk instance you can set up access restrictions at the index level that will effectively give administrators the ability to maintain the infrastructure but not see any of the data.
Be very cautious of your access model and how you choose to go forward here. If you are not careful you might inadvertently circumvent access controls at different levels of your technology stack. Check this link for access control considerations: https://docs.splunk.com/Documentation/Splunk/7.2.5/Security/UseaccesscontroltosecureSplunkdata
If you are planning to do any meaningful correlations then you really need to make sure you use Splunk’s Enterprise Security Application, this is security module for Splunk. This is a required module if you are doing advanced correlations and looking to detect patterns in your data. Enterprise Security will automatically review the events in a security-relevant way using searches that correlate many streams of data. Additionally, Splunk compares the identified event against the assets and asset value in your environment to prepare a comprehensive view of Enterprise Security Risk. Splunk Enterprise Security also contains other valuable features such as Incident Review, Investigation Management tools, Glass Tables, etc.
If you’re getting started with correlation searches and you are not sure where to start then I would recommend starting small. Start with a population of correlations that are easy to comprehend and conceptually track. As a starting point I usually recommend getting used to operating 10 correlation searches to start and as you become more comfortable tuning and optimizing the searches then look to add more when you’re ready. One consideration to make when it when adding correlation searches is the impact on CPU in compute horse power to process the advanced number data streams. This link has the 411: https://www.splunk.com/en_us/software/enterprise-security.html
Process does matter in a security operations center. It matters even more if you’re in the position where you could be investigating a crime. The more people that you have operating your security operation center the more important a tight and robust process becomes. If you were operating in a 24/7 Security Operation Center even having a system to coordinate handoff between shifts can be a differentiator when timely investigating events. The complexity of security operation center is can vary widely, but add a minimum processes I would like to implement from the start including an incident response play book, standardization of which security alerts to investigate and what constitutes an escalation’s from tier 1 to tier 2 from tier 2 to tier 3, etc. and depending on the criticality of your security operations center (e.g. investigating potential crime) implementing processes to ensure sound forensic handling of information and systems can assist in preventing spoliation of forensic data. Only some lawyers want spoliation of evidence: Spoliation of evidence
Threat intelligence is important for any security operation center. It is really important to be smart and judicious about the threat intelligence you use and how it is integrated into your security correlations. A bad threat intelligence feed could escalate false positives within your environment, conversely a high confidence threat feed may streamline and give you the ability to automate your security alert and capabilities. There is a lot of options for threat intelligence in the market today. Threat intelligence sources can be incredibly expensive but due to the nature of the threat Intel the value can be fleeting as the value of the intelligence can be outdated quickly due to a shifting threat landscape. It is possible to build your own threat Intel platform, but it does take time if you are building it from scratch. If you start down the path of building your own threat intelligence platform and establishing a collection of threat intelligence feeds I would encourage you to be patient and never let your feeds become complacent. You should always be testing the quality of your threat intelligence feeds as even the best intelligence feeds can become outdated over time. Additionally, if you are looking to build your threat intelligence platform from the ground up there are many free high confidence threat intelligence feeds in the market today. Having seen many organizations invest heavily into threat intelligence I would caution investing too much into this activity unless the value of the effort is truly worth it to your organization.
Last but not least is people. The big question for any security operation center is how many people do I need to operate my security function. The answer to this question is that it depends. What it depends on is the ability of your staff to meet the established agreed-upon service level or service level agreement of your security operation center. It also depends on if your security operation center operates on a 24 x 7 model or is required to use on shore resources. Three very different types of staffing models include (and not limited to):
- There’s the simple pager duty model. This means you have security staff operating during the day and critical alerts are escalated to an on-call analyst after hours.
- A direct staffing model of an analyst escalating to a manager. In this model target 5 security analysts to one manager. If you operate 24×7, then triple your staff counts to account for second and third shift.
- Advanced staffing model of Tier 1 analyst, Tier 2 senior analyst/engineer, Tier 3 Security Manager. In this model it’s common to see five Tier 1 Analysts for every Tier 2 senior analyst, and three Tier 2 senior analysts for every Tier 3 Security Manager. In this model I normally also see robust security investigation processes to ensure clean and reliable hand-offs between teams.
When it comes to staffing your security operation center there is no firm rule to follow on how to best staff your operations. Every company is different and every company has to factor in the security risk, the complexity of doing business, and the cost of running a fully burdened security operation center.
If You Still Need Advice
If you need help with your security operations center or assistance building out your security muscle with Splunk give us a call or check out our industry leading Splunk MSSP and Managed Services options. You can also find me on the tweet deck @justinfimlaid