The concept of 'Big Data' is not new and was generally used to discuss the vast amounts of information resulting from the processing in areas such as astromony, weather or meteorological planning calculations. The resulting processes produced petabytes of data, either in the form calculated results, or via the collection of raw observations. Approaches to store, manage and query the data vary, but many utilise concepts such as distributed and grid computing with vast amounts of physical storage, often directly attached (DAS), based on solid state or high speed SATA devices in order to allow rapid query execution. Massively Parallel Processing was then applied in front of the data in order to execute rapid relational queries.
In recent years, networking, consumer and and social media applications have started to produce vast amounts of log, user and alerting information that needs to be stored and analysed. For example, Walmart handles more than 1 million customer transactions every hour. That equates to around 2.5 petabytes of data. Those transactions not only need to processed accurately, but they will also need storing for accounting, management reporting and compliance mandates. Social networking is another area requiring the storage of huge user data, such as news feeds, photo objects, pointers, tags and user relations such as followings, friendships and associations.
The main issues with storing such vast amounts of data are generally around being able to index, query and analyse the under lying data. End users require search results in near real time response. Analytics are expected to be contextual, with detailed and flexible trend, history and projection capabilities, that can be easily and simply expanded and developed.
Another producer of this big data concept is that of security appliances, devices and software. Intrusion Protection Systems, firewalls and networking equipment will produce huge amounts of verbose log data that needs to interpreted and acted upon. Security Information and Event Management (SIEM) solutions over the last 10 years, have developed to a level of maturity where log and alerting data is quickly centralised, correlated, normalised and indexed, providing a solid platform where queries can be quickly interpreted and results delivered with context and insight.
But as security data continues to increase, simply having the ability to execute a query with a rapid response is not enough. The first assumption to this is that the query that needs to be run, is actually a known query. That is, a signature based approach. A set criteria is known (perhaps a control or threat scenario) which is simply wrapped within a policy engine, that is compared against the underlying data.
As security data starts to develop further and include identity, business and threat intelligence data, a known query may not exist. The concept of the 'unknown unknowns' makes it difficult to be able to traverse vast amounts of data without knowing what trends, threats, exceptions or incidents really need attention or more detailed analysis. The classic needle-in-a-haystack scenario, but this time needle is of an unknown, colour, size and style.
A simple example is analysing which entitlements a user should or should not have. If an organisation has 100,000 employees, each with twelve key applications, with each application containing 50 access control entries, the numbers alone require significant processing and interpretation. If the compliance mandate quickly requires the reporting and approval of 'who has access to what' within the organisation, a more intelligent approach is required.
This intelligence is in the form of having a more adaptable, contextual based approach to analysing the large volumes of data. It simply wouldn't be effective to perform static queries. A dynamic approach would include being able to automatically analyse just the exceptions held within a large data set, with the ability to 'learn' or adapt to new exceptions and deviations.
As attack vectors continue to increase, utilising internal and external avenues, security intelligence will become a key component of the information assurance counter measure tool kit, resulting in a more effective and pin pointed approach.