By Tobias Manolo
Following on from our article last week regarding Big Data strategies, the term Big Data analytics has become something of a ‘headache’ within the IT environment, brought about, in a way, by the belief that the masses of data being received by enterprises holds the key to getting ahead of the competition and improving the bottom line. People within the enterprise believe that if a range of data points are compared and analysed, enough insights will be gathered to steam ahead in the business world. And that’s where IT has a problem because to gain that sort of insight needs a lot of data, all of which has to be captured, stored, made accessible and analysed!
The way data is processed sits in two categories: synchronously, as in real-time, and asynchronously, as in capture, record and then analyse the data, i.e. via a batch process. Ok, let’s go into a bit more detail.
The best way to explain synchronous data analytics is the supermarket (or Amazon!). Take Tesco’s… you buy your groceries online, you register your likes and dislikes by your buying behaviour and use of coupons, and when you return, the supermarket offers you the products you are most likely to buy. Another example is social media profiling which is used by advertisers to deliver the ultimate pop-up experience targeted for you, based on your online activity and preferences. Applications for real-time analytics are usually run on databases, i.e. NoSQL and solid-state storage devices. The big data storage infrastructure has to be flexible, must be quick – a critical element – and reduce latency. Flash storage is currently being used by some organisations and can be used as a network-attached storage system, via a tier on a disk array, or within the application server, although additional storage may be required should capacity be needed.
Asynchronous big data analytics is more traditional; data is captured, stored and then analysed. Data is captured from a wide range of sources – web, social, mobile devices, financial transactions, point-of-sale terminals, etc., recorded and stored within a storage system, then analysed using a RDBMS (relational database management system) which converts the data into a structured form that is the same as other data sets, and suitable for analysis. This process presents storage challenges in terms of scalability, capacity, performance level and cost efficiency. Data warehousing can produce enormous data sets; add in the fact that scalable disk storage architectures aren’t traditionally cost-effective and before too long, it’s not worth analysing your Big Data!
Big Data analytics is the buzzword of the moment and there’s no doubt that there are many benefits to being able to analyse and utilise the information gathered from the data. But the aspect of data storage must be addressed and incorporate scalability, efficiency and cost-effectiveness, which begs the question: are current storage solutions up to the task of handling Big Data analytics?