By Allie Philpin
Remember that little saying, “Look after the pennies and the pounds will look after themselves,”? Well, when it comes to data, the same analogy applies; yet too many organisations are ignoring data ‘pennies’! And it’s not surprising when just about all we’ve heard is Big Data this, and Big Data that, and how much of a necessity it is to take control of this data. Less has been said about the ‘little’ data; the data that’s your bread-and-butter; the data that’s kept your organisation’s wheels turning; that data that’s already present in your existing systems.
Big data has evolved exponentially over the past few years, generating complex combinations of data variations, analytical and processing needs, resulting new data types – multi-structured data such as audio, images and webstreams, XML (extensible markup language), LOBs (large objects), etc., and that’s not taking into consideration the increased speed and volume in which this data is being received. This big data often means that organisations have to invest in new architectures, new solutions or systems, and possibly new hardware in order to deal with this data. But too much focus on the Big Data and the organisation loses sight of the ‘little’ data; and that brings with it the complications of ineffective best practices, integration issues into the new architecture, and potentially non-compliance.
So, what can you do to ensure that you don’t miss the ‘little’ data which is also important to your organisation? Start with your core data, the data that currently sits in your existing operational systems; this is your source data and it’s quite possible this data hasn’t been accessed and analysed! Source data is often managed by the IT department who, and they will be the first to admit this, don’t have the analytical skills needed to assess the data. It is also quite possible that any analytics expert contracted to help with Big Data doesn’t understand the impact of source data… a bit of a conundrum, isn’t it? Before you go ahead and implement a Big Data project, look at your source data and work out the best way to integrate, merge, compare and join this data with the new big data, then design a system that is capable of doing this, such as ensuring that the new big data is documented and defined within an enterprise data dictionary or data model.
Also look at how the data moves through the organisations. Data processes and warehouse systems access source systems to find the data, some of which may need ‘cleaning’; for example, data fields not containing the right data or invalid data. Think about the data that comes from external sources; that can bring its own issues, like numeric data fields containing non-numeric data. Whether the data is big data or ‘little’ data, whether is multi- or semi-structured, it will still require transformation, or ‘cleaning’, before it can be used effectively and if there are existing ‘cleaning’ protocols, any new protocol needs to incorporate this original, although there may well be modifications.
Another key consideration is where this data is going to be stored. Remember that direct access will be needed in order to analyse the data, and with that comes the query tools. And you will also need the full support of the IT department to discover the early use options. If the loading of current data needs to be carried out in parallel with big data tables, this could not only delay the data availability but it could also affect its accuracy.
Yes, Big Data is here to stay; yes, Big Data has to be dealt with; but don’t forget the ‘little’ data ‘pennies’!