This is a two-part blog on Big Data Analytics – Part one will look at Big Data analytics challenges for organizations of all sizes. In part two, Big Data Analytics Trends 2015 we will discuss Big Data analytics trends going forward and how these impact the cloud and cloud service providers.
The Big Data Phenomenon
Big Data is an important social phenomenon and one that is becoming increasingly important in modern computing and commerce. This is a subject that has prompted many intellectual discussions about how it will seriously impact the IT industry. Big Data is effectively vast and often has diverse data sets which cannot be dealt with by traditional data processing methods.
The Big Data market is expanding rapidly; IDC estimated that it was worth $16.1 billion worldwide in 2014. The expansion in infrastructure is significant in the growth of Big Data.
Gartner predicts that Big Data will expand rapidly in 2015, creating massive employment opportunities. In the next 12 months there will be a significant shift in how we use Big Data as the technology associated with it develops.
The role for cloud computing service providers will increase with relation to the implementation of Big Data analytics in real time.
Understanding Big Data
The term “Big Data” refers to the combination of an approach, methodology, and a set of enabling technologies that enable informed decision making from analytical insight derived from fast-moving, large and complex datasets comprising of structured, unstructured, and/or semi-structured data, where data volume, variety, velocity, and veracity offer challenges that typical database software tools are unable to address.
Big Data is a relatively recent phenomenon, but one that has been growing exponentially in recent years due to the expanding number of sources which are creating vast data sets. This data has now grown to thousands of terabytes, and in 2010, the CEO of Google, Eric Schmidt, asserted that the human race creates as much data in two days as it did in all of 2003.
Big Data comes from many sources including business applications, social media, data storage, machine log data, the public Internet, and many others. Big Data can include sensor data such as machine data, but also streams of data from social networks. Both of these are much larger in size than log files and dumps which are also fed into a database. Big Data was impossible to process before the invention of Big Data tools, which deal with these vast arrays of information today.
Six Important Big Data Attributes
- Volume – the sheer quantity of data produced.
- Variety – is of particular importance to data analysts in order to be able to deal with Big Data effectively.
- Velocity – the speed of the generation of data and how fast it can be processed.
- Variability – this is another major issue and refers to the inconsistency of some of the data that is generated.
- Veracity – the quality of the data captured can vary greatly.
- Complexity – finally, the management of Big Data can be incredibly complicated, especially when dealing with large volumes from multiple sources. Collation and correlation are both extremely important.
The Big Data Market in 2014
IDC predicted that the market for Big Data amounted to $16.1 billion in 2014. This is a huge figure in itself, but it also represents a rate of growth, where 2014 represents six times the growth rate than the overall IT market in 2013. IDC includes in this figure infrastructure (servers, storage, etc., the largest and fastest growing segment at 45 percent of the market), services (29 percent) and software (24 percent). Infrastructure has been the largest portion of this market for several years, but this figure indicates that its market share is growing, as IDC documented that the same proportion was only just over 41 percent in 2011.
Cloud computing is also inherently associated with Big Data, as the vast storage potential offered by this new yet rapidly developing form of computing makes dealing with the vast data sets involved with Big Data much more feasible.
Before the rise of cloud IaaS (Infrastructure as a Service) providers, having a cluster of servers meant a large investment in hardware. Today, with IaaS providers, any organization can quickly spin up a cluster composed of however many virtual nodes in any provider, keep that cluster alive for as long as there are tasks to run, and spin down the nodes when these are not needed. In fact, several of the largest IaaS providers have started offering pre-configured Hadoop clusters, so that users do not even have to worry about configuration.
Organizations can make use of benchmarking analyses and reports from Cloud Spectator to see which providers offer the most performance value per unit of cost. This will mean potential cost savings through buying less infrastructure on better performing services.
As massively parallel architectures become more and more commonplace – both in individual computers, with multiple cores and processing nodes and in distributed systems, which run in several servers – it becomes possible to perform much more complex tasks. Organizations of all sizes are able to access the computing power necessary to process a lot more data. This generates a positive feedback loop – as end-users believe that they can process more data, they will demand more data collection, which in turn leads to a greater need for data processing, and so on.
Big Data Analytics Challenges
Big Data is increasingly revolutionizing all aspects of our lives, ranging from enterprises to consumers, to government. But Big Data is still very much in its infancy, and this means that the way organizations are using the information generated is evolving very rapidly. A recent Accenture and General Electric study, reported on by Forbes indicated that enterprises are now investing the majority of the time in analysis, with just 13 percent currently making use of data analytics in order to predict outcomes. Additionally, 16 percent are using analytics applications related to the data in order to optimize processes and strategies.
Creating value from Big Data has increasingly become a multi-step process which involves acquisition, information extraction and cleaning, data integration, modelling and analysis, and finally interpretation and deployment. Most analysts advise that those wishing to work with Big Data effectively must take each step into consideration.
However, as much as Big Data is providing many opportunities for organizations, there are also plenty of Big Data analytics challenges associated with the technology. Firstly, there are huge research challenges for any organization that wishes to be involved with Big Data. One of the most prominent of these is the fact that the data generated by Big Data collection methods produce data that is heterogeneous; that is, extremely diverse and difficult to deal with.
The timeliness and privacy of data is also an issue, and presenting data and conclusions in a visual and digestible format can also be a huge challenge. In any case, many organizations have experienced issues related to the tools ecosystem which is used to deal with Big Data sources.
Additionally, Gartner reports that there is a significant skills gap in the IT industry though it is positive about the opportunities that big data presents. Gartner estimates that only one-third of IT jobs related to big data can currently be filled, due to failings in the public and private education systems. Gartner predicts that data experts will be a scarce and valuable commodity in the near future.
In Part Two of this blog on Big Data analytics challenges, we will discuss Big Data Analytics Trends 2015 going forward and how these impact the cloud and cloud service providers.