Big Data Analytics: Outsourcing vs. In-House
Business Solutions According to IDC, the 1.8 zettabytes—that’s 1.8 trillion gigabytes—of information created last year will grow by a factor of nine over the next five years.
While the storage of massive amounts of data on big computers is not a new idea, what has changed is the need and expectation of mining that data for decision support.
That’s what we call big data analytics and all experts agree that the ability to analyze big data will be the difference between success and failure in almost every type of business in the coming years.
Big data analytics has been and will continue to be made possible by three significant trends –
- The growth of nonvolatile memory: replacing disks and DRAM
- The introduction of photonics and improvements in interconnects: replacing copper cabling, thereby reducing space and power requirements
- Advances in systems on a chip: also reducing power and footprint
Where to turn
So how do companies today make the leap to light speed and become big data analyzers? Do they go outside and hire data analysis consultants or try to develop the capability in-house? The fundamental question must be ‘how business-critical is the data?’
If the data is essential to the company’s business survival, it should be kept in-house. Other analytics can be outsourced. But notice: outsourcing suppliers shouldn’t just be warehousing. Whether business-critical or support, the data must be analyzed.
Sorting your data
We should take a step back and say that in analytics, there are two classes of data. There’s the information that can be determined over time, and there’s the data that must be analyzed and mined in near real time.
The first class is a backend operation, which allows for deep dives into long-term analysis and business processing. This class of analysis was facilitated by technological developments like Hadoop and MapReduce, which make it possible to scale information and distribute it to a large number of commodity processors. These operations include built-in triple redundancy for security. They are suited for running in the cloud.
Decision support, on the other hand, is a near-real-time operation similar to streaming. It manipulates far smaller amounts of data and is facilitated by such technologies as open source Storm, Spark and Flink, which support real-time and stream processing. With faster processing, analysis that used to take days to perform was reduced to hours, and is now expected in minutes.
Applying the process
So imagine a startup company that—lets say—plans to consolidate all the most interesting news from around the world into a single newspaper. The masses of data they would scrape from news sources and social media would in fact be their ‘product.’ This data would need to be analyzed in near real time with alerts for certain key words and topics to create the news publication. This function should—must—be performed in-house.
Such essential data must be close by, accessible, malleable and secure. At the same time, the background IT that supports the website could be stored and analyzed in the cloud by an outsourced provider. Business criticality is the key.