Business Cloud News
Jorge Lopez, director of product marketing, Syncsort

Jorge Lopez, director of product marketing, Syncsort

Every time new technologies enter the mainstream there is a bedding-in period; big data, and more specifically Hadoop, is no different. The UK market is still very much in its infancy and many organisations are struggling to make a proper business case; is it a next-generation analytics platform? Is it an open-source storage platform? Either way, the bottom line is that if your organisation generates a lot of data, Hadoop is likely to be on your radar already, but understanding how to sell it to the business is a skill in itself.

Your organisation will have spent years building up its IT system, so if your Hadoop proposal involves ripping out the infrastructure that’s already in place, you will encounter resistance. In larger enterprises in particular, the “if it ain’t broke, don’t fix it” mentality prevails, pragmatism reigns supreme and it takes more than a promise of solid ROI to justify significant change. Instead, IT teams need to approach it from the business end, which means solving existing problems with Hadoop, with minimal disruption. But if Hadoop really is the answer, what’s the question?

Businesses produce massive amounts of data on a daily basis, and many are paying millions every year to store it in dedicated data warehouses. These organisations have grown accustomed to paying incrementally more each year for the same service, often with no real understanding of where that extra money goes.

That’s a problem. The cost of storing a terabyte of data in a data warehouse can range anywhere from $20,000 – $100,000; in Hadoop, that figure sits somewhere between $250 and $2,000. From those figures alone it’s clear the ability to offload data from expensive servers into Hadoop is one way to justify the expense of the cluster itself.

A recent Gartner report found 80 per cent of all enterprise data warehouses are capacity-constrained. In our experience, most of that capacity is consumed by extract, load, and transform operations (ELT); for example joining a database of customer contact details with their respective purchase histories. While most organisations implemented ETL (Extract, Transform, Load) at the beginning – extracting the data and processing it outside the database – underperforming ETL tools forced them to take a different approach.

Where the software couldn’t cope we saw ETL evolve into ELT, whereby the transform stage takes place in a “staging area” within the data warehouse itself. This staging area can grow significantly over time, silently consuming ever-more resources. Considering the top 20 per cent of ETL and ELT operations consume as much as 80 per cent of total capacity, the essence of a business case is already evident.

The numbers just don’t match up; performing resource-intensive operations such as ELT in a data warehouse is a one-way ticket to a sky-rocketing IT budget.

Making the business case for Hadoop requires an understanding of where that top 20 per cent of ELT workloads sits in your organisation. Slowly-changing dimensions, raking functions and volatile tables are some of the worst offenders, and are therefore easy targets when it comes to your first steps with Hadoop. Realistically, offloading anything that has a high impact on resource utilisation will be your fastest route to ROI.

By making data offload your first priority, it’s possible to amass a pot of operational savings and deferred database costs that can then be used to fund more strategic initiatives – analytics being one example. Not only that, but it’s important to remember that Hadoop has its own required skillset, and by picking your battles one at a time you’ll avoid needing to bring in too much new talent at once. Hadoop has a cascade effect; once you have a cluster in place you see more and more ways to utilise it, which in turn deliver greater savings, quicker, so it almost becomes its own circular economy.

It may be tempting to start your Hadoop journey with fancy analytics, but in reality it’s much more difficult to present a compelling cost argument by taking that route first. Predictive analysis and other types of advance analytics are a longer-term project and are harder to quantify, particularly if the business is performing well.

To glean valuable insights from big data you need to be able to ask the right questions at the right time, which is a learning curve in itself, and requires that you have adequate skills in-house to operate the various required platforms. These sorts of hurdles are by no means insurmountable, but they are important factors to bear in mind during the planning process.

By Jorge Lopez, director of product marketing, Syncsort

big data world congress

  • Tom Deutsch (IBM) May 9, 2014 at 2:28 pm

    Focusing on ETL is as the primary driver is, frankly, not the best idea as it removes business outcomes as the driver. Adoption of Hadoop, or any new-ish technology, needs to be driven by business outcomes – ROI – to be most effective. Often times that is driven by insights from the data that were hard to get to before (mainly driven by flexibility not sheer size).

  • Greg Deckler May 9, 2014 at 10:27 pm

    I think you are lumping all data warehouses together a little unfairly. There are technologies like Microsoft’s Parallel Data Warehouse (PDW) that offer a TB of storage in the $1,000 range and some amazing performance around ETL/ELT’s. I am seeing customers implement a PDW alongside their traditional data warehouse in order to offload significant workloads for a far more inexpensive price than upgrading their current data warehouse. It is essentially what you describe above, but it does not have to be Hadoop. The real story here is to justify better ways of doing things, analyze the cost involved in simply upgrading the current system to meet the needs, and the business case almost makes itself. The issue with Hadoop in particular is that it is so vaguely defined and such a relatively immature technology, it is a much tougher sell than something like a PDW

  • Post a comment

    Threaded commenting powered by interconnect/it code.