We’re sitting on a big data time bomb
$114 billion. That’s how much global organizations will spend on big data in 2018, an increase of more than 300 percent in just five years. But how much of that is money well spent?
Over the past 10 or so years, we’ve seen widespread adoption of new approaches for managing big data such as Mapreduce and the introduction of schema-less databases for massive-scale storage, as well as complementary technologies like Hadoop, Storm, and Spark for storage and processing. But making use of big data means more than deploying a particular platform or paradigm: At its best, it means a total redesign of how companies structure and organize data.
1140億ドル。これが2018年に世界中の会社がビックデータに費やす金額だ。実に5年前から300パーセント以上の増加である。しかしそのうちいったいどれだけが正しく費やされるだろうか?
我々は10年以上も前からビッグデータに対する新しいアプローチの広範囲な適用を見てきた。すなわちストレージと処理にHadoop, Storm, Sparkなどを補完的に利用した大規模ストレージ向けのMapreduceといったスキーマレス型データベースの導入である。しかし真のビッグデータの利用には特定のプラットフォームや実装例を拡大するだけでは不十分だ。最良なのは会社のデータ構築と運用を根本から再設計することである。
1140億米ドル。これは、2018年の時点でグローバル組織がビッグデータにかけることになる費用である。わずか5年で3倍以上の増加だ。だが、このうちいくらが有効活用されているのだろうか。
過去10年そこそこの間に、Mapredeceや大規模ストレージのためのスキーマレス・データベースの導入といった、ビッグデータを扱うための新しい手法や、HadoopやStorm、Sparkなど、ストレージや処理のための補足的な技術が広く取り入れられきた。しかし、ビッグデータの活用は、単に特定のプラットフォームやパラダイムを活用すればいいというものではない。究極的には、企業がデータを構築し、オーガナイズする方法を、一から設計し直さなくてはならないのである。
Despite big data’s promising benefits, few organizations have begun the essential steps to prepare for the adoption of new capabilities and data platforms. An industry survey of global companies found that only 35 percent have “robust processes for data capture, curation, validation, and retention.” Equally troubling, 67 percent “do not have well-defined criteria in place to measure the success of their big data initiatives.” Instead, big data solutions are integrated reactively, department by department, or not at all.
The amount of available data in the world will have exploded to 44 zettabytes by 2020 — 10 times what it was in 2013, according to a 2014 IDC report. Companies that fail to prepare for this next generation of massive data volume and insights run the risk of incurring operational and technical debt. In an example of corporate natural selection at work, those that fall behind are doomed to wither away.
Here’s what they can expect as this big data time bomb goes off.
ビッグデータという時限爆弾は、このようにして爆発すると予測されているのである。
Catastrophic loss of transparency. Few IT professionals have experience managing big data platforms at scale — a situation that has created a massive skills shortage in the industry. By 2018, U.S. companies will be short 1.5 million managers able to make data-based decisions. A recent McKinsey Quarterly report estimates that, in order to close this gap, companies would need to spend 50 percent of their data and analytics budget on training frontline managers; it also notes that few companies realize this need.
As data needs broaden, managers without a firm understanding of information management and best-practices in data extensibility will encounter major challenges with managing data-driven systems. Through poor operational transparency, businesses will struggle to identify when data is inaccurate and meaningful and even whether key reports and metrics are running properly. Being able to grasp these intricacies and ask the right questions about data will become a mandatory skill. Anything less will mean a lack of visibility into how your business is run, inhibiting informed decision making and diminishing your company’s competitive edge.
Skyrocketing personnel costs. In 2014, data scientists spent an estimated 50-80 percent of their working hours on cleaning and processing datasets. In the near-term, companies are often tempted to outsource the automation of data preparation tasks to off or nearshore data specialists. Demand for these services is already fueling an explosion of microwork platforms like CloudFactory, MobileWorks, and Samasource, which are expected to become a $5 billion industry by 2018.
However, the outsourcing approach doesn’t scale. Referring back to the predicted 44 zettabytes of data, this amount of rapid growth would require thousands of offshore and nearshore team resources with a long-term viable solution. Any sustainable solution will need to involve significant automation.
Communications blockage. Companies today interact with each other through curated data, but the effort to facilitate that process pales in comparison to what is coming within the next 20 years. A new standard of corporate data networking will emerge involving organizations of all sizes trading, publishing, and measuring curated datasets as well as the corresponding algorithms and metadata. A company that’s not able to participate in this global data marketplace will be unable to capitalize on the market intelligence on offer.
This evolution to commercial mass data-sharing is already underway in every sector of the global economy.
商業上の膨大なデータ共有化への進化は、グローバル経済のあらゆる分野ですでに進行中である。
この商業的な大規模データ・シェアリングの進化は、グローバル経済のあらゆる部門ですでに進行中である。
Under pressure to allow third-party verification of their research, pharmaceutical companies such as GlaxoSmithKline recently proposed plans to share clinical trial data more broadly. President Obama has called upon tech companies to share data about potential hacking threats. A recent Forrester report predicts that data services will become “a mainstream aspect of product offerings” in 2015, citing examples from John Deere’s FarmSight to LexisNexis’ analytics products. At this pace and by the next decade, effective use of big data won’t just be key to winning in the marketplace, it will be a prerequisite for participation .
Despite these impending challenges, you can avoid the big data time bomb — if you take action now. Here are three steps that can defuse this oncoming explosion within your company.
最初の文、「〜にもかかわらず、」以降は、
新しい機能やデータプラットフォームの採用準備のための、重要な一歩を踏み出し始めた企業は数少ない。
に修正します。