Today, you are much less likely to face a scenario in which you cannot query data and get a response back in a brief period of time. Analytical processes that used to require month, days, or hours have been reduced to minutes, seconds, and fractions of seconds. But shorter processing times have led to higher expectations. Two years ago, many data analysts thought that generating a result from a query in less than 40 minutes was nothing short of miraculous. Today, they expect to see results in under a minute. That’s practically the speed of thought — you think of a query, you get a result, and you begin your experiment.
“It’s about moving with greater speed toward previously unknown questions, defining new insights, and reducing the time between when an event happens somewhere in the world and someone responds or reacts to that event,” says Erickson. A rapidly emerging universe of newer technologies has dramatically reduced data processing cycle time, making it possible to explore and experiment with data in ways that would not have been practical or even possible a few years ago. Despite the availability of new tools and systems for handling massive amounts of data at incredible speeds, however, the real promise of advanced data analytics lies beyond the realm of pure technology.
“Real-time big data isn’t just a process for storing petabytes or exabytes of data in a data warehouse,” says Michael Minelli, co-author of Big Data, Big Analytics. “It’s about the ability to make better decisions and take meaningful actions at the right time. It’s about detecting fraud while someone is swiping a credit card, or triggering an offer while a shopper is standing on a checkout line, or placing an ad on a website while someone is reading a specific article. It’s about combining and analyzing data so you can take the right action, at the right time, and at the right place.” For some, real-time big data analytics (RTBDA) is a ticket to improved sales, higher profits and lower marketing costs. To others, it signals the dawn of a new era in which machines begin to think and respond more like humans.
“IBM execs told analysts at the company’s new Spark Technology Center [in San Francisco that] it’s an all-in bet to integrate nearly everything in the analytics portfolio with Spark. Other tech vendors betting on Spark range from Amazon to Zoomdata …”
In addition, IBM executives explained the salient features of Spark that they liked:
1. The task of data conversion and loading is handled automatically, allowing the Spark user to concentrate on data analysis, not data movement.
2. Spark is flexible in its data processing capabilities. It’s a platform where the task can be distributed, scheduled, and given proper I/O capacity, while the data gets filtered, reduced, and joined as needed.
3. Its in-memory feature gives it an outlandish speed advantage over classic Hadoop, which relies on MapReduce, a disk-based system. In short, it excels at performance.
4. It can host SQL queries, perform machine learning analytics, Spark Streaming data analysis and the analytics in the recently released SparkR language coming out of Berkeley.
IBM said it would run its own analytics software on top of Spark, including SystemML for machine learning, SPSS, and IBM Streams.