In 1997, IBM’s “Deep Blue” used to be considered one of the world’s biggest and most powerful supercomputers. So big and powerful that it beat Gary Kasparov, the world’s biggest and most powerful chess player, at his own game. At a cost of $100 million dollars, Deep Blue was big, not only in performance but in cost.
Today’s commodity computer is over 30x more powerful and 100,000x lower cost than Deep Blue was, less than 20 years ago. Exponential technology advancement sure took the depth out of Deep Blue over the years, which today looks rather, well… small.
Today, the word “big data” is all the rage. Countless big data companies have been born. Numerous big data tools have emerged. But then again, will the definition of big data today, pass the test of time tomorrow? As much so as the Titanic did when it came it came to defining big, unsinkable ships.
Today, Twitter represents a textbook example of what defines big data today:
- Each active Twitter user sends about 1.7 avg tweets per day.
- Collectively, that equates to about 500 million total tweets a day.
- With each tweet being about 200 bytes in size, incremental data produced is about 100 GB per day.
The current Twitter “big data” challenges revolve around the ingestion, batch and streaming analytics of both new data streams and persistent data stores. As a result, Twitter has contributed notable efforts into today’s open-source data tools, including Storm and Hadoop.
Twitter has done well in addressing its data challenge of today, that are largely driven by ingestion of human behavioral data patterns. Three billion people using the internet today produces an amount of data that would be insurmountable to comprehend ten years ago.
Looking ten years ahead, the Internet of Things is represents the most numerous and significant new “users” coming online. Not only will these new users outnumber people online by a great margin, but they also will be much more active. Because they communicate via messaging, you could also say that they “tweet,” just like we do. But differently….
- The IoT is projected to grow to over 50 billion devices that produce over 500 billion data points. That’s over 150x more machine points than people online.
- Typical IoT points, on average can “tweet” data every second, but they do it continuously. That can mean over 86,000 tweets per user, per day. Over 50,000x the number of tweets that people do today.
- Machine tweets can involve large media content, or tiny data values. To keep things simple, let’s assume 100 bytes per tweet.
- Total IoT big data flow can surpass 4.3 exabytes (that means 4,320 million GB per day)
Using the IoT example, showcases that the definition of “big data” flow tomorrow can be upwards of 43 million times bigger than Twitter’s tweet flow today. When it comes to scale, that’s like comparing the size of yourself to the size bigger than our entire planet.
To put scale in a much clearer perspective, back in 1977 Charles and Ray Eames produced “Powers of Ten,” which today is a useful tool to understand the power of exponential growth when it comes to scale:
As the of big data keeps growing “to the 10th degree,” we will be presented with several new “big data” questions, challenges and opportunities to solve in regards to managing data:
1. Today it’s all about storing everything. When does data hoarding and data pollution become a problem?
2. Today “loss” is a four-letter word when it comes to data. When does the shedding of data become as key as the shedding of unnecessary obese pounds?
3. Data being processed / analyzed at the right spot will become very important. The cloud alone won't cut it. Device layer. On-premise layer. and cloud layer.
In the meantime, exponential growth will continue to quickly and continuously redefine what’s big and what’s small. It’s already getting harder for my small brain to comprehend, so more and more i’m relying on technology to help understand it. That’s just begun and presents an entirely new set of problems and opportunities... :-)
Today’s big is tomorrow’s small.