Must-know: “Big data” gave way to the arrival of NoSQL and Hadoop
Google and Yahoo broke large databases and distributed them across multiple servers, that led to distributed databases—which then founded advanced databases like NoSQL and new processing platforms like Hadoop.
Aug. 1 2014, Updated 1:11 a.m. ET
What does “big data” include?
“Big data” includes structured and unstructured data. Structured data is relatively easy to analyze. Unstructured data that includes voice, video, email, and documents are usually difficult and expensive to analyze. Relational databases, with their limitations in handling “big data,” aren’t much help either. As a result, “big data” is sometimes considered to be the data that can’t be analyzed in a traditional database.
The previous chart shows the growth expected in Hadoop and NoSQL market. Combined together, they’re expected to surpass $3.5 billion. As the volume of multi-structured data increases in the future, demand for NoSQL and Hadoop will increase.
Emergence of NoSQL and Hadoop
Google (GOOG) and Yahoo ran search engines. They were able to process them within milliseconds to produce search results and rankings. They were also dealing with huge data because millions of users were submitting different search queries at once. To deal with this type of unstructured query at a fast speed, they developed new database platforms. They broke large databases and distributed them across multiple servers, that led to distributed databases. These distributed databases led to the foundation and the emergence of advanced databases like NoSQL and new processing platforms like Hadoop.
What is NoSQL and Hadoop?
Not Only SQL (or NoSQL) refers to cloud friendly, broad classes of databases that are designed to handle semi-structured data. They don’t use query language or SQL. Hadoop is an open source, distributed database processing platform, that is designed to store and analyze “big data” across several thousand nodes.
Both Hadoop and NoSQL are open source, possess high speed, and have a high degree of fault tolerance. They’re cost efficient because they store data in small chunks across several servers. They can process queries fast by sending several queries to multiple machines at the same time. Due to these advantages, Microsoft (MSFT), Oracle, IBM Corp. (IBM), EMC (EMC), Teradata Corp. (TDC), and other players have incorporated them into their own platforms. This implies that the core relational database management system (or RDBMS) isn’t equipped to deal with “big data.”
Issues and concerns
NoSQL, Hadoop, and other advanced technologies do provide the basic infrastructure to deal with big data.Hhowever, they aren’t as reliable as relational databases. Software developers are still required to write the business intelligence code to support them. This gives significant scope to the industry players to offer differentiated products.