Big Data: Where Dreams Take Flight

Release Date:2013-07-24 Author:Chengzhong Xu and Zhibin Yu Click:

   From academia to industry, big data has become a buzzword in information technology. The US Federal Government is paying much attention to the big-data revolution. In 2012, fourteen US government departments allocated funds to 87 big-data projects [1]. Europe has the second largest amount of data [2], and most universities and research institutes have already established big-data research programs. In Asia, especially in China, central and local governments have been setting aside funds for their own big-data programs. The big-data related 973 Projects in China are good examples of this. Industry players have been following in the footsteps of big-data pioneers such as Google, Facebook, Twitter, and Baidu, and more and more companies are rushing into the big-data business. Companies have been analyzing the purchasing behavior of huge numbers of customers and have been devising more attractive plans and policies. Big data is already an important part of the $64 billion database and data analytics market [3]. Indeed, big data will open up commercial opportunities comparable in scale to those created by enterprise software of the late 1980s, the internet of the 1990s, and the social media explosion today.


    However, what is big data? It has been defined in many different ways. We prefer to define big data as data sets that are too big for current information technologies to capture, transmit, store, process, or visualize. Although this definition is simple, it encompasses computing complexity theory, computer architecture, operating system, programming model, database technologies, algorithms, and applications. People from different fields have dramatically different understandings of big data, which is why there is so much excitement and conjecture surrounding it.


    In this special issue, we present papers that discuss big-data technology from different perspectives. These are not only high-level surveys but also reports on initial results from big-data projects. Communication infrastructure is one of the most important aspects of big data. Yi Zhu and Zhengkun Mi from Nanjing University of Posts and Telecommunications discuss content-centric networking, which is seen as a promising approach to big-data distribution. They propose a networking architecture for processing big data, and this architecture is fundamentally different from TCP/IP. Shengmei Luo et al. from the Cloud Computing & IT Institute of ZTE Corporation present a survey of big-data analytics. They analyze challenges related to storage, data-mining algorithms, and programming models for big data. They also predict opportunities in the big-data era. Although there are many potential business opportunities in big data, security is of the utmost importance for users and cannot be overlooked. Ruixuan Li et al. from Huazhong University of Science and Technology provide an overview of data security and privacy-preservation for cloud storage. They carefully investigate confidentiality, data integrity, and data availability. They also propose a feasible solution to current security problems. Shigang Chen et al. from the University of Florida delve more deeply into data integrity. They propose a novel authenticated data structure called Cloud Merkle B+ tree that supports dynamic operations such as insertion, deletion and modification. CMBT lowers overhead from O (n ) to O (log n ).


    Moving to big data applications, algorithms oriented towards a single machine are not necessarily efficient in big-data platforms because many machines need to run concurrently for the same task. Weisong Shi et al. from Wayne State University design a mechanism called SPBD that reduces the response time of big-data systems. This mechanism is very feasible in practice. Zhendong Bei et al. report their experiences with big-data applications that use MapReduce/Hadoop. They confirm that manually tuning up to 190 Hadoop configuration parameters is extremely time consuming, if at all possible. They then propose an automatic performance prediction scheme based on random forest to determine the best configuration parameter combinations. Their experimental results show that their scheme can predict the performance of Hadoop systems very accurately.


    Challenges and opportunities exist together in the big-data era. We believe most of these challenges will be overcome and opportunities will be realized. Big data is a field where dreams will take flight.