Developments in Big-Data Hardware

Release Date:2014-07-17 By Xiong Xiankui Click:

 

 

There are three important aspects of big data: the huge scale of the data to be handled, multidimensional data analysis, and diverse data processing requirements. Big data is so big that it is usually measured in petabytes whereas data used in traditional online transactions is usually measured in terabytes. Big data is comprehensively analyzed in a dozen dimensions whereas in a traditional relational database, more emphasis is given to the coupling relationship between data and to the NoSQL database, which uses the key-value of hash for storage. Big data also has diverse data processing requirements. Non-real-time business intelligent processing occurs in the background and is used for analysis and mining. However, real-time decision-making, such as that which occurs in a high-frequency financial trading system, is needed to replace traditional online transaction and processing of big data. These factors have driven the development of bottom-layer hardware technologies.

Video-based smart city applications, such as vehicle license plate recognition, facial recognition, or vehicle tracking, create data at the petabyte or even exabyte level and need to be stored economically. Large-capacity, high-density SATA hard disks are usually used to store and analyze such data, and open-source software, such as Hadoop, is used to construct a distributed file system for storage. The POSIX-mode API interface is used to build a big-data processing platform. Recently, there has been much interest in a server that combines computing and storage. A storage server usually has:

●    a two-channel Intel XEON processor

●    strong computing and storage capabilities

●    mid-range memory

●    GE and 10GE network adapters

●    12 to 24 local SATA hard disks.

Large internet companies, such as Tencent, use this type of server to build their own big-data processing platforms. ZTE provides the i8350 storage server, which meets the TS6 specifications and has passed rigorous tests. ZTE i8350 is now very popular in the market.

Database platforms, especially those for big-data processing, analysis, and mining are becoming column-oriented rather than row-oriented relational databases. Sybase IQ is a high-performance, highly-scalable, column-oriented database suitable for OLAP applications. The hardware architecture of the database usually comprises high-performance disk arrays and high-density server arrays. ZTE provides the E9000 blade server, which has a height of 10U and supports 16 blades. The server has a built-in high-bandwidth FCoE gigabit switch, supports FCF (which provides FC ports), and can be upgraded smoothly to support 40GE FCoE with a single node. ZTE i8350 can be combined with ZTE’s high-performance FC SAN disk array in the KS3200 series to create a PB-level OLAP big-data processing system. Many of ZTE’s big-data products are based on a database platform and have been put into large-scale commercial use. ZTE’s UBAS user-behavior analysis system is one such product used in the telecom industry.

Because of the wide variety of big-data application models, it is difficult for customers to identify different hardware solutions. A big-data processing platform with integrated software and hardware is needed. Oracle’s Exadata and IBM’s Netezza are examples of all-in-one hardware platforms in which a minicomputer or high-end servers at the front end are used to resolve and distribute SQL requests, and a server cluster at the back end is used for processing. Both Oracle Exadata and IBM Netezza have solid-state storage, such as PCIE SSD, for accelerated index processing. IBM Netezza also uses FPGA on the blade server cluster to accelerate SQL processing. However, the two platforms have different storage mechanisms. Oracle Exadata uses shelf-mode storage servers for distributed storage whereas IBM Netezza uses the blades to connect external optical fiber disk arrays. The combined state-of-the-art technologies within the two databases make an all-in-one platform and service fees expensive. SAP’s HANA is another all-in-one system with customizable hardware. It has a hybrid row-column memory database and sells well. HANA supports multi-node extension and therefore has high requirements in terms of memory and bandwidth.

These all-in-one platforms can support various applications, including OLTP and OLAP applications, if the storage capacity is expanded. They are also suitable for background non-real-time analysis and real-time decision making. ZTE’s R8500 four-channel server includes Intel’s Brickland platform, which ensures high reliability, availability, and serviceability. Its four CPU sockets support up to 96 hot pluggable memory chips. If one LRDIMM has a capacity of 32 GB, then one R8500 has 3 TB of memory, which is ideal for memory databases. With powerful processing capabilities, R8500 can also be the head of the all-in-one platform. The R8500 can be combined with an i8350 storage server to create a distributed storage system that is used as an all-in-one platform for OLAP applications.

Existing computing-centric technologies are insufficient for big-data applications. At present, big-data processing is usually optimized in traditional memory-disk access mode, and there is always a data I/O bottleneck during processing. New hardware and materials, such as phase-change materials and impedance RAM, will be commercialized, and this will greatly increase processing speed and memory. Then, big-data processing technologies based on memory computing will boom. ZTE is working hard to develop big-data hardware.