GBase 8a

Cloud-Native Analytical Database

Introduction

Market Positioning

GBase 8a MPP Cluster (abbreviation: GBase 8a), a cloud-native analytical database, primarily targets the data warehouse, business analytics, and business intelligence markets. Boasting excellent compatibility with the technology ecosystem, it supports multiple data sources and data integration tools. Ideal for cloud-native architectures, elastic scaling scenarios, and real-time analytics use cases, the product offers high flexibility and cost-effectiveness. GBase 8a has been widely adopted across various sectors, including government, statistics, auditing, banking regulation, and securities regulation. It also serves industries with massive business data such as telecommunications, finance, and power utilities.

Key Indicators

Columnar storage with a maximum data compression ratio of 1:30
Automatically provides coarse-grained intelligent indexes, enabling efficient filtering, minimal expansion, and zero maintenance
Cluster supports over 100PB of structured raw data, with 100TB of raw data per node
Supports parallel computing to fully utilize modern SMP multi-core CPU resources
Cluster loading speed exceeds 30TB per hour
Cluster supports Repeatable Read (RR) and Snapshot isolation levels for transactions, along with MVCC (Multi-Version Concurrency Control)
Cluster supports a scale of up to 4096 nodes

Product Architecture · Technical Features

Supports standard Linux kernels: CentOS, Red Hat, SUSE, etc.
Supports standard PC servers based on x86-64 and ARM architectures
Supports local storage (SATA, SAS, SSD, etc.)
Supports array deployment (SAN, NAS)
Supports SSD and Flash storage media as Level-2 I/O cache
Supports standard SQL
Provides universal APIs: JDBC, ODBC, CAPI, ADO.NET
Cluster supports distributed transactions, featuring transaction high availability for primary-replica sharding and ensuring transaction atomicity

Product FAQ

Q：What can GBase 8a do?

A：GBase 8a enables storage, management, and efficient analysis of all types of big data (structured, semi-structured, and unstructured data), providing a complete database solution for industry-specific big data applications.
Q：What is the performance level of GBase 8a?

A：GBase 8a achieves sub-second response for data queries at the 100TB to PB-scale; helps customers save 50%-90% of storage space; reduces customer investment and operation & maintenance (O&M) costs by 50%-90%; enables unified processing of structured, semi-structured, and unstructured data; delivers sub-second response for full-text search across 100 billion-level text entries; and provides end-to-end visualized tools for data query, analysis, and presentation.
Q：What successful cases does GBase 8a have currently?

A：GBase 8a has achieved large-scale market adoption in sectors such as telecommunications, finance, and government services, with key customers including China Mobile, China Unicom, China Telecom, China Banking and Insurance Regulatory Commission (CBIRC), Ministry of Public Security, Ministry of State Security, Ministry of Industry and Information Technology (MIIT), State Taxation Administration, State Oceanic Administration, and PetroChina.
Q：How did GBase 8a perform in project tests?

A：It has participated in over 150 on-site user tests: ranked among the top 3 in China Mobile Group's next-generation data warehouse selection test; achieved excellent results in project tests conducted by units such as the Ministry of Public Security, China Merchants Bank, Xinjiang Mobile, Jilin Mobile, ZTE Corporation, and UFIDA Software, winning unanimous praise from users.
Q：To what extent does GBase 8a support transactions?

A：GBase 8a allows configuring tables to support row storage, column storage, and transaction logs via table attributes. Transactional tables in GBase 8a support row-level locking and concurrent DML operations, significantly improving DML performance and insert loading performance.

Commercial Value

GBase 8a MPP Cluster: A Pioneering Distributed Relational Database Cluster for Converged Data Processing As China-native first distributed relational database cluster product supporting converged data processing, GBase 8a MPP Cluster competes head-on with leading international big data vendors such as EMC, HP, and IBM in sectors including finance and telecommunications, matching their technical capabilities. It has developed proprietary technologies such as cluster active-active, large-scale cluster management, and virtual clusters, achieving international leadership in certain features. The product has been widely adopted by hundreds of users across dozens of industries, including the People's Bank of China (PBC), China Banking and Insurance Regulatory Commission (CBIRC), China Securities Regulatory Commission (CSRC), Agricultural Bank of China (ABC), Bank of China (BOC), Industrial and Commercial Bank of China (ICBC), China Merchants Bank (CMB), China Mobile, China Unicom, China Telecom, General Administration of Customs, and a national defense department. To date, it has been deployed across over 45,000 nodes, managing more than 400PB of data.

Level-1 Values

Speed Enhancement: Query and analysis performance improved by 10–100x
Storage Optimization: 50%–90% reduction in storage space requirements
Cost Reduction: 50%–90% savings in hardware and software investment, plus 30%–50% lower power consumption
Cloud Enablement: Cloud computing architecture support with horizontal scalability

Level-2 Values

Full-Text Search: Integrated full-text retrieval for managing semi-structured data (cloud files)
Unstructured-to-Structured Conversion: Structured extraction and transformation of unstructured data
All-Data Processing: Unified processing of structured, semi-structured, and unstructured data
Visualization: Support for the GBase BI visual data analytics platform

Core Advantages

GBase 8a MPP Cluster boasts core advantages including federated architecture, massive data distribution, efficient compression, optimized storage structure, intelligent indexing, flexible data distribution, online high-performance scalability, high concurrency, high availability, robust security, ease of maintenance, and efficient data loading. Details are as follows:

Federated Architecture Cluster Deployment: Adopting a columnar-storage-based fully parallel MPP + Shared Nothing federated architecture, it features a two-tier deployment with multi-active Coordinator (Master) nodes and data nodes. This eliminates single-point performance bottlenecks and failures, provides a unified access address externally, and supports connection load balancing across nodes. The cluster allows up to 64 Coordinator nodes and over 4096 data nodes; each data node can handle more than 50TB of raw data with shared-nothing, peer-to-peer computing capabilities.
Massive Data Distributed Compressed Storage: Supporting massive data storage and querying for over 15PB of structured data, the cluster employs HASH or RANDOM distribution strategies for distributed data storage. Advanced compression algorithms reduce storage space requirements and improve I/O performance, with support for instance-level, table-level, and column-level compression. Leveraging columnar storage-based data encoding and high-efficiency compression technologies, it achieves a compression ratio of over 1:20 under ideal conditions.
Optimized Storage Structure: Utilizes a columnar-storage-based structure optimized for analytical workloads, complemented by maintenance-free intelligent indexing. It also supports hybrid row-column storage, effectively enhancing query performance for SELECT * scenarios in columnar databases.
Intelligent Indexing: Employs high-performance, maintenance-free coarse-grained intelligent indexing with an index expansion rate of less than 1%. Integrating column-level statistics, the index directly supports data retrieval and filtering, significantly reducing disk I/O and improving query performance for massive datasets.
Flexible Data Distribution: Users can customize data distribution strategies (HASH or RANDOM) based on business requirements, achieving optimal balance among performance, reliability, and flexibility.
Online High-Performance Scalability: Supports online scaling (expansion/shrinking) of cluster nodes with minimal business impact, delivering a scaling performance of over 20TB/hour.
High Concurrency: Enables concurrent read and write operations with support for query-while-loading. A 3-node cluster can handle over 1000 concurrent connections.
High Data Availability: Ensures high availability through redundancy mechanisms, with automatic data synchronization between standby shards. Supporting 1 or 2 custom-configurable data replicas, it features automatic fault detection, metadata/business data synchronization, and fault recovery without manual intervention—replica failures do not affect cluster availability.
Comprehensive Resource Management: Achieves resource isolation among database users through flexible resource pool and resource plan configurations. It supports governance of key resources and metrics including CPU, memory, disk space, disk I/O, and concurrent tasks, providing full multi-tenancy capabilities.
Active-Standby Cluster High Availability: Supports active-standby cluster deployment with full/incremental data synchronization, synchronization rollback, error recovery, and same-city disaster recovery.
Security: Provides comprehensive user, role, and permission management; supports detailed audit logging with configurable policies and graphical audit tools. Features transparent data encryption (storage encryption, password encryption, encrypted compression), encryption functions (AES_ENCRYPT(), ENCRYPT(), MD5(), SHA1(), SHA(), etc.), in-database data masking, and Kerberos authentication for cluster and external data source access.
Ease of Maintenance: Offers graphical management and monitoring tools to simplify database administration.
Efficient Data Loading: Delivers parallel data loading capabilities with linear performance scaling as nodes increase. Employing policy-based loading modes, the cluster achieves an overall loading speed of over 30TB/hour.
Adaptive Load Management: Supports arbitrary concurrent jobs through adaptive load balancing. The database automatically adjusts the number of concurrent jobs based on system load, enabling parameter-free tuning.
Backup/Restore: Supports data backup/restore between the cluster and Hadoop, with a performance of over 100TB/hour.
Standardization: Complies with SQL 92, SQL 99, and SQL 2003 ANSI/ISO standards. Supports interfaces including ODBC, JDBC, ADO.NET, OLEDB, as well as C API, Python API, and TCL API. It also supports SQL 2003 OLAP functions.

Technical Features

Developed independently by GBase, GBase 8a MPP Cluster is a mature analytical MPP database tailored for the big data era. It boasts core advantages including federated architecture, massive data distribution, efficient compression, optimized storage structure, intelligent indexing, flexible data distribution, online high-performance scalability, high concurrency, high availability, robust security, ease of maintenance, and efficient data loading. Details are as follows:

Platform Compatibility:
- mainstream middleware such as Kingdee and TongTech Software.
- Compatible with major servers including Sugon, Inspur, H3C, Great Wall, and Lenovo.
- Adapts to mainstream processors such as Hygon, Kunpeng, Phytium, Sunway, Loongson, and Zhaoxin.
- Runs on major domestic operating systems including Kylin OS (NeoKylin & Galaxy Kylin), China Standard Software, and UOS.
Encoding Formats:
- Supports multiple encoding formats (UTF-8, UTF8-MB4, GBK, GB18030, Unicode) and multi-language compatibility.
Efficient Massive Data Storage:
- A single cluster can process over 15PB of structured data, using HASH or RANDOM distribution strategies for distributed storage.
- Each data node supports more than 50TB of raw data with shared-nothing, peer-to-peer computing capabilities.
- A single table can handle up to 247 trillion rows of data.
Massively Parallel Processing (MPP):
- Implements automatic and efficient parallel processing for data loading and querying, fully leveraging SMP multi-core CPU resources to process massive datasets in parallel. Combining single-node parallelism with cross-node MPP cluster parallelism, it performs distributed parallel computing on operators, enabling ultra-large-scale distributed parallel processing for data query/analysis and parallel loading from multiple data sources. The product supports complex SQL execution to meet scheduled and ad-hoc analytical needs, and can run all SQL queries specified in the TPC-H and TPC-DS benchmarks even with cross-node data distribution.
High Data Availability:
- Ensures cluster high availability through redundancy mechanisms, with automatic data synchronization between standby shards. It demonstrates parallel processing capabilities both within and across nodes in the cluster environment.
Hash Indexing:
- Utilizes Hash indexing to improve the positioning efficiency of equi-join queries, achieving sub-second response times for precise single-table queries within the cluster.
Intelligent Indexing:
- Employs high-performance, maintenance-free coarse-grained intelligent indexing with an index expansion rate of less than 1%. Integrating column-level statistics, the index directly supports data retrieval and filtering, significantly reducing disk I/O and improving query performance for massive datasets. For 100-million-level data volumes, it achieves sub-second response times for precise time-column-based queries on single tables within a cluster node.
Backup and Recovery Management:
- Provides dedicated backup and recovery tools, supporting physical backup/recovery (full, incremental, differential) and logical backup/recovery (instance-level, user-level, table-level). Users can flexibly select backup and recovery strategies based on different application scenarios.
Data Encryption: Offers transparent data encryption with table-level or column-level granularity, supporting:
- Data storage encryption, database password encryption, and encrypted data compression.
- Encryption functions such as AES_ENCRYPT(), ENCRYPT(), MD5(), SHA1(), and SHA().
- Encryption of backup files via backup software.
Core Process-Level High Availability:
- Core processes (GNode, GCluster, GCware) are monitored in real-time with immediate fault recovery. It provides comprehensive physical database recovery capabilities, including system failure recovery, complete media failure recovery, network failure recovery, and tablespace/filegroup-based media failure recovery. Supporting full recovery mode and point-in-time recovery mode, it can restore data to the crash point or a specified point in time.

Applicable Scenarios

GBase 8a is a high-performance database product designed for big data analytics applications. It addresses the growing demands for data query, statistics, analysis, mining, and backup in data-intensive industries, and can serve as the underlying database for data warehouse systems, BI systems, and decision support systems.