Launch of StarRocks 2.0: A New-Gen Enterprise-Level MPP Database Unlocking 5X to 10X Analytics Performance Improvements than Competitors

Launch of StarRocks 2.0: A New-Gen Enterprise-Level MPP Database Unlocking 5X to 10X Analytics Performance Improvements than Competitors

BEIJING, Jan. 28, 2022 /PRNewswire/ — StarRocks, a new-generation massively parallel processing (MPP) database service designed for all analytical scenarios, launched the 2.0 version. This new version delivers a myriad of performance improvements in both single-table and multi-table query scenarios. The single-table query performance is twice that of its competitors. The multi-table query performance is five to ten times that of other database systems. StarRocks 2.0 introduces a new model, the primary key model, which enhances real-time update performance by three to ten times. In addition, the memory management scheme is redesigned in 2.0 to accommodate customers’ requirements for higher availability and stability.

Last September, StarRocks opened its source code to global communities and communities have become a key driving force behind the improvement of StarRocks. StarRocks has received more than 2,000 GitHub stars within the first 135 days after the code is open. Hundreds of large and medium-sized enterprises are attracted to use StarRocks.

2X Single-Table Query Performance Compared to Competitors

StarRocks 2.0 is ideal for single-table and multi-table queries. For single-table queries, StarRocks 2.0 innovatively uses global dictionaries to optimize queries on low-cardinality fields, delivering a single-table query performance twice that of its earlier versions and also other leading database service providers. For multi-table queries, StarRocks 2.0 has resigned the cost-based optimizer (CBO) to handle complex multi-table queries, improving multi-table query performance by two times and making StarRocks 2.0 five to ten times faster than other database systems.

In terms of data updates, traditional OLAP systems use the merge-on-read mode to update data, which is not the best solution because it pursues data loading efficiency at the cost of query performance. As real-time data update requirements keep rising in the finance and logistics sectors, this model no longer lives up to expectations. StarRocks 2.0 introduces a novel data model, the primary key model, to update data in delete-and-insert mode. This innovation enhances query performance by three to ten times in real-time update scenarios.

In addition, the memory management scheme is redesigned in StarRocks 2.0 to improve system stability. A pipeline execution engine built for higher concurrency and faster complex queries on multi-core machines has been released for trial use. This engine will be officially released in StarRocks 2.1.

Five Technical Highlights and R&D Directions in 2022

StarRocks announced its five major R&D directions in 2022 to the community.

Resource Management

StarRocks will introduce a new resource management mechanism to provide separate resource groups for different businesses. This mechanism guarantees sufficient resource quotas and isolated resources for businesses. This way, different services can run on the same cluster, which simplifies O&M and improves cluster resource utilization.

Materialized Views with JOINs

Data modeling in a majority of companies requires complex data development from data engineers. Materialized views with JOINs enable data engineers to create various types of materialized views to construct data models. This significantly reduces the workload of data engineers and simplifies data modeling.

StarRocks also introduces intelligent materialized views. This feature intelligently recommends materialized views to users based on query behavior to accelerate queries.

Separation of Storage and Compute

In the earlier versions of StarRocks, compute and storage are tightly coupled for excellent query performance. However, this architecture cannot achieve on-demand resource allocation and may result in unnecessary costs. In 2022, StarRocks will implement a new architecture where storage and compute are decoupled. This new architecture supports offline analytics in parallel with real-time analytics and can be deployed on public, private, and multiple clouds.

Lightning Fast Data Lake Analytics

Currently, StarRocks serves more like a data warehouse. Customers import high-value data from data lakes to StarRocks for ultra-fast data analytics. In 2022, StarRocks will press ahead with its endeavors to enhance data lake analytics capabilities and provide unified and blazing fast analytics experience for customers.

The StarRocks community has completed the first-phase development of data queries on Iceberg, with the collaboration from renowned communities and developers in world’s leading cloud computing companies. Test results show that StarRocks offers a 5X performance improvement compared to Trino. In the future, the StarRocks community will extend its support for Hudi and offer more feature enhancements.

Unified Batch and Stream Processing

StarRocks plans to implement unified stream and batch processing across hundreds of nodes. This way, customers’ raw data can be processed and then analyzed all in StarRocks. This guarantees a one-stop, unified, and blazing fast data processing and analytics experience, bringing the vision of unification to a new level.

About StarRocks

StarRocks is a new-generation MPP database designed for all analytical scenarios. It features a simple architecture, vectorized engine, redesigned CBO, and a query speed (especially for multi-table join queries) beyond the reach of other database products. StarRocks supports real-time data analytics and achieves efficient queries on data that is updated in real time. StarRocks provides materialized views to further accelerate queries. Customers can use StarRocks to flexibly build various schemas such as flat tables and the star and snowflake schemas. StarRocks is compatible with the MySQL protocol and can interconnect with various MySQL clients and tools. StarRocks does not rely on any external systems. The simple architecture makes it highly available, scalable, and easy for O&M.

StarRocks meets requirements in various data analytics scenarios, such as multi-dimensional filtering and analytics, real-time data analytics, and ad hoc queries. It allows access from thousands of users at the same time. Typical use scenarios include business intelligence, real-time data warehousing, user profiling, reports and dashboards, order analysis, O&M and monitoring, anti-fraud analysis, and risk management. Hundreds of large and medium-sized enterprises from various sectors have deployed StarRocks to their production environments and have seen thousands of StarRocks servers run stably and steadily on their platforms.

Cision View original content to download multimedia:

SOURCE StarRocks

Leave a Reply

Your email address will not be published.