作者:陈亮,Apache CarbonData项目 PMC Chairman,技术Committer 来源:华为云社区
CarbonData是一种高性能大数据存储方案,已在100+企业生产环境上部署应用,其中最大的单一集群数据规模达到几万亿。针对当前大数据领域分析场景需求各异而导致的存储冗余问题,业务驱动下的数据分析灵活性要求越来越高,CarbonData提供了一种新的融合数据存储方案,以一份数据同时支持多种应用场景,万亿数据规模,查询性能秒级响应。
当前CarbonData2.0正在rc2阶段,社区用了半年多时间,倾力打造2.0里程碑版本,新特性都是从实际业务需求演进而来的,尤其在性能和生态集成方面做了非常多的优化。
2.0新特性提前预览(注:下面内容不是官方正式的release notes,只是特性提前预览):
1. Support pre priming cache in Index cache server:
Use Case: CarbonData has a mechanism to load the min/max index cache into memory on the first query that is executed on the specified table. This causes degrade to the query performance. To improve the performance of the First time query, the user can enable prepriming feature which will load the min/max cache into memory on each load.
Usage:
carbon.indexserver.enable.prepriming=true
2. Carbon Extension for Spark 2.4, without Carbon Session
Use Case: Due to the tight integration of carbon with computer engine spark, Carbon requires CarbonSession to be created instead of SparkSession,
To make the integration layer modular, CarbonData now supports the SparkSessionExtention API which enables carbon to integrate its parser and optimizer to the existing SparkSession.
Example:
val spark = SparkSession .builder() .config(conf) .master("spark://localhost:7077") .appName("Test") .enableHiveSupport() .config("spark.sql.warehouse.dir", "./warehouse") .config("spark.sql.extensions", "org.apache.spark.sql.CarbonExtensions") .getOrCreate() spark.sql("""CREATE TABLE IF NOT EXISTS test_table ( id string, name string, city string, age Int) STORED AS carbondata""")
3. MV Time-series support with Rollup support, multiple granularity
UseCase: Analytics data such as application performance monitoring, network data, sensor data, events, clicks, banking, server metrics, etc., has to be aggregated and analyzed or monitored over a period of time for business needs. CarbonData supports pre-computation of aggregations and joins through Materialized views which provides faster performance results, timeseries support is required for many users.