Apache CarbonData 2.0 Preview(关键特性提前预览)
来源:
|
作者:华为云折扣网
|
发布时间: 2020-12-20
|
4251 次浏览
|
分享到:
CarbonData是一种高性能大数据存储方案,已在100+企业生产环境上部署应用,其中最大的单一集群数据规模达到几万亿。
Usage/Example:
Please refer the below link to use pycarbon https://github.com/apache/carbondata/blob/master/python/README.md
15) Materialized view on all table such as Parquet and ORC
Use Case: CarbonData’s datamap interface can be used to improve the query performance of other formats like Parquet/ORC. One of the implementations of datamap interface is MV table which precompute the aggregation results based on the user input. By creating MV datamap on a parquet/orc table the user can get the benefit of quering a pre-computed data instead of raw data which results in better query results.
This is possible as carbon will redirect the query to the MV datamap instead of the parquet tables.
Example:
Spark.sql(""" create table source(empname String, designation String, deptno int, deptname String, salary int) using parquet """) Spark.sql(""" create materialized view mv_parquet as select empname, deptname, avg(salary) from source group by empname, deptname """)
Get more about usage: https://github.com/apache/carbondata/blob/master/integration/spark/src/test/scala/org/apache/carbondata/view/MVTest.scala
点击这里→了解更多精彩内容(同时获取华为云服务器折扣)