Apache CarbonData 2.0 Preview（关键特性提前预览）

来源: | 作者:华为云折扣网 | 发布时间: 2020-12-20 | 5007 次浏览 | 分享到:

CarbonData是一种高性能大数据存储方案，已在100+企业生产环境上部署应用，其中最大的单一集群数据规模达到几万亿。

Example:

spark.sql("""CREATE TABLE sales ( order_time timestamp, user_id string, xingbie string, country string, quantity int, price bigint) STORED AS carbondata""") spark.sql(""" CREATE MATERIALIZED VIEW agg_sales SELECT timeseries(order_time, 'minute'),avg(price) FROM sales GROUP BY timeseries(order_time, 'minute') """) spark.sql(""" SELECT timeseries(order_time,'minute'), avg(price) FROM sales GROUP BY timeseries(order_time,'minute') """)

Get more about usage: https://github.com/apache/carbondata/blob/master/mv/core/src/test/scala/org/apache/carbondata/mv/timeseries/TestMVTimeSeriesLoadAndQuery.scala

4. Supports the spatial index Data-map

UseCase: For queries which require a filter on a spatial object like a region on a 2D map, these type of queries would be treated as a full scan query, causing significant performance degrade.To solve this limitation in carbon, a concept called as ‘spatial indexing’, that allows for accessing a spatial object efficiently is implemented. It is a common technique used by spatial databases.

Example:

spark.sql("""create table source_index(id BIGINT, latitude long, longitude long) stored by 'carbondata' TBLPROPERTIES ( 'INDEX_HANDLER'='mygeohash', 'INDEX_HANDLER.mygeohash.type'='geohash', 'INDEX_HANDLER.mygeohash.sourcecolumns'='longitude, latitude', 'INDEX_HANDLER.mygeohash.originLatitude'='19.832277', 'INDEX_HANDLER.mygeohash.gridSize'='50', 'INDEX_HANDLER.mygeohash.minLongitude'='1.811865', 'INDEX_HANDLER.mygeohash.maxLongitude'='2.782233', 'INDEX_HANDLER.mygeohash.minLatitude'='19.832277', 'INDEX_HANDLER.mygeohash.maxLatitude'='20.225281', 'INDEX_HANDLER.mygeohash.conversionRatio'='1000000') """) spark.sql(""" select * from source_index where IN_POLYGON('16.321011 4.123503,16.137676 5.947911,16.560993 5.935276,16.321011 4.123503') """)

Get more about usage

https://github.com/apache/carbondata/blob/master/integration/spark/src/test/scala/org/apache/carbondata/geo/GeoTest.scala

5. Support Secondary Index

« 上一页 123 4 5…7 下一页 » 查看全文 »

上一篇：基于Docker和D......

下一篇：全球最大CDN服务商......

客服微信

备案号：浙ICP备19010705号-2