资讯中心
关于我们
欢迎光临格子云商城!
GE ZI CLOUD
数字化应用聚合平台
格子云
按钮文本
热门搜索:惠普  复印纸  中性笔
全部商品分类
技术社区

Apache CarbonData 2.0 Preview(关键特性提前预览)

来源: | 作者:华为云折扣网 | 发布时间: 2020-12-20 | 4250 次浏览 | 分享到:
CarbonData是一种高性能大数据存储方案,已在100+企业生产环境上部署应用,其中最大的单一集群数据规模达到几万亿。

Example:

spark.sql("""CREATE TABLE sales ( order_time timestamp, user_id string, xingbie string, country string, quantity int, price bigint) STORED AS carbondata""") spark.sql(""" CREATE MATERIALIZED VIEW agg_sales SELECT timeseries(order_time, 'minute'),avg(price) FROM sales GROUP BY timeseries(order_time, 'minute') """) spark.sql(""" SELECT timeseries(order_time,'minute'), avg(price) FROM sales GROUP BY timeseries(order_time,'minute') """)

Get more about usage: https://github.com/apache/carbondata/blob/master/mv/core/src/test/scala/org/apache/carbondata/mv/timeseries/TestMVTimeSeriesLoadAndQuery.scala

4. Supports the spatial index Data-map

UseCase: For queries which require a filter on a spatial object like a region on a 2D map, these type of queries would be treated as a full scan query, causing significant performance degrade.To solve this limitation in carbon, a concept called as ‘spatial indexing’, that allows for accessing a spatial object efficiently is implemented. It is a common technique used by spatial databases.

Example:

spark.sql("""create table source_index(id BIGINT, latitude long, longitude long) stored by 'carbondata' TBLPROPERTIES ( 'INDEX_HANDLER'='mygeohash', 'INDEX_HANDLER.mygeohash.type'='geohash', 'INDEX_HANDLER.mygeohash.sourcecolumns'='longitude, latitude', 'INDEX_HANDLER.mygeohash.originLatitude'='19.832277', 'INDEX_HANDLER.mygeohash.gridSize'='50', 'INDEX_HANDLER.mygeohash.minLongitude'='1.811865', 'INDEX_HANDLER.mygeohash.maxLongitude'='2.782233', 'INDEX_HANDLER.mygeohash.minLatitude'='19.832277', 'INDEX_HANDLER.mygeohash.maxLatitude'='20.225281', 'INDEX_HANDLER.mygeohash.conversionRatio'='1000000') """) spark.sql(""" select * from source_index where IN_POLYGON('16.321011 4.123503,16.137676 5.947911,16.560993 5.935276,16.321011 4.123503') """)

Get more about usage

https://github.com/apache/carbondata/blob/master/integration/spark/src/test/scala/org/apache/carbondata/geo/GeoTest.scala

5. Support Secondary Index