Example:
spark.sql("""CREATE TABLE sales ( order_time timestamp, user_id string, xingbie string, country string, quantity int, price bigint) STORED AS carbondata""") spark.sql(""" CREATE MATERIALIZED VIEW agg_sales SELECT timeseries(order_time, 'minute'),avg(price) FROM sales GROUP BY timeseries(order_time, 'minute') """) spark.sql(""" SELECT timeseries(order_time,'minute'), avg(price) FROM sales GROUP BY timeseries(order_time,'minute') """)
Get more about usage: https://github.com/apache/carbondata/blob/master/mv/core/src/test/scala/org/apache/carbondata/mv/timeseries/TestMVTimeSeriesLoadAndQuery.scala
4. Supports the spatial index Data-map
UseCase: For queries which require a filter on a spatial object like a region on a 2D map, these type of queries would be treated as a full scan query, causing significant performance degrade.To solve this limitation in carbon, a concept called as ‘spatial indexing’, that allows for accessing a spatial object efficiently is implemented. It is a common technique used by spatial databases.
Example:
spark.sql("""create table source_index(id BIGINT, latitude long, longitude long) stored by 'carbondata' TBLPROPERTIES ( 'INDEX_HANDLER'='mygeohash', 'INDEX_HANDLER.mygeohash.type'='geohash', 'INDEX_HANDLER.mygeohash.sourcecolumns'='longitude, latitude', 'INDEX_HANDLER.mygeohash.originLatitude'='19.832277', 'INDEX_HANDLER.mygeohash.gridSize'='50', 'INDEX_HANDLER.mygeohash.minLongitude'='1.811865', 'INDEX_HANDLER.mygeohash.maxLongitude'='2.782233', 'INDEX_HANDLER.mygeohash.minLatitude'='19.832277', 'INDEX_HANDLER.mygeohash.maxLatitude'='20.225281', 'INDEX_HANDLER.mygeohash.conversionRatio'='1000000') """) spark.sql(""" select * from source_index where IN_POLYGON('16.321011 4.123503,16.137676 5.947911,16.560993 5.935276,16.321011 4.123503') """)
Get more about usage
https://github.com/apache/carbondata/blob/master/integration/spark/src/test/scala/org/apache/carbondata/geo/GeoTest.scala
5. Support Secondary Index