RoaringBitmap extension for PostgreSQL.
It's initial based on https://github.com/zeromax007/gpdb-roaringbitmap.
Roaring bitmaps are compressed bitmaps which tend to outperform conventional compressed bitmaps such as WAH, EWAH or Concise. In some instances, roaring bitmaps can be hundreds of times faster and they often offer significantly better compression. They can even be faster than uncompressed bitmaps. More information https://github.com/RoaringBitmap/CRoaring .
- PostgreSQL 10+
- Other Requirements from https://github.com/RoaringBitmap/CRoaring
su - postgres
make
sudo make install
psql -c "create extension roaringbitmap"
Note:You can use make -f Makefile_native instead of make to let the compiler use the SIMD instructions if it's supported by your CPU. In some scenarios, it may double the performance. But if you copy the pg_roaringbitmap binary which builded on machine with SIMD support to other machine without SIMD and run, you could get a SIGILL crash.
make installcheck
Parallel execution is not supported in PostgreSQL 9.5 and earlier.
If you want to compile on these early PostgreSQL versions or Greenplum 6.0(based on PostgreSQL 9.4), you need to remove the PARALLEL keyword from these SQL files.
cd pg_roaringbitmap
sed 's/PARALLEL SAFE//g' -i roaringbitmap--*.sql
sed -z 's/,[ \r\n]*PARALLEL = SAFE//g' -i roaringbitmap--*.sql
Then refer to [Build] above for building, such as the steps to build on Greenplum 6.0:
su - gpadmin
make
make install
psql -c "create extension roaringbitmap"
sudo yum install 'perl(Data::Dumper)'
make installcheck
Since the expected output is based on PostgreSQL 10+, this test will not pass. Check the difference in the output file. If the execution results are the same, only the execution plan or other content that is not related to pg_roaringbitmap` is different, the test can be considered OK.
diff results/roaringbitmap.out expected/roaringbitmap_gpdb6.out
Logically, you could think of roaringbitmap data type as bit(4294967296), and it should be noted that
the integers added to bitmaps is considered to be unsigned. Within bitmaps, numbers are ordered according to uint32. We order the numbers like 0, 1, ..., 2147483647, -2147483648, -2147483647,..., -1. But we use bigint to
reference the range of these integers, that is [0 4294967296).
Support two kind of input/output syntax 'array' and 'bytea', The default output format is 'bytea'.
postgres=# select roaringbitmap('{1,100,10}');
                 roaringbitmap
------------------------------------------------
 \x3a30000001000000000002001000000001000a006400
(1 row)
or
postgres=# select '\x3a30000001000000000002001000000001000a006400'::roaringbitmap;
                 roaringbitmap
------------------------------------------------
 \x3a30000001000000000002001000000001000a006400
(1 row)
output format can changed by roaringbitmap.output_format
postgres=# set roaringbitmap.output_format='bytea';
SET
postgres=# select '{1}'::roaringbitmap;
             roaringbitmap
----------------------------------------
 \x3a3000000100000000000000100000000100
(1 row)
postgres=# set roaringbitmap.output_format='array';
SET
postgres=# select '{1}'::roaringbitmap;
 roaringbitmap
---------------
 {1}
(1 row)
CREATE TABLE t1 (id integer, bitmap roaringbitmap);
INSERT INTO t1 SELECT 1,rb_build(ARRAY[1,2,3,4,5,6,7,8,9,200]);
INSERT INTO t1 SELECT 2,rb_build_agg(e) FROM generate_series(1,100) e;
SELECT roaringbitmap('{1,2,3}') | roaringbitmap('{3,4,5}');
SELECT roaringbitmap('{1,2,3}') & roaringbitmap('{3,4,5}');
SELECT roaringbitmap('{1,2,3}') # roaringbitmap('{3,4,5}');
SELECT roaringbitmap('{1,2,3}') - roaringbitmap('{3,4,5}');
SELECT rb_or_agg(bitmap) FROM t1;
SELECT rb_and_agg(bitmap) FROM t1;
SELECT rb_xor_agg(bitmap) FROM t1;
SELECT rb_build_agg(e) FROM generate_series(1,100) e;
SELECT rb_cardinality('{1,2,3}');
SELECT rb_to_array(bitmap) FROM t1 WHERE id = 1;
SELECT unnest(rb_to_array('{1,2,3}'::roaringbitmap));
or
SELECT rb_iterate('{1,2,3}'::roaringbitmap);
| Opperator | Input | Output | Desc | Example | Result | 
|---|---|---|---|---|---|
| & | roaringbitmap,roaringbitmap | roaringbitmap | bitwise AND | roaringbitmap('{1,2,3}') & roaringbitmap('{3,4,5}') | {3} | 
| | | roaringbitmap,roaringbitmap | roaringbitmap | bitwise OR | roaringbitmap('{1,2,3}') | roaringbitmap('{3,4,5}') | {1,2,3,4,5} | 
| | | roaringbitmap,integer | roaringbitmap | add element to roaringbitmap | roaringbitmap('{1,2,3}') | 6 | {1,2,3,6} | 
| | | integer,roaringbitmap | roaringbitmap | add element to roaringbitmap | 6 | roaringbitmap('{1,2,3}') | {1,2,3,6} | 
| # | roaringbitmap,roaringbitmap | roaringbitmap | bitwise XOR | roaringbitmap('{1,2,3}') # roaringbitmap('{3,4,5}') | {1,2,4,5} | 
| << | roaringbitmap,bigint | roaringbitmap | bitwise shift left | roaringbitmap('{1,2,3}') << 2 | {0,1} | 
| >> | roaringbitmap,bigint | roaringbitmap | bitwise shift right | roaringbitmap('{1,2,3}') >> 3 | {4,5,6} | 
| - | roaringbitmap,roaringbitmap | roaringbitmap | difference(bitwise ANDNOT) | roaringbitmap('{1,2,3}') - roaringbitmap('{3,4,5}') | {1,2} | 
| - | roaringbitmap,integer | roaringbitmap | remove element from roaringbitmap | roaringbitmap('{1,2,3}') - 3 | {1,2} | 
| @> | roaringbitmap,roaringbitmap | bool | contains | roaringbitmap('{1,2,3}') @> roaringbitmap('{3,4,5}') | f | 
| @> | roaringbitmap,integer | bool | contains | roaringbitmap('{1,2,3,4,5}') @> 3 | t | 
| <@ | roaringbitmap,roaringbitmap | bool | is contained by | roaringbitmap('{1,2,3}') <@ roaringbitmap('{3,4,5}') | f | 
| <@ | integer,roaringbitmap | bool | is contained by | 3 <@ roaringbitmap('{3,4,5}') | t | 
| && | roaringbitmap,roaringbitmap | bool | overlap (have elements in common) | roaringbitmap('{1,2,3}') && roaringbitmap('{3,4,5}') | t | 
| = | roaringbitmap,roaringbitmap | bool | equal | roaringbitmap('{1,2,3}') = roaringbitmap('{3,4,5}') | f | 
| <> | roaringbitmap,roaringbitmap | bool | not equal | roaringbitmap('{1,2,3}') <> roaringbitmap('{3,4,5}') | t | 
| Function | Input | Output | Desc | Example | Result | 
|---|---|---|---|---|---|
| rb_build | integer[] | roaringbitmap | Create roaringbitmap from integer array | rb_build('{1,2,3,4,5}') | {1,2,3,4,5} | 
| rb_index | roaringbitmap,integer | bigint | Return the 0-based index of element in this roaringbitmap, or -1 if do not exsits | rb_index('{1,2,3}',3) | 2 | 
| rb_cardinality | roaringbitmap | bigint | Return cardinality of the roaringbitmap | rb_cardinality('{1,2,3,4,5}') | 5 | 
| rb_and_cardinality | roaringbitmap,roaringbitmap | bigint | Return cardinality of the AND of two roaringbitmaps | rb_and_cardinality('{1,2,3}','{3,4,5}') | 1 | 
| rb_or_cardinality | roaringbitmap,roaringbitmap | bigint | Return cardinality of the OR of two roaringbitmaps | rb_or_cardinality('{1,2,3}','{3,4,5}') | 5 | 
| rb_xor_cardinality | roaringbitmap,roaringbitmap | bigint | Return cardinality of the XOR of two roaringbitmaps | rb_xor_cardinality('{1,2,3}','{3,4,5}') | 4 | 
| rb_andnot_cardinality | roaringbitmap,roaringbitmap | bigint | Return cardinality of the ANDNOT of two roaringbitmaps | rb_andnot_cardinality('{1,2,3}','{3,4,5}') | 2 | 
| rb_is_empty | roaringbitmap | boolean | Check if roaringbitmap is empty. | rb_is_empty('{1,2,3,4,5}') | t | 
| rb_fill | roaringbitmap,range_start bigint,range_end bigint | roaringbitmap | Fill the specified range (not include the range_end) | rb_fill('{1,2,3}',5,7) | {1,2,3,5,6} | 
| rb_clear | roaringbitmap,range_start bigint,range_end bigint | roaringbitmap | Clear the specified range (not include the range_end) | rb_clear('{1,2,3}',2,3) | {1,3} | 
| rb_flip | roaringbitmap,range_start bigint,range_end bigint | roaringbitmap | Negative the specified range (not include the range_end) | rb_flip('{1,2,3}',2,10) | {1,4,5,6,7,8,9} | 
| rb_range | roaringbitmap,range_start bigint,range_end bigint | roaringbitmap | Return new set with specified range (not include the range_end) | rb_range('{1,2,3}',2,3) | {2} | 
| rb_range_cardinality | roaringbitmap,range_start bigint,range_end bigint | bigint | Return the cardinality of specified range (not include the range_end) | rb_range_cardinality('{1,2,3}',2,3) | 1 | 
| rb_min | roaringbitmap | integer | Return the smallest offset in roaringbitmap. Return NULL if the bitmap is empty | rb_min('{1,2,3}') | 1 | 
| rb_max | roaringbitmap | integer | Return the greatest offset in roaringbitmap. Return NULL if the bitmap is empty | rb_max('{1,2,3}') | 3 | 
| rb_rank | roaringbitmap,integer | bigint | Return the number of elements that are smaller or equal to the specified offset | rb_rank('{1,2,3}',3) | 3 | 
| rb_jaccard_dist | roaringbitmap,roaringbitmap | double precision | Return the jaccard distance(or the Jaccard similarity coefficient) of two bitmaps | rb_jaccard_dist('{1,2,3}','{3,4}') | 0.25 | 
| rb_select | roaringbitmap,bitset_limit bigint,bitset_offset bigint=0,reverse boolean=false,range_start bigint=0,range_end bigint=4294967296 | roaringbitmap | Return subset [bitset_offset,bitset_offset+bitset_limit) of bitmap between range [range_start,range_end) | rb_select('{1,2,3,4,5,6,7,8,9}',5,2) | {3,4,5,6,7} | 
| rb_to_array | roaringbitmap | integer[] | Convert roaringbitmap to integer array | rb_to_array(roaringbitmap('{1,2,3}')) | {1,2,3} | 
| rb_iterate | roaringbitmap | SET of integer | Return set of integer from a roaringbitmap data. | rb_iterate(roaringbitmap('{1,2,3}')) | 1 2 3 | 
| Function | Input | Output | Desc | Example | Result | 
|---|---|---|---|---|---|
| rb_build_agg | integer | roaringbitmap | Build a roaringbitmap from a integer set | select rb_build_agg(id)
    from (values (1),(2),(3)) t(id) | {1,2,3} | 
| rb_or_agg | roaringbitmap | roaringbitmap | AND Aggregate calculations from a roaringbitmap set | select rb_or_agg(bitmap)
    from (values (roaringbitmap('{1,2,3}')),
                 (roaringbitmap('{2,3,4}'))
          ) t(bitmap) | {1,2,3,4} | 
| rb_and_agg | roaringbitmap | roaringbitmap | AND Aggregate calculations from a roaringbitmap set | select rb_and_agg(bitmap)
    from (values (roaringbitmap('{1,2,3}')),
                 (roaringbitmap('{2,3,4}'))
          ) t(bitmap) | {2,3} | 
| rb_xor_agg | roaringbitmap | roaringbitmap | XOR Aggregate calculations from a roaringbitmap set | select rb_xor_agg(bitmap)
    from (values (roaringbitmap('{1,2,3}')),
                 (roaringbitmap('{2,3,4}'))
          ) t(bitmap) | {1,4} | 
| rb_or_cardinality_agg | roaringbitmap | bigint | OR Aggregate calculations from a roaringbitmap set, return cardinality. | select rb_or_cardinality_agg(bitmap)
    from (values (roaringbitmap('{1,2,3}')),
                 (roaringbitmap('{2,3,4}'))
          ) t(bitmap) | 4 | 
| rb_and_cardinality_agg | roaringbitmap | bigint | AND Aggregate calculations from a roaringbitmap set, return cardinality | select rb_and_cardinality_agg(bitmap)
    from (values (roaringbitmap('{1,2,3}')),
                 (roaringbitmap('{2,3,4}'))
          ) t(bitmap) | 2 | 
| rb_xor_cardinality_agg | roaringbitmap | bigint | XOR Aggregate calculations from a roaringbitmap set, return cardinality | select rb_xor_cardinality_agg(bitmap)
    from (values (roaringbitmap('{1,2,3}')),
                 (roaringbitmap('{2,3,4}'))
          ) t(bitmap) | 2 | 
pg_roaringbitmap is supported by the following cloud vendors
- Alibaba Cloud RDS PostgreSQL: https://www.alibabacloud.com/help/doc-detail/142340.htm
- Huawei Cloud RDS PostgreSQL: https://support.huaweicloud.com/usermanual-rds/rds_09_0045.html
- Tencent Cloud RDS PostgreSQL: https://cloud.tencent.com/document/product/409/67299
To request support for pg_roaringbitmap from other cloud vendors, please see the following:
- Amazon RDS: send an email to rds-postgres-extensions-request@amazon.com with the extension name and use case (docs)
- Google Cloud SQL: comment on this issue
- DigitalOcean Managed Databases: comment on this idea
- Azure Database for PostgreSQL: comment on this post