We want to share our test using token function on range queries based in tokens.
A partitioner determines how data is distributed across the nodes in the cluster (including replicas). Basically, a partitioner is a hash function for computing the token (it's hash) of a partition key. Each row of data is uniquely identified by a partition key and distributed across the cluster by the value of the token. Tested: Murmur3Partitioner ByteOrderedPartitioner More info about partitioners Cassandra version 2.0.7 - Schema
create keyspace ksptest
with replication = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };
use ksptest;
create table data1(
key1 int,
key2 int,
id int,
column1 timestamp,
column2 timestamp,
PRIMARY KEY((key1, key2), id));
INSERT INTO ksptest.data1( id,key1,key2,column1,column2)
VALUES(1,2014,1,'2014-01-01 16:50:45 -0500','2014-01-01 18:50:45 -0500');
INSERT INTO ksptest.data1( id,key1,key2,column1,column2)
VALUES(2,2014,1,'2014-01-01 17:50:45 -0500','2014-01-01 19:50:45 -0500');
INSERT INTO ksptest.data1( id,key1,key2,column1,column2)
VALUES(3,2014,1,'2014-01-01 18:50:45 -0500','2014-01-01 20:50:45 -0500');
INSERT INTO ksptest.data1( id,key1,key2,column1,column2)
VALUES(4,2014,2,'2014-02-01 16:50:45 -0500','2014-02-01 18:50:45 -0500');
INSERT INTO ksptest.data1( id,key1,key2,column1,column2)
VALUES(5,2014,2,'2014-02-01 17:50:45 -0500','2014-02-01 19:50:45 -0500');
INSERT INTO ksptest.data1( id,key1,key2,column1,column2)
VALUES(6,2014,2,'2014-02-01 18:50:45 -0500','2014-02-01 20:50:45 -0500');
INSERT INTO ksptest.data1( id,key1,key2,column1,column2)
VALUES(7,2014,3,'2014-03-01 16:50:45 -0500','2014-03-01 17:50:45 -0500');
INSERT INTO ksptest.data1( id,key1,key2,column1,column2)
VALUES(8,2014,3,'2014-03-01 17:50:45 -0500','2014-03-01 18:50:45 -0500');
INSERT INTO ksptest.data1( id,key1,key2,column1,column2)
VALUES(9,2014,3,'2014-03-01 18:50:45 -0500','2014-03-01 19:50:45 -0500');
INSERT INTO ksptest.data1( id,key1,key2,column1,column2)
VALUES(10,2014,3,'2014-03-01 19:50:45 -0500','2014-03-01 20:50:45 -0500');
- Using Murmur3Partitioner
SELECT token(key1,key2),key1,key2 FROM ksptest.data1 where token(key1,key2) = token(2014,1);
SELECT token(key1,key2),key1,key2 FROM ksptest.data1 where token(key1,key2) = token(2014,2);
SELECT token(key1,key2),key1,key2 FROM ksptest.data1 where token(key1,key2) = token(2014,3);
TOKENS token(key1, key2) | key1 | key2 ----------------------+------+------+---- -2662374876872028068 | 2014 | 1 (3 rows) token(key1, key2) | key1 | key2 ----------------------+------+------+---- -8469758598453416143 | 2014 | 2 (3 rows) token(key1, key2) | key1 | key2 ---------------------+------+------+---- 8957232040621060434 | 2014 | 3 (4 rows) Querying token ranges- Okey the first logic query should be get all from 1 to 3 row key and get 10 rows
SELECT token(key1,key2),key1,key2,id FROM ksptest.data1 where token(key1,key2) >= token(2014,1) and token(key1,key2) <= token(2014,3); token(key1, key2) | key1 | key2 | id ----------------------+------+------+---- -2662374876872028068 | 2014 | 1 | 1 -2662374876872028068 | 2014 | 1 | 2 -2662374876872028068 | 2014 | 1 | 3 8957232040621060434 | 2014 | 3 | 7 8957232040621060434 | 2014 | 3 | 8 8957232040621060434 | 2014 | 3 | 9 8957232040621060434 | 2014 | 3 | 10 (7 rows) Why we get 7 rows?Because we are querying tokens so we are searching a hash(token)
Sorted tokens -8469758598453416143 2 (3 rows) -2662374876872028068 1 (3 rows)<-- 8957232040621060434 3 (4 rows)<--
So if we query from 2 to 3 we will get all row keys and that is because our tokens are created randomly by the partitioner and the range query go through all nodes requested only
- Using ByteOrderedPartitionerSELECT token(key1,key2),key1,key2 FROM ksptest.data1 where token(key1,key2) = token(2014,1);
SELECT token(key1,key2),key1,key2 FROM ksptest.data1 where token(key1,key2) = token(2014,2);
SELECT token(key1,key2),key1,key2 FROM ksptest.data1 where token(key1,key2) = token(2014,3);
TOKENS token(key1, key2) | key1 | key2 --------------------------------+------+------ 0x0004000007de0000040000000100 | 2014 | 1 (3 rows) token(key1, key2) | key1 | key2 --------------------------------+------+------ 0x0004000007de0000040000000200 | 2014 | 2 (3 rows) token(key1, key2) | key1 | key2 --------------------------------+------+------ 0x0004000007de0000040000000300 | 2014 | 3 (4 rows) Querying token rangesSame query to get all rows from 1 to 3 and should get 10 rows
SELECT token(key1,key2),key1,key2,id FROM ksptest.data1 where token(key1,key2) >= token(2014,1) and token(key1,key2) <= token(2014,3); token(key1, key2) | key1 | key2 | id --------------------------------+------+------+---- 0x0004000007de0000040000000100 | 2014 | 1 | 1 0x0004000007de0000040000000100 | 2014 | 1 | 2 0x0004000007de0000040000000100 | 2014 | 1 | 3 0x0004000007de0000040000000200 | 2014 | 2 | 4 0x0004000007de0000040000000200 | 2014 | 2 | 5 0x0004000007de0000040000000200 | 2014 | 2 | 6 0x0004000007de0000040000000300 | 2014 | 3 | 7 0x0004000007de0000040000000300 | 2014 | 3 | 8 0x0004000007de0000040000000300 | 2014 | 3 | 9 0x0004000007de0000040000000300 | 2014 | 3 | 10 (10 rows)Conclusion
We can now understand how token function works and how return data but choose ByteOrderedPartitioner means a complex work to manage a cluster and there are very few good reasons to use it.
Here you can find an analysis about partitioners:
Apache Cassandra: The Case Against The ByteOrderedPartitioner