~ What ? ~

In Cassandra, clustering columns are any additional columns needed to form the Primary Key. Clustering Columns can be used to better segment or cluster the data.

Ordering

Ordering can be either ASC or DESC. Also if you have more than one clustering column, when ordering they are ordered from left to right.

Example:

CREATE TABLE server_logs(
   log_hour timestamp,
   log_level text,
   message text,
   server text,
   PRIMARY KEY ((log_hour, server),log_level)
)WITH CLUSTERING ORDER BY (log_level DESC);

With data:

log_hourlog_levelmessageserver
12info1Some message1Prod-1
12info2Some message2Prod-1
12info3Some message3Prod-2

Will query: SELECT * FROM server_logs where log_hour = '12' and server = 'Prod-1'

Will return:

log_hourlog_levelmessageserver
12info1Some message1Prod-1
12info2Some message2Prod-1

Default Behaviour

By default, the Cassandra storage engine sorts the data in ascending order of clustering key columns.

Examples

No Columns

CREATE TABLE server_logs(
   log_hour timestamp PRIMARYKEY,
   log_level text,
   message text,
   server text
)

partition key: log_hour 

clustering columns: none

One clustering column

CREATE TABLE server_logs(
   log_hour timestamp,
   log_level text,
   message text,
   server text,
   PRIMARY KEY (log_hour, log_level)
)

partition key: log_hour 

clustering columns: log_level

Complex Partition Key

CREATE TABLE server_logs(
   log_hour timestamp,
   log_level text,
   message text,
   server text,
   PRIMARY KEY ((log_hour, server))
)

partition key: log_hour, server

clustering columns: none

Clustering column with order

CREATE TABLE server_logs(
   log_hour timestamp,
   log_level text,
   message text,
   server text,
   PRIMARY KEY ((log_hour, server),log_level)
)WITH CLUSTERING ORDER BY (log_level DESC);

partition key: log_hourserver

clustering columns: log_level