Rustyrazorblade

Nerding out on Open Source

Cassandra, CQL3, and Time Series Data With Timeuuid

Cassandra is a BigTable inspired database created at Facebook. It was open sourced several years ago and is now an Apache project.

In cassandra, a row can be very wide and is identified by a key. Think of it as more like a giant array. The data is stored on disk sorted by the key you pick, meaning if you pick the right sort option and key you can have some really fast queries. Here we’ll go over a time series.

A time series is a naturally sorted list, since things are happening over time. Sensor readings or live chat are good examples. In older versions of Cassandra, you’d use timestamp as your column name, and the value would be the actual data. This would give you your list of data, sorted in order. The benefit of this is your queries would likely be looking at slices of time, and with the data stored sequentially on disk you’ll get very fast reads, since there only needs to be one seek (if the data isn’t already in memory).

To make it insanely unlikely that 2 timestamps would ever conflict, the column would actually be a uuid1, which has an embedded timestamp. Data stax gave a good example of a table definition back from Cassandra 0.8:

1
2
3
4
[default@demo] CREATE COLUMN FAMILY blog_entry
WITH comparator = TimeUUIDType
AND key_validation_class=UTF8Type
AND default_validation_class = UTF8Type;

As Cassandra has matured, it’s evolved really nice schema definition options giving you the choice of some additional structure if you want it. CQL is a SQL-ish language for defining tables, where you specify the column names beforehand. This makes using our time series data a little challenging since you can’t possibly know all the timestamps you’re going to be using. The upcoming version of the language is CQL3. Here’s a great DataStax blog post on some of the CQL3 features.

In particular, the Cassandra team has introduced 2 important items. 1 is the timeuuid field, and the other is specifying compound primary keys with compact storage. This causes the data to be stored sequentially by the timeuuid column, exactly like a really wide row. Starting cqlsh with the -3 option gives us a CQL3 console. Here we define our schema:

1
2
3
4
5
6
7
8
9
10
11
12
haddad-pro:apache-cassandra-1.1.5  jhaddad$ bin/cqlsh -3
Connected to Test Cluster at localhost:9160.
[cqlsh 2.2.0 | Cassandra 1.1.5 | CQL spec 3.0.0
Use HELP for help.

cqlsh> CREATE KEYSPACE sensor WITH strategy_class = 'SimpleStrategy' 
    AND strategy_options:replication_factor = 1;
cqlsh> use sensor;
cqlsh:sensor> create table sensor_entries ( 
    sensorid uuid, 
    time_taken timeuuid, reading text,  
    primary key(sensorid, time_taken)) with compact storage;

Here’s a little Python script to put in some example data:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
import uuid
import cql

conn = cql.connect('localhost')
conn.set_cql_version('3.0.0')
conn.set_initial_keyspace('sensor')

# put in 5 sensor reads right now

key = str(uuid.uuid4())
for i in range(5):
    ts = str(uuid.uuid1())
    query = """INSERT INTO sensor_entries 
                (sensorid, time_taken, reading)
                VALUES (:sensorid, :time_taken, :reading)"""
    values = {'sensorid':key,
              'time_taken': ts,
              'reading': "random sample {}".format(i)}
    cur = conn.cursor()
    cur.execute(query, values)

And the result (edited to fit on 1 line):

1
2
3
4
5
6
7
8
cqlsh:sensor> select * from sensor_entries;
 sensorid                | time_taken               | reading
--------------------------------------+------------------
 060e1156-1d7d-46b7-87f3 | 2012-10-02 07:57:43-0700 | random sample 0
 060e1156-1d7d-46b7-87f3 | 2012-10-02 07:57:43-0700 | random sample 1
 060e1156-1d7d-46b7-87f3 | 2012-10-02 07:57:43-0700 | random sample 2
 060e1156-1d7d-46b7-87f3 | 2012-10-02 07:57:43-0700 | random sample 3
 060e1156-1d7d-46b7-87f3 | 2012-10-02 07:57:43-0700 | random sample 4`

You can see how even though I was generating a uuid1, Cassandra is showing us a timestamp.

Huge thanks to everyone that’s worked on Cassandra to get it to this point. It’s an absolutely amazing piece of software.

Setting Up RAID0 in Ubuntu 12.04 in AWS High I/O

Amazon announced high I/O instances today. This is huge for anyone with a database larger than available memory, as it’s been a complete nightmare dealing with EBS up till now. Now your Cassandra, MongoDB, MySQL, or whatever your using should be able to perform well without requiring keeping your entire dataset in memory.

With each instance you get 2x1TB of disk. In this tutorial I’ll be setting it up as a RAID0 to get a single 2TB disk which should deliver excellent performance.

Before you get started, make sure you’ve got mdadm installed:

1
apt-get install mdadm

To begin, check fdisk and make sure your 1TB drives are mounted.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
**root@ip-10-140-128-232:~# fdisk -l**

Disk /dev/xvda1: 8589 MB, 8589934592 bytes
255 heads, 63 sectors/track, 1044 cylinders, total 16777216 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

Disk /dev/xvda1 doesn't contain a valid partition table

Disk /dev/xvdf: 1099.5 GB, 1099511627776 bytes
255 heads, 63 sectors/track, 133674 cylinders, total 2147483648 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

Disk /dev/xvdf doesn't contain a valid partition table

Disk /dev/xvdg: 1099.5 GB, 1099511627776 bytes
255 heads, 63 sectors/track, 133674 cylinders, total 2147483648 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

Disk /dev/xvdg doesn't contain a valid partition table

Now you’ll want to format each of the 1TB drives. Here’s what my console looks like, minus some extra help text:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
**root@ip-10-140-128-232:~# fdisk /dev/xvdf**
Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
Building a new DOS disklabel with disk identifier 0x2aabe5ed.
Changes will remain in memory only, until you decide to write them.
After that, of course, the previous content won't be recoverable.

Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)

**Command (m for help): n**
Partition type:
   p   primary (0 primary, 0 extended, 4 free)
   e   extended
Select (default p): p
Partition number (1-4, default 1): 1
First sector (2048-2147483647, default 2048): 
Using default value 2048
Last sector, +sectors or +size{K,M,G} (2048-2147483647, default 2147483647): 
Using default value 2147483647

**Command (m for help): t**
Selected partition 1
**Hex code (type L to list codes): L**
.....
->      fd  Linux raid auto
.....
**Hex code (type L to list codes): fd**
Changed system type of partition 1 to fd (Linux raid autodetect)

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.

Do this for both drives.

Now, tell mdadm to build the RAID:

1
root@ip-10-140-128-232:~# mdadm --create --verbose --auto=yes /dev/md0 --level=0 --raid-devices=2 /dev/xvdf1 /dev/xvdg1

We’re using XFS, so I needed to install the xfs tools for the next part.

1
root@ip-10-140-128-232:~# apt-get install xfsprogs

Now format your drive. I got some output about log strip unit being too large, but I think it’s OK.

1
2
3
4
5
6
7
8
9
10
11
**root@ip-10-140-128-232:~# mkfs -t xfs /dev/md0**
log stripe unit (524288 bytes) is too large (maximum is 256KiB)
log stripe unit adjusted to 32KiB
meta-data=/dev/md0               isize=256    agcount=32, agsize=16777088 blks
         =                       sectsz=512   attr=2, projid32bit=0
data     =                       bsize=4096   blocks=536866816, imaxpct=5
         =                       sunit=128    swidth=256 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal log           bsize=4096   blocks=262144, version=2
         =                       sectsz=512   sunit=8 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

Mount it and check it out:

1
2
3
4
5
6
7
8
9
10
**root@ip-10-140-128-232:~# mkdir /mnt/bigraid
root@ip-10-140-128-232:~# mount /dev/md0 /mnt/bigraid/
root@ip-10-140-128-232:~# df -h**
Filesystem      Size  Used Avail Use% Mounted on
/dev/xvda1      8.0G  878M  6.8G  12% /
udev             30G   12K   30G   1% /dev
tmpfs            12G  192K   12G   1% /run
none            5.0M     0  5.0M   0% /run/lock
none             30G     0   30G   0% /run/shm
/dev/md0        2.0T   34M  2.0T   1% /mnt/bigraid

As a final note, a few quick tests:

1
2
3
4
5
6
7
8
9
root@ip-10-140-128-232:~# dd if=/dev/zero of=/mnt/bigraid/somefile bs=512
^C2790615+0 records in
2790615+0 records out
1428794880 bytes (1.4 GB) copied, 11.1162 s, 129 MB/s

root@ip-10-140-128-232:~# dd if=/dev/zero of=/mnt/bigraid/somefile bs=2048
^C2441837+0 records in
2441837+0 records out
5000882176 bytes (5.0 GB) copied, 13.9482 s, 359 MB/s
1
2
3
4
**root@ip-10-140-128-232:~# hdparm -t /dev/md0** 

/dev/md0:
 Timing buffered disk reads: 1188 MB in  3.00 seconds = 395.42 MB/sec

I based my post off the instructions found here. Modified for AWS and a newer version of Ubuntu.

Weird Disutils Error When Running Python Scripts Within MacVim

I saw this today when trying to run a nosetest in MacVim:

1
DistutilsPlatformError: $MACOSX_DEPLOYMENT_TARGET mismatch: now "10.4" but "10.7" during configure

Add this to your .vimrc to fix this weird message.

1
let $MACOSX_DEPLOYMENT_TARGET = "10.7"

Installing Vim-ipython With MacVim

I got really excited at the notion of having IPython built into MacVim (vim-ipython), so over the last few days I’ve spent some time mucking around trying to get this whole thing to work.  Unfortunately there’s not a lot of documentation on how to fix the issues that might pop up, so hopefully this will help some people.  (spoiler - MacVim download is 32 bit zeromq is 64)

First, your prerequisites.  I’m assuming you’re using the awesome HomeBrew.  If you’re not, you’re on your own for some of these sections.

1
2
pip install pyzmq ipython
brew install zeromq

In a shell, type:

1
ipython console

and leave it here.

Go into Vim.

I’m using Vim Addon Manager (VAM).  It’s a fantastic tool and made working with vim 100x better.  Go ahead and install vim-ipython - if you’re using VAM it’s easy.  If you’re not, use whatever system you’re used to (or start using VAM).

1
:InstallAddons vim-ipython

If you’ve done this with VAM, the docs say you should be able to open python file and type  - but when I tried that this is what I got:

1
2
:IPython
ImportError: IPython.zmq requires pyzmq >= 2.1.4

Weird you say, because you know you have it installed.  Lets see what happens if we import it directly into Vim:

1
2
3
4
5
6
7
8
9
10
11
:py import zmq
Traceback (most recent call last):  
  File "<string>", line 1, in <module> 
  File "/Library/Python/2.7/site-packages/zmq/__init__.py", line 38, in <module>    
    from zmq import core, devices 
  File "/Library/Python/2.7/site-packages/zmq/core/__init__.py", line 26, in <module>    
    from zmq.core import (constants, error, message, context,
  ImportError: dlopen(/Library/Python/2.7/site-packages/zmq/core/error.so, 2): 
    Symbol not found: _zmq_errno  
  Referenced from: /Library/Python/2.7/site-packages/zmq/core/error.so  
Expected in: flat namespace in /Library/Python/2.7/site-packages/zmq/core/error.so

I won’t go into details here, but the short version of the story is MacVim (snapshot 64) was compiled as a 32 bit executable and it can’t read the 64 bit symbols.  Bummer.

We can tell that MacVim is a 32 bit compile because of this:

1
2
3
haddad-pro:vim  jhaddad$ file /Applications/MacVim.app/Contents/MacOS/MacVim/\
Applications/MacVim.app/
 Contents/MacOS/MacVim: Mach-O executable i386

And zeromq shared library:

1
file /usr/local/lib/libzmq.dylib /usr/local/lib/libzmq.dylib: Mach-O 64-bit dynamically linked shared library x86_64

The solution: Use HomeBrew to compile 64 bit MacVim.

1
brew install macvim

You’ll run into a snag if you don’t have the old /Developer directory (I didn’t) so go ahead and fix like this

1
haddad-pro:vim  jhaddad$ sudo /usr/bin/xcode-select -switch /Applications/Xcode.app/

Finally, open up your 64 bit MacVim and edit a Python file.   You should now be able to send lines to iPython using , and see the results if you s.

I’ll follow up with a later post on how to get the most benefit from vim-ipython.

Applescripting a Remote X-Windows Session for Virt-Manager

This isn’t just for virt-manager, but any X-Windows app you’d want to tightly integrate into your daily routine. Instead of firing up X11, then SSH’ing to your VM box and typing out virt-manager (insane!) you can script X11 to do everything with 1 mouse click. I have it in my Dock, and Launchbar also recognizes it as an app.

1
2
3
tell application "Finder"
    launch application "X11"
end tell

set results to do shell script “ssh -X haddad-vmserver ‘virt-manager’”

Sweetness.

Also see my post about setting up a headless VM server using KVM.

Drizzle Differences From MySQL

I decided to take a look at Drizzle today and was encouraged by what I saw. Here’s my favorite part:

There is no UNSIGNED (as per the standard). * There are no spatial data types GEOMETRY, POINT, LINESTRING & POLYGON (go use Postgres). * No YEAR field type. * There are no FULLTEXT indexes for the MyISAM storage engine (the only engine FULLTEXT was supported in). Look at either Lucene, Sphinx, or Solr. * No “dual” table. * The “LOCAL” keyword in “LOAD DATA LOCAL INFILE” is not supported

GO USE POSTGRES. Awesome.

List of differences from MySQL.

Making Better Use of Your .ackrc File

Many command line utils have a . file that people rarely use. Ack is one of them.

For a project I’m working on, there’s a var folder (ignored in git) where all the logs go. When I perform an ack search, I have no interest in ack looking through the var folder every single time.

By default, ack only checks your ~/.ackrc file for it’s default switches. You can have per directory ack settings if you add this to your .bash_profile:

1
export ACKRC=".ackrc"

Now you don’t have to worry about random log file being searched every time you try to find something.

1
2
new-host-3:dev  jhaddad$ cat .ackrc 
--ignore-dir=var/

Just add whatever switches you want, one per line.

Nginx Pub/sub Module

A coworker pointed me to this Nginx module today. You can write a chat server without actually writing a server. The message thread below indicates incredible performance. If you’ve got more than 50K users and 9000 messages / second you might be able to upgrade your hardware, or at least load balance your channels between 2 servers.

When I open 10,000 connections, it seems to behave quite nicely. Sending half a million messages, I am able to get a throughput of around 9,000 message per second. At this rate “top” shows the nginx process as high as 90% of cpu. If I push it harder, I start to receive SIGIO in the nginx main log and the writer/poster is throttled down meaning a lower throughput but all messages appear to get through to the clients on the other machine. However, when I perform the same tests but with 50,000 connections I see a similar pattern of throughput up to about 6,000 or 7,000 messages/second. As before, when I push faster I get the same SIGIO in the log but the difference is not all the messages get through to clients!

[…later down the page]

Many thanks for your explanation. Your suspicion was correct. I was using the default 30sec for that parameter. I tried upping it to 5m and I was able to receive messages more reliably with 50,000 clients connected. Sometimes, however, the rate at which messages were sent from nginx slowed right down. e.g. I could get 9,000/sec for a sustained minute or so and then when the poster stopped posting, the rate of messages would slow almost to a stop but not quite until all messages were successfully sent.

So awesome.

Nginx push stream module.