In this post I’ll be discussing the fundamentals of the Logical Volume Manager in Linux, usually simply referred to as LVM. I’ve used LVM occasionally over the years, but for the most part I would just create a single big partition on my disk, toss XFS on it and call it a day. Recently that changed when I decided to replace my aging home media server with a new beast of a box that I wanted to do a lot more than simply serve up content. I knew I would need lots of storage, but didn’t necessarily know how I wanted to partition my disks ahead of time. I also wanted to move away from btrfs, which I never had a big problem with but I felt it would be better to use a more mainstream filesystem.
I’ve been messing with Apache Spark quite a bit lately. If you aren’t familiar, Spark is a general purpose engine for large scale data processing. Initially it comes across as simply a replacement for Hadoop, but that would be selling it short. Big time. In addition to bulk processing (goodbye MapReduce!), Spark includes:
SQL engine
Stream processing via Kafka, Flume, ZeroMQ
Machine Learning
Graph Processing
Sounds awesome, right? That’s because it is, babaganoush. The next question is where do we store our data? Spark works with a number of projects, but my database of choice these days is Apache Cassandra. Easy scale out and always up. It’s approximately this epic:
In this tutorial I’ll be guiding you through setting up a headless Ubuntu 11.10 box that you’ll manage using virt-manager, accessed via X11. My main machine is a Mac running OSX Lion. You’ll need the Ubuntu CD, and for the first part of the tutorial, physical access to the box with a keyboard and monitor.
**Install Ubuntu on your server. **
I installed Ubuntu Desktop so I could mess with virt-manager before I disconnected everything. You won’t be using much of the deskop feature so it might not matter to you. Just make sure you install openssh-server so you can connect later on.
I need to compile something and use the MySQL C++ library. I have mysql and mysql++ already compiled, I won’t go over how to do that now.
I added the following code to the top of my source:
#include <mysql++.h>
I got an error
/Users/jhaddad/dev/search_engine/main.cpp:4:21: error: mysql++.h: No such file or directory
Not cool.
How to fix:
In XCode, open up the project settings (under the project menu). Click the build tab. Go down to search paths, and you can change your Header search paths to the correct locations where you installed whatever you’re looking for. In this case, mine was /usr/local/includes and /usr/local/mysql/
The most basic use of curl is very straightforward, just put in a web site’s url:
curl http://twitter.com
If you copy and paste the above code, you’ll get the HTML output of twitter’s home page.
In order to demo this, I created a twitter account. You can sign up for one on your own.
Now, to hit their api and update your status, they require you use HTTP Basic Authentication. No sweat, we can use the -u flag for that. The request must be sent as a post, so you must use the -d flag (data).