In the world of software engineering, especially within the realm of distributed systems, continuous learning and experimentation are not just beneficial; they’re essential. As a software engineer with a focus on distributed systems, particularly Apache Cassandra, I’ve taken this ethos to heart. My journey has led me to not only explore the intricacies of Cassandra’s distributed architecture but also to share my experiences and findings with a broader audience. This is why my YouTube channel has become an active platform where I stream at least once a week, engaging with viewers through coding sessions, trying new approaches, and benchmarking different Cassandra workloads.
As I promised in December, I redid my presentation from the Cassandra Summit 2023 on a live stream. You can check it out at the bottom of this post.
Going forward, I’ll be live-streaming on Tuesdays at 10AM Pacific on my YouTube channel.
Next week I’ll be taking a look at tlp-stress, which is used by the teams at some of the biggest Cassandra deployments in the world to benchmark their clusters. You can find that here.
The other reason is the eight blog posts I’ve got in the draft folder. One of the reasons why there are so many is the way I write. If the post is programming related, I usually start with the post, then start coding, pull snippets out, learn more, rework the post, then rework snippets. It’s an annoying, manual process. The posts sitting in my draft folder have incomplete code, and reworking the code is a tedious process that I get annoyed with, leading to abandoned posts.
In this post I’ll be discussing the fundamentals of the Logical Volume Manager in Linux, usually simply referred to as LVM. I’ve used LVM occasionally over the years, but for the most part I would just create a single big partition on my disk, toss XFS on it and call it a day. Recently that changed when I decided to replace my aging home media server with a new beast of a box that I wanted to do a lot more than simply serve up content. I knew I would need lots of storage, but didn’t necessarily know how I wanted to partition my disks ahead of time. I also wanted to move away from btrfs, which I never had a big problem with but I felt it would be better to use a more mainstream filesystem.
I’ve finally gotten sick of having a terrible wireless signal in my room, and I have a server in my office (hard wired via gigabit) so I figured I’d set it up as a wireless access point. There’s a lot of information in various places on how to set everything up, so I figured I’d try to wrangle everything in 1 spot.
Install the card.
I used a TP-Link WDN4800. Sadly I don’t have a list of all the compatible wireless cards, but it seems that having something based on the Atheros chipset is a good thing.
Creating and testing new databases that require clustering can be a pain point when trying to do everything on a local machine. Simulating failures or network failures can be difficult or impossible if everything you’re testing is running on the same machine. To better simulate your production environment you can try using LXC (Linux Containers). A linux container is a lot like a Virtual Machine, but shares the host’s Kernel and as a result has very little overhead. A limitation of this is that you can’t mix different environments - for example you can’t run Windows in a container on a Linux host. Theoretically it’s possible to run different Linux distros but so far it seems like there’s a few hiccups doing this.
Many command line utils have a . file that people rarely use. Ack is one of them.
For a project I’m working on, there’s a var folder (ignored in git) where all the logs go. When I perform an ack search, I have no interest in ack looking through the var folder every single time.
By default, ack only checks your ~/.ackrc file for it’s default switches. You can have per directory ack settings if you add this to your .bash_profile:
Excellent question on superuser.com with a fantastic answer. If anyone needs to jump through multiple servers via ssh (or any other protocol) take a look at this answer.
iWatch is a perl script that uses inotify to monitor files directories. It’s similar to the watch tool, which can do all sorts of stuff if the files or directories it’s watching are modified or affected in pretty much any way at all.
Install iWatch
apt-get install iwatch
I’ve got this 1 liner in a file to quickly watch my directory and execute a PHP unit test .
I run this with a my argument (a unit test) and then sit there and code away. When I save, it detects the change and automatically runs my test. It’s pretty awesome.
I’ve run into a ton of issues working with crons, mostly with the $PATH variable screwing things up. Scripts work when run manually on the command line, but fail when run in cron. Very annoying.
I’ve asked a bunch of Linux sys admins how to fix this - and the answer is always “put the full path in your scripts” which to me in unacceptable as it introduces the possibility of human error. Fixing the underlying problem is always preferred.
I’m using PHP53 package from the IUS Community repository. I’ve been trying to get phpunit to install, but it gives an error that it needs DOM install. It took me a little bit to figure this out, but I finally got it working. What you need is the php53-xml package. You can install it using
yum install php53-xml
or if you’re using puppet
package { ["php53-xml"]:
ensure => present
}
And finally, to get it to install, I used the below. I had to make it go to multiple lines to fit on the page but I have it all on 1 line in my puppet script:
Copy apachectl (or symlink it) to /etc/init.d/httpd, and add these two lines at the end of the comment section:
chkconfig: 2345 64 36
description: script for starting and stopping httpd
chkconfig –add httpd
These commands will now work:
service httpd start
service httpd stop
Additionally, apache will start when the system boots. You can accomplish this by using rc.local too, if you prefer, but I think it’s more convenient to have everything be service based where you can use chkconfig to manage startup / shutdown.
To get a quick idea of what ports you have open on your local box, you can use nmap.
~ jhaddad$ nmap localhost
Starting Nmap 5.00 ( http://nmap.org ) at 2010-01-05 11:06 PST
Interesting ports on localhost (127.0.0.1):
Not shown: 499 closed ports, 492 filtered ports
PORT STATE SERVICE
80/tcp open http
88/tcp open kerberos-sec
548/tcp open afp
631/tcp open ipp
3306/tcp open mysql
3325/tcp open unknown
5900/tcp open vnc
9000/tcp open cslistener
10000/tcp open snet-sensor-mgmt
I’ve done this using CentOS, but I’m pretty sure the same thing will work with RedHat the exact same way.
Virtualization is now a commodity with several free tools available from Sun, VMWare and Xen. If you’re like me, you like to create a new, clean VM for each experiment. However, this comes with a drawback - the installation process, choosing your timezone, putting in the same password every time, etc… takes a while.
Note: The terminology used below also applies to VMWare. The screens are different, but the issue and the solution are the same.
When creating a new VM through VirtualBox, you might have a problem SSH’ing into the box. You also might notice you get a 10.0.x.xx type address, even though the rest of your network is a 192.168.xx.xxx deal. Yes, the two are related.
Essentially what’s happening is the VM is sitting on it’s on little private line, where it can go out but nothing can reach it. By default, the network setting is “NAT”. If we want the VM to be accessible to the outside, we want to use Bridged networking.
One thing that always bothered me about complex desktop applications like Adobe Photoshop or Eclipse, or even Desktop Linux is finding out how to use the more advanced features (or, truthfully, some of the basic features). I’ve always liked community response, so I’ve been on a number of mailing lists and it’s usually really helpful.
What if these types of useful feedback were available within the application itself? You could literally just type a question into your help box, and a minute later you would get answers. This would be incredibly helpful for hundreds of new users.
Really interesting read about how to examine what’s stored in memcached.
Peep uses ptrace to freeze a running memcached server, dump the internal key metadata, and return the server to a running state. If you have a good host ejection mechanism in your client, such as in the Twitter libmemcached builds, you won’t even have to change the production server pool. The instance is not restarted, and no data is lost.
I ran into an issue just now compiling libjpeg on 64 bit CentOS. I found this very helpful post that gives a workaround using a config.guess file from libtool. For some reason, I didn’t have the folder he suggested, but I did have the alternative (automake).
I was setting up a new server for someone, and encountered this error while I was trying to build svn
/usr/bin/ld: cannot find -lexpat
Now, while I can do some things on that a sys admin can, I am by no means a sys admin. I have only installed svn a handful of times, and I didn’t know what this was.
First I installed expat from source. It didn’t help.
KCacheGrind is a very useful tool to identify bottlenecks in your applications. This will explain the steps to using it to find issues with your PHP scripts. For me, the scripts are all web pages.
I’m already assuming you’re running a current version of PHP. I did this using PHP 5.2.1. These instructions are based on a Unix/Linux server, if you’re running Windows I can’t help you.
I installed CentOS 5 on my VMWare a few days ago. I installed gcc via yum, compiled and installed libxml2. I then tried to install PHP 5.2.3 and received this error:
configure: error: installation or configuration problem: C++ compiler cannot create executables.
It took me forever to figure this out, but I had to install the g++ library, then it compiled fine.
The most basic use of curl is very straightforward, just put in a web site’s url:
curl http://twitter.com
If you copy and paste the above code, you’ll get the HTML output of twitter’s home page.
In order to demo this, I created a twitter account. You can sign up for one on your own.
Now, to hit their api and update your status, they require you use HTTP Basic Authentication. No sweat, we can use the -u flag for that. The request must be sent as a post, so you must use the -d flag (data).
When you set up public key authentication, make sure your authorized_keys2 file has the permissions set to 600. If you don’t, it’s likely that you will still be prompted for your password.
Of course, I felt like an idiot after being completely confused by this for about half an hour.
Consider this:
awk "{print $1}" somefile.txt
This does not work as I had expected. The reason is because the $1 is evaluated within double quotes, (not single quotes). Yes, it’s a rookie mistake, but I never claimed to be the awk master.
Using mysql -e’s feature, combined with awk and xargs, I was able to call an existing stored procedure repeatedly for a resultset. Yes, I could have written another stored procedure to do this, I realize. But I guess I like doing things the hard way. Either that, or this is just less code. Or I wanted the awk and xargs practice. Whatever.
mysql database -e "select id from category where foreign_key in (2771, 2769, 2766, 2772, 2767)" | awk -F\| '{print $1}' | xargs -ivar mysql database -e "call move_category(var, 5666)"
Every now and then, we find that we will have a sudden increase in the number of apache processes, load average will spike up, and then go back down to normal. In rare cases, we will see the same thing happen, and the load avg spike WAY up, all queries appear locked up, and the server must be rebooted. I am looking for ways of determining what caused this. I should note that it happens extremely rarely, and has never shown up in a load test.
I found a very good explanation of how to set up public key authentication over ssh. I’m always looking for it when I need it, and it always takes forever.
You can also use ssh-copy-id, I’m not sure what the specifics are behind it, but it seems to be available occasionally (Fedora 6) and not in other places (OS X 10.4.8).
Took me a while before I got on the ball and starting doing this for syncing my web servers docs to my local machine (for backing up). I had previously used Interarchy for this, but I really prefer to use the command line, where I can schedule it via a cron (should all be 1 line).