Mysql

Migrating from MySQL to Cassandra Using Spark

15 min read

MySQL is a popular choice for new projects. It’s a flexible database that’s easy to set up and start querying. There’s loads of documentation, examples and frameworks it works with, such as Wordpress, Pandas, Ruby on Rails, and Django.

From the above paragraph it reads like a pretty fantastic database, and at small scale it can be great. The problem arises when you need to scale past a single server or have high availability needs. MySQL’s solution to both of these needs is replication. Replication is ok at handling read heavy workloads in a single datacenter, but it falls on it’s face under heavy writes or if you need multiple datacenters. Fortunately Cassandra excels at scalability and high availability. It’s a common story for people to migrate from a relational database to Cassandra for one or both of these reasons. (For further reading on choosing Cassandra even with small datasets read Matt Kennedy’s Little Big Data article)

mysql cassandra python
Read more

The Myth of Schema-less

7 min read

I have grown increasingly frustrated with the world as people have become more and more convinced that “schema-less” is actually a feature to be proud of (or even exists). For over ten years I’ve worked with close to a dozen different databases in production and have not once seen “schemaless” truly manifest. What’s extremely frustrating is seeing this from vendors, who should really know better. At best, we should be using the description “provides little to no help in enforcing a schema” or “you’re on your own, good luck.”

databases nosql python
Read more

Setting up RAID0 in Ubuntu 12.04 in AWS High I/O

4 min read

Amazon announced high I/O instances today. This is huge for anyone with a database larger than available memory, as it’s been a complete nightmare dealing with EBS up till now. Now your Cassandra, MongoDB, MySQL, or whatever your using should be able to perform well without requiring keeping your entire dataset in memory.

With each instance you get 2x1TB of disk. In this tutorial I’ll be setting it up as a RAID0 to get a single 2TB disk which should deliver excellent performance.

amazon cassandra mongodb
Read more

Drizzle Differences from MySQL

1 min read

I decided to take a look at Drizzle today and was encouraged by what I saw. Here’s my favorite part:

There is no UNSIGNED (as per the standard). * There are no spatial data types GEOMETRY, POINT, LINESTRING & POLYGON (go use Postgres). * No YEAR field type. * There are no FULLTEXT indexes for the MyISAM storage engine (the only engine FULLTEXT was supported in). Look at either Lucene, Sphinx, or Solr. * No “dual” table. * The “LOCAL” keyword in “LOAD DATA LOCAL INFILE” is not supported

drizzle mysql
Read more

Splitmytab ready for the public!

1 min read

Splitmytab.net is finally for the public to check out. Splitmytab is a bill splitting and IOU system for friends. It uses facebook’s login, so you won’t need to put in anyone’s emails, names, or get people to sign up for an account.

It’ll automatically keep balances of who owes who, so you can keep a running tab with friends and always know who’s buying the next case of beer.

Please note: I’m not a designer, so there’s a few rough corners, but what’s there is simple and it works.

coffeescript mysql python
Read more

Installing MySQLdb on MacOS Lion

1 min read

I ran into an issue installing the MySQLdb module.

>>> import MySQLdb
/Library/Python/2.7/site-packages/MySQL_python-1.2.3-py2.7-macosx-10.7-intel.egg/_mysql.py:3: UserWarning: Module _mysql was already imported from /Library/Python/2.7/site-packages/MySQL_python-1.2.3-py2.7-macosx-10.7-intel.egg/_mysql.pyc, but /Users/jhaddad/Downloads/MySQL-python-1.2.3 is being added to sys.path
Traceback (most recent call last):
  File "", line 1, in
  File "MySQLdb/__init__.py", line 19, in
    import _mysql
  File "build/bdist.macosx-10.7-intel/egg/_mysql.py", line 7, in
  File "build/bdist.macosx-10.7-intel/egg/_mysql.py", line 6, in __bootstrap__
ImportError: dlopen(/var/root/.python-eggs/MySQL_python-1.2.3-py2.7-macosx-10.7-intel.egg-tmp/_mysql.so, 2): Library not loaded: libmysqlclient.18.dylib
  Referenced from: /var/root/.python-eggs/MySQL_python-1.2.3-py2.7-macosx-10.7-intel.egg-tmp/_mysql.so
  Reason: image not found

I fixed it by doing the following:

mac mysql python
Read more

The Lack of Flexibility of Stored Procedures in MySQL

5 min read

sakila.pngOver three years ago I wrote about how you cannot use a stored procedure in a subquery. Well, it’s 2010, and I’m still annoyed by this and a handful of other things.

I was just working today on a report consisting of a series of queries, taking about a minute to generate. Some of the data would be created in a temporary table and queried against multiple times for performance reasons, and ultimately spit out into a CSV file for someone to examine later. I also would like to be able to return the result set, and perform queries on it, which is much faster than querying a view.

mysql rant
Read more

Stored Procedure For Finding Columns In MySQL

1 min read

sakila.png

Looking for instances particular column in a large schema can be a pain. Fortunately the information schema makes this pretty easy, if your columns have a consistent naming convention.

SELECT table_schema, table_name, column_name
FROM information_schema.columns
WHERE column_name LIKE '%some_name%';

Now, if we want to wrap this up into an easy to use stored procedure, we can do something like this:

drop procedure find_column;
delimiter //
CREATE PROCEDURE find_column(c varchar(255))
begin
	SET @a = CONCAT("%", c, "%");
	SELECT table_schema, table_name, column_name, column_type
		FROM information_schema.columns
		WHERE column_name LIKE @a;
end
//
delimiter ;

We need to use the concat statement in order to properly get the quotes in there without using the literal string “c” in the LIKE statement.

mysql stored procedure
Read more

MySQL 5.1.34 and Innodb

2 min read

I don’t remember having to do this before, but I had compiled MySQL 5.1.34 recently on my dev box (os x) and I saw a warning on a create table statement. It turns out InnoDB was not enabled (or even listed in the list of storage engines.

Before:

mysql> show engines; +————+———+———————————————————–+————–+——+————+ | Engine | Support | Comment | Transactions | XA | Savepoints | +————+———+———————————————————–+————–+——+————+ | CSV | YES | CSV storage engine | NO | NO | NO | | MRG_MYISAM | YES | Collection of identical MyISAM tables | NO | NO | NO | | MEMORY | YES | Hash based, stored in memory, useful for temporary tables | NO | NO | NO | | MyISAM | DEFAULT | Default engine as of MySQL 3.23 with great performance | NO | NO | NO | +————+———+———————————————————–+————–+——+————+ 4 rows in set (0.00 sec)

innodb mysql
Read more

Interesting Programmer Links

2 min read

Really interesting read about how to examine what’s stored in memcached.

Peep uses ptrace to freeze a running memcached server, dump the internal key metadata, and return the server to a running state. If you have a good host ejection mechanism in your client, such as in the Twitter libmemcached builds, you won’t even have to change the production server pool. The instance is not restarted, and no data is lost.

linux mysql tips
Read more

'MySQL: Innodb Memory Usage Formula'

1 min read

I hate looking for this….

This will give you a rough idea of your innodb memory usage. I know it’s in a hundred spots, but i hate looking for it when i double check things.

innodb_buffer_pool_size

  • key_buffer_size
  • max_connections*(sort_buffer_size+read_buffer_size+binlog_cache_size +2MB)

mysql
Read more

MySQL: ERROR 1267 (HY000): Illegal mix of collations (utf8_unicode_ci,IMPLICIT)

1 min read

I have never, ever seen this before. I don’t even know how the table was created with a different collation. However, I had all my other tables created with the character set utf8, no collation specified.

I had to convert the second table to match the character set. This probably wouldn’t have been a problem if I wasn’t joining on a character field.

alter table exclusion CONVERT TO CHARACTER SET utf8;

mysql
Read more

Executing MySQL queries within VIM

1 min read

I haven’t been using vim for very long, but I’ve gotten over the initial learning curve of getting used to the different editing modes. With some help from the guys in #vim on irc.freenode.net, I managed to get this gem:

` map :call SwitchDB() :function SwitchDB() : let g:current_db = input(“Database > “) :endfunction

map :call Doquery() :function Doquery() : if !exists(“g:current_db”) : call SwitchDB() : endif : let query_string = input(g:current_db . " > " ) : if query_string != "” : exe “!mysql " . g:current_db . " -e "” . escape(query_string, ‘”’) . “"” : endif :endfunction `

mysql vim
Read more

MySQL ALTER table Progress Bar?

1 min read

Altering a big table sucks, and to make it worse you have no idea what’s happening or how long it will take. I’d like a progress bar, or some status output, or something that gives me the feeling like my server didn’t die.

mysql
Read more

MySQL and Materialized Views

2 min read

I was poking around the MySQL Worklog again over the weekend, and found a request for materialized views for MySQL. This feature has existed in Oracle for a while, in DB2 as a materialized query table, and appeared in MS SQL Server 2000 and 2005 as indexed views.

What is a materialized view?

A materialized view is a database object that contains the results of a query. The FROM clause of the query can name tables, views, and other materialized views. (from Oracle).

db2 mssql mysql
Read more

'MySQL: Time Delayed Replication'

2 min read

I was cruising the MySQL Forge Worklog when I came across the idea of Time Delayed Replication. I had never considered the benefits of deliberately keeping a slave server behind a master.

Kristian Koehntopp gives a good example:

Kristian Koehntopp writes:

TDS: Time delayed SQL_THREAD (Have a replication slave that is always lagging 30 minutes behind).

Currently, replication is a rolling recovery: To set up replication you restore from a full dump with a binlog position. You then continously download binlog and roll forward. In case of a master crash a slave is a readily recovered instance (as opposed to a backup, which still has to be restored).

mysql
Read more

MySQL Load Balancer

1 min read

I’m not sure how I didn’t see this earlier, but it looks like MySQL 5.1 is coming with a load balancer for replicated servers. I’m absolutely pumped about this - we’ve got a few sites running with multiple db slaves and it’s so annoying having to check if they’re behind the master.

It looks like the load balancer will automatically pull slaves out if they fall behind, and route connections to the ones that are the most up to date. It is based on MySQL Proxy, which is currently in Alpha.

mysql
Read more

Decimal vs Float in MySQL

1 min read

While I’ve known how floating points are stored, I didn’t know if decimal was stored any differently, or if it was more or less accurate. According to a post on the MySQL list:

Bruno Rodrigues Silva wrote: Dear all.

The MySQL Manual inform that Decimal Data Type is used for represent exact-number, but like Float Point Data Type, Decimal use rounding case the fractional part is not sufficient. Therefore, what the difference?

mysql
Read more

External Libraries in XCode

2 min read

  • c++
  • mysql
  • xcode

I need to compile something and use the MySQL C++ library. I have mysql and mysql++ already compiled, I won’t go over how to do that now.

I added the following code to the top of my source:

#include <mysql++.h>

I got an error

/Users/jhaddad/dev/search_engine/main.cpp:4:21: error: mysql++.h: No such file or directory

Not cool.

How to fix:

In XCode, open up the project settings (under the project menu). Click the build tab. Go down to search paths, and you can change your Header search paths to the correct locations where you installed whatever you’re looking for. In this case, mine was /usr/local/includes and /usr/local/mysql/

c++ mac mysql
Read more

Altering tables in MySQL Cluster 5.0.45

2 min read

I’m setting up my first mysql cluster, and just wanted some clarification on a few things.

In the manual, it says:

Online schema changes. It is not possible to make online schema changes such as those accomplished using ALTER TABLE or CREATE INDEX, as the NDB Cluster engine does not support autodiscovery of such changes. (However, you can import or create a table that uses a different storage engine, and then convert it to NDB using ALTER TABLE tbl_name ENGINE=NDBCLUSTER. In such a case, you must issue a FLUSH TABLES statement to force the cluster to pick up the change.)

cluster mysql
Read more

Replication Issues - Duplicate Key Errors (1062)

3 min read

We were using replication to deal with certain queries that were producing table scans. I realize this is not a great long term solution but we were migrating a web site that was set up this way, so it wasn’t really a choice.

We had a database that was a mix of InnoDB and a few MyISAM tables. The MyISAM tables were used for fulltext searches.

To get a database dump, we were using the command mysqldump --all-databases --single-transaction --master-data=1 > dumpfile.sql

mysql replication
Read more

Checking MySQL Query Cache

1 min read

MySQL query cache can be useful, if it works. Here’s how to check it’s effectiveness.

show status like 'qc%'; +-------------------------+----------+ | Variable_name | Value | +-------------------------+----------+ | Qcache_free_blocks | 6407 | | Qcache_free_memory | 24176544 | | Qcache_hits | 3075026 | | Qcache_inserts | 2435740 | | Qcache_lowmem_prunes | 363018 | | Qcache_not_cached | 157193 | | Qcache_queries_in_cache | 16022 | | Qcache_total_blocks | 39912 | +-------------------------+----------+

You can see here that we have 24Megs of unused query cache. This is essentially wasted memory. Make sure you’re not using a ridiculously high amount, since every time you perform an insert or update to a table it’ll kill any queries in the cache that reference those tables.

mysql
Read more

Calculating the queries per second average in Mysql

1 min read

I didn’t find any simple ways to determine how hard our mysql database was working, so I whipped this up. It uses a 10 second delay to figure out the queries per second average.

time=10 orig=mysql -e “show status” | awk ‘{if ($1 == “Questions”) print $2}’ sleep $time last=mysql -e “show status” | awk ‘{if ($1 == “Questions”) print $2}’ diff=expr $last - $orig avg=expr $diff / $timeecho "$avg"

mysql tips
Read more

MySQL does not support variables with LIMIT in stored procedures

1 min read

After banging my head against a wall for a while, I finally found this thread from June 2005 in the MySQL support forums regarding using variables in the LIMIT clause. They don’t support it yet. There’s no time table that indicates when it will be supported either. It’s very unfortunate, I was looking forward to benchmarking the difference between our ad hoc code and the stored proc version.

This code will not work: SELECT field from table LIMIT limit_var

mysql
Read more

'Database tip: Only Counting Certain Values in an Aggregate Query'

2 min read

Aggregates are awesome. But sometimes you want to do 2 counts in a query, and have one of them be more restrictive than the other. Lets say our database focuses on pictures, and rating them on a scale from 1-100. We want to know the average rating, the minimum rating, the max, the number of ratings, and the number of ratings over 75.

First let’s setup the tables. create table rating ( picture int not null, userid int not null, rating int not null, primary key (picture, userid) ) engine=innodb;

mysql
Read more

Delete from table using a join

1 min read

MySQL won’t let you delete from a table using a subquery that references itself. Fair enough. To get around this, up till now, I’ve used temporary tables. However, I’ve never really liked it, and I’ve always wanted a JOIN delete. Well, apparently it exists.

I’m cleaning out some data from some really old database dump. I needed to preserve the data in the article table, but remove the rows in the city table based on a column in the article table.

mysql
Read more

Using a result set to call stored procedures for ad hoc queries

1 min read

Using mysql -e’s feature, combined with awk and xargs, I was able to call an existing stored procedure repeatedly for a resultset. Yes, I could have written another stored procedure to do this, I realize. But I guess I like doing things the hard way. Either that, or this is just less code. Or I wanted the awk and xargs practice. Whatever.

mysql database -e "select id from category where foreign_key in (2771, 2769, 2766, 2772, 2767)" | awk -F\| '{print $1}' | xargs -ivar mysql database -e "call move_category(var, 5666)"

linux mysql
Read more

What caused that load spike?

1 min read

Every now and then, we find that we will have a sudden increase in the number of apache processes, load average will spike up, and then go back down to normal. In rare cases, we will see the same thing happen, and the load avg spike WAY up, all queries appear locked up, and the server must be rebooted. I am looking for ways of determining what caused this. I should note that it happens extremely rarely, and has never shown up in a load test.

apache linux mysql
Read more

SHOW CREATE TRIGGER - Nope

1 min read

As of MySQL 5.0 there is no SHOW CREATE TRIGGER which is pretty annoying. I don’t like using “show triggers like “, so I figured I’d write a wrapper around mysqldump. For the record, this was a huge pain the ass. I did this on Fedora, GNU sed version 4.1.5. You need to have your password stored in ~./my.cnf for this to work, I believe. I haven’t tried it without it.

mysql
Read more

count(1) vs count(*) - any difference?

2 min read

A while ago when I started my first job in LA, I was using MSSQL. I was told to never use count(), and rather to use count(1). Several people insisted this performs better than count(), and since I really didn’t care to argue or look into it, I just started doing it.

Fast forward to today, where I am still writing count(1), but on MySQL. I’m at a different company now, but working with a lot of the same people. The person who used to be my boss is now my equal, and while mytop was running, noticed the count(1) and was happy to see it. Which reminded me to wonder - does it even matter?

misconception mysql
Read more

Why have REPLACE INTO and INSERT ... ON DUPLICATE KEY UPDATE?

1 min read

REPLACE INTO will actually perform a delete and then an insert, while INSERT … ON DUPLCIATE KEY UPDATE will perform an update (as the name suggests). I would think the latter would be faster. I have not done any performance testing between the two, but it only seems logical the update would be faster than the delete/insert. Please correct me if I’m wrong.

Since the two statements both end up with the same result, I’m not sure yet what the benefit of REPLACE into it.

mysql
Read more

Paging Through Data 2.0

3 min read

For a long time, whenever I wanted to do paging to browse through a table, I used to run 2 queries. The first would get the results, and the second would be an almost identical query, with a count() instead of fields, and I’d use the result of the second query to figure out how many pages I would need.

Now there’s a better solution, and it’s called found_rows(). Say you have a simple select statement:

mysql
Read more

Can you use a stored procedure in a subquery? I don't think so. (MySQL)

2 min read

I do not think you can use the result of a stored procedure in an ad-hoc subquery.

On my social network, LetsGetNuts.com, I have a Friend table. This is the structure:

mysql> describe Friend; +—————-+—————+——+—–+———+——-+ | Field | Type | Null | Key | Default | Extra | +—————-+—————+——+—–+———+——-+ | fkUser | int(11) | YES | MUL | NULL | | | fkFriend | int(11) | YES | MUL | NULL | | | confirmed | enum(‘Y’,‘N’) | YES | MUL | NULL | | | confirmed_date | date | YES | | NULL | | +—————-+—————+——+—–+———+——-+ 4 rows in set (0.02 sec)

letsgetnuts.com mysql
Read more

'MySQL: Number + NULL

2 min read

Maybe not breaking news, but I think it’s interesting enough of a point, and I didn’t really find anything about the topic when I googled it. If you do any addition, subtraction, multiplication, or division (and probably a lot more mathematical functions for that matter) and NULL is one of your values, the entire expression will evaluate to NULL.

For example, this statement returns NULL:

mysql> select 4 + NULL;

+----------+
| 4 + NULL |
+----------+
|     NULL |
+----------+

Normally you wouldn’t do the above in such a simple way, for instance, you might do some addition in a subquery. For example,

mysql tips
Read more

MySQL Triggers Tutorial

3 min read

If you’ve worked with any of the commerical databases other than MySQL, you may have already had some experience with triggers.

I’m glad that the MySQL developers finally managed to squeeze in this extremely important feature. I no longer dread trying to defend MySQL from the Microsoft and Oracle developers who have had triggers for so many years.

I’ll use the example off the MySQL website (explained below in detail). I’ll assume you know how to create a table. If not, please read up on that first.

mysql
Read more