How to Install the Lucene Search Engine using Solr

March 15, 2008 – 10:01 am

I’m going to go through the steps necessary to install and start using Solr. I’ve always been interested in trying out Lucene, but I never felt like dealing with writing my own wrapper around the classes. Solr simplifies this by creating a fully working search engine as a web service.

Let’s get started. You’ll need to check to see if Java is up to date. Run the below to find out.

java -version

You need to be running at least Java 1.5. Next is to check to see if Ant is installed. Do that with.

ant -version

I’m doing this on my Mac using Leopard. Here’s some instructions on setting up Ant if you aren’t using Leopard.

Java was up to date, and ant is built in. Sweet. However, I need to install JUnit. I’ll do that first.

Download JUnit. You’ll need to put it somewhere that’s accessible by the $CLASSPATH variable. There’s more information on the JUnit FAQ.

I threw it in /usr/share and left the name as junit-4.4.jar and set my CLASSPATH to point to that file (not the directory)

export CLASSPATH=$CLASSPATH:/usr/share/junit-4.4.jar

I ran that, as well as put it in my /etc/bashrc file (which you must be root to edit) so I don’t have to deal with it again.

Compile Solr

Switch back to the directory containing the solr files, and run:

ant compile

You should see something like this:

Buildfile: build.xml

init-forrest-entities:

checkJunitPresence:

compile:
[javac] Compiling 185 source files to /Users/jhaddad/src/apache-solr-1.2.0/build
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.

BUILD SUCCESSFUL
Total time: 3 seconds

I then ran:

ant dist

Which outputted something like this:

Buildfile: build.xml

init-forrest-entities:

checkJunitPresence:

compile:

make-manifest:
[mkdir] Created dir: /Users/jhaddad/src/apache-solr-1.2.0/build/META-INF

dist-jar:
[jar] Building jar: /Users/jhaddad/src/apache-solr-1.2.0/dist/apache-solr-1.2.1-dev.jar

dist-war:
[war] Building war: /Users/jhaddad/src/apache-solr-1.2.0/dist/apache-solr-1.2.1-dev.war

dist:

BUILD SUCCESSFUL
Total time: 0 seconds

You can run the example by going to the example directory and running

java -jar start.jar

Then go here: http://localhost:8983/solr/admin/ and check out your admin.

Load a few sample docs by going here:

/example/exampledocs

and running

java -jar post.jar solr.xml monitor.xml

I will post a follow up on how to get Solr running in Tomcat, as well as examples on how to use the server.

These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Reddit

Western Digital Drive with Leopard - “File system formatter failed.”

February 26, 2008 – 11:59 pm

I just bought a 1TB Western Digital drive. I am stoked.

However, I tried to format the drive on my Mac (Leopard) and got the error “File system formatter failed” when I tried to format the disk as MacOS Extended (Journaled). Not cool.

I found this forum thread which suggested using multiple partitions to solve the problem. When I was trying this, I went into options and changed the partition scheme from Master Boot Record to GUID partition table. It formatted fine.

To test things further, I changed the scheme back to 1 partition, and left the format as GUID partition table. This time it worked flawlessly.

The GUID type says it will not work as a start up disk MacOS older than 10.4. So if you’re still on Panther you can’t boot up off my backup drive. Try not to cry too much.

These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Reddit

Executing multiple curl requests in parallel with PHP and curl_multi_exec

February 20, 2008 – 4:17 pm

Let’s get one thing out in the open. Curl is sweet. It does it’s job very well, and I’m absoutely thrilled it exists.

If you’re using curl in your PHP app to make web requests, you’ve probably realized that by doing them one after the other, the total time of your request is the sum of all the requests put together. That’s lame.

Unfortunately using the curl_multi_exec is poorly documented in the PHP manual.

Let’s say that your app is hitting APIs from these servers:

Google: .1s
Microsoft: .3s
rustyrazorblade.com: .5s

Your total time will be .9s, just for api calls.

By using curl_multi_exec, you can execute those requests in parallel, and you’ll only be limited by the slowest request, which is about .5 sec to rustyrazorblade in this case, assuming your download bandwidth is not slowing you down.

Sample code:

$nodes = array('http://www.google.com', 'http://www.microsoft.com', 'http://www.rustyrazorblade.com');
$node_count = count($nodes);

$curl_arr = array();
$master = curl_multi_init();

for($i = 0; $i < $node_count; $i++)
{
	$url =$nodes[$i];
	$curl_arr[$i] = curl_init($url);
	curl_setopt($curl_arr[$i], CURLOPT_RETURNTRANSFER, true);
	curl_multi_add_handle($master, $curl_arr[$i]);
}

do {
    curl_multi_exec($master,$running);
} while($running > 0);

echo "results: ";
for($i = 0; $i < $node_count; $i++)
{
	$results = curl_multi_getcontent  ( $curl_arr[$i]  );
	echo( $i . "\n" . $results . "\n");
}
echo 'done';

It’s really not documented on php.net how to use curl_multi_getcontent, so hopefully this helps someone.

These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Reddit

External Libraries in XCode

February 6, 2008 – 5:44 pm

I need to compile something and use the MySQL C++ library. I have mysql and mysql++ already compiled, I won’t go over how to do that now.

I added the following code to the top of my source:


#include <mysql++.h>

I got an error

/Users/jhaddad/dev/search_engine/main.cpp:4:21: error: mysql++.h: No such file or directory

Not cool.

How to fix:

In XCode, open up the project settings (under the project menu). Go down to search paths, and you can change your Header search paths to the correct locations where you installed whatever you’re looking for. In this case, mine was /usr/local/includes and /usr/local/mysql/

Next time you try to recompile, you’ll get a different error, this time it should be during Linking. Might look something like the below.

“mysqlpp::Query::store(mysqlpp::SQLQueryParms&)”, referenced from:

Right click on your project in the left hand column (file listing), click “Add existing files”, then go to the prebuilt library (for me it was in /usr/local/lib), and add the file. You don’t have to copy it into the directory, you can just add it and it should work. Recompile and enjoy.

Edit: /usr/local/lib won’t be initially visible. Type command-shift-g and it’ll bring up a text field you can type a path into to go directly to a directory.

These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Reddit

Tool for Testing API’s on a Mac

February 6, 2008 – 2:48 pm

I wasn’t able to find anything that I liked. I basically wanted a front end for curl with bookmarks.

Check out my cleverly named Api Tester. No docs yet, but I hope it’s self explanatory. Click the plus on the bottom left to add a bookmark. It’s freeware. Or, just download.

These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Reddit

Unsubscribing from ebay emails is insane

January 23, 2008 – 11:22 pm

Ebay deserves a lot of credit for building a massive system that never seems to be down. Great. But you know what? That doesn’t excuse them for creating some really stupid interfaces or being complete assholes.

At the bottom of an email they sent, I saw this gem next to unsubscribe:

Please note that it may take up to 10 days to process your request.

Ten days. Ten days? Wow.

Now lets move onto the actual unsubscribe process. What a nightmare. Email preferences are grouped into sections, and only sections can be edited. All the options are hidden, so you must first expand them to see what you’re getting.

Seriously, check this thing out. I never use thumbnails either, so you know I’m pissed.

picture-1.png

This process makes me really start to hate these guys.

These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Reddit

Getting phpsh to work on a mac

December 18, 2007 – 11:14 am

I had an issue getting phpsh to work on my mac - I kept getting the following error:

Traceback (most recent call last):
File “./phpsh”, line 20, in
import readline

OK, seems easy enough. So I compiled python with readline support.

./configure –prefix=/usr/local/python –enable-readline

I change the PATH variable in my .bash_profile to point to the /usr/local/python directory first, and source’d it to get the new PATH settings. Still get the same error.

Usually at this point I’d like to tell you what was wrong, and how I fixed it. But you know what - I never figured it out. I just installed the python binary from python.org. And it works.

Sometimes it’s just not worth fighting the battle.

These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Reddit

ob_start() causes phpsh to hang, sort of

December 15, 2007 – 3:19 pm

If you manually call ob_start() at the beginning of your script, you might notice that you are unable to use phpsh. By commenting it out, I was able to fix the issue.

I don’t think it technically hangs, it just sits there with the data in a buffer waiting to be flushed.

These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Reddit

Regex Coach Mac Substitute

December 2, 2007 – 12:02 pm

For a while I had to use a Windows box for my development. Fortunately, I was able to make a good case for my company to switch me to a MacBook Pro.

One of the tools that took me a while to track down was a regex testing app. On Windows, I was a big fan of Regex Coach, which is a great app.

I was happy to find QuRegExmm. It’s not nearly as feature rich as Regex coach, but it seems to be good enough for me, since I really only want to test regular expressions, not really do anything else with it.

Screenshot

These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Reddit

PHP: An Array of Months

November 16, 2007 – 2:11 am

I’ll be honest, this isn’t very useful. The goal was to have an array of months in the least amount of code.


for($i = 1; $i <= 12; $i++)
$months[$i] = date('F', strtotime("{$i}/01/2000"));

These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Reddit