Executing multiple curl requests in parallel with PHP and curl_multi_exec
Let’s get one thing out in the open. Curl is sweet. It does it’s job very well, and I’m absoutely thrilled it exists.
If you’re using curl in your PHP app to make web requests, you’ve probably realized that by doing them one after the other, the total time of your request is the sum of all the requests put together. That’s lame.
Unfortunately using the curl_multi_exec is poorly documented in the PHP manual.
Let’s say that your app is hitting APIs from these servers:
Google: .1s
Microsoft: .3s
rustyrazorblade.com: .5s
Your total time will be .9s, just for api calls.
By using curl_multi_exec, you can execute those requests in parallel, and you’ll only be limited by the slowest request, which is about .5 sec to rustyrazorblade in this case, assuming your download bandwidth is not slowing you down.
Sample code:
$nodes = array('http://www.google.com', 'http://www.microsoft.com', 'http://www.rustyrazorblade.com');
$node_count = count($nodes);
$curl_arr = array();
$master = curl_multi_init();
for($i = 0; $i < $node_count; $i++)
{
$url =$nodes[$i];
$curl_arr[$i] = curl_init($url);
curl_setopt($curl_arr[$i], CURLOPT_RETURNTRANSFER, true);
curl_multi_add_handle($master, $curl_arr[$i]);
}
do {
curl_multi_exec($master,$running);
} while($running > 0);
echo "results: ";
for($i = 0; $i < $node_count; $i++)
{
$results = curl_multi_getcontent ( $curl_arr[$i] );
echo( $i . "\n" . $results . "\n");
}
echo 'done';
It’s really not documented on php.net how to use curl_multi_getcontent, so hopefully this helps someone.
78 Responses to Executing multiple curl requests in parallel with PHP and curl_multi_exec
Leave a Reply Cancel reply
Recent Comments
- Anil on MySQL Triggers Tutorial
- Ashish on MySQL Triggers Tutorial
- David on iCal Agenda
- jon on IP address geolocation SQL database
- pim on IP address geolocation SQL database
- jnns on Redis Wildcard Delete
- K.C. Murphy on iCal Agenda
- BA on Experts Exchange should be removed from Google search results
- Andrew on Executing multiple curl requests in parallel with PHP and curl_multi_exec
- Stu on Executing multiple curl requests in parallel with PHP and curl_multi_exec
Recent Posts
- New Project: Jester
- Open New Terminal Tip
- Installing MySQLdb on MacOS Lion
- Headless VM Server Using Ubuntu 11.10
- Get rid of Facebook’s Awful Ticker
- Api Tester now hosted on Github
- Trac .11 jQuery bug
- Multiple Filetypes in Vim
- Git Tip: Setting Up Your Remote Server
- Install issue pymongo on OSX (setuptools out of date)
Categories
- amazon (1)
- answerbag (6)
- apache (9)
- apple (8)
- awk (2)
- bbedit (2)
- c++ (3)
- chrome (2)
- cluster (1)
- cocoa (1)
- collective intelligence (1)
- curl (3)
- db2 (1)
- demand media (1)
- ebay (1)
- eclipse (4)
- erlang (13)
- facebook (1)
- fortran (1)
- gen_server (1)
- git (5)
- google (4)
- haddad (1)
- hdf5 (1)
- html (1)
- innodb (1)
- itunes (1)
- java (2)
- jester (1)
- kvm (1)
- launchbar (1)
- leex (1)
- letsgetnuts.com (1)
- libvirt (1)
- links (6)
- linux (27)
- lucene (1)
- mac (16)
- memcached (1)
- misconception (1)
- mobile (1)
- mono (1)
- mssql (1)
- munin (1)
- mysql (31)
- numpy (1)
- oracle (1)
- php (23)
- puppet (4)
- pyparsing (1)
- pytables (1)
- python (11)
- q&a (1)
- quicksilver (1)
- rant (6)
- readynas (1)
- redis (2)
- regex (1)
- replication (1)
- search (1)
- shitty code (1)
- solr (3)
- spaces (1)
- sshfs (1)
- stored procedure (1)
- svn (5)
- textmate (2)
- tips (22)
- trac (1)
- tutorial (4)
- ubuntu (3)
- Uncategorized (4)
- unix (1)
- vim (3)
- virtual box (6)
- vmware (1)
- weird (3)
- wikipedia (1)
- windows (1)
- xcode (1)








Stern87,
I’m pretty sure your issue is related to the browser, not curl. Try running all 10 commands at the same time from 10 command lines (just use wget or curl itself) and see if you’re still throttled. If not, it could be a server issue, but might just be a network thing.
However, it’s possible it’s an apache issue – try setting MaxSpareServers to something like 100, restart, and make sure if you do “ps aux | grep httpd” you see a ton of processes.
You’ll also want to determine if it’s actually curl that’s causing the problem, so you could throw in a sleep(1000) in your script, and see what happens when you load the 10 links.
Finally – and possibly more importantly, are you loading 10 asp pages on a different server? If so, everything we’ve discussed thus far has absolutely nothing to do with curl and everything to do with whatever server you’re trying to load the page of.
This will busy-loop, which is not good. Check out the CurlObjects implementation.
—
do {
curl_multi_exec($master,$running);
} while($running > 0);
—-
I don’t see how the curl objects implementation is any better, it still has to wait on all requests to finish, if I’m reading it correctly. Just because there’s more code in the while loop doesn’t make it any better.
By all means, correct me if I’m wrong, as I just glanced at the class.CurlBase.php file.
wow.. this helped me tons..
i was only getting 20 request a minute because of latency.. but with this its not as much of an issue.
I’m getting around 100 request give or take.
hi,
i wrote a little function which help me to get webpage content and deal with it in my program:
[code]
function request ($url) {
$ch = curl_init ($url) ;
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, true) ;
$res =curl_exec ($ch) ;
curl_close ($ch) ;
return ($res) ;}
[/code]
but when i call this function a lot e.g in some loop it reduce speed limits so i’m wondering if there is a trick to speed it up
if it’s important to talk about my program’s job, let me send it to your email
thank you very much
Hi,
it is very useful to run multple url in parallel.
but i met some problem in the array.
for example, i get multiple url in an array and i had array_filter them, so the array key does not follow from 0,1,2,3…..
The array key is like 3,4,7,8,10,14.
And i would like to use your function to run them in parallel but array key not match if using for i=0,i $no as $url. At here i get many errors of variables n i unable to solve them.
Most error occur at here
$curl_arr = curl_init($url);
and here
$results = curl_multi_getcontent ( curl_init($url) );
sometimes i even get no value.
Can anyone give some example?
Sorry for my bothering.
How to minimum cpu usage when use curl() function. My server only allow under 10% cpu usage.
Hi, thanks for the tutorial, but how would I be able to take the outputs from the urls and put them into their own variables? I am making something that goes through a list of url and analyzes each one, it take a long time but with this it should be much faster, I just don’t know how to implement that? Cheers.
Hey Tom,
You could do this:
$results[] = curl_multi_getcontent ( $curl_arr[$i] );
Hi,
Thanks very much,
I use this function to process pieces of my webpages containing php and mysql code in parallel.
I created a main php page that will call client php pages on localhost (or even another server). This way I can work around the fact that php is not multitasking. I can even use multiple servers at once as in cloud computing.
Kind regards,
Thank you very much!
Lovely example on how to use the curl multi functions, really appreciated!
Regards
Great examples. I have the following problem -
A client sends 10 requests to a server and uses the same url. The individual sub-requests are sent as part of the body, via POST with some delimited characters to identify one url over the other (kind of sub-request urls). The idea behind this being, the server gets one request and then can grab these sub-request urls and fire up parallel requests to other servers and get the data, and server the client. The way I see this is we just need one curl handle, and not 10 to be added to curl_multi_add_handle, right? Can we get all the responses on one handle? Is there an example of this kind? Thanks.
Thanks – this made it simple to integrate the changes with my web app which is running “much” quicker now!
First of all, very nice post jon.
Now my question is: does curl_multi_getcontent return the variables of the website that you are accessing?
For example, let’ say I have WEBSITE 1 which sends a POST data to WEBSITE 2 which executes a function with that post data and then returns a result. However, parts of this result are variables like this:
echo “The ” . $animal . ” jumped over the fence.
When I try to return this result on WEBSITE 1 using curl_multi_getcontent all it returns is The (blank) jumped over the fence.
Am I doing something wrong or is it just simply the curl_multi_getcontent function that can’t return the values of variables?
Thanks in advance.
can some one help me to remove all link in website so i can open any website using my own page.for example.i develop page using curl.open example.php but it view http://www.yahoo.com..but the problem i have now is the link still point to yahoo…i hope u can understand.
I will use this code for getting TITLE tags. I can do this code with 100 URLs, after that I get timeouts. 80 URLs will take about 20 Seconds or so.
Here is my fetch TITLEs code:
//———————–
$nodes = array(
‘http://www.goole.com',
‘http://www.microsoft.com',
// add more if needed, or feed from mysql db
);
$node_count = count($nodes);
$curl_arr = array();
$ch = curl_multi_init();
$cho = curl_init();
for($i = 0; $i 0);
for($i = 0; $i < $node_count; $i++)
{
$results = curl_multi_getcontent ( $curl_arr[$i] );
$inhalt = $results;
$a = explode("”, $inhalt);
$b = explode(“”, $a[0]);
$title = $b[1];
$title = str_replace(“\’”,”", $title);
$title = str_replace(“\”",”", $title);
// $title = ereg_replace(“[^A-Za-z0-9]“, ” “, $title);
$title = preg_replace(“/[^A-Za-z0-9\s\s+]/”,” “,$title);
$title = trim($title);
echo( $i . “\n” . $title . “”);
}
echo ‘done’;
Seems something went wrong. Here it’s again. Get TITLE tags from websites (mit mySQL feeding)
———————————————–
$result = mysql_query(“SELECT * FROM websites LIMIT 10;”) or die(mysql_error());
$node_count = mysql_num_rows($result);
while($row = mysql_fetch_assoc( $result ))
{ $dom[] = $row['cndomain'];
$nodes[] = “http://www.”.$row['cndomain']; }
$curl_arr = array();
$master = curl_multi_init();
$ch = curl_init();
for($i = 0; $i 0);
for($i = 0; $i < $node_count; $i++)
{
$results = curl_multi_getcontent ( $curl_arr[$i] );
$inhalt = $results;
$a = explode("”, $inhalt);
$b = explode(“”, $a[0]);
$title = $b[1];
$title = str_replace(“\’”,”", $title);
$title = str_replace(“\”",”", $title);
$title = preg_replace(“/[^A-Za-z0-9\s\s+]/”,” “,$title);
$title = trim($title);
echo($dom[$i].” – “.$title.”");
}
curl_multi_close($master);
———————————————–
Improvement suggest welcome!
BTW, cURL seems very CPU resources heavy.
Both my above samples are displayed wrong. There seem to be a problem here posting code.
Anyway, I found some great samples
http://code.google.com/p/rolling-curl/
http://code.google.com/p/rolling-curl/source/browse/#svn/trunk
The example.php worked out of the box for me, the other one did not though.
I grab 20 URLs in one go from mySQL. The processing takes 3-30 Seconds, depending how many dead URLs in that list I guess.
Interesting is that for that method the CPU load is very low and it should be no problem to run a few pages with 20 URLs each parallel (that’s on Xp + XAMPP)
Thanks. This is exactly what I needed.
thanks – this totally did the trick!
hello, when you use this function you must know it’s will take alot of cpu and memory be carfull,
thanks
@DarkCoder [..]Both my above samples are displayed wrong. There seem to be a problem here posting code.[..]
) are salted to prevent them executing. When coping code from the examples double and single quotes need to be replaced with their genuine ASCII counterparts. Your code has issues too here: for($i = 0; $i 0); needs to be fixed and the usage of regular expressions for title matching is greedy on time and resources. Here’s a better way to to it (though just as crude – example proposes only
)
codes posted on wordpress blogs (like this one
//===================================================
#PageLoad Timer: Part A (Top segment)
$starttime = microtime();
$startarray = explode(” “, $starttime);
$starttime = $startarray[1]+$startarray[0];
//===================================================
$nodes = array(
“http://www.rustyrazorblade.com/”,
“http://www.iana.org/domains/example/”,
“http://www.php.net/”,
“http://www.search.com”,
“http://ac.com/”,
“http://www.goole.com/”,
“http://ad.com/”,
“http://www.phpcoders.com/”,
“http://www.ah.com/”,
“http://www.javacoders.com/”,
“http://www.al.com/”,
“http://www.cars.com/”);
$node_count = count($nodes);
$curl_arr = array();
$master = curl_multi_init();
for($i=0; $i0);
echo “RESULTS:”;
for($i=0; $i<$node_count; $i++) {
$results = curl_multi_getcontent($curl_arr[$i]);
$start = strpos($results, "”);
$end = strpos($results, “”, $start);
$titles = substr($results, $start, $end-$start);
echo “Titles: |”, trim(strip_tags(str_replace(“\n”, “”, $titles))), “|”;
}
//===================================================
#PageLoad Timer: Part B (Bottom segment – output):
echo “============================================================”;
$endtime = microtime();
$endarray = explode(” “, $endtime);
$endtime = $endarray[1]+$endarray[0];
$totaltime = $endtime-$starttime;
$totaltime = round($totaltime,6);
echo “Pageload time took $totaltime seconds.”.PHP_EOL;
echo “============================================================”;
enjoy
$node_count = count($nodes);
$curl_arr = array();
$master = curl_multi_init();
for($i=0; $i0);
echo “RESULTS:”;
for($i=0; $i<$node_count; $i++) {
$results = curl_multi_getcontent($curl_arr[$i]);
$start = strpos($results,"”);
$end = strpos($results,”",$start);
$titles = substr($results, $start, $end-$start);
echo “Titles: |”, trim(strip_tags(str_replace(“\n”, “”, $titles))), “|”;
}
Maybe third time lucky
~
//===================================================
#PageLoad Timer: Part A (Top segment)
$starttime = microtime();
$startarray = explode(” “, $starttime);
$starttime = $startarray[1]+$startarray[0];
//===================================================
$nodes = array(
“http://www.iana.org/domains/example/”,
“http://www.php.net/”,
“http://www.search.com”,
“http://ac.com/”,
“http://www.goole.com/”,
“http://ad.com/”,
“http://www.phpcoders.com/”,
“http://www.ah.com/”,
“http://www.javacoders.com/”,
“http://www.rustyrazorblade.com/”,
“http://www.al.com/”,
“http://www.cars.com/”);
$node_count = count($nodes);
$curl_arr = array();
$master = curl_multi_init();
for($i=0; $i0);
echo “RESULTS:”;
for($i=0; $i<$node_count; $i++) {
$results = curl_multi_getcontent($curl_arr[$i]);
$start = strpos($results,"”);
$end = strpos($results,”",$start);
$titles = substr($results, $start, $end-$start);
echo “Titles: |”, trim(strip_tags(str_replace(“\n”, “”, $titles))), “|”;
}
//===================================================
#PageLoad Timer: Part B (Bottom segment – output):
echo “============================================================”;
$endtime = microtime();
$endarray = explode(” “, $endtime);
$endtime = $endarray[1]+$endarray[0];
$totaltime = $endtime-$starttime;
$totaltime = round($totaltime,6);
echo “Pageload time took $totaltime seconds.”.PHP_EOL;
echo”============================================================”;
enjoy~
This article was a pleasure to read, thank you all so much for the additional comments, examples and links.
I have been using the Simple HTML DOM Parser requesting pages with file_get_html. I am finding it to be good, but slower than I would like it to be, so when I finally found this article about processing multiple requests in parallel I was excited.
My experience with the extracting the information I want from each page took me to using the Simple HTML DOM Parser as it seemed a great deal more tolerant than other methods.
Can I mix the usage of curl_multi_getcontent and the Simple HTML DOM Parser or is that just insaine? I am new to scraping with PHP (Can you tell?)
Stu, I don’t see why not. Looking at the docs I saw this:
// Create a DOM object from a string
Hello!‘);$html = str_get_html(‘
That seems like it’ll do the trick.
I was wondering what kind of techniques you guys are using for mining the data retreived? I have built an array of urls, then retreive the pages using curl_multi_getcontent, writing the content to an array then mining the array, appending new lines with the extracted data I want BUT have started to run into memory issues.
Am I approaching this all wrong?
You have a problem in the loop calling curl_multi_exec(). You need to either introduce a call to usleep() or curl_multi_select() to prevent PHP from just endlessly calling curl_multi_exec() and eating up all available CPU time. If you make a request that will take a long time to complete you can see this happening just by watching the output of top. Preferably you should use curl_multi_select() after you stop receiving CURLM_CALL_MULTI_PERFORM from curl_multi_exec(). Basically what this does is prevent PHP from calling curl_multi_exec() before you’ve gotten data back from one of the requests so instead of calling it 1000x a second while you’re waiting on network I/O it’ll just wait for the network I/O to complete before it calls curl_multi_exec() because nothing could have possibly changed before curl_multi_select() returns. I hope I made this clear enough, right now your script just sits around using %100 CPU time while waiting for a response. There’s no need for it when you can just sleep until there’s some change to the status of the multi handle.