Executing multiple curl requests in parallel with PHP and curl_multi_exec
Let’s get one thing out in the open. Curl is sweet. It does it’s job very well, and I’m absoutely thrilled it exists.
If you’re using curl in your PHP app to make web requests, you’ve probably realized that by doing them one after the other, the total time of your request is the sum of all the requests put together. That’s lame.
Unfortunately using the curl_multi_exec is poorly documented in the PHP manual.
Let’s say that your app is hitting APIs from these servers:
Google: .1s
Microsoft: .3s
rustyrazorblade.com: .5s
Your total time will be .9s, just for api calls.
By using curl_multi_exec, you can execute those requests in parallel, and you’ll only be limited by the slowest request, which is about .5 sec to rustyrazorblade in this case, assuming your download bandwidth is not slowing you down.
Sample code:
$nodes = array('http://www.google.com', 'http://www.microsoft.com', 'http://www.rustyrazorblade.com');
$node_count = count($nodes);
$curl_arr = array();
$master = curl_multi_init();
for($i = 0; $i < $node_count; $i++)
{
$url =$nodes[$i];
$curl_arr[$i] = curl_init($url);
curl_setopt($curl_arr[$i], CURLOPT_RETURNTRANSFER, true);
curl_multi_add_handle($master, $curl_arr[$i]);
}
do {
curl_multi_exec($master,$running);
} while($running > 0);
echo "results: ";
for($i = 0; $i < $node_count; $i++)
{
$results = curl_multi_getcontent ( $curl_arr[$i] );
echo( $i . "\n" . $results . "\n");
}
echo 'done';
It’s really not documented on php.net how to use curl_multi_getcontent, so hopefully this helps someone.
78 Responses to Executing multiple curl requests in parallel with PHP and curl_multi_exec
Leave a Reply Cancel reply
Recent Comments
- pratibha on MySQL Triggers Tutorial
- pratibha on MySQL Triggers Tutorial
- MySQL Tutorials on MySQL Triggers Tutorial
- jon on The Lack of Flexibility of Stored Procedures in MySQL
- Nithya on The Lack of Flexibility of Stored Procedures in MySQL
- vietnam travel guide on MySQL Triggers Tutorial
- Phil Freo on Making Better Use of your .ackrc file
- PHP Examples on MySQL Triggers Tutorial
- jon on Drizzle Differences from MySQL
- Will on Drizzle Differences from MySQL
Recent Posts
- Vim: Use !make: to avoid auto jumping to files
- Weird Disutils Error When Running Python Scripts within MacVim
- Installing vim-ipython with MacVim
- Applescripting A Remote X-Windows Session for Virt-Manager
- Drizzle Differences from MySQL
- Great Article by the Varnish Architect
- Making Better Use of your .ackrc file
- Nginx pub/sub module
- Coffeescript, Bootstrap, and Less are amazing
- Splitmytab ready for the public!
Categories
- ack (1)
- amazon (1)
- answerbag (6)
- apache (9)
- apple (9)
- awk (2)
- bbedit (2)
- bootstrap (1)
- c++ (3)
- chrome (2)
- cluster (1)
- cocoa (1)
- coffeescript (2)
- collective intelligence (1)
- css (1)
- curl (3)
- db2 (1)
- demand media (1)
- drizzle (1)
- ebay (1)
- eclipse (4)
- erlang (13)
- facebook (1)
- fortran (1)
- gen_server (1)
- git (5)
- google (4)
- haddad (1)
- hdf5 (1)
- html (1)
- innodb (1)
- itunes (1)
- java (2)
- jester (2)
- kvm (2)
- launchbar (1)
- leex (1)
- less (1)
- letsgetnuts.com (1)
- libvirt (1)
- links (6)
- linux (28)
- lucene (1)
- mac (16)
- memcached (1)
- misconception (1)
- mobile (1)
- mono (1)
- mssql (1)
- munin (1)
- mysql (34)
- nginx (1)
- numpy (1)
- oracle (1)
- php (23)
- puppet (4)
- pyparsing (1)
- pytables (1)
- python (13)
- q&a (1)
- quicksilver (1)
- rant (6)
- readynas (1)
- redis (2)
- regex (1)
- replication (1)
- search (1)
- shitty code (1)
- solr (3)
- spaces (1)
- splitmytab (2)
- sshfs (1)
- stored procedure (1)
- svn (5)
- textmate (2)
- tips (25)
- tornado (1)
- trac (1)
- tutorial (4)
- ubuntu (3)
- Uncategorized (5)
- unix (1)
- vim (6)
- virtual box (6)
- vmware (1)
- weird (3)
- wikipedia (1)
- windows (1)
- xcode (1)








You are right, it is sad the lack of documentation about this on the php.net website. I’m glad you put a link to this guide as a comment.
You forgot to close your handles I think….
//close the handles
inside the last for loop:
curl_multi_remove_handle($curl_arr[$i]);
after the last for loop:
curl_multi_close($master);
also see the PHP manual:
http://cn.php.net/manual/en/function.curl-multi-init.php
small mistake, it should read:
curl_multi_remove_handle($master, $curl_arr[$i]);
Hey Thijs,
I’m pretty sure it’s right as is. The for loop is to add each of the curl handles to the multihandle. If I removed them, it wouldn’t work. Try my above code, it’ll work.
Hi Jon,
I mean it’s better to close the handles after you use them (of course not before).
Something like this:
[code]
for($i = 0; $i < $node_count; $i++)
{
$results = curl_multi_getcontent ( $curl_arr[$i] );
echo( $i . "\n" . $results . "\n");
url_multi_remove_handle($curl_arr[$i]);
}
url_multi_remove_handle($master, $curl_arr[$i]);
echo 'done';
[/code]
Hi Jon,
I mean it’s better to close the handles after you use them (of course not before).
Something like this:
[code]
for($i = 0; $i < $node_count; $i++)
{
$results = curl_multi_getcontent ( $curl_arr[$i] );
echo( $i . “\n” . $results . “\n”);
url_multi_remove_handle($master, $curl_arr[$i]);
}
curl_multi_close($master);
echo ‘done’;
[/code]
ps. please disregard the previous comment from me. Difficult to copy/paste code here
hi, quick question: what would be the most efficient way to use cURL to grab a page AND the header, but only display the page to the users browser. i would ideally like to do this by only opening one cURL session. currently i have the following code, which, as you can see, is intended to pass all $_GET and $_POST info, but currently has no mechanism to keep track of cookies:
$target_domain = ‘http://targetdomain.com/fowlder/page.cfm';
$ch = curl_init();
curl_setopt ($ch, CURLOPT_URL, “{$target_domain}?{$_SERVER["QUERY_STRING"]}”);
curl_setopt ($ch, CURLOPT_HEADER, true); // this displays header info on users browser but i really want to just load it into variables
curl_setopt ($ch, CURLOPT_POST, true);
curl_setopt ($ch, CURLOPT_POSTFIELDS, $_POST);
curl_exec ($ch);
curl_close ($ch);
Hey Sean,
I poked around the PHP site, and there’s actually someone who wrote a function to do this.
Give it a try, please let me know if if works.
http://us3.php.net/manual/en/function.curl-setopt.php#42009
All you need is this part, btw:
list($response_headers,$response_body) = explode(“\r\n\r\n”,$r,2);
Enjoy.
helo,
I need an advice please:
i try to connect to a website, using curl, but its require javascript and i am not able to see what i need. there is exist other solution to this ?
regards
You’ll have to figure out which javascript file you’ll need, then run your code though a javascript interpreter. To be honest, I’ve never done it, and I have no idea how good they are.
Here’s 1 project:
http://j4p5.sourceforge.net/
@george:
In the past, when faced with that problem, I ended up writing a few regular expressions to get the data I needed directly out of the javascript files.
Depending on what you’re trying to achieve, you may find this type of approach simplest.
mi problem is:
first option is login whith => http://192.168.1.100:8088/asterisk/manager?action=login&username=admin&secret=123
and second option is => http://192.168.1.100:8088/asterisk/rawman?action=sipeers
but the secondo option no succes,
navegator say: Response: Error Message: Authentication Required
WHY???????????????
if I’m logeado
Fantastic! I came here from the PHP documentation and this is exactly what I was looking for. And, I agree with Thijs–you should close your handles after you are done using them.
not nice with pages which has moved
(301, 302 http error)
Hi all
I need a quick advise: I have a program crawling many pages from the same server with many file_get_contents() calls. Do you think I would get some improvement by replacing them with just a curl_multi_exec() call to retrieve them all at once, or it’ll probably take the same time since they all come from the same server?
If you have the bandwidth, it’ll be faster to use curl_multi_exec.
Anyone know how to catch redirect pages?
I have problem like this:
URL1 -> URL2 -> URL3
-> mean redirect
So I want to get URL3 content only with entering url 1. Anyone know How to solve this problem?
Thx..
Coliq, try this:
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
Just thinking out loud….
Wouldn’t it be an idea to use some ajax to load each api data on it’s own which will mean in this case the first response would be at .1 from Google. I mean put the cURL part in a function on a different php file. call the php file in a div with the proper variables with an onload function.
just my thoughts and nothing concrete (yet)
Yes, you can execute multiple requests through ajax, but what if you’re not using PHP through the web, or the calls you’re making require a secret api key?
if it’s only for your own (as i asume that’s what you mean with not thru the web) ..then i do not seed that much need to worry about about performance or neet looking ajax gif’s telling there’s something going on. I would use your way then. The ajax thought came to me as i was reading your post and thinking about ways to handle this.
Hi Jon,
I’m trying to fill in the documentation gaps in cURL – I hope it’s okay to include something similar to this example?
Thanks,
Ross
Sure. Feel free to link to this page as well.
Damn, that saved me a lot of time doing nasty API calls and writing them in a temporary file instead of polling the API directly.
Thanks for the good work!
Thank you very much for this. Do you know how many URLs curl_multi_exec() can handle at once and not crash my application? Is it just dependent on my Memory?
Hey Jeff,
I haven’t had a chance to try to max out curl_multi_exec. I suspect you’ll hit a bottleneck on your network before you’ll run out of memory.
Thanks for your reply Jon. Just like you said, I tried it with 20 URLs at once and it crashed Apache.
Jeff, there must be something whacky with your Apache installation. I’m doing something fairly similar to the above code, and when I clear the entire cache, it calls out to well over 20 URLs. I’ve yet to see Apache (2.2.6 – standard Slackware binary package) crash on it. (2.2.8 and later crash on me even without calling out to 20+ URLs, curiously. I’m pretty sure it’s just a packaging screwup, though, as I’ve tested a locally compiled 2.2.10 without hitting any crashes on cURL.)
Cheers,
– Dave
I thought this was a great piece of code, and it was very useful. However, I think I can improve on it.
In the original example, you showed how to reduce the wait from 0.9 to 0.5 seconds. The question arises that if 1 website is very slow, you could hold up your app for a long time. It would be great if we could work on the returned files as soon as it was returned, rather than wait for ALL files in the “multi handle”. The function (and its call) should echo out as soon as a site is downloaded, then after all pages are downloaded should return an array in the same order as the input $nodes.
function getMultipleDocuments($nodes, $referer){
set_time_limit(90);
if(!$referer){
$referer = $nodes[0];
}
$node_count = count($nodes);
$curl_arr = array();
$master = curl_multi_init();
for($i = 0; $i < $node_count; $i++)
{
$curl_arr[$i] = curl_init($nodes[$i]);
curl_setopt($curl_arr[$i], CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl_arr[$i],CURLOPT_FRESH_CONNECT,true);
curl_setopt($curl_arr[$i],CURLOPT_CONNECTTIMEOUT,10);
curl_setopt($curl_arr[$i],CURLOPT_RETURNTRANSFER,true);
curl_setopt($curl_arr[$i],CURLOPT_REFERER,$referer);
curl_setopt($curl_arr[$i],CURLOPT_TIMEOUT,30);
curl_multi_add_handle($master, $curl_arr[$i]);
}
$previousActive = -1;
$finalresult = array();
$returnedOrder = array();
do{
curl_multi_exec($master, $running);
if($running !== $previousActive){
$info = curl_multi_info_read($master);
if($info['handle']){
$finalresult[] = curl_multi_getcontent($info['handle']);
$returnedOrder[] = array_search($info['handle'], $curl_arr, true);
curl_multi_remove_handle($master, $info['handle']);
curl_close($curl_arr[end($returnedOrder)]);
echo ‘downloaded ‘.$nodes[end($returnedOrder)].’. We can process it further straight away, but for this example, we will not.’;
ob_flush();flush();
}
}
$previousActive = $running;
}while($running > 0);
curl_multi_close($master);
set_time_limit(30);
return array_combine($returnedOrder, $finalresult);
}
$nodes = array(‘http://mediumSpeedSite.org', ‘http://fastSpeedSite.com', ‘http://quiteSlowSite.com‘);
$returnedDocs = getMultipleDocuments($nodes, null);
thanks for posting this, there is little to no documentation on the php.net website, this cleared things up
Thanks for sharing. I made some modifications so that each request is processed as soon as it completes. I’ve found that make things a lot faster particularly when you’re dealing with a large number of requests:
http://onlineaspect.com/2009/01/26/how-to-use-curl_multi-without-blocking/
How does one speed up the time it takes for a CURL script to execute. Currently I am trying to execute 1 URL. I am told it is only taking .2 seconds on the other end for the server to respond but it is taking roughly 20 seconds for the CURL script to fully execute and return a response to me. Any ideas?
What else are you doing on the page?
The simplest way to find out what’s taking a long time is to time the various parts of the page using microtime(true). If you want to be more advanced, look into xdebug and kcachegrind (or webgrind)
Check it out here: http://www.rustyrazorblade.com/2007/07/26/php-setting-up-xdebug-with-kcachegrind/
At this point I am just running the curl execution without anything else going on. It is a test page at the moment. I am trying to work out a solution for another problem. Would you happen to know how I capture the response from the other server. I am not sure as to how to load it into a variable. Currently I am getting the response but haven’t figured out what I am suppose to use to capture the other servers response.
thanks.
to have the results returns by curl_exec, you need to use this:
curl_setopt($curl_handle, CURLOPT_RETURNTRANSFER, true);
I’ve got the response already I just am not sure how to load it into a variable that I can then use to manipulate the page. Are you saying $curl_handle will be my variable?
So that if I want to create an if() statement I would use if($curle_handle = “response text”) { do this}
???
You’d do this to get the response into a variable:
$curl_handle = curl_init(“http://whatever.com”);
curl_setopt($curl_handle, CURLOPT_RETURNTRANSFER, true);
$response = curl_exec($curl_handle);
If you’re worried about timeout, you can also do:
curl_setopt($curl_handle, CURLOPT_TIMEOUT, 2);
Last param is seconds.
You should get familiar with the curl options here: http://us2.php.net/manual/en/function.curl-setopt.php
Good luck,
Jon
I have tried this but I’m not sure I am getting what I need. Here is what i have so far. Maybe you can see if I messed up somewhere:
[code]
$myVin = $_GET['vin'];
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_URL, "http://socket.somewebsite.com:8080/?UID=C412012&REQUEST=INV&VIN=$myVin&INV_DATE=N&ONE_OWNER=Y");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$result1 = curl_exec($ch);
if($result1['curlopt_returntransfer'] == "Yes 1") {
echo "We have a Winner!";
} else {
echo "Not a One Owner";
}
curl_close($ch);
[/code]
P.S. I’m not worried about timeout. Thanks. I am more concerned with the amount of time it is taking to send and receive the response though.
Ok. So I am confused now. I’ve got the code I mentioned above working. The problem seems to be that the variable $result1 does not want to work properly in my if statement.
I believe it should be written so but please let me know if I am wrong.
if($result1 == “Yes 1″) {
echo “We have a Winner!”;
}
if(result1 == “Yes N”) {
echo “Not a One Owner”;
}
For some reason my if statement does not appear to be working. Am I missing something on this?
Oh yah, ive been poring over the manual on php.net as you mentioned earlier but still cannot figure out for the life of me why my variable is not being excepted. From what I gather my variable $result1 should read either “Yes N” or “Yes 1″ but when I try to pass it to my if statement the if statement does not seem to be recognizing the variable at all. I can echo $result1 onto the page but something seems amiss for the if statement not to be recognizing this.
I tried to hit that URL, it never gave a result. The problem is the remote site. Try it in your browser first.
Hi Jon, I just put a dummy url in this forum for security purposes. I think I’ve got it figured out. I just wish I could find something that would execute quicker. Total time it takes for this to send, receive and execute based on the response is 20 seconds. Way to long at the moment. I need it to be 1 or 2 seconds most.
You’re being pretty vague… I’ve suggested ways to find out what’s taking up so much time.
I can’t really help you for free anymore – you might want to look to a user group or consultant to help you with this. Good luck.
oh. sorry to take up your time. I thought this was a free forum. my apologies.
It is, but you aren’t taking the advice I’ve given to isolate the code that’s taking 20 seconds.
I use curl in my scripts. Your sample is great. But I got a problem: for example, on the HTML-page I have 10 links, in every link curl downloading
http://www.example.com/?page=1 in first link
http://www.example.com/?page=2 in second link
…
http://www.example.com/?page=10 in tenth link
If you quickly open each link (of HTML-page) in new tabs, then you’ll see the funny thing: while one tab (copy of running script) not completed – the next copy of the script will not run.
This problem with “curl” and “multi-curl”(((
Please help. How fix that?
Are you sure it’s not just your browser being throttled? Try loading the URL from a second machine and see if you still get that delay.
I’m sure. From the second machine – it’s ok, but if you open next tab, next…. you’ll get the same funny problem.
This is very sad.
Here is simple example for an experiment.
test.php:
===============================================
<?
if (isset($_GET["link"])) {
$link = “http://nehe.gamedev.net/lesson.asp?index=0″.$_GET["link"];
$nodes = array();
array_push($nodes, $link);
$node_count = count($nodes);
$curl_arr = array();
$master = curl_multi_init();
for($i = 0; $i 0);
echo “results: “;
for($i = 0; $i
Link 1
Link 2
Link 3
Link 4
Link 5
Link 6
Link 7
Link 8
Link 9
===============================================
I’m using firefox browser, so if we will open all links in new tabs (Middle Click) rapidly – and you will see what I mean.
While curl’s code block of first executing script will undone – next executing script will not start downloading. Why??? How can I fix it?
Thanks again!
I appreciate you, Jon.