Let’s get one thing out in the open. Curl is sweet. It does it’s job very well, and I’m absoutely thrilled it exists.

If you’re using curl in your PHP app to make web requests, you’ve probably realized that by doing them one after the other, the total time of your request is the sum of all the requests put together. That’s lame.

Unfortunately using the curl_multi_exec is poorly documented in the PHP manual.

Let’s say that your app is hitting APIs from these servers:

Google: .1s
Microsoft: .3s
rustyrazorblade.com: .5s

Your total time will be .9s, just for api calls.

By using curl_multi_exec, you can execute those requests in parallel, and you’ll only be limited by the slowest request, which is about .5 sec to rustyrazorblade in this case, assuming your download bandwidth is not slowing you down.

Sample code:

$nodes = array('http://www.google.com', 'http://www.microsoft.com', 'http://www.rustyrazorblade.com');
$node_count = count($nodes);

$curl_arr = array();
$master = curl_multi_init();

for($i = 0; $i < $node_count; $i++)
{
	$url =$nodes[$i];
	$curl_arr[$i] = curl_init($url);
	curl_setopt($curl_arr[$i], CURLOPT_RETURNTRANSFER, true);
	curl_multi_add_handle($master, $curl_arr[$i]);
}

do {
    curl_multi_exec($master,$running);
} while($running > 0);

echo "results: ";
for($i = 0; $i < $node_count; $i++)
{
	$results = curl_multi_getcontent  ( $curl_arr[$i]  );
	echo( $i . "\n" . $results . "\n");
}
echo 'done';

It’s really not documented on php.net how to use curl_multi_getcontent, so hopefully this helps someone.

These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Reddit
 

78 Responses to Executing multiple curl requests in parallel with PHP and curl_multi_exec

  1. jon says:

    Stern87,
    I’m pretty sure your issue is related to the browser, not curl. Try running all 10 commands at the same time from 10 command lines (just use wget or curl itself) and see if you’re still throttled. If not, it could be a server issue, but might just be a network thing.

    However, it’s possible it’s an apache issue – try setting MaxSpareServers to something like 100, restart, and make sure if you do “ps aux | grep httpd” you see a ton of processes.

    You’ll also want to determine if it’s actually curl that’s causing the problem, so you could throw in a sleep(1000) in your script, and see what happens when you load the 10 links.

    Finally – and possibly more importantly, are you loading 10 asp pages on a different server? If so, everything we’ve discussed thus far has absolutely nothing to do with curl and everything to do with whatever server you’re trying to load the page of.

  2. curlobjects says:

    This will busy-loop, which is not good. Check out the CurlObjects implementation.


    do {
    curl_multi_exec($master,$running);
    } while($running > 0);
    —-

  3. jon says:

    I don’t see how the curl objects implementation is any better, it still has to wait on all requests to finish, if I’m reading it correctly. Just because there’s more code in the while loop doesn’t make it any better.

    By all means, correct me if I’m wrong, as I just glanced at the class.CurlBase.php file.

  4. Cody says:

    wow.. this helped me tons..

    i was only getting 20 request a minute because of latency.. but with this its not as much of an issue.

    I’m getting around 100 request give or take.

  5. Tariq says:

    hi,

    i wrote a little function which help me to get webpage content and deal with it in my program:
    [code]
    function request ($url) {
    $ch = curl_init ($url) ;
    curl_setopt ($ch, CURLOPT_RETURNTRANSFER, true) ;
    $res =curl_exec ($ch) ;
    curl_close ($ch) ;
    return ($res) ;}
    [/code]
    but when i call this function a lot e.g in some loop it reduce speed limits so i’m wondering if there is a trick to speed it up

    if it’s important to talk about my program’s job, let me send it to your email

    thank you very much

  6. Alex says:

    Hi,

    it is very useful to run multple url in parallel.

    but i met some problem in the array.
    for example, i get multiple url in an array and i had array_filter them, so the array key does not follow from 0,1,2,3…..

    The array key is like 3,4,7,8,10,14.
    And i would like to use your function to run them in parallel but array key not match if using for i=0,i $no as $url. At here i get many errors of variables n i unable to solve them.

    Most error occur at here
    $curl_arr = curl_init($url);
    and here
    $results = curl_multi_getcontent ( curl_init($url) );

    sometimes i even get no value.

    Can anyone give some example?

    Sorry for my bothering.

  7. John Kerry says:

    How to minimum cpu usage when use curl() function. My server only allow under 10% cpu usage.

  8. Tom says:

    Hi, thanks for the tutorial, but how would I be able to take the outputs from the urls and put them into their own variables? I am making something that goes through a list of url and analyzes each one, it take a long time but with this it should be much faster, I just don’t know how to implement that? Cheers.

  9. jon says:

    Hey Tom,

    You could do this:

    $results[] = curl_multi_getcontent ( $curl_arr[$i] );

  10. anonymous says:

    Hi,

    Thanks very much,

    I use this function to process pieces of my webpages containing php and mysql code in parallel.

    I created a main php page that will call client php pages on localhost (or even another server). This way I can work around the fact that php is not multitasking. I can even use multiple servers at once as in cloud computing.

    Kind regards,

  11. Thank you very much!

    Lovely example on how to use the curl multi functions, really appreciated!

    Regards

  12. yogi says:

    Great examples. I have the following problem -
    A client sends 10 requests to a server and uses the same url. The individual sub-requests are sent as part of the body, via POST with some delimited characters to identify one url over the other (kind of sub-request urls). The idea behind this being, the server gets one request and then can grab these sub-request urls and fire up parallel requests to other servers and get the data, and server the client. The way I see this is we just need one curl handle, and not 10 to be added to curl_multi_add_handle, right? Can we get all the responses on one handle? Is there an example of this kind? Thanks.

  13. Thanks – this made it simple to integrate the changes with my web app which is running “much” quicker now!

  14. Phorty says:

    First of all, very nice post jon.
    Now my question is: does curl_multi_getcontent return the variables of the website that you are accessing?

    For example, let’ say I have WEBSITE 1 which sends a POST data to WEBSITE 2 which executes a function with that post data and then returns a result. However, parts of this result are variables like this:

    echo “The ” . $animal . ” jumped over the fence.

    When I try to return this result on WEBSITE 1 using curl_multi_getcontent all it returns is The (blank) jumped over the fence.

    Am I doing something wrong or is it just simply the curl_multi_getcontent function that can’t return the values of variables?

    Thanks in advance.

  15. samat says:

    can some one help me to remove all link in website so i can open any website using my own page.for example.i develop page using curl.open example.php but it view http://www.yahoo.com..but the problem i have now is the link still point to yahoo…i hope u can understand. :)

  16. DarkCoder says:

    I will use this code for getting TITLE tags. I can do this code with 100 URLs, after that I get timeouts. 80 URLs will take about 20 Seconds or so.

    Here is my fetch TITLEs code:

    //———————–

    $nodes = array(
    http://www.goole.com',
    http://www.microsoft.com',
    // add more if needed, or feed from mysql db

    );

    $node_count = count($nodes);

    $curl_arr = array();
    $ch = curl_multi_init();
    $cho = curl_init();

    for($i = 0; $i 0);

    for($i = 0; $i < $node_count; $i++)
    {
    $results = curl_multi_getcontent ( $curl_arr[$i] );

    $inhalt = $results;

    $a = explode("”, $inhalt);
    $b = explode(“”, $a[0]);

    $title = $b[1];

    $title = str_replace(“\’”,”", $title);
    $title = str_replace(“\”",”", $title);
    // $title = ereg_replace(“[^A-Za-z0-9]“, ” “, $title);
    $title = preg_replace(“/[^A-Za-z0-9\s\s+]/”,” “,$title);
    $title = trim($title);

    echo( $i . “\n” . $title . “”);
    }
    echo ‘done’;

  17. DarkCoder says:

    Seems something went wrong. Here it’s again. Get TITLE tags from websites (mit mySQL feeding)

    ———————————————–

    $result = mysql_query(“SELECT * FROM websites LIMIT 10;”) or die(mysql_error());

    $node_count = mysql_num_rows($result);

    while($row = mysql_fetch_assoc( $result ))

    { $dom[] = $row['cndomain'];
    $nodes[] = “http://www.”.$row['cndomain']; }

    $curl_arr = array();
    $master = curl_multi_init();
    $ch = curl_init();

    for($i = 0; $i 0);

    for($i = 0; $i < $node_count; $i++)
    {
    $results = curl_multi_getcontent ( $curl_arr[$i] );

    $inhalt = $results;

    $a = explode("”, $inhalt);
    $b = explode(“”, $a[0]);

    $title = $b[1];

    $title = str_replace(“\’”,”", $title);
    $title = str_replace(“\”",”", $title);
    $title = preg_replace(“/[^A-Za-z0-9\s\s+]/”,” “,$title);
    $title = trim($title);

    echo($dom[$i].” – “.$title.”");

    }

    curl_multi_close($master);

    ———————————————–

    Improvement suggest welcome!

    BTW, cURL seems very CPU resources heavy.

  18. DarkCoder says:

    Both my above samples are displayed wrong. There seem to be a problem here posting code.

    Anyway, I found some great samples

    http://code.google.com/p/rolling-curl/
    http://code.google.com/p/rolling-curl/source/browse/#svn/trunk

    The example.php worked out of the box for me, the other one did not though.

    I grab 20 URLs in one go from mySQL. The processing takes 3-30 Seconds, depending how many dead URLs in that list I guess.

    Interesting is that for that method the CPU load is very low and it should be no problem to run a few pages with 20 URLs each parallel (that’s on Xp + XAMPP)

  19. David says:

    Thanks. This is exactly what I needed.

  20. jay johnston says:

    thanks – this totally did the trick!

  21. astaza says:

    hello, when you use this function you must know it’s will take alot of cpu and memory be carfull,
    thanks

  22. jayjay says:

    @DarkCoder [..]Both my above samples are displayed wrong. There seem to be a problem here posting code.[..]
    codes posted on wordpress blogs (like this one ;) ) are salted to prevent them executing. When coping code from the examples double and single quotes need to be replaced with their genuine ASCII counterparts. Your code has issues too here: for($i = 0; $i 0); needs to be fixed and the usage of regular expressions for title matching is greedy on time and resources. Here’s a better way to to it (though just as crude – example proposes only ;) )

    //===================================================
    #PageLoad Timer: Part A (Top segment)
    $starttime = microtime();
    $startarray = explode(” “, $starttime);
    $starttime = $startarray[1]+$startarray[0];
    //===================================================

    $nodes = array(
    “http://www.rustyrazorblade.com/”,
    “http://www.iana.org/domains/example/”,
    “http://www.php.net/”,
    “http://www.search.com”,
    “http://ac.com/”,
    “http://www.goole.com/”,
    “http://ad.com/”,
    “http://www.phpcoders.com/”,
    “http://www.ah.com/”,
    “http://www.javacoders.com/”,
    “http://www.al.com/”,
    “http://www.cars.com/”);

    $node_count = count($nodes);
    $curl_arr = array();
    $master = curl_multi_init();
    for($i=0; $i0);
    echo “RESULTS:”;
    for($i=0; $i<$node_count; $i++) {
    $results = curl_multi_getcontent($curl_arr[$i]);
    $start = strpos($results, "”);
    $end = strpos($results, “”, $start);
    $titles = substr($results, $start, $end-$start);
    echo “Titles: |”, trim(strip_tags(str_replace(“\n”, “”, $titles))), “|”;
    }

    //===================================================
    #PageLoad Timer: Part B (Bottom segment – output):

    echo “============================================================”;
    $endtime = microtime();
    $endarray = explode(” “, $endtime);
    $endtime = $endarray[1]+$endarray[0];
    $totaltime = $endtime-$starttime;
    $totaltime = round($totaltime,6);
    echo “Pageload time took $totaltime seconds.”.PHP_EOL;
    echo “============================================================”;

    enjoy :)

  23. jayjay says:

    $node_count = count($nodes);
    $curl_arr = array();
    $master = curl_multi_init();
    for($i=0; $i0);
    echo “RESULTS:”;
    for($i=0; $i<$node_count; $i++) {
    $results = curl_multi_getcontent($curl_arr[$i]);
    $start = strpos($results,"”);
    $end = strpos($results,”",$start);
    $titles = substr($results, $start, $end-$start);
    echo “Titles: |”, trim(strip_tags(str_replace(“\n”, “”, $titles))), “|”;
    }

  24. jayjay says:

    Maybe third time lucky ;) ~

    //===================================================
    #PageLoad Timer: Part A (Top segment)
    $starttime = microtime();
    $startarray = explode(” “, $starttime);
    $starttime = $startarray[1]+$startarray[0];
    //===================================================

    $nodes = array(
    “http://www.iana.org/domains/example/”,
    “http://www.php.net/”,
    “http://www.search.com”,
    “http://ac.com/”,
    “http://www.goole.com/”,
    “http://ad.com/”,
    “http://www.phpcoders.com/”,
    “http://www.ah.com/”,
    “http://www.javacoders.com/”,
    “http://www.rustyrazorblade.com/”,
    “http://www.al.com/”,
    “http://www.cars.com/”);

    $node_count = count($nodes);
    $curl_arr = array();
    $master = curl_multi_init();
    for($i=0; $i0);
    echo “RESULTS:”;
    for($i=0; $i<$node_count; $i++) {
    $results = curl_multi_getcontent($curl_arr[$i]);
    $start = strpos($results,"”);
    $end = strpos($results,”",$start);
    $titles = substr($results, $start, $end-$start);
    echo “Titles: |”, trim(strip_tags(str_replace(“\n”, “”, $titles))), “|”;
    }

    //===================================================
    #PageLoad Timer: Part B (Bottom segment – output):

    echo “============================================================”;
    $endtime = microtime();
    $endarray = explode(” “, $endtime);
    $endtime = $endarray[1]+$endarray[0];
    $totaltime = $endtime-$starttime;
    $totaltime = round($totaltime,6);
    echo “Pageload time took $totaltime seconds.”.PHP_EOL;
    echo”============================================================”;

    enjoy~ :)

  25. Stu says:

    This article was a pleasure to read, thank you all so much for the additional comments, examples and links.

    I have been using the Simple HTML DOM Parser requesting pages with file_get_html. I am finding it to be good, but slower than I would like it to be, so when I finally found this article about processing multiple requests in parallel I was excited.

    My experience with the extracting the information I want from each page took me to using the Simple HTML DOM Parser as it seemed a great deal more tolerant than other methods.

    Can I mix the usage of curl_multi_getcontent and the Simple HTML DOM Parser or is that just insaine? I am new to scraping with PHP (Can you tell?)

  26. jon says:

    Stu, I don’t see why not. Looking at the docs I saw this:

    // Create a DOM object from a string
    $html = str_get_html(‘Hello!‘);

    That seems like it’ll do the trick.

  27. Stu says:

    I was wondering what kind of techniques you guys are using for mining the data retreived? I have built an array of urls, then retreive the pages using curl_multi_getcontent, writing the content to an array then mining the array, appending new lines with the extracted data I want BUT have started to run into memory issues.

    Am I approaching this all wrong?

  28. Andrew says:

    You have a problem in the loop calling curl_multi_exec(). You need to either introduce a call to usleep() or curl_multi_select() to prevent PHP from just endlessly calling curl_multi_exec() and eating up all available CPU time. If you make a request that will take a long time to complete you can see this happening just by watching the output of top. Preferably you should use curl_multi_select() after you stop receiving CURLM_CALL_MULTI_PERFORM from curl_multi_exec(). Basically what this does is prevent PHP from calling curl_multi_exec() before you’ve gotten data back from one of the requests so instead of calling it 1000x a second while you’re waiting on network I/O it’ll just wait for the network I/O to complete before it calls curl_multi_exec() because nothing could have possibly changed before curl_multi_select() returns. I hope I made this clear enough, right now your script just sits around using %100 CPU time while waiting for a response. There’s no need for it when you can just sleep until there’s some change to the status of the multi handle.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>