<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Working with DOM in PHP &#8211; Looking at a PHP HTML Parser</title>
	<atom:link href="http://www.rustyrazorblade.com/2010/06/working-with-dom-in-php-looking-at-an-html-parser/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.rustyrazorblade.com/2010/06/working-with-dom-in-php-looking-at-an-html-parser/</link>
	<description>Tech Thoughts, Mostly on LAMP - by Jon Haddad</description>
	<lastBuildDate>Mon, 23 Jan 2012 09:03:48 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
	<item>
		<title>By: Lyddy</title>
		<link>http://www.rustyrazorblade.com/2010/06/working-with-dom-in-php-looking-at-an-html-parser/comment-page-1/#comment-66546</link>
		<dc:creator>Lyddy</dc:creator>
		<pubDate>Tue, 10 May 2011 08:05:32 +0000</pubDate>
		<guid isPermaLink="false">http://www.rustyrazorblade.com/?p=1095#comment-66546</guid>
		<description>I feel so much happier now I udnrestnad all this. Thanks!</description>
		<content:encoded><![CDATA[<p>I feel so much happier now I udnrestnad all this. Thanks!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mang</title>
		<link>http://www.rustyrazorblade.com/2010/06/working-with-dom-in-php-looking-at-an-html-parser/comment-page-1/#comment-53988</link>
		<dc:creator>Mang</dc:creator>
		<pubDate>Sun, 24 Oct 2010 14:12:24 +0000</pubDate>
		<guid isPermaLink="false">http://www.rustyrazorblade.com/?p=1095#comment-53988</guid>
		<description>Look up SimpleHTMLDOM.  I use that all the time.  It is a bit sluggish and occasionally awkward to use but I&#039;ve rarely seen it fail to parse even the most evil HTML - such as Microsoft Word&#039;s idea of &quot;HTML&quot;.  Selecting DOM elements is flawless.  It has difficulties with modifying the DOM though - I end up having to call save() and then load() every time I make any change before making more modifications or it gets confused.

phpquery looks interesting for those scenarios where I&#039;m modifying the DOM frequently.</description>
		<content:encoded><![CDATA[<p>Look up SimpleHTMLDOM.  I use that all the time.  It is a bit sluggish and occasionally awkward to use but I&#8217;ve rarely seen it fail to parse even the most evil HTML &#8211; such as Microsoft Word&#8217;s idea of &#8220;HTML&#8221;.  Selecting DOM elements is flawless.  It has difficulties with modifying the DOM though &#8211; I end up having to call save() and then load() every time I make any change before making more modifications or it gets confused.</p>
<p>phpquery looks interesting for those scenarios where I&#8217;m modifying the DOM frequently.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ming</title>
		<link>http://www.rustyrazorblade.com/2010/06/working-with-dom-in-php-looking-at-an-html-parser/comment-page-1/#comment-50725</link>
		<dc:creator>Ming</dc:creator>
		<pubDate>Thu, 10 Jun 2010 18:35:09 +0000</pubDate>
		<guid isPermaLink="false">http://www.rustyrazorblade.com/?p=1095#comment-50725</guid>
		<description>Jon,

Yes, if you are pulling the entire element and contents then you are right.  Or even if you are going to do some basic split or string match, etc.

I&#039;m speaking more about the case where you would grab the contents of an element and then regex a specific value out of it.  e.g. grab the text within a div and then extract a specific value.  In that case where you would end up using regex anyways to get that value, I would probably just skip DOM and go straight to regex of the entire doc.

Keep in mind that this does not address performance or elegance, just my personal preference and style.</description>
		<content:encoded><![CDATA[<p>Jon,</p>
<p>Yes, if you are pulling the entire element and contents then you are right.  Or even if you are going to do some basic split or string match, etc.</p>
<p>I&#8217;m speaking more about the case where you would grab the contents of an element and then regex a specific value out of it.  e.g. grab the text within a div and then extract a specific value.  In that case where you would end up using regex anyways to get that value, I would probably just skip DOM and go straight to regex of the entire doc.</p>
<p>Keep in mind that this does not address performance or elegance, just my personal preference and style.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: jon</title>
		<link>http://www.rustyrazorblade.com/2010/06/working-with-dom-in-php-looking-at-an-html-parser/comment-page-1/#comment-50710</link>
		<dc:creator>jon</dc:creator>
		<pubDate>Thu, 10 Jun 2010 05:36:22 +0000</pubDate>
		<guid isPermaLink="false">http://www.rustyrazorblade.com/?p=1095#comment-50710</guid>
		<description>Ming - I&#039;m not sure I follow though.  If you check the example, you can easily grab any of the attributes or values.  (note the href pulled out using getAttribute(&#039;href&#039;) )

You can also do things like grab elements and their contents by ID (getelementbyid), then pull what you out out of that using the same techniques listed above.

I&#039;d like to see an example where the regex is easier than the DOM parser.</description>
		<content:encoded><![CDATA[<p>Ming &#8211; I&#8217;m not sure I follow though.  If you check the example, you can easily grab any of the attributes or values.  (note the href pulled out using getAttribute(&#8216;href&#8217;) )</p>
<p>You can also do things like grab elements and their contents by ID (getelementbyid), then pull what you out out of that using the same techniques listed above.</p>
<p>I&#8217;d like to see an example where the regex is easier than the DOM parser.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ming</title>
		<link>http://www.rustyrazorblade.com/2010/06/working-with-dom-in-php-looking-at-an-html-parser/comment-page-1/#comment-50702</link>
		<dc:creator>Ming</dc:creator>
		<pubDate>Wed, 09 Jun 2010 23:13:23 +0000</pubDate>
		<guid isPermaLink="false">http://www.rustyrazorblade.com/?p=1095#comment-50702</guid>
		<description>Usually, I need specific pieces of data from within each DOM element or attribute, so I would end up with regex anyways.  In those cases I just go straight to regex and dump any additional extensions and parsers.  Also, maybe it&#039;s my background in perl, but regex has never been a long painstaking task.

Don&#039;t get me wrong.  For things like grabbing all links or images from a HTML source, using what you suggest is a great way to do it and very easy.</description>
		<content:encoded><![CDATA[<p>Usually, I need specific pieces of data from within each DOM element or attribute, so I would end up with regex anyways.  In those cases I just go straight to regex and dump any additional extensions and parsers.  Also, maybe it&#8217;s my background in perl, but regex has never been a long painstaking task.</p>
<p>Don&#8217;t get me wrong.  For things like grabbing all links or images from a HTML source, using what you suggest is a great way to do it and very easy.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: jon</title>
		<link>http://www.rustyrazorblade.com/2010/06/working-with-dom-in-php-looking-at-an-html-parser/comment-page-1/#comment-50701</link>
		<dc:creator>jon</dc:creator>
		<pubDate>Wed, 09 Jun 2010 23:07:21 +0000</pubDate>
		<guid isPermaLink="false">http://www.rustyrazorblade.com/?p=1095#comment-50701</guid>
		<description>The nice part of the DOM parser is that it can handle invalid HTML, as well as correctly parse out the attributes from a tag.  I wrote that script in about 5 minutes vs the trial and error of using a bunch of regexes.</description>
		<content:encoded><![CDATA[<p>The nice part of the DOM parser is that it can handle invalid HTML, as well as correctly parse out the attributes from a tag.  I wrote that script in about 5 minutes vs the trial and error of using a bunch of regexes.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ming</title>
		<link>http://www.rustyrazorblade.com/2010/06/working-with-dom-in-php-looking-at-an-html-parser/comment-page-1/#comment-50700</link>
		<dc:creator>Ming</dc:creator>
		<pubDate>Wed, 09 Jun 2010 23:04:17 +0000</pubDate>
		<guid isPermaLink="false">http://www.rustyrazorblade.com/?p=1095#comment-50700</guid>
		<description>Yes, there are &quot;easier&quot; ways to parse HTML in php. However, I&#039;ve found that I almost always end up using regex.  Maybe it&#039;s just me, the type of projects or the source, but DOM traversing was never good enough.</description>
		<content:encoded><![CDATA[<p>Yes, there are &#8220;easier&#8221; ways to parse HTML in php. However, I&#8217;ve found that I almost always end up using regex.  Maybe it&#8217;s just me, the type of projects or the source, but DOM traversing was never good enough.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

