Parsing HTML in a nutshell

Ever wanted to get data from a particular webpage / service but they don’t have an API available for the public? You really want to use the data for whatever reason floats your boat? If so, what you will be wanting to do is crawl the webpage and gather all the data you need.

For this tutorial we’re going to parse HTML data via the Simple HTML DOM Parser PHP script. What it does is fetch all the contents of a webpage and makes it searchable with CSS like selectors. You can find the source for this gem right here at sourceforge.net. All we need is the simple_html_dom.php script, everything else is example data.

We’ll work with the following scenario: I want to fetch all the articles from, in my case, the homepage of NetTuts+ so I can email them to myself every morning (via a cronjob). When I open my email I want to see the title of the post, the permalink to the post and the thumbnail that goes with each post. I will not cover the email part in this tutorial because it falls out of scope.

Lees verder Parsing HTML in a nutshell