Parsing Craigslist for an item across Multiple Cities

A friend of mine wanted something better than a curl, pipe, awesomeness he wrote to parse out motorcycles in our surrounding area – see his blog here (https://www.zacharyfouts.com/):

Zach’s Code

curl 'http://austin.craigslist.org/search/mcy…'
'http://collegestation.craigslist.org/search/mcy…'
'http://houston.craigslist.org/search/mcy…'
'http://killeen.craigslist.org/search/mcy…'
'http://sanantonio.craigslist.org/search/mcy…'
'http://sanmarcos.craigslist.org/search/mcy…'
'http://waco.craigslist.org/search/mcy…'
--silent | grep 'dc:title' | sed -e 's/<.*\[//g' -e 's/\&#.*$//g'|grep -v 'by owner search'

So I wrote this little snippet of code to have a much more maintainable but quick and dirty (no exception handling) python script for him to run on a cron job or what not. Anyways take a look and fork it for your own craigslist shenanigans even though it’s against their TOS.

My Code

#!/usr/bin/env python

import pycurl

import re

from StringIO import StringIO



def parse_listing(text):
    '''
    Function for parsing results
    '''
    listings = text.split("</channel>")[1].split("<item rdf:")
    listings.pop(0)
    for item in listings:
        messy_name = item.split("<title>")[1].split("</title>")[0]
        pattern = re.compile(r"<!\[CDATA\[(.*)&")
        name = pattern.match(messy_name)
        print name.group(1)


# All the listings we want to pull
cities = ['austin', 'collegestation', 'houston', 'killeen', 'sanantonio', 'sanmarcos', 'waco']
for city in cities:
    url = 'http://' + city + '.craigslist.org/search/mcy?hasPic=1&postedToday=1&max_price=2200&auto_title_status=1&format=rss'
    buffer = StringIO()
    c = pycurl.Curl()
    c.setopt(c.URL, url)
    c.setopt(c.WRITEDATA, buffer)
    c.perform()
    c.close()
    body = buffer.getvalue()
    parse_listing(body)

 

Write a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.