Parsing a export with Hpricot

The trickiest part is to detect if a bookmark has a corresponding description. The export is in the same format that Netscape used for its bookmarks export, which means it is a simple html file with a definition list (dl) and a series of definition terms (dt). A term (=bookmarks) may have a description (dd).

But how do you detect if there's a description? It seems the answer was rather simple: use and if the next element's name is dd, we're lucky and have a description. The only problem was that I didn't know how to access the name of an element, until I just thought: what if I simply use name? and guess what... it worked! So was exactly what I looked for :-)

require 'rubygems'
require 'hpricot'

doc = open("bookmarks.html") {|f| Hpricot(f) }

bookmarks = []

(doc/"dl/dt").each do |term|
    link = (term/"a")

    if and == 'dd'
        desc =
        desc = nil

    if link.attr('tags')
        tags = link.attr('tags').split(",")
        tags = nil

    bookmarks << {
        :address        =>    link.attr('href'),
        :created_at    =>    link.attr('last_visit'),
        :tags            =>    tags,
        :description    =>    desc,
        :title            =>    link.inner_text


Source at supersnippets.

I also extended this a bit to save the results into a database, using ActiveRecord, but since each db schema is a different world, I didn't post that version here. If anybody thinks it might be useful just let me know.

Also, this code is not very rubyesque yet, suggestions in order to improve it will be really appreciated. I'm specially thinking about the if ... else parts, I'm pretty sure there's a way to shorten those lines :-)