soledad penadés
repeat 4[fd 100 rt 90]

Parsing a del.icio.us export with Hpricot

The trickiest part is to detect if a bookmark has a corresponding description. The export is in the same format that Netscape used for its bookmarks export, which means it is a simple html file with a definition list (dl) and a series of definition terms (dt). A term (=bookmarks) may have a description (dd).

But how do you detect if there's a description? It seems the answer was rather simple: use term.next and if the next element's name is dd, we're lucky and have a description. The only problem was that I didn't know how to access the name of an element, until I just thought: what if I simply use name? and guess what… it worked! So term.next.name was exactly what I looked for :-)

require 'rubygems'
require 'hpricot'

doc = open("delicious.html") {|f| Hpricot(f) }

bookmarks = []

(doc/"dl/dt").each do |term|
        link = (term/"a")
       
        if term.next and term.next.name == 'dd'
                desc = term.next.inner_text
        else
                desc = nil
        end
       
        if link.attr('tags')
                tags = link.attr('tags').split(",")
        else
                tags = nil
        end
       
        bookmarks << {
                :address                =>      link.attr('href'),
                :created_at     =>   link.attr('last_visit'),
                :tags         => tags,
                :description    =>  desc,
                :title      =>        link.inner_text
        }
       
end

Source at supersnippets.

I also extended this a bit to save the results into a database, using ActiveRecord, but since each db schema is a different world, I didn't post that version here. If anybody thinks it might be useful just let me know.

Also, this code is not very rubyesque yet, suggestions in order to improve it will be really appreciated. I'm specially thinking about the if … else parts, I'm pretty sure there's a way to shorten those lines :-)

// 2 responses to Parsing a del.icio.us export with Hpricot

Wayne
Wayne
20080328

Soledad,

Thank you for sharing your examples of using Hpricot. Your examples save me much time learning Hpricot's "funky" syntax.

–Wayne

sole
sole
20080328

Hey thanks! :-)

although it's not that funky, it's just a mix of everything, and it's funny!

Feel free to leave a reply

Comments are moderated: Rude and offtopic ones are out!