The trickiest part is to detect if a bookmark has a corresponding description. The export is in the same format that Netscape used for its bookmarks export, which means it is a simple html file with a definition list (dl) and a series of definition terms (dt). A term (=bookmarks) may have a description (dd).
But how do you detect if there’s a description? It seems the answer was rather simple: use term.next and if the next element’s name is dd, we’re lucky and have a description. The only problem was that I didn’t know how to access the name of an element, until I just thought: what if I simply use name? and guess what… it worked! So term.next.name was exactly what I looked for :-)
require 'rubygems'
require 'hpricot'
doc = open("bookmarks.html") {|f| Hpricot(f) }
bookmarks = []
(doc/"dl/dt").each do |term|
link = (term/"a")
if term.next and term.next.name == 'dd'
desc = term.next.inner_text
else
desc = nil
end
if link.attr('tags')
tags = link.attr('tags').split(",")
else
tags = nil
end
bookmarks << {
:address => link.attr('href'),
:created_at => link.attr('last_visit'),
:tags => tags,
:description => desc,
:title => link.inner_text
}
endSource at supersnippets.
I also extended this a bit to save the results into a database, using ActiveRecord, but since each db schema is a different world, I didn’t post that version here. If anybody thinks it might be useful just let me know.
Also, this code is not very rubyesque yet, suggestions in order to improve it will be really appreciated. I’m specially thinking about the if … else parts, I’m pretty sure there’s a way to shorten those lines :-)

Wayne
Soledad,
Thank you for sharing your examples of using Hpricot. Your examples save me much time learning Hpricot’s “funky” syntax.
–Wayne
sole
Hey thanks! :-)
although it’s not that funky, it’s just a mix of everything, and it’s funny!
Marq
Hey Soledad,
Thanks for pointing out the next.name thingy with Hpricot… I didn’t want to waste more than an hour to parse my delicious bookmarks, and I almost ended up taking sed to the bookmarks HTML file so that I could add the title as an attribute within the instead of the next DD thing. I’m pretty sure Yahoo isn’t giving us a nice XML document because it fears people will move away to other services such as Google bookmarks—which is exactly what I’m planning to do.
Anyway, if you don’t like the if/else loops, you can just do something like the following for the same result:
(doc/”dl/dt”).each do |bkmk|
link = (bkmk/”a”)
b = Hash.new
b[:title] = link.inner_text
b[:url] = link.attr(‘HREF’)
b[:tags] = link.attr(‘TAGS’).split(“,”) if link.attr(‘TAGS’)
b[:created_at] = link.attr(‘ADD_DATE’)
b[:description] = bkmk.next.inner_text if bkmk.next.name == ‘dd’
bookmarks << b
end