mozilla bookmarks

D

Dick Davies

long shot but what the hell - don't suppose any of you good
good people are sitting on a parser for Mozilla/Firefox bookmarks.html
files, by any chance?


nah, didn't think so :)



Ah well, never mind. I found that squirting it into REXML::Document.new()
by way of 'tidy -asxml' at least stops the constructor choking to death, I'll
have to take it from there.....
 
J

Jamis Buck

--------------070106000601070906010503
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Dick said:
long shot but what the hell - don't suppose any of you good
good people are sitting on a parser for Mozilla/Firefox bookmarks.html
files, by any chance?

Funny you should ask. :) I've had this for awhile, and I can't even
remember why I wrote it. It's pretty hacked together, and it's not a
true "parser" (I just search for certain patterns in the bookmark file)
and it is hardcoded (currently) for my own (obsolete) Phoenix bookmarks
file, but it should be fairly straightforward to modify for your own
purposes.

Hope this is at least close to what you are looking for... :)

- Jamis

--
Jamis Buck
(e-mail address removed)
http://www.jamisbuck.org/jamis

"I use octal until I get to 8, and then I switch to decimal."

--------------070106000601070906010503
Content-Type: text/plain;
name="bookmarks.rb"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
filename="bookmarks.rb"

#!/usr/bin/ruby

class Item
attr_accessor :last_modified
attr_accessor :id
attr_accessor :title
attr_accessor :remarks

def to_html_attr_list
s = ""
s << " LAST_MODIFIED=\"#{@last_modified}\"" if @last_modified
s << " ID=\"#{@id}\"" if @id
return s
end
end

class Folder < Item
attr_reader :items

def initialize
@items = Array.new
end

def dump( level = 0 )
puts "#{' ' * level * 2}#{title}" if @title
@items.each do |i|
i.dump( level+1 )
end
end

def sort!
@items.sort! do |a,b|
if a.type == b.type
a.title.downcase <=> b.title.downcase if a.type == b.type
elsif a.is_a? Folder
-1
elsif b.is_a? Folder
1
else
raise "wrong type in folder"
end
end

@items.each { |i| i.sort! if i.is_a? Folder }
end

def to_html( level, file )
indent = " " * level * 4
file.puts indent + "<DT><H3#{to_html_attr_list}>#{@title}</H3>" if @title
file.puts indent + "<DD>#{@remarks}" if @remarks
file.puts indent + "<HR>" if !@title # hack for top-level folder
file.puts indent + "<DL><p>"

@items.each do |i|
i.to_html( level+1, file )
end

file.puts indent + "</DL><p>"
end
end

class Bookmark < Item
attr_accessor :last_visit
attr_accessor :icon
attr_accessor :last_charset
attr_accessor :href

def dump( level )
print " " * level * 2
print "'" + @title + "' => "
puts @href
end

def to_html_attr_list
s = super
s << " LAST_VISIT=\"#{@last_visit}\"" if @last_visit
s << " ICON=\"#{@icon}\"" if @icon
s << " LAST_CHARSET=\"#{@last_charset}\"" if @last_charset
s << " HREF=\"#{@href}\"" if @href
return s
end

def to_html( level, file )
indent = " " * level * 4
file.puts indent + "<DT><A#{to_html_attr_list}>#{@title}</A>"
file.puts indent + "<DD>#{@remarks}" if @remarks
end
end

class BookmarkManager
def initialize
@top_folder = Folder.new
end

def build_attribute_hash( str )
list = str.scan( /[_A-Z]+="[^"]*"/ )
hash = Hash.new
list.each do |item|
item =~ /([_A-Z]+)="(.*)"/
hash[ $1 ] = $2
end
hash
end

def append( bookmarks_file )
folder_stack = [ @top_folder ]

File.open( bookmarks_file, "r" ) do |file|
# skip to the start of the bookmark data
while ( line = file.gets.strip ) != "<DL><p>"; end

last_item = nil
while folder_stack.length > 0
line = file.gets.strip

case line
when /<HR>/ then
# separator...
last_item = nil

when /<DT><H3 (.*)>(.*)<\/H3>/
last_item = folder = Folder.new
attr_list = $1
folder.title = $2
attrs = build_attribute_hash( attr_list )
folder.last_modified = attrs[ "LAST_MODIFIED" ]
folder.id = attrs[ "ID" ]
folder_stack.last.items.push folder
folder_stack.push folder

when /<DT><A (.*)>(.*)<\/A>/
last_item = bookmark = Bookmark.new
attr_list = $1
bookmark.title = $2
attrs = build_attribute_hash( attr_list )
bookmark.last_modified = attrs[ "LAST_MODIFIED" ]
bookmark.id = attrs[ "ID" ]
bookmark.last_visit = attrs[ "LAST_VISIT" ]
bookmark.icon = attrs[ "ICON" ]
bookmark.last_charset = attrs[ "LAST_CHARSET" ]
bookmark.href = attrs[ "HREF" ]
folder_stack.last.items.push bookmark

when /<\/DL><p>/
folder_stack.pop
last_item = nil

when /<DD>(.*)/
last_item.remarks = $1

when /<DL><p>/
# start of a list
end
end
end

@top_folder.sort!
end

def dump
puts "Bookmarks:"
@top_folder.dump
end

def to_html( file )
file.puts "<!DOCTYPE NETSCAPE-Bookmark-file-1>"
file.puts "<!-- This is an automatically generated file."
file.puts " It will be read and overwritten."
file.puts " DO NOT EDIT! -->"
file.puts "<META HTTP-EQUIV=\"Content-Type\" CONTENT=\"text/html; charset=UTF-8\">"
file.puts "<TITLE>Bookmarks</TITLE>"
file.puts "<H1>Bookmarks</H1>"
file.puts

@top_folder.to_html( 0, file )
end
end


mgr = BookmarkManager.new
mgr.append "/home/jgb3/.phoenix/default/d2isamzz.slt/bookmarks.html"
mgr.to_html( $stdout )

--------------070106000601070906010503--
 
B

Ben Giddings

Dick said:
long shot but what the hell - don't suppose any of you good
good people are sitting on a parser for Mozilla/Firefox bookmarks.html
files, by any chance?

I have successfully used my htmltokenizer module to parse them. It
depends what you're looking for though. It's not specific to mozilla
bookmarks, but I have (for example) written a little script to compare
bookmark files and find links that only exist in one of them.

Ben
 
J

James Britt

Dick said:
long shot but what the hell - don't suppose any of you good
good people are sitting on a parser for Mozilla/Firefox bookmarks.html
files, by any chance?


nah, didn't think so :)



Ah well, never mind. I found that squirting it into REXML::Document.new()
by way of 'tidy -asxml' at least stops the constructor choking to death, I'll
have to take it from there.....

Do you use the Ruby/Tidy wrapper?

http://www.rubyxml.com/index.rb/Applications@Ruby_Wrapper_for_XML_Tidy.txt


James
 
D

Dick Davies

* Jamis Buck said:
Funny you should ask. :) I've had this for awhile, and I can't even
remember why I wrote it. It's pretty hacked together, and it's not a
true "parser" (I just search for certain patterns in the bookmark file)
and it is hardcoded (currently) for my own (obsolete) Phoenix bookmarks
file, but it should be fairly straightforward to modify for your own
purposes.

Hope this is at least close to what you are looking for... :)

Thanks a lot, it was handy to get a feel for it - I gave up on a parser too (I'd prefer not to require extra libs), and did a
cutdown homegrown version in the end (I only need url, folder info and description myself) :

-----------------------------------------------------------------
rasputin@lb:lib$ cat mozbooks.rb
#!/usr/bin/env ruby

# quick and dirty bookmarks.html parser - thanks to Jamis Buck for the 'folder state machine' idea

class MozBooks

# pull urls, descriptions and folder heirarchy info from mozilla/firefox bookmarks.html
def self.parse(bm)
folders = []
bm.each_line{ |l|
folders.pop if l =~ /<\/dl><p>/i # we just left a folder
folders << $1 if l =~ /\s*<dt><h3[^>]+>(.*)<\/h3>/i # we just entered a folder
puts "url = #{$1}, desc = #{$2}, folder = #{folders.join('/')}" if l =~ /a href="([^"]*)"[^>]+>([^<]+)</i
}
end
end

mb = MozBooks.parse($stdin)
-----------------------------------------------------------------

and that seems to work (enough info for my purposes anyway, I can feed this lot into del.icio.us).... thanks!

rasputin@lb:booty$ cat ~/bookmarks.html | ruby lib/mozbooks.rb |grep -i ruby|head
url = http://raa.ruby-lang.org/, desc = RAA - Ruby Application Archive, folder = toolbar/search
url = http://www.rubygarden.org/ruby?UsingRubyFastCGI, desc = Ruby: UsingRubyFastCGI, folder = toolbar/proj/FastCGI
url = http://dev.faeriemud.org/changes-1.8.0.html, desc = New Features in Ruby 1.8.0, folder = toolbar/ruby/1.8
url = http://www.rubygarden.org/ruby?RIOnePointEight, desc = Ruby: RIOnePointEight, folder = toolbar/ruby/1.8
url = ftp://ftp.ruby-lang.org/pub/ruby/1.8/changes.1.8.0, desc = ftp://ftp.ruby-lang.org/pub/ruby/1.8/changes.1.8.0, folder = toolbar/ruby/1.8
url = http://www.rubyist.net/~matz/slides/rc2003/mgp00003.html, desc = MagicPoint presentation foils, folder = toolbar/ruby/1.8
url = http://whytheluckystiff.net/articles/2003/08/04/rubyOneEightOh, desc = whyTHEluckySTIFF ;,. What's Shiny and New in Ruby 1.8.0? .,;, folder = toolbar/ruby/1.8
url = http://images-jp.amazon.com/images/P/4894714531.09.LZZZZZZZ.jpg, desc = 4894714531.09.LZZZZZZZ.jpg (JPEG Image, 375x475 pixels), folder = toolbar/ruby/community
url = http://www2a.biglobe.ne.jp/~seki/ruby/, desc = I like Ruby., folder = toolbar/ruby/community
url = http://www.excite.co.jp/world/url/b...C%96%F3&wb_lp=JAEN&wb_dis=2&wb_co=excitejapan, desc = Matz' Blog, folder = toolbar/ruby/community
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,057
Latest member
KetoBeezACVGummies

Latest Threads

Top