mozilla bookmarks

Discussion in 'Ruby' started by Dick Davies, Sep 22, 2004.

  1. Dick Davies

    Dick Davies Guest

    long shot but what the hell - don't suppose any of you good
    good people are sitting on a parser for Mozilla/Firefox bookmarks.html
    files, by any chance?


    nah, didn't think so :)



    Ah well, never mind. I found that squirting it into REXML::Document.new()
    by way of 'tidy -asxml' at least stops the constructor choking to death, I'll
    have to take it from there.....

    --
    Forms follow function, and often obliterate it.
    Rasputin :: Jack of All Trades - Master of Nuns
     
    Dick Davies, Sep 22, 2004
    #1
    1. Advertising

  2. Dick Davies

    Jamis Buck Guest

    --------------070106000601070906010503
    Content-Type: text/plain; charset=ISO-8859-1; format=flowed
    Content-Transfer-Encoding: 7bit

    Dick Davies wrote:
    > long shot but what the hell - don't suppose any of you good
    > good people are sitting on a parser for Mozilla/Firefox bookmarks.html
    > files, by any chance?


    Funny you should ask. :) I've had this for awhile, and I can't even
    remember why I wrote it. It's pretty hacked together, and it's not a
    true "parser" (I just search for certain patterns in the bookmark file)
    and it is hardcoded (currently) for my own (obsolete) Phoenix bookmarks
    file, but it should be fairly straightforward to modify for your own
    purposes.

    Hope this is at least close to what you are looking for... :)

    - Jamis

    --
    Jamis Buck

    http://www.jamisbuck.org/jamis

    "I use octal until I get to 8, and then I switch to decimal."

    --------------070106000601070906010503
    Content-Type: text/plain;
    name="bookmarks.rb"
    Content-Transfer-Encoding: 7bit
    Content-Disposition: inline;
    filename="bookmarks.rb"

    #!/usr/bin/ruby

    class Item
    attr_accessor :last_modified
    attr_accessor :id
    attr_accessor :title
    attr_accessor :remarks

    def to_html_attr_list
    s = ""
    s << " LAST_MODIFIED=\"#{@last_modified}\"" if @last_modified
    s << " ID=\"#{@id}\"" if @id
    return s
    end
    end

    class Folder < Item
    attr_reader :items

    def initialize
    @items = Array.new
    end

    def dump( level = 0 )
    puts "#{' ' * level * 2}#{title}" if @title
    @items.each do |i|
    i.dump( level+1 )
    end
    end

    def sort!
    @items.sort! do |a,b|
    if a.type == b.type
    a.title.downcase <=> b.title.downcase if a.type == b.type
    elsif a.is_a? Folder
    -1
    elsif b.is_a? Folder
    1
    else
    raise "wrong type in folder"
    end
    end

    @items.each { |i| i.sort! if i.is_a? Folder }
    end

    def to_html( level, file )
    indent = " " * level * 4
    file.puts indent + "<DT><H3#{to_html_attr_list}>#{@title}</H3>" if @title
    file.puts indent + "<DD>#{@remarks}" if @remarks
    file.puts indent + "<HR>" if !@title # hack for top-level folder
    file.puts indent + "<DL><p>"

    @items.each do |i|
    i.to_html( level+1, file )
    end

    file.puts indent + "</DL><p>"
    end
    end

    class Bookmark < Item
    attr_accessor :last_visit
    attr_accessor :icon
    attr_accessor :last_charset
    attr_accessor :href

    def dump( level )
    print " " * level * 2
    print "'" + @title + "' => "
    puts @href
    end

    def to_html_attr_list
    s = super
    s << " LAST_VISIT=\"#{@last_visit}\"" if @last_visit
    s << " ICON=\"#{@icon}\"" if @icon
    s << " LAST_CHARSET=\"#{@last_charset}\"" if @last_charset
    s << " HREF=\"#{@href}\"" if @href
    return s
    end

    def to_html( level, file )
    indent = " " * level * 4
    file.puts indent + "<DT><A#{to_html_attr_list}>#{@title}</A>"
    file.puts indent + "<DD>#{@remarks}" if @remarks
    end
    end

    class BookmarkManager
    def initialize
    @top_folder = Folder.new
    end

    def build_attribute_hash( str )
    list = str.scan( /[_A-Z]+="[^"]*"/ )
    hash = Hash.new
    list.each do |item|
    item =~ /([_A-Z]+)="(.*)"/
    hash[ $1 ] = $2
    end
    hash
    end

    def append( bookmarks_file )
    folder_stack = [ @top_folder ]

    File.open( bookmarks_file, "r" ) do |file|
    # skip to the start of the bookmark data
    while ( line = file.gets.strip ) != "<DL><p>"; end

    last_item = nil
    while folder_stack.length > 0
    line = file.gets.strip

    case line
    when /<HR>/ then
    # separator...
    last_item = nil

    when /<DT><H3 (.*)>(.*)<\/H3>/
    last_item = folder = Folder.new
    attr_list = $1
    folder.title = $2
    attrs = build_attribute_hash( attr_list )
    folder.last_modified = attrs[ "LAST_MODIFIED" ]
    folder.id = attrs[ "ID" ]
    folder_stack.last.items.push folder
    folder_stack.push folder

    when /<DT><A (.*)>(.*)<\/A>/
    last_item = bookmark = Bookmark.new
    attr_list = $1
    bookmark.title = $2
    attrs = build_attribute_hash( attr_list )
    bookmark.last_modified = attrs[ "LAST_MODIFIED" ]
    bookmark.id = attrs[ "ID" ]
    bookmark.last_visit = attrs[ "LAST_VISIT" ]
    bookmark.icon = attrs[ "ICON" ]
    bookmark.last_charset = attrs[ "LAST_CHARSET" ]
    bookmark.href = attrs[ "HREF" ]
    folder_stack.last.items.push bookmark

    when /<\/DL><p>/
    folder_stack.pop
    last_item = nil

    when /<DD>(.*)/
    last_item.remarks = $1

    when /<DL><p>/
    # start of a list
    end
    end
    end

    @top_folder.sort!
    end

    def dump
    puts "Bookmarks:"
    @top_folder.dump
    end

    def to_html( file )
    file.puts "<!DOCTYPE NETSCAPE-Bookmark-file-1>"
    file.puts "<!-- This is an automatically generated file."
    file.puts " It will be read and overwritten."
    file.puts " DO NOT EDIT! -->"
    file.puts "<META HTTP-EQUIV=\"Content-Type\" CONTENT=\"text/html; charset=UTF-8\">"
    file.puts "<TITLE>Bookmarks</TITLE>"
    file.puts "<H1>Bookmarks</H1>"
    file.puts

    @top_folder.to_html( 0, file )
    end
    end


    mgr = BookmarkManager.new
    mgr.append "/home/jgb3/.phoenix/default/d2isamzz.slt/bookmarks.html"
    mgr.to_html( $stdout )

    --------------070106000601070906010503--
     
    Jamis Buck, Sep 22, 2004
    #2
    1. Advertising

  3. Dick Davies

    Ben Giddings Guest

    Dick Davies wrote:
    > long shot but what the hell - don't suppose any of you good
    > good people are sitting on a parser for Mozilla/Firefox bookmarks.html
    > files, by any chance?


    I have successfully used my htmltokenizer module to parse them. It
    depends what you're looking for though. It's not specific to mozilla
    bookmarks, but I have (for example) written a little script to compare
    bookmark files and find links that only exist in one of them.

    Ben
     
    Ben Giddings, Sep 22, 2004
    #3
  4. Dick Davies

    James Britt Guest

    Dick Davies wrote:

    > long shot but what the hell - don't suppose any of you good
    > good people are sitting on a parser for Mozilla/Firefox bookmarks.html
    > files, by any chance?
    >
    >
    > nah, didn't think so :)
    >
    >
    >
    > Ah well, never mind. I found that squirting it into REXML::Document.new()
    > by way of 'tidy -asxml' at least stops the constructor choking to death, I'll
    > have to take it from there.....


    Do you use the Ruby/Tidy wrapper?

    http://www.rubyxml.com/index.rb/Applications@Ruby_Wrapper_for_XML_Tidy.txt


    James
     
    James Britt, Sep 22, 2004
    #4
  5. Dick Davies

    Dick Davies Guest

    * James Britt <> [0921 17:21]:
    > Dick Davies wrote:
    >
    > >long shot but what the hell - don't suppose any of you good
    > >good people are sitting on a parser for Mozilla/Firefox bookmarks.html
    > >files, by any chance?
    > >
    > >
    > >nah, didn't think so :)
    > >
    > >
    > >
    > >Ah well, never mind. I found that squirting it into REXML::Document.new()
    > >by way of 'tidy -asxml' at least stops the constructor choking to death,
    > >I'll have to take it from there.....

    >
    > Do you use the Ruby/Tidy wrapper?
    >
    > http://www.rubyxml.com/index.rb/Applications@Ruby_Wrapper_for_XML_Tidy.txt


    I will eventually, I think - though to be honest even on my monster bookmark
    file the tidy warning/error output is longer than the generated XML :)

    --
    Census Taker to Housewife: Did you ever have the measles, and, if so,
    how many?
    Rasputin :: Jack of All Trades - Master of Nuns
     
    Dick Davies, Sep 23, 2004
    #5
  6. Dick Davies

    Dick Davies Guest

    * Jamis Buck <> [0926 16:26]:
    > Dick Davies wrote:
    > >long shot but what the hell - don't suppose any of you good
    > >good people are sitting on a parser for Mozilla/Firefox bookmarks.html
    > >files, by any chance?

    >
    > Funny you should ask. :) I've had this for awhile, and I can't even
    > remember why I wrote it. It's pretty hacked together, and it's not a
    > true "parser" (I just search for certain patterns in the bookmark file)
    > and it is hardcoded (currently) for my own (obsolete) Phoenix bookmarks
    > file, but it should be fairly straightforward to modify for your own
    > purposes.
    >
    > Hope this is at least close to what you are looking for... :)


    Thanks a lot, it was handy to get a feel for it - I gave up on a parser too (I'd prefer not to require extra libs), and did a
    cutdown homegrown version in the end (I only need url, folder info and description myself) :

    -----------------------------------------------------------------
    rasputin@lb:lib$ cat mozbooks.rb
    #!/usr/bin/env ruby

    # quick and dirty bookmarks.html parser - thanks to Jamis Buck for the 'folder state machine' idea

    class MozBooks

    # pull urls, descriptions and folder heirarchy info from mozilla/firefox bookmarks.html
    def self.parse(bm)
    folders = []
    bm.each_line{ |l|
    folders.pop if l =~ /<\/dl><p>/i # we just left a folder
    folders << $1 if l =~ /\s*<dt><h3[^>]+>(.*)<\/h3>/i # we just entered a folder
    puts "url = #{$1}, desc = #{$2}, folder = #{folders.join('/')}" if l =~ /a href="([^"]*)"[^>]+>([^<]+)</i
    }
    end
    end

    mb = MozBooks.parse($stdin)
    -----------------------------------------------------------------

    and that seems to work (enough info for my purposes anyway, I can feed this lot into del.icio.us).... thanks!

    rasputin@lb:booty$ cat ~/bookmarks.html | ruby lib/mozbooks.rb |grep -i ruby|head
    url = http://raa.ruby-lang.org/, desc = RAA - Ruby Application Archive, folder = toolbar/search
    url = http://www.rubygarden.org/ruby?UsingRubyFastCGI, desc = Ruby: UsingRubyFastCGI, folder = toolbar/proj/FastCGI
    url = http://dev.faeriemud.org/changes-1.8.0.html, desc = New Features in Ruby 1.8.0, folder = toolbar/ruby/1.8
    url = http://www.rubygarden.org/ruby?RIOnePointEight, desc = Ruby: RIOnePointEight, folder = toolbar/ruby/1.8
    url = ftp://ftp.ruby-lang.org/pub/ruby/1.8/changes.1.8.0, desc = ftp://ftp.ruby-lang.org/pub/ruby/1.8/changes.1.8.0, folder = toolbar/ruby/1.8
    url = http://www.rubyist.net/~matz/slides/rc2003/mgp00003.html, desc = MagicPoint presentation foils, folder = toolbar/ruby/1.8
    url = http://whytheluckystiff.net/articles/2003/08/04/rubyOneEightOh, desc = whyTHEluckySTIFF ;,. What's Shiny and New in Ruby 1.8.0? .,;, folder = toolbar/ruby/1.8
    url = http://images-jp.amazon.com/images/P/4894714531.09.LZZZZZZZ.jpg, desc = 4894714531.09.LZZZZZZZ.jpg (JPEG Image, 375x475 pixels), folder = toolbar/ruby/community
    url = http://www2a.biglobe.ne.jp/~seki/ruby/, desc = I like Ruby., folder = toolbar/ruby/community
    url = http://www.excite.co.jp/world/url/b...C%96%F3&wb_lp=JAEN&wb_dis=2&wb_co=excitejapan, desc = Matz' Blog, folder = toolbar/ruby/community





    --
    It's always darkest just before it gets pitch black.
    Rasputin :: Jack of All Trades - Master of Nuns
     
    Dick Davies, Sep 23, 2004
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Regis Tromm

    How to create bookmarks with .NET

    Regis Tromm, Jun 27, 2003, in forum: ASP .Net
    Replies:
    0
    Views:
    566
    Regis Tromm
    Jun 27, 2003
  2. Adie
    Replies:
    3
    Views:
    430
    Showjumper
    Jan 26, 2004
  3. Mark
    Replies:
    1
    Views:
    793
  4. Mills
    Replies:
    0
    Views:
    4,661
    Mills
    Jun 14, 2004
  5. Henri
    Replies:
    9
    Views:
    1,904
    Toby Inkster
    Feb 22, 2005
Loading...

Share This Page