TagTreeScanner help

P

Patrick Gundlach

Hello out there,


I am experimenting with TagTreeScanner from Gavin Kistner. Has anybody
created a Wiki-markup parser for the mediawiki software?

I have succeded for a few tags, but I am currently stuck with
enumerations:

* one
*two
* three
** threeone
** threetwo
*four


should be represented by sth. like

<ol>
<li>one</li>
<li>two</li>
<li>three
<ol>
<li>threeone</li>
<li>threetwo</li>
</ol>
</li>
<li>four</li>
</ol>

This is what I have tried, but I have the feeling that is not the way
to proceed here. I only get <ol> </ol> around the string.

--------------------------------------------------
#!/opt/ruby/1.8.2/bin/ruby

require 'TagTreeScanner'

class SimpleMarkup < TagTreeScanner
@root_factory.allows_text = false
@tag_genres[ :root ] = [ ]
@tag_genres[ :root ] <<
TagFactory.new( :eek:l,
:eek:pen_match => /(\*.+?)\n+(?=[^*])/m,
:eek:pen_requires_bol => true,
:setup => lambda{ |tag, scanner, tagtree|
# Throw the contents I found into the tag
# but remove leading whitespace
tag << scanner[1] # [1].gsub( /\*/, '<li>' )
},
:allows_text => true,
:autoclose => :true,
# no effect: (??)
:allowed_genre => :list
)

@tag_genres[ :list ] = [ ]
@tag_genres[ :list ] <<
TagFactory.new( :li,
:eek:pen_match => /\*/,
:close_match => /\n/,
:eek:pen_requires_bol => true,
:allows_text => true
)


end

sample = <<EOS
* one
*two
* three
** threeone
** threetwo
*four

EOS


markup = SimpleMarkup.new(sample)
puts markup.to_xml
--------------------------------------------------
 
D

Damphyr

Patrick said:
Hello out there,


I am experimenting with TagTreeScanner from Gavin Kistner. Has anybody
created a Wiki-markup parser for the mediawiki software?

Have you had any luck with this?
I find myself in the position of having to migrate a sizeable MediaWiki
installation to a TracWiki one.
First I must say I haven't started on the differences between the two
yet (which seem trivial at first glance).
I would love helping out with any problems you might have (and I have a
sizeable data set to test the code with :) ) as it would give me a
sizeable head start in my own work.
Cheers,
V.-
--
http://www.braveworld.net/riva

____________________________________________________________________
http://www.freemail.gr - äùñåÜí õðçñåóßá çëåêôñïíéêïý ôá÷õäñïìåßïõ.
http://www.freemail.gr - free email service for the Greek-speaking.
 
P

Patrick Gundlach

Have you had any luck with this?

Actually no. I have tried different approaches, one with
tagtreescanner, one with a racc grammer and one with regular
expressions and the help of stringscanner. But I did not get any
satisfying results (= nice looking code). For example, I have no idea
yet how to parse (the example from my first post)

* item 1
* item 2
** subitem 2/1
** subitem 2/2
# subitem 2/3, but numbererd

etc. I might give RACC grammar another try perhaps. I would be very
very happy about a mediawiki.to_html method.

Patrick
 
P

Patrick Gundlach

You could look at the ruwiki parsing code to see if it helps. The
scanning and token replacement is factored into separate components -
see token.rb and the token directory in the ruwiki dist.

http://rubyforge.org/projects/ruwiki

Thanks. I fear that if I use that source code I get only 95% the way
because of subtle differences in these two markup languages. ruwiki
has a lot of code on parsing that isn't easy for a human to parse. I
am advancing on my mediawiki class. And I also 'fear' that my code
grows up to that size and complexity. I'll release my code as a lib
once I get more complex mediawiki pages parsed.

Patrick
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,756
Messages
2,569,535
Members
45,008
Latest member
obedient dusk

Latest Threads

Top