How to convert an string like this into a tree Object?

P

Possum

Hi, guys ,

recently I'm working on stanford parser(written in java) which has an
in-complete ruby rapper.I can get the result like this as a string ,
but I still want to manipulate the result as an tree , what is the
convinient way for me to convert the following string to an tree
object,can someone give me some suggestion?

(ROOT
(S
(S
(NP
(NP (DT The) (JJS strongest) (NN rain))
(VP
(ADVP (RB ever))
(VBN recorded)
(PP (IN in)
(NP (NNP India)))))
(VP
(VP (VBD shut)
(PRT (RP down))
(NP
(NP (DT the) (JJ financial) (NN hub))
(PP (IN of)
(NP (NNP Mumbai)))))
(, ,)
(VP (VBD snapped)
(NP (NN communication) (NNS lines)))
(, ,)
(VP (VBD closed)
(NP (NNS airports)))
(CC and)
(VP (VBD forced)
(NP
(NP (NNS thousands))
(PP (IN of)
(NP (NNS people))))
(S
(VP (TO to)
(VP
(VP (VB sleep)
(PP (IN in)
(NP (PRP$ their) (NNS offices))))
(CC or)
(VP (VB walk)
(NP (NN home))
(PP (IN during)
(NP (DT the) (NN night))))))))))
(, ,)
(NP (NNS officials))
(VP (VBD said)
(NP-TMP (NN today)))
(. .)))
 
T

Tony Arcieri

[Note: parts of this message were removed to make it a legal post.]

With JRuby you can work directly with the Java objects coming out of your
parser. This is probably the best way to write a Ruby wrapper.
 
R

Robert Klemme

2009/9/15 Tony Arcieri said:
With JRuby you can work directly with the Java objects coming out of your
parser. =A0This is probably the best way to write a Ruby wrapper.

If that does not work OP needs to create a parser - either manually or
using a parser generator.

Kind regards

robert

--=20
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/
 
A

Aldric Giacomoni

Possum said:
Hi, guys ,

recently I'm working on stanford parser(written in java) which has an
in-complete ruby rapper.I can get the result like this as a string ,
but I still want to manipulate the result as an tree , what is the
convinient way for me to convert the following string to an tree
object,can someone give me some suggestion?

Look up linked lists ?
 
P

Pascal J. Bourguignon

Possum said:
recently I'm working on stanford parser(written in java) which has an
in-complete ruby rapper.I can get the result like this as a string ,
but I still want to manipulate the result as an tree , what is the
convinient way for me to convert the following string to an tree
object,can someone give me some suggestion?

EBADLNG ?

Perhaps not...


----(parse-sentence.rb)------------------------------------------------------------

(sexp = "(ROOT
(S
(S
(NP
(NP (DT The) (JJS strongest) (NN rain))
(VP
(ADVP (RB ever))
(VBN recorded)
(PP (IN in)
(NP (NNP India)))))
(VP
(VP (VBD shut)
(PRT (RP down))
(NP
(NP (DT the) (JJ financial) (NN hub))
(PP (IN of)
(NP (NNP Mumbai)))))
(, ,)
(VP (VBD snapped)
(NP (NN communication) (NNS lines)))
(, ,)
(VP (VBD closed)
(NP (NNS airports)))
(CC and)
(VP (VBD forced)
(NP
(NP (NNS thousands))
(PP (IN of)
(NP (NNS people))))
(S
(VP (TO to)
(VP
(VP (VB sleep)
(PP (IN in)
(NP (PRP$ their) (NNS offices))))
(CC or)
(VP (VB walk)
(NP (NN home))
(PP (IN during)
(NP (DT the) (NN night))))))))))
(, ,)
(NP (NNS officials))
(VP (VBD said)
(NP-TMP (NN today)))
(. .)))")



(code = <<ENDCODE

(defparameter *rt* (let ((rt (copy-readtable nil)))
(setf (readtable-case rt) :preserve)
rt))


(defun translate-parsed-sentence-string (string)
(translate-parsed-sentence-sexp (let ((*readtable* *rt*)) (read-from-string string))))


(defun collect-classes (sexp)
(let ((classes '()))
(labels ((scan (sexp)
(unless (atom sexp)
(push (car sexp) classes)
(mapcar (function scan) (cdr sexp)))))
(scan sexp)
(remove-duplicates classes))))


(defun concatenate-strings (list-of-strings)
"
LIST-OF-STRINGS: Each element may be either a string,
or a list containing a string, and a start and end position
denoting a substring.
RETURN: A string containing the concatenation of the strings
of the LIST-OF-STRINGS.
"
(flet ((slength (string)
(if (stringp string)
(length string)
(- (or (third string) (length (first string)))
(second string)))))
(loop
:with result = (make-string (loop :for s :in list-of-strings
:sum (slength s)))
:for pos = 0
:then (+ pos (slength string))
:for string :in list-of-strings
:do (if (stringp string)
(replace result string :start1 pos)
(replace result (first string) :start1 pos
:start2 (second string) :end2 (third string)))
:finally (return result))))


(defun string-replace (string pattern replace &key (test (function char=)))
"
RETURN: A string build from STRING where all occurences of PATTERN
are replaced by the REPLACE string.
TEST: The function used to compare the elements of the PATTERN
with the elements of the STRING.
"
(concatenate-strings
(loop
:with pattern-length = (length pattern)
:for start = 0 :then (+ pos pattern-length)
:for pos = (search pattern string :start2 start :test test)
:if pos :collect (list string start pos)
:and :collect replace
:else :collect (list string start)
:while pos)))


(defun rubify (name)
(format nil \"~{~:(~A~)~}\"
(loop
:with string = (string-replace (string name) \"$\" \"DOLLAR\")
:with start = 0
:for end = (position #\\- string :start start)
:collect (subseq string start end)
:while end
:do (setf start (1+ end)))))


(defun generate-sexp-instance-building (sexp)
(format nil "(~A.new(~{~A~^,~%~}))"
(rubify (car sexp))
(mapcar (lambda (item)
(etypecase item
(cons (generate-sexp-instance-building item))
(symbol (format nil "\\\"~A\\\"" item))
(t (format nil "~S" item))))
(cdr sexp))))


(defun translate-parsed-sentence-sexp (sexp)
(format t "(begin~%")
(princ "(class Node
attr_accessor :children
(def initialize(*args)
(@children=args)
end)
end)
")
(dolist (class (collect-classes sexp))
(format t "(class ~A < Node~%end)~%" (rubify class)))
(princ (generate-sexp-instance-building sexp))
(format t "~%end)~%"))

ENDCODE
)

(begin
file=File.open("/tmp/parse-sentence.lisp","w")
file.write(code)
file.write("(princ (translate-parsed-sentence-string \"#{(sexp . gsub("(.. .)","(DOT \".\")") . gsub("(, ,)","(COMMA \",\")") . gsub("(; ;)","(SEMICOLON \";\")") . gsub(":) :)","(COLON \":\")") . gsub("\"","\\\""))}\"))\n")
file.write("(finish-output)\n")
file.close
end)

(parseTree=(begin
(expression=IO.popen("clisp /tmp/parse-sentence.lisp","w+"))
(parseTree=(eval(((expression . readlines) [0..-2]).join)))
(expression.close)
parseTree
end))

puts parseTree
# prints: #<Root:0x7f033ce5ef70>

parseTree
# returns: #<Root:0x7f9219695d68 @children=[#<S:0x7f9219695db8 @children=[#<S:0x7f9219696150 @children=[#<Np:0x7f9219697a78 @children=[#<Np:0x7f9219697e10 @children=[#<Dt:0x7f9219697fc8 @children=["The"]>, #<Jjs:0x7f9219697f50 @children=["strongest"]>, #<Nn:0x7f9219697e88 @children=["rain"]>]>, #<Vp:0x7f9219697af0 @children=[#<Advp:0x7f9219697d48 @children=[#<Rb:0x7f9219697d70 @children=["ever"]>]>, #<Vbn:0x7f9219697ca8 @children=["recorded"]>, #<Pp:0x7f9219697b40 @children=[#<In:0x7f9219697c30 @children=["in"]>, #<Np:0x7f9219697b90 @children=[#<Nnp:0x7f9219697bb8 @children=["India"]>]>]>]>]>, #<Vp:0x7f92196961a0 @children=[#<Vp:0x7f9219697438 @children=[#<Vbd:0x7f92196979b0 @children=["shut"]>, #<Prt:0x7f9219697910 @children=[#<Rp:0x7f9219697938 @children=["down"]>]>, #<Np:0x7f92196974b0 @children=[#<Np:0x7f9219697758 @children=[#<Dt:0x7f9219697870 @children=["the"]>, #<Jj:0x7f92196977f8 @children=["financial"]>, #<Nn:0x7f9219697780 @children=["hub"]>]>, #<Pp:0x7f9219697550 @children=[#<In:0x7f92196976b8 @children=["of"]>, #<Np:0x7f92196975c8 @children=[#<Nnp:0x7f92196975f0 @children=["Mumbai"]>]>]>]>]>, #<Comma:0x7f9219697398 @children=[","]>, #<Vp:0x7f92196971b8 @children=[#<Vbd:0x7f9219697320 @children=["snapped"]>, #<Np:0x7f9219697208 @children=[#<Nn:0x7f92196972a8 @children=["communication"]>, #<Nns:0x7f9219697230 @children=["lines"]>]>]>, #<Comma:0x7f9219697118 @children=[","]>, #<Vp:0x7f9219696f88 @children=[#<Vbd:0x7f92196970a0 @children=["closed"]>, #<Np:0x7f9219697000 @children=[#<Nns:0x7f9219697028 @children=["airports"]>]>]>, #<Cc:0x7f9219696ec0 @children=["and"]>, #<Vp:0x7f92196961f0 @children=[#<Vbd:0x7f9219696e48 @children=["forced"]>, #<Np:0x7f9219696bc8 @children=[#<Np:0x7f9219696da8 @children=[#<Nns:0x7f9219696dd0 @children=["thousands"]>]>, #<Pp:0x7f9219696c18 @children=[#<In:0x7f9219696d08 @children=["of"]>, #<Np:0x7f9219696c68 @children=[#<Nns:0x7f9219696c90 @children=["people"]>]>]>]>, #<S:0x7f9219696240 @children=[#<Vp:0x7f9219696290 @children=[#<To:0x7f9219696b28 @children=["to"]>, #<Vp:0x7f9219696330 @children=[#<Vp:0x7f9219696830 @children=[#<Vb:0x7f9219696a60 @children=["sleep"]>, #<Pp:0x7f9219696880 @children=[#<In:0x7f92196969e8 @children=["in"]>, #<Np:0x7f92196968d0 @children=[#<Prpdollar:0x7f9219696970 @children=["their"]>, #<Nns:0x7f92196968f8 @children=["offices"]>]>]>]>,#<Cc:0x7f9219696740 @children=["or"]>, #<Vp:0x7f92196963d0 @children=[#<Vb:0x7f92196966c8 @children=["walk"]>, #<Np:0x7f9219696628 @children=[#<Nn:0x7f9219696650 @children=["home"]>]>, #<Pp:0x7f9219696420 @children=[#<In:0x7f9219696588 @children=["during"]>, #<Np:0x7f9219696470 @children=[#<Dt:0x7f9219696510 @children=["the"]>, #<Nn:0x7f9219696498 @children=["night"]>]>]>]>]>]>]>]>]>]>, #<Comma:0x7f92196960b0 @children=[","]>, #<Np:0x7f9219696010 @children=[#<Nns:0x7f9219696038 @children=["officials"]>]>, #<Vp:0x7f9219695e80 @children=[#<Vbd:0x7f9219695f70 @children=["said"]>, #<NpTmp:0x7f9219695ed0 @children=[#<Nn:0x7f9219695ef8 @children=["today"]>]>]>, #<Dot:0x7f9219695de0 @children=["."]>]>]>
 
P

Pascal J. Bourguignon

Possum said:
Hi, guys ,

recently I'm working on stanford parser(written in java) which has an
in-complete ruby rapper.I can get the result like this as a string ,
but I still want to manipulate the result as an tree , what is the
convinient way for me to convert the following string to an tree
object,can someone give me some suggestion?

(NP (DT The) (JJS strongest) (NN rain))

Otherwise, you may google for ruby sexp parse, and select one of the
sexp parsing ruby library, to convert the sexp string into a "rexp", a
ruby expression made of Array, Symbol, and other data atoms. Then you
can process this rexp in ruby like I did in lisp.
 
L

Louis-Philippe

[Note: parts of this message were removed to make it a legal post.]

As always, Pascal is the best!That is, at showing everybody He can answer
everything with Lisp ...
 
H

Harry

Hi, Pascal ,Tony and everyone

Thank you for your guys reply ,it's realy helpful for me to sort this
out.

Jruby is the ultimate idea to use any java lib from ruby ,
thanks ,only thing is I am not so comfortable with Java , if compare
with ruby,

Pascal, this sexp idea lead me to the right way , I found an lib in
ruby called sexp_path, but later I have an new idea ,because sexp_path
lead me to think it as an html tree with xpath selector ,so at last I
do it this way

1, convert the string to an html tree with few simple rexp replacing
s.gsub!(/\(/,"\<div class=\"")
s.gsub!(/\)/,"</div>")
s.gsub!(/class="[A-Z]+/){|match|
match.replace(match<<"\">")
}

the string converted to this :

<div class="ROOT">
<div class="S">
<div class="S">
<div class="NP">
<div class="NP"> <div class="DT"> The</div> <div class="JJS">
strongest</div> <div class="NN"> rain</div></div>
<div class="VP">
<div class="ADVP"> <div class="RB"> ever</div></div>
<div class="VBN"> recorded</div>
<div class="PP"> <div class="IN"> in</div>
<div class="NP"> <div class="NNP"> India</div></div></
div></div></div>
<div class="VP">
<div class="VP"> <div class="VBD"> shut</div>
<div class="PRT"> <div class="RP"> down</div></div>
<div class="NP">
<div class="NP"> <div class="DT"> the</div> <div
class="JJ"> financial</div> <div class="NN"> hub</div></div>
<div class="PP"> <div class="IN"> of</div>
<div class="NP"> <div class="NNP"> Mumbai</div></div></
div></div></div>
<div class=", ,</div>
<div class="VP"> <div class="VBD"> snapped</div>
<div class="NP"> <div class="NN"> communication</div> <div
class="NNS"> lines</div></div></div>
<div class=", ,</div>
<div class="VP"> <div class="VBD"> closed</div>
<div class="NP"> <div class="NNS"> airports</div></div></
div>
<div class="CC"> and</div>
<div class="VP"> <div class="VBD"> forced</div>
<div class="NP">
<div class="NP"> <div class="NNS"> thousands</div></div>
<div class="PP"> <div class="IN"> of</div>
<div class="NP"> <div class="NNS"> people</div></div></
div></div>
<div class="S">
<div class="VP"> <div class="TO"> to</div>
<div class="VP">
<div class="VP"> <div class="VB"> sleep</div>
<div class="PP"> <div class="IN"> in</div>
<div class="NP"> <div class="PRP">$ their</div>
<div class="NNS"> offices</div></div></div></div>
<div class="CC"> or</div>
<div class="VP"> <div class="VB"> walk</div>
<div class="NP"> <div class="NN"> home</div></div>
<div class="PP"> <div class="IN"> during</div>
<div class="NP"> <div class="DT"> the</div> <div
class="NN"> night</div></div></div></div></div></div></div></div></
div></div>


2, then manipulate the tree with Hpricot with xpath selector , very
easy and simple


thanks again!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,075
Latest member
MakersCBDBloodSupport

Latest Threads

Top