Automatic question generator libs in Ruby Language

C

Chris Kottom

[Note: parts of this message were removed to make it a legal post.]

class QuestionGenerator
def self.generate
"Huh?"
end
end

On Thu, Feb 24, 2011 at 6:04 PM, Sniper Abandon <
 
A

Abinoam Jr.

[Note: parts of this message were removed to make it a legal post.]

Sniper,

Could you be a little more specific?
What is a "question generator"?
Perhaps we could help you.

Best regards,
Abinoam Jr.


class QuestionGenerator
def self.generate
"Huh?"
end
end

On Thu, Feb 24, 2011 at 6:04 PM, Sniper Abandon <
 
S

Sniper Abandon

suppose if i have a paragraph (arround 250 words)

i want to get all the possible question from that paragraph


Thanks
 
H

Hassan Schroeder

suppose if i have a paragraph (arround 250 words)

i want to get all the possible question from that paragraph

?! *all possible questions* ?

"Why are there some many 3-character words in this language?"
"Was this bozo drunk when he wrote this?"
"Why doesn't the outside part of an 'e' go all the way around?"
"This is really -- wait, where did I put my car keys?"

def all_possible_questions
?
end
 
S

susan hall

i don't know were hit the sauce=2Cbozo you type the letter with an e awerso=
me cookie
=20
Date: Sun=2C 27 Feb 2011 12:18:14 +0900
From: (e-mail address removed)
Subject: Re: Automatic question generator libs in Ruby Language
To: (e-mail address removed)
=20

=20
?! *all possible questions* ?
=20
"Why are there some many 3-character words in this language?"
"Was this bozo drunk when he wrote this?"
"Why doesn't the outside part of an 'e' go all the way around?"
"This is really -- wait=2C where did I put my car keys?"
=20
def all_possible_questions
?
end
=20
--=20
Hassan Schroeder ------------------------ (e-mail address removed)
twitter: @hassan
=20
=
 
S

susan hall

=20
Date: Tue=2C 1 Mar 2011 03:17:34 +0900
From: (e-mail address removed)
Subject: Re: Automatic question generator libs in Ruby Language
To: (e-mail address removed)
=20
=20
i don't know were you hitting the sauce=2C bozo =2C you type the lett= er with an e awersome cookie
=20
=20
=
 
S

Shadowfirebird

i want to get all the possible question from that paragraph

Do you mean that you want to extract all the sentences that end in a question mark?

Untested (so don't laugh, all you Ruby grownups):

def questions(para)
ans = []
para.split(/[\.\?\!] /).each do |sn|
ans << sn if sn ~= /\? $/
end

return ans
end

(That won't cope with brackets.) Said the Vicar: "Or quotes." But it's a starting point -- assuming that that was what you meant.

Presumably someone smarter than me could do better with a single regex.
 
P

Peter Zotov

i want to get all the possible question from that paragraph

Do you mean that you want to extract all the sentences that end in a
question mark?

Untested (so don't laugh, all you Ruby grownups):

def questions(para)
ans = []
para.split(/[\.\?\!] /).each do |sn|
ans << sn if sn ~= /\? $/
end

return ans
end

(That won't cope with brackets.) Said the Vicar: "Or quotes." But
it's a starting point -- assuming that that was what you meant.

Presumably someone smarter than me could do better with a single
regex.

def questions(para)
para.scan(/([^!.]+?\?|.+?[!.])\w*/m).flatten.select { |m| m[-1] == ??
}.map &:strip
end

ruby-1.9.2-p136 :063 > str
=> "This is a sentence? Yes. And this too? Definitely!\n"
ruby-1.9.2-p136 :064 > questions(str)
=> ["This is a sentence?", "And this too?"]
 
A

Adam Prescott

[Note: parts of this message were removed to make it a legal post.]

def questions(para)
para.scan(/([^!.]+?\?|.+?[!.])\w*/m).flatten.select { |m| m[-1] == ??
}.map &:strip
end

ruby-1.9.2-p136 :063 > str
=> "This is a sentence? Yes. And this too? Definitely!\n"
ruby-1.9.2-p136 :064 > questions(str)
=> ["This is a sentence?", "And this too?"]
This could be a long rabbit hole.


questions("What does '?' mean?") #=> ["What does '?", "' mean?"]
 
P

Peter Zotov

This could be a long rabbit hole.

Well, you've started this. Now I shall show you what a rabbit hole is
;)

#encoding:utf-8
# requires 1.9

QUESTION_REGEXP = Regexp.compile(<<'END'.strip, Regexp::EXTENDED |
Regexp::MULTILINE)
(?<mstc>\g<inf>[!?.]?\s*){0}(?<stc>\g<inf>[!?.]\s*){0}(?<text>[^!.?']+?){0}
(?<inf>(?<pq>["'])\g<mstc>\k<pq+0>\g<inf>|«\g<mstc>»\g<inf>|\g<text>\g<inf>|){0}
\g<stc>
END

def questions(para)
questions = []
while para =~ QUESTION_REGEXP
para = $'
questions << $& if $&.strip[-1] == ??
end
questions
end

TEXT = <<'END'
Nested 'questions?' are supported.
This is a text. A question?
This «kind?» is supported too, really?
The quotes «"shall 'be" matching?'»?
END

p questions(TEXT)
 
A

Adam Prescott

[Note: parts of this message were removed to make it a legal post.]

This could be a long rabbit hole.
Well, you've started this. Now I shall show you what a rabbit hole is ;)

[impressiveness]
Hm!

opine = "You think ' is an excellent grapheme?" # arguably this is a valid
fully-formed question...

questions(opine) # endless loop!
 
P

Peter Zotov

This could be a long rabbit hole.
Well, you've started this. Now I shall show you what a rabbit hole
is ;)

[impressiveness]
Hm!

opine = "You think ' is an excellent grapheme?" # arguably this is a
valid
fully-formed question...

questions(opine) # endless loop!

A debug modification has accidentally slipped through. The regexp
should have been defined this way:

QUESTION_REGEXP = Regexp.compile(<<'END'.strip, Regexp::EXTENDED |
Regexp::MULTILINE)
(?<mstc>\g<inf>[!?.]?\s*){0}(?<stc>\g<inf>[!?.]\s*){0}(?<text>[^!.?]+?){0}
(?<inf>(?<pq>["'])\g<mstc>\k<pq+0>\g<inf>|«\g<mstc>»\g<inf>|\g<text>\g<inf>|){0}
\g<stc>
END

(look at the last [] in first line: it has ' removed).

I am aware of this problem. (I hadn't written about it because of
several reasons: I don't know what exactly causes it, and I wanted to
see if someone would trigger it — congratulations. Given the time this
regexp executes even on a simple strings, the code is already not suited
for production, and this small misinformation would not hurt the OP.)

I've triggered it several times while trying different forms of the
regexp, and sometimes it occurs when using one of seemingly equivalent
constructs, but does not occur with other one. A quick look at some
backtraces taken at random times during the execution suggests that it
really loops (and not just executes for a lot of time), but oniguruma is
really huge and complex, and I don't have enough time to debug this.

Also, as I am not a regular expression guru, it may contain blatant
errors. (I've seen once a book on optimizing regexps. It was scary.)
I've tried to use my experience with LALR parsers here, but sometimes
oniguruma behaves in a completely different way. E.g. at the some point
I've tried to replace an empty token at the end of (?<inf>...) group
with a plain \g<text>, which of course includes empty string, it would
not compile due to _indefinite recursion_. It works perfectly with
current variant. The (?<text>...) token always was a terminal, of
course.

The ideal way of accomplishing this task would be using a lexer and a
parser; it is easy enough to write them manually in this case (as
opposed to using a tool like RACC). Thus, one would create a finite
state machine, which is what regular expression compiler does, too, but
the former will be optimized properly.

I'll leave this as a homework to someone else :)

--
WBR, Peter Zotov.

P.S. Given that Oniguruma RE can compile a full LALR parser, I wonder
how a regular expression matching a regular expression look like. And
the amount of time and memory it will use.
 
S

Shadowfirebird

opine = "You think ' is an excellent grapheme?" # arguably this is a valid
fully-formed question...

questions(opine) # endless loop!

So is: "What is the capital of Tunisia!"

Unless you can get Ruby to understand English, the parsing will always be imperfect.
 
A

Andrew Wagner

[Note: parts of this message were removed to make it a legal post.]

So is: "What is the capital of Tunisia!"

Unless you can get Ruby to understand English, the parsing will always be
imperfect.
But that's not a question! It's an exuberant and wrong statement about a
fictitious city named "What"! (cue Abbott and Costello)
 
A

Adam Prescott

[Note: parts of this message were removed to make it a legal post.]

So is: "What is the capital of Tunisia!"

Unless you can get Ruby to understand English, the parsing will always be
imperfect.

Sure. Although my opine-variable question is more syntactic, I think, in
question-ness than the semantic-focused Tunisia one. But yes, I agree; it's
really just about coverage of needs.
 
S

Shadowfirebird

But that's not a question! It's an exuberant and wrong statement about a
fictitious city named "What"! (cue Abbott and Costello)

Imagine a very angry teacher. And remember that you can't end a sentence with '?!'.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,754
Messages
2,569,527
Members
45,000
Latest member
MurrayKeync

Latest Threads

Top