Regular Expressions: large amount of or's

  • Thread starter =?ISO-8859-1?Q?Andr=E9_S=F8reng?=
  • Start date
D

Daniel Yoo

: tree.search("I went to alpha beta the other day to pick up some spam")

: could use a startpos (default=0) argument for efficiently restarting
: the search after finding the first match

Ok, that's easy to fix. I'll do that tonight.
 
S

Scott David Daniels

Daniel said:
: tree.search("I went to alpha beta the other day to pick up some spam")

: could use a startpos (default=0) argument for efficiently restarting
: the search after finding the first match

Ok, that's easy to fix. I'll do that tonight.

I have a (very high speed) modified Aho-Corasick machine that I sell.
The calling model that I found works well is:

def chases(self, sourcestream, ...):
'''A generator taking a generator of source blocks,
yielding (matches, position) pairs where position is an
offset within the "current" block.
'''

You might consider taking a look at providing that form.

-Scott David Daniels
(e-mail address removed)
 
D

Daniel Yoo

: : tree.search("I went to alpha beta the other day to pick up some spam")

: : could use a startpos (default=0) argument for efficiently restarting
: : the search after finding the first match

: Ok, that's easy to fix. I'll do that tonight.

Done. 'startpos' and other bug fixes are in Release 0.7:

http://hkn.eecs.berkeley.edu/~dyoo/python/ahocorasick/ahocorasick-0.7.tar.gz

But I think I'd better hold off adding the ahocorasick package to PyPI
until it stabilizes for longer than a day... *grin*
 
D

Daniel Yoo

: I have a (very high speed) modified Aho-Corasick machine that I sell.
: The calling model that I found works well is:

: def chases(self, sourcestream, ...):
: '''A generator taking a generator of source blocks,
: yielding (matches, position) pairs where position is an
: offset within the "current" block.
: '''

: You might consider taking a look at providing that form.


Hi Scott,

No problem, I'll be happy to do this.

I need some clarification on the calling model though. Would this be
an accurate test case?

######
def testChasesInterface(self):
self.tree.add("python")
self.tree.add("is")
self.tree.make()
sourceStream = iter(("python programming is fun",
"how much is that python in the window"))
self.assertEqual([
(sourceBlocks[0], (0, 6)),
(sourceBlocks[0], (19, 21)),
(sourceBlocks[1], (9, 11)),
(sourceBlocks[1], (17, 23)),
],
list(self.tree.chases(sourceStream))
######

Here, I'm assuming that chases() takes in a 'sourceStream', which is
an iterator of text blocks., and that the return value is itself an
iterator.


Best of wishes!
 
D

Daniel Yoo

: Done. 'startpos' and other bug fixes are in Release 0.7:

: http://hkn.eecs.berkeley.edu/~dyoo/python/ahocorasick/ahocorasick-0.7.tar.gz

Ok, I stopped working on the Aho-Corasick module for a while, so I've
just bumped the version number to 0.8 and posted it up on PyPI.

I did add some preliminary code to use graphviz to emit DOT files, but
it's very untested code. I also added an undocumented api for
inspecting the states and their transitions.

I hope that the original poster finds it useful, even though it's
probably a bit late.


Hope this helps!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,075
Latest member
MakersCBDBloodSupport

Latest Threads

Top