Method improvement request .--

C

Charles Hixson

James said:
Does that work inside character class definitions( [] delimited groups)?

Contrary to what you expect (judging by your later message), it sure
does.
[\saeiou]
Will match a whitespace or vowel character.
James Edward Gray II


Whuuf! That *is* a surprise! Thanks. That may make some of my regexps
much more readable.
 
B

Bill Guindon

Bill Guindon wrote:
OK, I've copied it over to codepaste here:
http://www.codepaste.org/paste/comment/218
http://www.codepaste.org/view/paste/163?show_comments=1
etc.
(How long does code stay up here? I never knew the site existed.)

New site, written in Ruby. For now, there's no time limit.

I was bored with my own stuff, so trimmed down the 'run' a bit...
http://www.codepaste.org/view/paste/163?show_comments=1

Hope it still runs, couldn't test it for lack of the 'word' file you required.
There are links on the right to see the originals, and a diff between the two.
But I think that I put the relevant pieces in the first e-mail.
However, I don't intend that elipsis and double-dashes be deleted.
They merely need to be parsed separately from the words that they appear
with. They do contain significant meaning, so merely deleting them
would be anti-productive.

Right, much easier to understand what you seek now. Sounds like
you're looking at a text formatter, I'm guessing for printing.
Also: Is

chunk = [] << chunk
chunk.flatten!
return chunk
better in some way than
return "" unless chunk.respond_to?("[]")

Depends on whether you want to stay, or return. The first forces the
Array so you'll stick around and process whatever you have -- useful
for processing when you're not sure if you were given an Array or a
String.

The second bails out, so is the better choice if you don't want to
look at it unless it's an Array.
I could see, perhaps,
return "" if not chunk or chunk.empty?
but I'd been reading that it was more Ruby-esque to use duck typing and the responds_to? test. That's one reason I didn't do
return "" if chunk.nil?

Actually, checking for Nil would fall into duck typing (as far as I
understand it). Anything can be Nil, so it's not quite the same as
explicit type checking.
And I'm still not certain what I should really be doing in such a case. What I really want to do is avoid returning nil. I want to skip over the processing of this case without aborting the calling process (likely an each). If this were a loop, then the command would be next rather than break or retry.

I'd go with ... return '' if not chunk
or... return unless chunk

but that's just a personal preference.

With a bit of time, the 'nil' bugs will go away, or at least have
semi-obvious solutions.
 
D

David A. Black

Hi --

There are many Ruby idioms that I'm less than totally fluent with, and
that's one of them. (I just recently figured out that if a routine has
yields, then you probably shouldn't have returns.

There are lots of cases where one uses both the yielded values and the
return value of the method:

new_array = [1,2,3].map {|x| x * 10 }

etc.


David
 
D

David A. Black

Hi --

Charles said:
James said:
On Sep 19, 2004, at 4:03 PM, Charles Hixson wrote:

Sorry. It took me a bit of digging to find the /.../x documentation
even after you explicitly pointed it out to me. (This won't work
for me, because my actual pre- and post- patterns also exclude
spaces, but it can certainly clarify the example, if one understands
it!)

You can match space characters in an /.../x regex. The easiest way
is to use the whitespace character class escape \s.
Hope that helps.
James Edward Gray II

Does that work inside character class definitions( [] delimited groups)?

Silly of me, of course not. /s *IS* a character class definition.

But this does mean that I won't be able to use /.../x in the code.
Still, it's great for clarifying the examples, now that I understand it.

There's no regex without /x that cannot be expressed with /x :)


David
 
F

Felipe Malta de Oliveira

In spite of the recent comments stating that newbies should not be afraid to
post silly questions, I now ask...

Could anybody give me a little knowledge about symbols? Like what they are,
why and where they're used and such...Or give me a pointer to somewhere I
can find that information?

Thanks a lot,

Felipe
 
M

Markus

I'll give it a shot:

* Symbols are an idea borrowed from lisp. They are immutable,
atomic, named, globally unique values with an efficient internal
storage.
* immutable, like (say) nil or an integer, in that you
can't change them, give them new values, update them,
etc.
* atomic in that you can't "take apart" a symbol or "peek
inside it" like you can with a string
* named, in that each symbol has a human-readable form
(unlike, say, pointers)
* globally unique in that if a symbol is referenced
anywhere in the program it is the same object that the
same reference would get you anywhere else. This is the
same way integers work (7 is 7, no matter where it
occurs in the program), but unlike how arrays and
strings work (you can, for example, have the string
"seven" in several places in your program, and they are
NOT the same object).
* efficient in that they are usually implemented as
something like an integer or a pointer, and thus are
quick to compare, small to store, etc.
* Symbols are used wherever they are useful.
* Symbols fill a roll in ruby (and in lisp) something like
enumerated types in pascal--in fact, if you single
imagined a pre-existing enumerated type containing all
possible values, that would work sort of like symbols.
* Symbols can be used for arbitrary state or condition
labels (e.g. :male/:female, :jan, :feb, :mar...,
:eek:n,:eek:ff,:standby,:eek:ut_of_service,... :reverse,:neutral,
:first,:second,:third,:eek:verdrive etc.)
* Symbols can be used as "exceptional" values (e.g.
:not_a_number, :to_be_determined, etc.) much as nil or
-1 often are, but in a way that is much easier to read.
They are much more efficient than strings, which are
often also used in such contexts

If that doesn't help, let me know and I'll try to dredge up some online
references--or you can always google.

-- MarkusQ
 
C

Charles Hixson

Bill said:
New site, written in Ruby. For now, there's no time limit.

I was bored with my own stuff, so trimmed down the 'run' a bit...
http://www.codepaste.org/view/paste/163?show_comments=1

Hope it still runs, couldn't test it for lack of the 'word' file you required.
There are links on the right to see the originals, and a diff between the two.
Actually, after several comments, and thinking about things, I've
started trying to use rewrite rules. Thinks like replacing
/(\w)n't/ with '\1n\xB9t'. This means that I need to reserve the first
255 word Id's for character names, but by replacing internal
apostrophe's with a Right Single Quotation Mark I don't need to worry
about matches with ' (since the replacement character isn't defined in
std. ANSI-7. It does mean that the elipsis (...) had to be translated
to /x91 (Private Use 1) since no elipsis character was defined, but
that's OK. But with these translations, a lot of the special cases
disappear, and I was going to need to do that anyway as there's no other
way to determine things like whether or not the period of Dr. denotes
the end of a sentence. I'm still dithering about what the period should
be rewritten as, but making it something other than a period simplifies
things alot, at the cost of misidentifying cases where a sentence ends
with Mr.
Right, much easier to understand what you seek now. Sounds like
you're looking at a text formatter, I'm guessing for printing.
Actually, I'm reading text files, assigning word id's to various word &
wordish fragments, sticking them in a database for memory. The next
step is to build real phrases (That word routine, and it's backfill is
just the part that builds the database, sticks the words into it, and
retrieves them when a previously encountered word is again
encountered.) I'm not yet to the interesting stuff. Right now I'm just
trying to create phrases of a reasonable length that break in
gramatically reasonable places. I'd prefer that no phrase be longer
than 5 or 6 words.

At a later stage it will start babbling using reasonable phrases as
chunks, and transitionsing from phrase to phrase based on some kind of
statistical relationship. Still later...well, I don't yet know just how
far this can go. I'm hoping it will become interesting. I intend to
feed it a bunch of books from Gutenberg as background, but I'm starting
with Alice30.txt (Alice in Wonderland).
 
M

Markus

At a later stage it will start babbling using reasonable phrases as
chunks, and transitionsing from phrase to phrase based on some kind of
statistical relationship. Still later...well, I don't yet know just how
far this can go. I'm hoping it will become interesting. I intend to
feed it a bunch of books from Gutenberg as background, but I'm starting
with Alice30.txt (Alice in Wonderland).

If it turns out you're using this to try to get past spam filters I
think a lot of us will be very disappointed.

-- Markus
 
C

Charles Hixson

Markus said:
If it turns out you're using this to try to get past spam filters I
think a lot of us will be very disappointed.

-- Markus
Well, since I plan to eventually release full sources...that may well
happen if it's successful. Then again, it could probably also be used
to sort spam from ham.

I basically think of this as a part of an AI project, and as such will
have multiple uses.
E.g., one test of ham is that most of what it contains consists of
reasonable phrases. If it doesn't have reasonable phrases, it's
probably something else. Which, unfortunately, includes programs. So
you'd need a separate recognizer to decide that it was or wasn't a
program. And possibly others.

But the spam/ham problem is an arms race. I suspect that a final answer
is impossible this side of individually tailored filters. Bayes is
already a start at this, but it's just a start. To be really effective
the filter will need to dip into the semantic level. (So far I'm pretty
much staying at the syntactic level, because it's more tractable...but
semantics will need to be added.)
 
A

Austin Ziegler

Well, since I plan to eventually release full sources...that may well
happen if it's successful. Then again, it could probably also be used
to sort spam from ham.

I basically think of this as a part of an AI project, and as such will
have multiple uses.
E.g., one test of ham is that most of what it contains consists of
reasonable phrases. If it doesn't have reasonable phrases, it's
probably something else. Which, unfortunately, includes programs. So
you'd need a separate recognizer to decide that it was or wasn't a
program. And possibly others.

But the spam/ham problem is an arms race. I suspect that a final answer
is impossible this side of individually tailored filters. Bayes is
already a start at this, but it's just a start. To be really effective
the filter will need to dip into the semantic level. (So far I'm pretty
much staying at the syntactic level, because it's more tractable...but
semantics will need to be added.)

I wouldn't mind seeing a more portable (Ruby?) implementation of a
Dolby noise or Markov chain spam analysis routine.

http://yro.slashdot.org/yro/04/02/24/0025219.shtml

It seems that you're basically doing Markov chain analysis here.

-austin
 
C

Charles Hixson

Austin said:
I wouldn't mind seeing a more portable (Ruby?) implementation of a
Dolby noise or Markov chain spam analysis routine.

http://yro.slashdot.org/yro/04/02/24/0025219.shtml

It seems that you're basically doing Markov chain analysis here.

-austin
Well, certainly not formally. But then I haven't gotten well started.
Still, there does seem to be a lot of overlap in the "state space".
I'll have to remember that for when I get hung up on how to proceed.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,777
Messages
2,569,604
Members
45,213
Latest member
ErikNeale6

Latest Threads

Top