excessively verbose request for help with regex and arrays

S

Simon Schuster

text = "(20:29:55) awhilewhileaway: I also need to assemble the
cover/back, and figure out the innards of the aimlog
formatting/keyword searches"

what I want is essentially 3 fields, the text itself, the speaker
(stripped of the ":") and the date information, however as far as the
date information goes, it will be part of a short-stepped process,
which will only need to reference the previous one, so all data can
keep overwriting within two variables, as in: time_since_last -
time_current = time_it_took ... I think. I'm new to programming and
left math in highschool, so it's a weird (but very fun) place for my
mind to be. :) still working it out.

so I would like two of the fields of this array to be hashes..
array[0] being a hash and having a numerical value, array[1] being a
hash and having personA or personB value, and then array[2] being a
string. that works, I think...?

wow! I had no idea I knew this much when I started the e-mail. :) any
hints/solutions for me to play around with? it's the parentheses of
the regex that kind of has me stuck, mostly, as well as how to deal
with clock arithmetic when it rolls over at midnight, I foresee that
being confusing.
 
E

Eric Hodel

text = "(20:29:55) awhilewhileaway: I also need to assemble the
cover/back, and figure out the innards of the aimlog
formatting/keyword searches"

what I want is essentially 3 fields, the text itself, the speaker
(stripped of the ":") and the date information, however as far as the
date information goes, it will be part of a short-stepped process,
which will only need to reference the previous one, so all data can
keep overwriting within two variables, as in: time_since_last -
time_current = time_it_took ... I think. I'm new to programming and
left math in highschool, so it's a weird (but very fun) place for my
mind to be. :) still working it out.

Well, you haven't explained what you really want to do with your data
yet, so that all sounds quite a bit complicated. Why not start out
with just a simple split on space:

time, speaker, content = text.split ' ', 3

Then you can parse the time:

require 'time'
time = Time.parse time

Cut off the ':' on the speaker:

speaker = speaker.sub(/:$/, '')

and you'll be left with:

p time, speaker, content
Wed Aug 15 20:29:55 -0700 2007
"awhilewhileaway"
"I also need to assemble the cover/back, and figure out the innards
of the aimlog formatting/keyword searches"
 
A

Alex Gutteridge

text = "(20:29:55) awhilewhileaway: I also need to assemble the
cover/back, and figure out the innards of the aimlog
formatting/keyword searches"

what I want is essentially 3 fields, the text itself, the speaker
(stripped of the ":") and the date information, however as far as the
date information goes, it will be part of a short-stepped process,
which will only need to reference the previous one, so all data can
keep overwriting within two variables, as in: time_since_last -
time_current = time_it_took ... I think. I'm new to programming and
left math in highschool, so it's a weird (but very fun) place for my
mind to be. :) still working it out.

so I would like two of the fields of this array to be hashes..
array[0] being a hash and having a numerical value, array[1] being a
hash and having personA or personB value, and then array[2] being a
string. that works, I think...?

wow! I had no idea I knew this much when I started the e-mail. :) any
hints/solutions for me to play around with? it's the parentheses of
the regex that kind of has me stuck, mostly, as well as how to deal
with clock arithmetic when it rolls over at midnight, I foresee that
being confusing.

I'm not sure I fully understand what you want to do, perhaps you
should post a more complete set of data. The part where you describe
Arrays of Hashes is a bit confusing as well, can you describe your
data structure using code rather than English? The first regexp part
is easy enough though:

text = "(20:29:55) awhilewhileaway: I also need to assemble"
text.scan(/(\(.+?\)) (.+?): (.+)/){|time,name,data|
p time
p name
p data
}

This doesn't check for multi-line strings, names with ':' in and
other weirdness though. So be careful with real world data.

If you have an Array of text lines then you can just iterate through
(I use map below), scan each one and store the data in another Array
(no need for Hashes unless I misunderstand you):

irb(main):031:0> text_a = ["(20:29:55) awhilewhileaway: I also need
to assemble","(20:39:55) away: I also need embl"]
=> ["(20:29:55) awhilewhileaway: I also need to assemble",
"(20:39:55) away: I also need embl"]
irb(main):032:0> res = text_a.map{|l| l.scan(/(\(.+?\)) (.+?): (.+)/)
[0]}
=> [["(20:29:55)", "awhilewhileaway", "I also need to assemble"],
["(20:39:55)", "away", "I also need embl"]]
irb(main):033:0> res[0]
=> ["(20:29:55)", "awhilewhileaway", "I also need to assemble"]
irb(main):034:0> res[0][0]
=> "(20:29:55)"

Hope that helps.

Alex Gutteridge

Bioinformatics Center
Kyoto University
 
S

Simon Schuster

thanks, but since the time is only going to be used for arithmetic
parsing it for additional information isn't helpful, and the roll-over
will be problematic.

(23:54:45) - (00:03:45) != 00:09:00

as for the use of the data, basically, at this stage, I'm working on
formatting aimlogs into "bookish" dialogue, with an eventual goal of
utilizing lulu.com's API to generate books behind my back. :D maybe
thinking about making a gaim plugin if it turns out, with many more
ideas for what else I could do, but not nearly the ruby rigors I need
to actualize them (yet!!) :p
 
A

Alex Gutteridge

thanks, but since the time is only going to be used for arithmetic
parsing it for additional information isn't helpful, and the roll-over
will be problematic.

(23:54:45) - (00:03:45) != 00:09:00

Does this help?

Parse the two dates like Eric suggested. If the second (later) time
is less than the first then add a day to it (60*60*24 seconds). Then
subtract one from the other to get the difference in seconds.

irb(main):023:0> t1 = Time.parse('23:54:45')
=> Thu Aug 16 23:54:45 +0900 2007
irb(main):024:0> t2 = Time.parse('00:03:45')
=> Thu Aug 16 00:03:45 +0900 2007
irb(main):025:0> t2 += (60 * 60 * 24) if t2 < t1
=> Fri Aug 17 00:03:45 +0900 2007
irb(main):026:0> diff = t2 - t1
=> 540.0

Alex Gutteridge

Bioinformatics Center
Kyoto University
 
S

Simon Schuster

yes, this helps a lot! I should have assumed that parsing the time
would enable arithmetic like Fri - Thurs, instead I assumed I'd have
to put it all to integers.

I will have to think over more of exactly what I'm trying to do with
the arrays/hashes, after I read more about hashes, I think. thanks!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top