A method to search and cut in Ruby.

S

scomboni

Hello all,
I'm trying to parse some EXIF data and return certain fields of text.
If I wanted to parsed a document looking for text and then locating it
take everything after a colon : delimitator what is the best method in
ruby vs Shell command?

For example if I have a file called filename.txt and I want to search
for this ---> "Compression : JPEG (old-style)"
but only return everything to the right of the delimitator?

Using command line tool such as grep to locate and then cut to grab
everything after " : " works but I'm trying to learn how to do things
like this in Ruby and having no luck grabbing everything to the right
of the colon. Some of the text I'm grabbing can contain additional
colons and I want everything after the first one.


Thanks.

Sc--
 
P

Phrogz

For example if I have a file called filename.txt and I want to search
for this ---> "Compression : JPEG (old-style)"
but only return everything to the right of the delimitator?

C:\>irb
irb(main):001:0> s = "Foo : Bar"
=> "Foo : Bar"
irb(main):002:0> s.split ":"
=> ["Foo ", " Bar"]
irb(main):003:0> s.split(":").last
=> " Bar"

....or did you want the leading whitespace chomped?

irb(main):004:0> s.split( /\s*:\s*/ )
=> ["Foo", "Bar"]
irb(main):005:0> s.split( /\s*:\s*/ ).last
=> "Bar"
 
S

scomboni

For example if I have a file called filename.txt and I want to search
for this ---> "Compression : JPEG (old-style)"
but only return everything to the right of the delimitator?

C:\>irb
irb(main):001:0> s = "Foo : Bar"
=> "Foo : Bar"
irb(main):002:0> s.split ":"
=> ["Foo ", " Bar"]
irb(main):003:0> s.split(":").last
=> " Bar"

...or did you want the leading whitespace chomped?

irb(main):004:0> s.split( /\s*:\s*/ )
=> ["Foo", "Bar"]
irb(main):005:0> s.split( /\s*:\s*/ ).last
=> "Bar"

I was looking at this method as well. I was having trouble though as I
know the left hand side column so for this example the Compression
portion then the colon : the remainder could be anything for instance
a description with lots of text and could also contain a additional
colon. At the shell prompt I could grep "Compression" | cut -d: -f2-
and that would grab everything...
So it sounds like I'm looking in the correct area but Im just not
pulling it all together. I will keep at it if you have additional info
to point to that would be great..and appreciated.
Thanks again.


Sc-
 
B

bbiker

For example if I have a file called filename.txt and I want to search
for this ---> "Compression : JPEG (old-style)"
but only return everything to the right of the delimitator?

C:\>irb
irb(main):001:0> s = "Foo : Bar"
=> "Foo : Bar"
irb(main):002:0> s.split ":"
=> ["Foo ", " Bar"]
irb(main):003:0> s.split(":").last
=> " Bar"

...or did you want the leading whitespace chomped?

irb(main):004:0> s.split( /\s*:\s*/ )
=> ["Foo", "Bar"]
irb(main):005:0> s.split( /\s*:\s*/ ).last
=> "Bar"


Some of the text I'm grabbing can contain additional
colons and I want everything after the first one.

SC
I am assuming you want everything after the first colon but not after
the second

irb(main):001:0> line = "Compression : JPEG (old_style) : whatever"
=> "Compression : JPEG (old_style) : whatever"
irb(main):002:0> if line =~ /^Compression\s*:?(.*)$/
irb(main):003:1> end_str = $1
irb(main):004:1> first_el = end_str.split(/:/).first
puts first_el
irb(main):005:1> end
JPEG (old_style)

if you everything after each colon

el = end_str.split(/:/)

p el = {"JPEG (old-style)", "whatever"]
 
P

Phil Meier

On Jul 25, 4:59 pm, (e-mail address removed) wrote: ....
...or did you want the leading whitespace chomped?

irb(main):004:0> s.split( /\s*:\s*/ )
=> ["Foo", "Bar"]
irb(main):005:0> s.split( /\s*:\s*/ ).last
=> "Bar"

I was looking at this method as well. I was having trouble though as I
know the left hand side column so for this example the Compression
portion then the colon : the remainder could be anything for instance
a description with lots of text and could also contain a additional
colon. At the shell prompt I could grep "Compression" | cut -d: -f2-
and that would grab everything...

To get everything after the first colon you can use:
s =~ /^.*?:/
textAfterColon = $'.dup

To also get rid of the spaces directly after the first colon use this Regex:
s =~ /^.*?:\s*/
textAfterColon = $'.dup

The trick is to anchor the regex (i.e. using ^)

BR Phil
 
S

scomboni

(e-mail address removed) schrieb:


On Jul 25, 4:59 pm, (e-mail address removed) wrote: ...
...or did you want the leading whitespace chomped?
irb(main):004:0> s.split( /\s*:\s*/ )
=> ["Foo", "Bar"]
irb(main):005:0> s.split( /\s*:\s*/ ).last
=> "Bar"
I was looking at this method as well. I was having trouble though as I
know the left hand side column so for this example the Compression
portion then the colon : the remainder could be anything for instance
a description with lots of text and could also contain a additional
colon. At the shell prompt I could grep "Compression" | cut -d: -f2-
and that would grab everything...

To get everything after the first colon you can use:
s =~ /^.*?:/
textAfterColon = $'.dup

To also get rid of the spaces directly after the first colon use this Regex:
s =~ /^.*?:\s*/
textAfterColon = $'.dup

The trick is to anchor the regex (i.e. using ^)

BR Phil

Thanks so much for everyones answers much appreciated. The space after
the colon, I thought I was going to have to live with that. Thanks
again for the help.
Scott
 
T

Thomas Gantner

For example if I have a file called filename.txt and I want to search
for this ---> "Compression : JPEG (old-style)"
but only return everything to the right of the delimitator?

C:\>irb
irb(main):001:0> s = "Foo : Bar"
=> "Foo : Bar"
irb(main):002:0> s.split ":"
=> ["Foo ", " Bar"]
irb(main):003:0> s.split(":").last
=> " Bar"

...or did you want the leading whitespace chomped?

irb(main):004:0> s.split( /\s*:\s*/ )
=> ["Foo", "Bar"]
irb(main):005:0> s.split( /\s*:\s*/ ).last
=> "Bar"

I was looking at this method as well. I was having trouble though as I
know the left hand side column so for this example the Compression
portion then the colon : the remainder could be anything for instance
a description with lots of text and could also contain a additional
colon. At the shell prompt I could grep "Compression" | cut -d: -f2-
and that would grab everything...
So it sounds like I'm looking in the correct area but Im just not
pulling it all together. I will keep at it if you have additional info
to point to that would be great..and appreciated.
Thanks again.

split takes an optional second parameter <limit>, defining how many elements
the resuting array shall have at most

'foo : bar : baz'.split(/\s*:\s*/, 2)
=> [ "foo", "bar : baz" ]

-Thomas
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,763
Messages
2,569,563
Members
45,039
Latest member
CasimiraVa

Latest Threads

Top