Help: Efficient regular expression

D

Divya Badrinath

string = "root 14051 14033 3 08:39 pts/2 00:00:00 /bin/bash"

i need to fetch 14051 and /bin/bash from the string

can someone help me to write an efficient regular expression for that.

i am a beginner, i wrote
string =~ /(\d+)\s+(\d+)\s+\d+\s+\d+:\d+\s+.*\s+\d+:\d+:\d+\s+(.*)\s/

i know this is not the efficient way of doing it.

Please help.
 
D

Divya Badrinath

Divya said:
string = "root 14051 14033 3 08:39 pts/2 00:00:00 /bin/bash"

i need to fetch 14051 and /bin/bash from the string

i mean i need the 2nd column and the last column.
 
J

James Edward Gray II

i mean i need the 2nd column and the last column.

cols = string.split
sec, last = cols.values_at(1, -1)

Hope that helps.

James Edward Gray II
 
F

Florian Aßmann

Florian said:
Hi Divya, use
=20
string[/\s(\d+)/, 1]
=20
see String.[]
=20
Regards
Florian
=20
=20
pid =3D string[/\s(\d+)/, 1]
cmd =3D string[/\s(\S+)$/, 1] # is missing
 
D

Divya Badrinath

Florian said:
Hi Divya, use

string[/\s(\d+)/, 1]

see String.[]

Regards
Florian

string = "root 14051 14033 3 08:39 pts/2 00:00:00 /bin/bash"
with this,
string =~ /(\d+)\s+(\d+)\s+\d+\s+\d+:\d+\s+.*\s+\d+:\d+:\d+\s+(.*)\s/

$1 gives me 14051
and
$3 gives me /bin/bash

what i am trying to do is to get $1 and $3 into a hash.
 
K

Kyle Schmitt

I love regex, so it hurts me to say it, there are other ways of solving this ;)

for instance:

string = "root 14051 14033 3 08:39 pts/2 00:00:00 /bin/bash"
number = string.split[1]
program=string.split.last


now regexes!

string = "root 14051 14033 3 08:39 pts/2 00:00:00 /bin/bash"
number=string[/[0-9]+/]
program=string[/[a-z\/]+$/]

You know you can get values out of an array with the [] operator.
Well you can get strings out of strings that same way, and it works
with regexes!

string[/[0-9]+/] will return the first match of 1 or more numbers

Here's the magic use [ ] inside of a regular expression to create your
own groups. Individual characters in there are included in the group,
and ranges may be included using the -. so a-b is
abcdefghijklmnopqrstuvwxyz.
The + afterwards means 1 or more times.
What if you want _exactly 5 consecutive numbers? use the {}
string[/[0-9]{5}/]
ranges also work here
string[/[0-9]{3-5}/] would match 3, 4 or 5 digit numbers

and
string[/[a-z\/]+$/] will match a text string containing the forward
slash at the end. The $ is a special char to represent the end of a
line, and since / is a special char itself, it needed to be escaped
with a \.

BUT it could even be easier.
the [] groups, can be negative!
/[^a]*/ would match any string that did not have an a in it
/[^ ]*/ would match any string that did not have a space in it...soo
string[/[^ ]+$/] would be a good way to get the last bit.
 
F

Florian Aßmann

Divya said:
string =3D "root 14051 14033 3 08:39 pts/2 00:00:00 /bin/bash"
=20
i need to fetch 14051 and /bin/bash from the string
=20
can someone help me to write an efficient regular expression for that.=
=20
i am a beginner, i wrote
string =3D~ /(\d+)\s+(\d+)\s+\d+\s+\d+:\d+\s+.*\s+\d+:\d+:\d+\s+(.*)\s/=
=20
i know this is not the efficient way of doing it.
=20
Please help.
=20
talking about efficient, I was just curious...

#!/usr/bin/env ruby -w
#
# Created by Florian A=C3=9Fmann on 2007-07-10.
# Copyright (c) 2007. All rights reserved.

string =3D "root 14051 14033 3 08:39 pts/2 00:00:00 /bin/bash"
require 'profiler'

puts <<-EOS

pid =3D string[/\s(\d+)/, 1]
cmd =3D string[/\s(\S+)$/, 1]

EOS
Profiler__::start_profile

10000.times do
pid =3D string[/\s(\d+)/, 1]
cmd =3D string[/\s(\S+)$/, 1]
end

Profiler__::stop_profile
Profiler__::print_profile STDOUT

puts <<-EOS

cols =3D string.split
sec, last =3D cols.values_at(1, -1)

EOS
Profiler__::start_profile

10000.times do
cols =3D string.split
sec, last =3D cols.values_at(1, -1)
end

Profiler__::stop_profile
Profiler__::print_profile STDOUT

puts <<-EOS

number =3D string.split[1]
program =3D string.split.last

EOS
Profiler__::start_profile

10000.times do
number =3D string.split[1]
program =3D string.split.last
end

Profiler__::stop_profile
Profiler__::print_profile STDOUT

*grin*

Florian
 
R

Robert Dober

cols = string.split
sec, last = cols.values_at(1, -1)
Very interesting James, I seem to be rather extreme and

sec, last = string.split.values_at(1, -1)
might be a tad to long for one line in your style, however Ruby syntax
just supports this marvelous syntax :)

sec, last = string.split.
values_at(1, -1)

Robert
 
F

Florian Aßmann

Ok, it was hard to beat Edward, but at least building the simplest
regular expression to do somthing like a String.split seems to faster:

#!/usr/bin/env ruby -w
#
# Created by Florian A=C3=9Fmann on 2007-07-10.
# Copyright (c) 2007. All rights reserved.

string =3D "root 14051 14033 3 08:39 pts/2 00:00:00 /bin/bash"
require 'profiler'

puts <<-EOS

pid_rx =3D /\s(\d+)/
cmd_rx =3D /\s(\S+)$/
pid, cmd =3D string[pid_rx, 1], string[cmd_rx, 1]

EOS
Profiler__::start_profile

pid_rx =3D /\s(\d+)/
cmd_rx =3D /\s(\S+)$/
100000.times do
pid, cmd =3D string[pid_rx, 1], string[cmd_rx, 1]
end

Profiler__::stop_profile
Profiler__::print_profile STDOUT

puts <<-EOS

pid, cmd =3D string.split.values_at(1, -1)

EOS
Profiler__::start_profile

100000.times do
pid, cmd =3D string.split.values_at(1, -1)
end

Profiler__::stop_profile
Profiler__::print_profile STDOUT

puts <<-EOS

rx =3D Regexp.new('\S+\s(\d+).*\s(\S+$)')
pid, cmd =3D rx.match(string).values_at( 1, -1 )

EOS
Profiler__::start_profile

rx =3D Regexp.new('\S+\s(\d+).*\s(\S+$)')
100000.times do
pid, cmd =3D rx.match(string).values_at( 1, -1 )
end

Profiler__::stop_profile
Profiler__::print_profile STDOUT

puts <<-EOS

rx =3D Regexp.new('(\S+)')
pid, cmd =3D rx.match(string).values_at( 1, -1 )

EOS
Profiler__::start_profile

rx =3D Regexp.new('(\S+)')
100000.times do
pid, cmd =3D rx.match(string).values_at( 1, -1 )
end

Profiler__::stop_profile
Profiler__::print_profile STDOUT


Sincerely
Florian
 
G

Gregory Brown

Ooooh fun.
so are you going to announce the winner ;)

It's just profiler code, you can run it yourself... But on my machine:

pid = string[/ (d+)/, 1]
cmd = string[/ (S+)$/, 1]

% cumulative self self total
time seconds seconds calls ms/call ms/call name
58.26 0.67 0.67 1 670.00 1150.00 Integer#times
41.74 1.15 0.48 20000 0.02 0.02 String#[]
0.00 1.15 0.00 1 0.00 1150.00 #toplevel

cols = string.split
sec, last = cols.values_at(1, -1)

% cumulative self self total
time seconds seconds calls ms/call ms/call name
66.67 0.70 0.70 1 700.00 1050.00 Integer#times
18.10 0.89 0.19 10000 0.02 0.02 String#split
15.24 1.05 0.16 10000 0.02 0.02 Array#values_at
0.00 1.05 0.00 1 0.00 1050.00 #toplevel

number = string.split[1]
program = string.split.last

% cumulative self self total
time seconds seconds calls ms/call ms/call name
61.70 1.16 1.16 1 1160.00 1880.00 Integer#times
23.94 1.61 0.45 20000 0.02 0.02 String#split
8.51 1.77 0.16 10000 0.02 0.02 Array#last
5.85 1.88 0.11 10000 0.01 0.01 Array#[]
0.00 1.88 0.00 1 0.00 1880.00 #toplevel
 
G

Gregory Brown

Very interesting James, I seem to be rather extreme and

sec, last = string.split.values_at(1, -1)
might be a tad to long for one line in your style, however Ruby syntax
just supports this marvelous syntax :)

sec, last = string.split.
values_at(1, -1)

What is your terminal width, 30?
 
D

Divya Badrinath

Robert said:
Very interesting James, I seem to be rather extreme and

sec, last = string.split.values_at(1, -1)
might be a tad to long for one line in your style, however Ruby syntax
just supports this marvelous syntax :)

sec, last = string.split.
values_at(1, -1)

Robert

cmd = string[/\s(\S+)$/, 1]
doesnt fetch me anything:)

program=string.split.last
what if
string = "root 14051 14033 3 08:39 pts/2 00:00:00 /bin/bash -x
-s"
it fetches only -s for me.
sec, last = string.split.values_at(1, -1)
doesnt work for the same reason
i need everything after 00.00.00 till the end
i.e., /bin/bash -x -s

program=string[/[a-z\/]+$/]
the command column mauy start with character. i dont want to limit it in
my regexp. it has to be generic.

with all your comments, i tried
pid = run_process[/\s(\d+)/, 1]
cmd = run_process[/:\d+:\d+\s(\S.*)\s$/, 1]

is there any other way?
 
P

Paolo Negri

sorry for being OT since I'm not going to talk about ruby or regexp

If the string you're parsing is an output from the ps command you can
simplify your life using the -o option that prints only the fields you
need.

I.E. in gnu Linux

ps -ao pid,command

just outputs pid and command columns.
Be careful since the command column can contain spaces.


Paolo
 
D

Divya Badrinath

Paolo said:
sorry for being OT since I'm not going to talk about ruby or regexp

If the string you're parsing is an output from the ps command you can
simplify your life using the -o option that prints only the fields you
need.

I.E. in gnu Linux

ps -ao pid,command

just outputs pid and command columns.
Be careful since the command column can contain spaces.


Paolo

i saw that too. But i can not use all the options in a ps command where
i am using.
i am limited to using ps -aef

i need to take care of fetching the stuff i need using from this result.
 
T

Todd Benson

Robert said:
Very interesting James, I seem to be rather extreme and

sec, last = string.split.values_at(1, -1)
might be a tad to long for one line in your style, however Ruby syntax
just supports this marvelous syntax :)

sec, last = string.split.
values_at(1, -1)

Robert

cmd = string[/\s(\S+)$/, 1]
doesnt fetch me anything:)

program=string.split.last
what if
string = "root 14051 14033 3 08:39 pts/2 00:00:00 /bin/bash -x
-s"
it fetches only -s for me.
sec, last = string.split.values_at(1, -1)
doesnt work for the same reason
i need everything after 00.00.00 till the end
i.e., /bin/bash -x -s

program=string[/[a-z\/]+$/]
the command column mauy start with character. i dont want to limit it in
my regexp. it has to be generic.

with all your comments, i tried
pid = run_process[/\s(\d+)/, 1]
cmd = run_process[/:\d+:\d+\s(\S.*)\s$/, 1]

is there any other way?

It's not fancy, but I'll throw it in:

s = 'root 14051 14033 3 08:39 pts/2 00:00:00 /bin/bash -x -s'
_, pid, _, cmd = *(s.match /(\d+)\s.*:)\d+){2}\s(.*?)$/)

so, if you're using a hash like I think you might be:

s = <output of ps command>
h = {}
s.each_line do |line|
_, pid, _, cmd = *(line.match /(\d+)\s.*:)\d+){2}\s(.*?)$/)
h[pid] = cmd
end

I think that should work.

Todd

_, pid
 
R

Rob Biedenharn

Just limit the split and you should go with command arguments...

...unless the process start time is one of the output columns and it =20
goes from 'HH:MM' to 'Mon dd' for a process that runs long enough.

If you really can't change the ps options, suck it up, count columns, =20=

forget the regexp, and be done.

-Rob

Rob Biedenharn http://agileconsultingllc.com
(e-mail address removed)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top