Newbie regexp question

J

James Calivar

Hello,

I'm trying to split a formatted text file into four separate columns.
The data is comprised of lines of text that are bundled into four
distinct columns, corresponding to a "Required versus Optional"
variable, a requirement number, a requirement classification (R1=Rev 1,
F=Future, I=Internal), and a textual description of the requirement.

My raw data looks like this in the input text file:

R [01] R1 The system shall support "emergency call processing"
R [02] R1 The system shall support "local call processing"
R [08] F The system shall provide a command-line user interface
R [723] F The system shall provide 6 10/100/1000 Ethernet interfaces
R [11] F The system shall support VoIP networks
R [398] R1 The system shall contain 2 control boards
O [327] I The system should support hotswapping of all internal boards
R [19] I The system shall be able to detect transmission errors
R [631] F The system shall continue processing data as long as a call is
active.

I've set up a loop to process each line in the input file, and what I'd
like to get is four separate variables containing on a line-by-line
basis the data corresponding to the four distinct columns. The problem
is my regexp experience is next to nothing, and I can't figure out how
to extract the data I want since my fourth column contains whitespace
(I'd have used that as my column separator otherwise).

Here's my loop:

File.open(textfile, "r") do |input_file|
while line = input_file.gets
output_file << line
end
end

What can I replace the simple copy statement (output_file << line) with
in order to get what I want?

Thanks in advance, I hope this question makes some sense.

James
 
M

Marcin Mielżyński

James said:
Hello,

I'm trying to split a formatted text file into four separate columns.
The data is comprised of lines of text that are bundled into four
distinct columns, corresponding to a "Required versus Optional"
variable, a requirement number, a requirement classification (R1=Rev 1,
F=Future, I=Internal), and a textual description of the requirement.

My raw data looks like this in the input text file:

R [01] R1 The system shall support "emergency call processing"
R [02] R1 The system shall support "local call processing"
R [08] F The system shall provide a command-line user interface
R [723] F The system shall provide 6 10/100/1000 Ethernet interfaces
R [11] F The system shall support VoIP networks
R [398] R1 The system shall contain 2 control boards
O [327] I The system should support hotswapping of all internal boards
R [19] I The system shall be able to detect transmission errors
R [631] F The system shall continue processing data as long as a call is
active.

try this one

open("file").read.scan(/(\w)\s+(.+?)\s+(\w+)\s+(.*?)\n?$/){|req,num,cls,dsc|
....}

lopex
 
M

Marcin Mielżyński

Marcin Mielżyński wrote:

Ooops,

the newline in regexp is not needed...
try this one

open("file").read.scan(/(\w)\s+(.+?)\s+(\w+)\s+(.*?)$/){|req,num,cls,dsc|
...}

lopex

lopex
 
J

James Edward Gray II

What can I replace the simple copy statement (output_file << line)
with
in order to get what I want?

My wife, Dana Gray, is still learning Ruby so I gave her this problem
as a test. ;) She suggests the code below.

James Edward Gray II

DATA.each do |line|
line =~ /^(\w)\s+(\S+)\s+(\S+)\s+(.+)/
p [$1, $2, $3, $4]
end

__END__
R [01] R1 The system shall support "emergency call processing"
R [02] R1 The system shall support "local call processing"
R [08] F The system shall provide a command-line user interface
R [723] F The system shall provide 6 10/100/1000 Ethernet interfaces
R [11] F The system shall support VoIP networks
R [398] R1 The system shall contain 2 control boards
O [327] I The system should support hotswapping of all internal boards
R [19] I The system shall be able to detect transmission errors
R [631] F The system shall continue processing data as long as a call
is active.
 
M

Mike Stok

Hello,

I'm trying to split a formatted text file into four separate columns.
The data is comprised of lines of text that are bundled into four
distinct columns, corresponding to a "Required versus Optional"
variable, a requirement number, a requirement classification
(R1=Rev 1,
F=Future, I=Internal), and a textual description of the requirement.

My raw data looks like this in the input text file:

R [01] R1 The system shall support "emergency call processing"
R [02] R1 The system shall support "local call processing"
R [08] F The system shall provide a command-line user interface
R [723] F The system shall provide 6 10/100/1000 Ethernet interfaces
R [11] F The system shall support VoIP networks
R [398] R1 The system shall contain 2 control boards
O [327] I The system should support hotswapping of all internal boards
R [19] I The system shall be able to detect transmission errors
R [631] F The system shall continue processing data as long as a
call is
active.

I've set up a loop to process each line in the input file, and what
I'd
like to get is four separate variables containing on a line-by-line
basis the data corresponding to the four distinct columns. The
problem
is my regexp experience is next to nothing, and I can't figure out how
to extract the data I want since my fourth column contains whitespace
(I'd have used that as my column separator otherwise).

Here's my loop:

File.open(textfile, "r") do |input_file|
while line = input_file.gets
output_file << line
end
end

What can I replace the simple copy statement (output_file << line)
with
in order to get what I want?

Thanks in advance, I hope this question makes some sense.

You have a number of options - if your data is tab delimited (i.e.
the first "two" coluumns are really one):

s = 'R [01] R1 The system shall support "emergency call processing"'
p s.split(/\t/)

=> ["R [01]", "R1", "The system shall support \"emergency call
processing\""]

or you can just split on whitespace and specify a limit on the number
of fields:

s = 'R [01] R1 The system shall support "emergency call processing"'
p s.split(/\s+/, 4)

=> ["R", "[01]", "R1", "The system shall support \"emergency call
processing\""]

Or you can use a regex (ick ;-)

Hope this helps,

Mike




--

Mike Stok <[email protected]>
http://www.stok.ca/~mike/

The "`Stok' disclaimers" apply.
 
S

Steven Hansen

I suck at regex too, I tried this as an exercise and came up with the
below. It's less concise than previous solutions, but it works as far
as I can tell:


Row = Struct.new:)col1, :col2, :col3, :col4)
rows = Array.new()
regex = /([A-Z])\s(\[[0-9]+\])\s([A-Z1-9]+)\s(.+)/

File.open("file.txt") do |file|
while (line = file.gets)
m = line.match(regex)
rows << Row.new(m[1], m[2], m[3], m[4])
end
end

puts rows.flatten

#output =>

#<struct Row col1="R", col2="[01]", col3="R1", col4="The system shall
support \"emergency call processing\"">
#<struct Row col1="R", col2="[02]", col3="R1", col4="The system shall
support \"local call processing\"">
#<struct Row col1="R", col2="[08]", col3="F", col4="The system shall
provide a command-line user interface">
#<struct Row col1="R", col2="[723]", col3="F", col4="The system shall
provide 6 10/100/1000 Ethernet interfaces">
#<struct Row col1="R", col2="[11]", col3="F", col4="The system shall
support VoIP networks">
#<struct Row col1="R", col2="[398]", col3="R1", col4="The system shall
contain 2 control boards">
#<struct Row col1="O", col2="[327]", col3="I", col4="The system should
support hotswapping of all internal boards">
#<struct Row col1="R", col2="[19]", col3="I", col4="The system shall be
able to detect transmission errors">
#<struct Row col1="R", col2="[631]", col3="F", col4="The system shall
continue processing data as long as a call is active.">


-Steven

James said:
Hello,

I'm trying to split a formatted text file into four separate columns.
The data is comprised of lines of text that are bundled into four
distinct columns, corresponding to a "Required versus Optional"
variable, a requirement number, a requirement classification (R1=Rev 1,
F=Future, I=Internal), and a textual description of the requirement.

My raw data looks like this in the input text file:

R [01] R1 The system shall support "emergency call processing"
R [02] R1 The system shall support "local call processing"
R [08] F The system shall provide a command-line user interface
R [723] F The system shall provide 6 10/100/1000 Ethernet interfaces
R [11] F The system shall support VoIP networks
R [398] R1 The system shall contain 2 control boards
O [327] I The system should support hotswapping of all internal boards
R [19] I The system shall be able to detect transmission errors
R [631] F The system shall continue processing data as long as a call is
active.

I've set up a loop to process each line in the input file, and what I'd
like to get is four separate variables containing on a line-by-line
basis the data corresponding to the four distinct columns. The problem
is my regexp experience is next to nothing, and I can't figure out how
to extract the data I want since my fourth column contains whitespace
(I'd have used that as my column separator otherwise).

Here's my loop:

File.open(textfile, "r") do |input_file|
while line = input_file.gets
output_file << line
end
end

What can I replace the simple copy statement (output_file << line) with
in order to get what I want?

Thanks in advance, I hope this question makes some sense.

James
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top