Counting Tabs and splitting by that number

N

Nick Bo

Basically i have a document which I am opening and then i am reading
each line of the file and having to split it up into two arrays and then
into a hash in which i have to get some sort of output like this:

application/activemessage has no extensions
application/andrew-inset has extensions ez
application/applefile has no extensions
application/atom has extensions atom
application/atomcat+xml has extensions atomcat
application/atomicmail has no extensions
application/atomserv+xml has extensions atomsrv
application/batch-SMTP has no extensions
application/beep+xml has no extensions
application/cals-1840 has no extensions

I have determined that if there are no tabs in the document then the
file has no extension so what i did was an if statement in the beginning
to see if the line contained the tab if not then it would save false to
the position in the array that i was at in the each loop.

file.each_line do |line|
next if line[0] == ?#
next if line == "\n"
string = line
if string.include?("\t") == false
mimeValue = false
mimeKey=string.split
else

#THIS IS WHERE MY ISSUE IS NOW
mimeKey, mimeValue = string.split("\t\t\t")
end

My problem now that sometimes teh document is split by tabs changing in
number one line may have 3 tabs other may have 5 and one might just have
just 1. So I am in a rut now How do i determine how many tabs are in
the line(string variable) thus so i can split the two parts into their
appropriate arrays. I was thinking I could do some kind of recurssion
which would test to see if tab and if so then add 1 to count and then be
able to do something like

mimeKey, mimeValue = string.split(#{tabCount}*("\t"))

I know there is alot in my message so here is a summary:

HOW TO COUNT \t IN A STRING THEN SPLIT BY THAT NUMBER OF \t
 
S

Siep Korteling

Nick said:
Basically i have a document which I am opening and then i am reading (...)

I know there is alot in my message so here is a summary:

HOW TO COUNT \t IN A STRING THEN SPLIT BY THAT NUMBER OF \t

Split on \t anyway and dump all empty results, like this:

str = 'beep+xml\t\t\t atom'
res = str.split('\t').reject{|item|item.empty?}
p res

hth,

Siep
 
B

brabuhr

#THIS IS WHERE MY ISSUE IS NOW
mimeKey, mimeValue = string.split("\t\t\t")

My problem now that sometimes teh document is split by tabs changing in
number one line may have 3 tabs other may have 5 and one might just have
just 1.

mimeKey, mimeValue = string.split(#{tabCount}*("\t"))

I know there is alot in my message so here is a summary:

HOW TO COUNT \t IN A STRING THEN SPLIT BY THAT NUMBER OF \t


Your tabs are consecutive and you don't actually care how many there are?
string.split(/\t+/)
?
 
N

Nick Bo

incorrect if i do it that way then if i have 5 tabs in between the two
parts i want to separate then i get 4 blank arrays. giving me a total of
6 arrays.
eg = "abcdefg \t\t\t\t\t hi"
eg.split("\t) --> ["abcdefg ", "", "", "", " i"
eg.split("/\t+/) just gives me ["abcdefg \t\t\t\t\t i"] cause it dont
matche the pattern given to the split at all so it makes whole thing
part of the array.
 
B

Bill Kelly

From: "Nick Bo said:
eg = "abcdefg \t\t\t\t\t hi"
eg.split("\t) --> ["abcdefg ", "", "", "", " i"
eg.split("/\t+/) just gives me ["abcdefg \t\t\t\t\t i"] cause it dont
matche the pattern given to the split at all so it makes whole thing
part of the array.
Huh?
eg = "abcdefg \t\t\t\t\t hi" => "abcdefg \t\t\t\t\t hi"
eg.split(/\t+/)
=> ["abcdefg ", " hi"]


Regards,

Bill
 
N

Nick Bo

Bill said:
From: "Nick Bo said:
eg = "abcdefg \t\t\t\t\t hi"
eg.split("\t) --> ["abcdefg ", "", "", "", " i"
eg.split("/\t+/) just gives me ["abcdefg \t\t\t\t\t i"] cause it dont
matche the pattern given to the split at all so it makes whole thing
part of the array.
Huh?
eg = "abcdefg \t\t\t\t\t hi" => "abcdefg \t\t\t\t\t hi"
eg.split(/\t+/)
=> ["abcdefg ", " hi"]


Regards,

Bill

it wouldnt give me the two, i so wish it did but i found a way around it
this is my solution and it works perfect
eg = "abcdefg \t\t\t\t\t\t hi"
splitArray = eg.split("\t")
splitArray = splitArray.delete("")

loop
arrayKey = splitArray[0]
arrayValue = splitArray[1]

Thanks for everyones help
 
M

Mark Thomas

it wouldnt give me the two, i so wish it did but i found a way around it
this is my solution and it works perfect
eg = "abcdefg \t\t\t\t\t\t hi"
splitArray = eg.split("\t")
splitArray = splitArray.delete("")

IMO, the regex solution is better

splitArray = eg.split(/\t+/)

I think you put it in quotes. Leave the quotes out.

-- Mark.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,020
Latest member
GenesisGai

Latest Threads

Top