Counting Tabs and splitting by that number

Nick Bo · Sep 28, 2008

Basically i have a document which I am opening and then i am reading
each line of the file and having to split it up into two arrays and then
into a hash in which i have to get some sort of output like this:

application/activemessage has no extensions
application/andrew-inset has extensions ez
application/applefile has no extensions
application/atom has extensions atom
application/atomcat+xml has extensions atomcat
application/atomicmail has no extensions
application/atomserv+xml has extensions atomsrv
application/batch-SMTP has no extensions
application/beep+xml has no extensions
application/cals-1840 has no extensions

I have determined that if there are no tabs in the document then the
file has no extension so what i did was an if statement in the beginning
to see if the line contained the tab if not then it would save false to
the position in the array that i was at in the each loop.

file.each_line do |line|
next if line[0] == ?#
next if line == "\n"
string = line
if string.include?("\t") == false
mimeValue = false
mimeKey=string.split
else

#THIS IS WHERE MY ISSUE IS NOW
mimeKey, mimeValue = string.split("\t\t\t")
end

My problem now that sometimes teh document is split by tabs changing in
number one line may have 3 tabs other may have 5 and one might just have
just 1. So I am in a rut now How do i determine how many tabs are in
the line(string variable) thus so i can split the two parts into their
appropriate arrays. I was thinking I could do some kind of recurssion
which would test to see if tab and if so then add 1 to count and then be
able to do something like

mimeKey, mimeValue = string.split(#{tabCount}*("\t"))

I know there is alot in my message so here is a summary:

HOW TO COUNT \t IN A STRING THEN SPLIT BY THAT NUMBER OF \t

Siep Korteling · Sep 28, 2008

Nick said:
Basically i have a document which I am opening and then i am reading (...)

I know there is alot in my message so here is a summary:

HOW TO COUNT \t IN A STRING THEN SPLIT BY THAT NUMBER OF \t

Split on \t anyway and dump all empty results, like this:

str = 'beep+xml\t\t\t atom'
res = str.split('\t').reject{|item|item.empty?}
p res

hth,

Siep

brabuhr · Sep 28, 2008

#THIS IS WHERE MY ISSUE IS NOW
mimeKey, mimeValue = string.split("\t\t\t")

My problem now that sometimes teh document is split by tabs changing in
number one line may have 3 tabs other may have 5 and one might just have
just 1.

mimeKey, mimeValue = string.split(#{tabCount}*("\t"))

I know there is alot in my message so here is a summary:

HOW TO COUNT \t IN A STRING THEN SPLIT BY THAT NUMBER OF \t

Your tabs are consecutive and you don't actually care how many there are?
string.split(/\t+/)
?

Nick Bo · Sep 29, 2008

incorrect if i do it that way then if i have 5 tabs in between the two
parts i want to separate then i get 4 blank arrays. giving me a total of
6 arrays.
eg = "abcdefg \t\t\t\t\t hi"
eg.split("\t) --> ["abcdefg ", "", "", "", " i"
eg.split("/\t+/) just gives me ["abcdefg \t\t\t\t\t i"] cause it dont
matche the pattern given to the split at all so it makes whole thing
part of the array.

Bill Kelly · Sep 29, 2008

From: "Nick Bo said:
eg = "abcdefg \t\t\t\t\t hi"
eg.split("\t) --> ["abcdefg ", "", "", "", " i"
eg.split("/\t+/) just gives me ["abcdefg \t\t\t\t\t i"] cause it dont
matche the pattern given to the split at all so it makes whole thing
part of the array.
Huh?

eg = "abcdefg \t\t\t\t\t hi" => "abcdefg \t\t\t\t\t hi"
eg.split(/\t+/)

Click to expand...

=> ["abcdefg ", " hi"]

Regards,

Bill

Nick Bo · Sep 29, 2008

Bill said:
From: "Nick Bo said:

eg = "abcdefg \t\t\t\t\t hi"
eg.split("\t) --> ["abcdefg ", "", "", "", " i"
eg.split("/\t+/) just gives me ["abcdefg \t\t\t\t\t i"] cause it dont
matche the pattern given to the split at all so it makes whole thing
part of the array.
Huh?

eg = "abcdefg \t\t\t\t\t hi" => "abcdefg \t\t\t\t\t hi"
eg.split(/\t+/)

Click to expand...

Click to expand...

=> ["abcdefg ", " hi"]

Regards,

Bill

it wouldnt give me the two, i so wish it did but i found a way around it
this is my solution and it works perfect
eg = "abcdefg \t\t\t\t\t\t hi"
splitArray = eg.split("\t")
splitArray = splitArray.delete("")

loop
arrayKey = splitArray[0]
arrayValue = splitArray[1]

Thanks for everyones help

Mark Thomas · Sep 29, 2008

it wouldnt give me the two, i so wish it did but i found a way around it
this is my solution and it works perfect
eg = "abcdefg \t\t\t\t\t\t hi"
splitArray = eg.split("\t")
splitArray = splitArray.delete("")

IMO, the regex solution is better

splitArray = eg.split(/\t+/)

I think you put it in quotes. Leave the quotes out.

-- Mark.

Sort by number of characters	1	Nov 2, 2023
Problem with displaying character that code number is 219 (after SetConsoleTextAttribute)?	3	Jan 9, 2023
Tabs -vs- Spaces: Tabs should have won.	95	Jul 16, 2011
Splitting Tree	2	Dec 2, 2012
How to fix ssl.SSLError: [SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:2570)	0	Jul 28, 2023
I would like to use awk to calculate the total number of records processed	1	Aug 25, 2022
Hash counting	21	Feb 2, 2009
I Need Help with making a function that draws in a canvas using location data.	1	Dec 17, 2021

Counting Tabs and splitting by that number

Nick Bo

Siep Korteling

brabuhr

Nick Bo

Bill Kelly

Nick Bo

Mark Thomas

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads