J
Jeff
Hi,
I've got a text file that is multiple space delimited '\s{2,} The
columns in this file may contain spaces, for example, one column is
comprised of cities which may have names of multiple words, i.e., 'San
Jose'.
here's a sample from the file:
31-Jan-2006 11:43:50 PM 649504 1.189 Public Website
Frankfurt DTAG Deutsche Telekom Frankfurt 1
http://www.joedog.org/
31-Jan-2006 11:42:57 PM 649504 .5 Public Website
Dallas UUNET UUNET Dallas
1 http://www.joedog.org/
31-Jan-2006 11:42:08 PM 649504 .652 Public Website
Houston UUNET UUNET Houston
1 http://www.joedog.org/
31-Jan-2006 11:39:46 PM 649504 .435 Public Website
San Jose XO XO
San Jose 1
http://www.joedog.org/
31-Jan-2006 11:37:46 PM 649504 6.573 Public Website
Sydney Optus Optus Sydney
1 http://www.joedog.org/
31-Jan-2006 11:26:43 PM 649504 .666 Public Website
New York UUNET UUNET New York
1 http://www.joedog.org/
31-Jan-2006 11:25:49 PM 649504 1.241 Public Website
Stockholm Telia Telia Stockholm 1
http://www.joedog.org/
31-Jan-2006 11:22:44 PM 649504 .722 Public Website
Boston Sprint Sprint
Boston 1 http://www.joedog.org/
And here is my best match effort to date:
open(FILE, "<haha.dat") or die "can't open file";
while($line = <FILE>){
if($line =~
m/^(.+[AM|PM]+)\s{2,}([0-9]+)\s{2,}([0-9]*\.*[0-9]*)\s{2,}([a-zA-Z\s]+)\s{2,}([a-zA-Z\s]+)\s{2,}/){
print "1: |".$1."|\n";
print "2: |".$2."|\n";
print "3: |".$3."|\n";
print "4: |".$4."|\n";
print "5: |".$5."|\n";
}
}
That effort is pretty crappy, here are the results:
1: |31-Jan-2006 11:43:50 PM|
2: |649504|
3: |1.189|
4: |Public Website Frankfurt DTAG |
5: | |
1: |31-Jan-2006 11:42:57 PM|
2: |649504|
3: |.5|
4: |Public Website Dallas UUNET UUNET
Dallas|
5: | |
1: |31-Jan-2006 11:42:08 PM|
2: |649504|
3: |.652|
4: |Public Website Houston UUNET UUNET
|
5: |Houston |
1: |31-Jan-2006 11:39:46 PM|
2: |649504|
3: |.435|
4: |Public Website San Jose XO XO
|
5: |San Jose|
1: |31-Jan-2006 11:37:46 PM|
2: |649504|
3: |6.573|
4: |Public Website Sydney Optus Optus
Sydney|
5: | |
1: |31-Jan-2006 11:26:43 PM|
2: |649504|
3: |.666|
4: |Public Website New York UUNET UUNET
|
5: |New York |
1: |31-Jan-2006 11:25:49 PM|
2: |649504|
3: |1.241|
4: |Public Website Stockholm Telia Telia
Stockholm |
5: | |
1: |31-Jan-2006 11:22:44 PM|
2: |649504|
3: |.722|
4: |Public Website Boston Sprint Sprint
Boston |
5: | |
Any thoughts?
Jeff
I've got a text file that is multiple space delimited '\s{2,} The
columns in this file may contain spaces, for example, one column is
comprised of cities which may have names of multiple words, i.e., 'San
Jose'.
here's a sample from the file:
31-Jan-2006 11:43:50 PM 649504 1.189 Public Website
Frankfurt DTAG Deutsche Telekom Frankfurt 1
http://www.joedog.org/
31-Jan-2006 11:42:57 PM 649504 .5 Public Website
Dallas UUNET UUNET Dallas
1 http://www.joedog.org/
31-Jan-2006 11:42:08 PM 649504 .652 Public Website
Houston UUNET UUNET Houston
1 http://www.joedog.org/
31-Jan-2006 11:39:46 PM 649504 .435 Public Website
San Jose XO XO
San Jose 1
http://www.joedog.org/
31-Jan-2006 11:37:46 PM 649504 6.573 Public Website
Sydney Optus Optus Sydney
1 http://www.joedog.org/
31-Jan-2006 11:26:43 PM 649504 .666 Public Website
New York UUNET UUNET New York
1 http://www.joedog.org/
31-Jan-2006 11:25:49 PM 649504 1.241 Public Website
Stockholm Telia Telia Stockholm 1
http://www.joedog.org/
31-Jan-2006 11:22:44 PM 649504 .722 Public Website
Boston Sprint Sprint
Boston 1 http://www.joedog.org/
And here is my best match effort to date:
open(FILE, "<haha.dat") or die "can't open file";
while($line = <FILE>){
if($line =~
m/^(.+[AM|PM]+)\s{2,}([0-9]+)\s{2,}([0-9]*\.*[0-9]*)\s{2,}([a-zA-Z\s]+)\s{2,}([a-zA-Z\s]+)\s{2,}/){
print "1: |".$1."|\n";
print "2: |".$2."|\n";
print "3: |".$3."|\n";
print "4: |".$4."|\n";
print "5: |".$5."|\n";
}
}
That effort is pretty crappy, here are the results:
1: |31-Jan-2006 11:43:50 PM|
2: |649504|
3: |1.189|
4: |Public Website Frankfurt DTAG |
5: | |
1: |31-Jan-2006 11:42:57 PM|
2: |649504|
3: |.5|
4: |Public Website Dallas UUNET UUNET
Dallas|
5: | |
1: |31-Jan-2006 11:42:08 PM|
2: |649504|
3: |.652|
4: |Public Website Houston UUNET UUNET
|
5: |Houston |
1: |31-Jan-2006 11:39:46 PM|
2: |649504|
3: |.435|
4: |Public Website San Jose XO XO
|
5: |San Jose|
1: |31-Jan-2006 11:37:46 PM|
2: |649504|
3: |6.573|
4: |Public Website Sydney Optus Optus
Sydney|
5: | |
1: |31-Jan-2006 11:26:43 PM|
2: |649504|
3: |.666|
4: |Public Website New York UUNET UUNET
|
5: |New York |
1: |31-Jan-2006 11:25:49 PM|
2: |649504|
3: |1.241|
4: |Public Website Stockholm Telia Telia
Stockholm |
5: | |
1: |31-Jan-2006 11:22:44 PM|
2: |649504|
3: |.722|
4: |Public Website Boston Sprint Sprint
Boston |
5: | |
Any thoughts?
Jeff