Extract variable length numbers (tab delimitered) from a string?

Thomas Andersson · Aug 3, 2010

As the topic says. I ahve a settings file where each line contains 2 numbers
of varying length and I want to extract each number and assign to a
variable, how would I go about that?

Justin C · Aug 3, 2010

As the topic says. I ahve a settings file where each line contains 2 numbers
of varying length and I want to extract each number and assign to a
variable, how would I go about that?

TMTOWTDI, here's one, it may not be a good one.

#!/usr/bin/perl

use strict;
use warnings;

while (<DATA>) {
chomp;
my ($x, $y) = split /\t/, $_;
printf("%s + %s = %s\n", $x, $y, $x + $y);
}

__DATA__
1 9999999999
20 999999999
300 99999999
4004 99999999999
50505 9
660066 0.00000001
7070707 999999.999
88088088 .99999999
999999999 1
9999999991 9999999999

How long are you numbers? Are they formatted?

Justin.

wolf · Aug 3, 2010

Thomas said:
As the topic says. I ahve a settings file where each line contains 2 numbers
of varying length and I want to extract each number and assign to a
variable, how would I go about that?

use SPLIT: split( /\s+/, $input) splits on any whitespace(s) including
tab(s). split( /\t/, $input) splits on every tab.

open (my $infile, '<', 'mynumbers.txt') or die;
my ($input, $number1, $number2);

while ($input = <$infile>) {
chomp $input;
($number1, $number2) = split( /\s+/, $input);
print $number1,'--',$number2,"\n";
}
close $infile;

J. Gleixner · Aug 3, 2010

wolf said:
use SPLIT: split( /\s+/, $input) splits on any whitespace(s) including
tab(s). split( /\t/, $input) splits on every tab.

open (my $infile, '<', 'mynumbers.txt') or die;

Always include the error message:

or die $!;

Better would be to include the file name and more helpful details:

or die "Can't open mynumbers.txt for read: $!";

my ($input, $number1, $number2);

Declare the variables in the smallest scope.

while ($input = <$infile>) {

while( my $input = said:
chomp $input;
($number1, $number2) = split( /\s+/, $input);

my ( $number1, $number2) = split( /\s+/, $input);

sln · Aug 3, 2010

use SPLIT: split( /\s+/, $input) splits on any whitespace(s) including
tab(s). split( /\t/, $input) splits on every tab.

open (my $infile, '<', 'mynumbers.txt') or die;
my ($input, $number1, $number2);

while ($input = <$infile>) {
chomp $input;
($number1, $number2) = split( /\s+/, $input);

Don't forget to validate $number(s) or you could run
into errors when doing stuff like
if $number1 == $number2

So after the split() it could be validated something like
$number1 =~ s/^\s+//;
$number1 =~ s/\s+$//;
if $number1 =~ /^[+-]?\d*?\.?\d+$/ # for non-exponent

Or it can all be done in one line
($number1, $number2) = $input =~ /\s*([+-]?\d*?\.?\d+)\s+([+-]?\d*?\.?\d+)/;

-sln

Thomas Andersson · Aug 4, 2010

Justin said:
TMTOWTDI, here's one, it may not be a good one.

How long are you numbers? Are they formatted?

2 variable length numbers that are tab delimiterd, someone suggest the split
function which worked perfect.

my ($cpid, $lproc) = split (/\t/, $pidlist);

wolf · Aug 4, 2010

use SPLIT: split( /\s+/, $input) splits on any whitespace(s) including
tab(s). split( /\t/, $input) splits on every tab.

open (my $infile, '<', 'mynumbers.txt') or die;
my ($input, $number1, $number2);

while ($input = <$infile>) {
chomp $input;
($number1, $number2) = split( /\s+/, $input);

Click to expand...

Don't forget to validate $number(s) or you could run
into errors when doing stuff like
if $number1 == $number2

So after the split() it could be validated something like
$number1 =~ s/^\s+//;
$number1 =~ s/\s+$//;
if $number1 =~ /^[+-]?\d*?\.?\d+$/ # for non-exponent

Or it can all be done in one line
($number1, $number2) = $input =~ /\s*([+-]?\d*?\.?\d+)\s+([+-]?\d*?\.?\d+)/;

-sln

Dear gents,

these are all valid arguments to improving the code

However, since the original poster didn't even know how to wield SPLIT,
i wanted to keep it as simple and un-confusing as possible,
demonstrating just SPLIT as the important thing to handle.
As such the code works well enough

Cheers, wolf

Thomas Andersson · Aug 4, 2010

wolf said:
these are all valid arguments to improving the code

However, since the original poster didn't even know how to wield
SPLIT, i wanted to keep it as simple and un-confusing as possible,
demonstrating just SPLIT as the important thing to handle.
As such the code works well enough

Thanks, I'm not just looking for solutions but to learn as well so the
simple solutions is best, the fancy stuff I can do later.

Ted Zlatanov · Aug 4, 2010

SP> Also keep in mind that "fancy" solutions are rarely the best. I've heard
SP> it said that debugging is twice as difficult as writing code, and that
SP> being the case, code that's written to the limit of one's abilities is
SP> by definition impossible to debug. In all but the most extreme cases,
SP> clarity and ease of maintenance are *far* more valuable over the long
SP> run than clever tricks.

OTOH a programmer that doesn't explore new techniques is not growing (I
mean technically; physically they grow with or without new techniques).
So there's value in exploration and cleverness.

Ted

Ted Zlatanov · Aug 5, 2010

SP> Certainly, exploring edge cases is part of the learning process. When
SP> it comes to production code, though, I find more value in simplicity.

I mean in production too. Today's clever code is tomorrow's standard.
For example the Guttman Rossler transform at first was clever and
confusing (I've had to explain it to beginners more than once);
gradually it became established and today Perl 6 does it internally by
default.

The threshold IMO is not "is it clever" but "will the maintainer 5 years
from now hunt me down and kill me if he goes crazy." I thus tend to
avoid Quantum::Superpositions, on-the-fly parser generators, and "ha ha"
comments in production code

Ted

Martijn Lievaart · Aug 6, 2010

Agreed - I *very* rarely need to resort to single-stepping, breakpoints,
and the like. It's nice to know that they're available though, for those
rare occasions when I do need them.

It also depends on the language. In Perl, I agree completely. But I also
have to code in VBA6 (shudder) and use the debugger all the time there.

Some reasons:
- Bugs in Perl are rare, in Perl modules rarer than in VBA6. In VBA6 they
are plenty
- Documentation of Perl and Perl modules is vastly better, in VBA6 I
often have to resort to try-it-out-and-figure-out-what-happens. And in
Perl you can at least look at the source of the modules, they are
documentation as well (and I do that a lot).
- The VBA6 debugger is vastly better than the perl debugger, making it
attractive to single step through new code to see if the code behaves the
way you thought it would.
- Perl is so much more powerful, you have to write a lot less code to
achieve the same goal. Less LOC, less bugs.

When I code in C or C++, I use the debugger more often than in Perl, but
still not very often.

M4

Uri Guttman · Aug 6, 2010

ML> - Perl is so much more powerful, you have to write a lot less code
ML> to achieve the same goal. Less LOC, less bugs.

that is also a good reason to promote perl. i mention it when i
can. fewer lines is always fewer bugs. some metrics have shown there is
usually 1 bug per hundred lines of code regardless of the language. so
a good perl hacker will have fewer bugs. and well designed perl has even
less.

uri

Uri Guttman · Aug 6, 2010

ML> - Perl is so much more powerful, you have to write a lot less code
ML> to achieve the same goal. Less LOC, less bugs.
SP> That's certainly a good guideline, but it's another one of those things
SP> that can bite you if you apply it blindly. Taken too far, it results in
SP> code that looks like JAPHs or golf - the bug count may be low, but it's
SP> still going to be a maintenance nightmare.

i did mention good design! of course any coding philosophy taken to an
extreme is bad. look at all the OO forever langs like java! can't get
any real work done since most of the time you are designing classes to
get around the restrictions in the language!

uri

Marc Girod · Aug 6, 2010

That Kernighan guy has a bit of a reputation amongst programmer types.

With all respect due, my own hubris would play Hall against Kernighan.
IMHO K's quote is a 'modern' one: top-down thinking.
The debugger is a very good way to learn, and to expand one's
understanding.
It is a postmodern tool.

Marc

wolf · Aug 6, 2010

Ted said:
SP> Certainly, exploring edge cases is part of the learning process. When
SP> it comes to production code, though, I find more value in simplicity.

I mean in production too. Today's clever code is tomorrow's standard.
For example the Guttman Rossler transform at first was clever and
confusing (I've had to explain it to beginners more than once);
gradually it became established and today Perl 6 does it internally by
default.

The threshold IMO is not "is it clever" but "will the maintainer 5 years
from now hunt me down and kill me if he goes crazy." I thus tend to
avoid Quantum::Superpositions, on-the-fly parser generators, and "ha ha"
comments in production code

Ted

.... and now I find myself learning about sorting algorythms to
understand the answer <G>.

love your comment, ted

cheers, wolf

Marc Girod · Aug 6, 2010

With all respect due, my own hubris would play Hall against Kernighan.

My name is Wall, Larry Wall...

Marc

Uri Guttman · Aug 6, 2010

SP> Certainly, exploring edge cases is part of the learning process. When
SP> it comes to production code, though, I find more value in simplicity.
w> ... and now I find myself learning about sorting algorythms to
w> understand the answer <G>.

it is well documented in Sort::Maker. in fact that module can generate
sorts in four (count'em four!) different styles so you can learn them
all.

uri

ccc31807 · Aug 6, 2010

my head), 40% coding and 20% debugging. and much of the debugging is
very easy stuff (at least to me). my take on the general coder
population is like 10% design (if that much), 40% coding and 50%
debugging. well it seems like that from what i see and hear. debugging
should be easy IMO if you design the code right. a given bug should be
quickly isolated to the area that handles that part of data. this brings
up the design philosophy of high isolation of modules. again, few adhere
to that idea so they have many places which could cause a given bug
thereby making debugging harder. i don't use a debugger, IDE or anything
but print and i get working code without pain. brains over typing!

At least part of this depends on the problem domain. The vast majority
of my work depends on user input, and almost all the 'bugs' that make
it past release occur with bad input. For example, an automated
process stopped working one day, and after quite a while debugging, I
took a look at the input data and discovered that a user had set a
password beginning with '#' -- the code skipped all lines as comments
beginning with # so it treated the password as a comment and couldn't
continue because the password field was blank.

It's very difficult to foresee all the ways that users can enter data.
Even if the user is a program you can't necessarily depend on it. I
sometimes get data generated by a database that consolidates blank
fields, which is a problem if you are using split() or a RE to
destructure input data.

Then again, some people don't classify user errors as bugs, but user
errors act just like bugs.

CC.

Peter J. Holzer · Aug 7, 2010

*Please* dont tell me you're doing eval($password)...?!?!?

How did you get that idea? "#" is a very common comment character in
config files and code like

while (<$config_h>) {
chomp;
s/#.*//;
my ($key, $value) = split(/\s*=\s*/, $_, 2);
next unless $key;
...
}

is a common way to ignore the comments. And now you have a config file
like this:

# this is a config file for the foo application
#
# first define the database connection:
dsn = dbi:Oracle:ORCL
user = scott
password = tiger#4
# then some ui preferences
background = green
...

the password will be truncated from "tiger#4" to "tiger". Oops.

Obviously that's not exactly what happened in Carter's case, because he
mentioned only "lines beginning with #", so he probably has an even simpler
file format where the password stands alone in a line. Maybe something
like "first line is the user name, second line is the password".

hp

ccc31807 · Aug 7, 2010

How did you get that idea? "#" is a very common comment character in
config files and code like

while (<$config_h>) {
chomp;
s/#.*//;
my ($key, $value) = split(/\s*=\s*/, $_, 2);
next unless $key;
...
}

is a common way to ignore the comments. And now you have a config file
Obviously that's not exactly what happened in Carter's case, because he
mentioned only "lines beginning with #", so he probably has an even simpler

The line of code was:

next if /^#/;

The input file looks like this:

#User's name
username=Joe
#User's password
password=secret

The input file is constructed from user input reading from a script
like this:

print "Enter your user name: ";
chomp($username = <STDIN>);
print "Enter your password: ";
chomp($password = <STDIN>);

My 'fix' was to make a double sharp (##) the comment character. Some
day, a user will enter a value beginning with a double sharp, and that
will be another 'bug' that testing didn't uncover.

CC.

Hello guys ! How do I convert a string from an array into numbers ? Javascript	3	Dec 19, 2022
Variable length lookbehind not implemented	19	Aug 21, 2013
Copy string from 2D array to a 1D array in C	1	Nov 1, 2023
Measuring a string of text	1	Sep 15, 2022
FAQ 4.34 How do I extract selected columns from a string?	0	Apr 27, 2011
Problem Splitting Text String	2	Dec 29, 2022
To extract numbers from files with Perl	4	Nov 11, 2007
Getting value of instances of variable.	1	Mar 25, 2023

Extract variable length numbers (tab delimitered) from a string?

Thomas Andersson

Justin C

wolf

J. Gleixner

sln

Thomas Andersson

wolf

Thomas Andersson

Ted Zlatanov

Ted Zlatanov

Martijn Lievaart

Uri Guttman

Uri Guttman

Marc Girod

wolf

Marc Girod

Uri Guttman

ccc31807

Peter J. Holzer

ccc31807

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads