MS SQL geek wants to jump ship, plz help on first Perl script

C

clone.of.snake

I've struggled a few days on whether or not to shamefully send this
message to beg for help. But time is limited, so here goes.

I'm a MS SQL Server DB Analyst working on backend CRM and financial
systems, our company was bought out by an internet company which is
mainly a FreeBSD & OSS. Now, we're slowly but steadily being migrated
to MySQL and Oracle. So, as u can imagine, if I don't jump ship and
aquire new skillset, I'm sure to be laid off.

We now have a data quality project to cleans the data before we migrate
to Oracle, what I need to do, is to import multiple excel xls into SQL
Svr. I want to take this chance to write my first Perl script, but I
need a little pointers from you experts.

Basically, the script just need to look into a directory, find all the
filenames and put them in an array. Then it loops through each
filename, substitute it in the "source filepath" value below:

[Source Filepath]
value=C:\$perl_xls_filename_variable

[Destination Server]
value=WINSQL2K

[Destination Database]
value=Pubs

[Destination Table]
value=pubs.dbo.BcpText

This will be an ini file that will be used by a DTS package for
importing data. The perl script will call this DTS package. So 1)
change the source filepath, 2) call DTS package, LOOP 1) change to next
source filepath, 2) call DTS package... etc.

Now, I'm not asking you to write the script for me. I have access to
O'Reilly's Learning Perl one Safari online. I'm reading through it
right now, but since this project is due next week, I need to jump
through the chapters and get it done ASAP. Can someone please point me
to the right chapters or right functions to read / use? (Mainly the
"open this direction" and "get all the filenames" part.)

Thanks for any help...
Nick
 
P

Paul Lalli

Basically, the script just need to look into a directory, find all the
filenames and put them in an array. Then it loops through each
filename, substitute it in the "source filepath" value below:

[Source Filepath]
value=C:\$perl_xls_filename_variable

[Destination Server]
value=WINSQL2K

[Destination Database]
value=Pubs

[Destination Table]
value=pubs.dbo.BcpText

This will be an ini file that will be used by a DTS package for
importing data. The perl script will call this DTS package. So 1)
change the source filepath, 2) call DTS package, LOOP 1) change to next
source filepath, 2) call DTS package... etc.

Now, I'm not asking you to write the script for me. I have access to
O'Reilly's Learning Perl one Safari online. I'm reading through it
right now, but since this project is due next week, I need to jump
through the chapters and get it done ASAP. Can someone please point me
to the right chapters or right functions to read / use? (Mainly the
"open this direction" and "get all the filenames" part.)

Open directory:
perldoc -f opendir
Get filenames:
perldoc -f readdir
Open each file:
perldoc -f open
Read each line of the file:
perldoc -f readline
Make the appropriate substitution:
perldoc perlretut
Print results:
perldoc -f print

Paul Lalli
 
M

Mark Clements

Basically, the script just need to look into a directory, find all the
filenames and put them in an array. Then it loops through each
filename, substitute it in the "source filepath" value below:

[Source Filepath]
value=C:\$perl_xls_filename_variable

[Destination Server]
value=WINSQL2K

[Destination Database]
value=Pubs

[Destination Table]
value=pubs.dbo.BcpText

This will be an ini file that will be used by a DTS package for
importing data. The perl script will call this DTS package. So 1)
change the source filepath, 2) call DTS package, LOOP 1) change to next
source filepath, 2) call DTS package... etc.

Now, I'm not asking you to write the script for me. I have access to

There are modules available on CPAN that will read and write ini files.

http://search.cpan.org

Config::Simple
Config::INI::Simple
Config::Auto

may be worth a look. Note that I haven't used any of these.

Mark
 
B

Bart Lateur

Can someone please point me
to the right chapters or right functions to read / use? (Mainly the
"open this direction" and "get all the filenames" part.)

The easiest way to achieve that task is just to use a fileglob. See

<http://perldoc.perl.org/functions/glob.html>

for a minimal description. Hmm, an, example might be clearer:

@files = glob "c:/*.bat";

That'll list all the batch files in the C:\ directory -- I hope there
are some. It returns paths in the same format you supplied, thus "./*pl"
will return things like "./myscript.pl".
 
H

Henry Law

This will be an ini file that will be used by a DTS package for
importing data. The perl script will call this DTS package. So 1)
change the source filepath, 2) call DTS package, LOOP 1) change to next
source filepath, 2) call DTS package... etc.

Now, I'm not asking you to write the script for me.

OK, noted. And I'm not doing so. But working from something that's
almost what you want can help you learn, so I'll risk the masters' ire
by writing a complete program, which contains comments I hope will help
you. It runs clean but you'll need to add lots of bits to it. Comments
on my style always accepted gratefully.

#! /usr/bin/perl
# Yes, convention has us put it in even though you're running on
# Windows. Win would ignore "C:\Perl\perl.exe" in any case.

use strict;
use warnings;

# The most straightforward way is to pass the directory name
# in on the command line
#
# nick.pl C:\Foo

my $dir_name = shift; # Look up "shift"!

exit 0 unless defined $dir_name;

# Let's make sure the user didn't try to fool us
unless (-d $dir_name) { # Look up -d, -f etc
print STDERR "Ha ha, try again ... '$dir_name' isn't a directory\n";
exit 0;
}

# I like to open the directory specifically and read elements out of it.
# Look up "glob" and the use of an array as an alternative.

opendir DIRECT,$dir_name
or die "Bad things happened trying to open '$dir_name':$!";

# Now read the entries one by one and do your stuff
my $file_count = 0; # Let's count them, why not?
while (my $file= readdir DIRECT) {
next if $file =~ /^\.{1,2}/; # Ignore "." and ".."

$file_count++;

# Open a handle for writing the ini file
open INI,">dts.ini" or die "Broke opening 'dts.ini': $!";
# If the file name has to vary each time then you'll need
# to construct the name before issuing "open".

# Now write the INI file. A "heredoc" will be most
# convenient
print INI <<ENDINI;
[Source Filepath]
value=C:\$file

[Destination Server]
value=WINSQL2K

[Destination Database]
value=Pubs

[Destination Table]
value=pubs.dbo.BcpText
ENDINI

# Close the INI file so it can be used
close INI;

# Now invoke DTS; I've no idea what it does so you'll have to
# validate this bit and probably re-write it
print STDERR "Processing $file\n";
my $DTS_return = `dts -foo -bar`;

# $DTS_return will contain any console output for DTS, which
# you need to check, maybe along these lines
if ($DTS_return =~ /some error text/) {
print STDERR "Bad return from dts\nDTS_return\n";
}
# Or maybe "nothing" is the success criterion, in which case
if ($DTS_return) {
print STDERR "Bad stuff from dts:\n$DTS_return\n";
} else {
print STDERR "dts worked, hooray\n";
}
}

# Clean up and sign off
closedir DIRECT;
print STDERR "Finished: $file_count files processed\n";
 
P

Paul Lalli

Henry said:
Comments on my style always accepted gratefully.

Just remember, you asked for it. :p
#! /usr/bin/perl
# Yes, convention has us put it in even though you're running on
# Windows. Win would ignore "C:\Perl\perl.exe" in any case.

use strict;
use warnings;

# The most straightforward way is to pass the directory name
# in on the command line
#
# nick.pl C:\Foo

my $dir_name = shift; # Look up "shift"!

exit 0 unless defined $dir_name;

Wouldn't you rather tell the user *why* the program suddenly stopped
without doing anything?

die "Usage: $0 dir_name\n" unless defined $dir_name;
# Let's make sure the user didn't try to fool us
unless (-d $dir_name) { # Look up -d, -f etc
print STDERR "Ha ha, try again ... '$dir_name' isn't a directory\n";
exit 0;
}

1) an exit code of 0 traditionally means "exited with success", which
is clearly not what's happening here.
2) An exit message printed to STDERR is more compactly and
traditionally printed with die()

die "Ha, ha, try again ... '$dir_name' isn't a directory\n" unless -d
$dir_name;
# I like to open the directory specifically and read elements out of it.
# Look up "glob" and the use of an array as an alternative.

opendir DIRECT,$dir_name

Use lexical filehandles rather than global barewords. They're subject
to 'use strict'; they're lexically defined so you can't accidentally
use the same handle in two bits of code 5000 lines apart; they're
automatically closed when they go out of scope.

opendir my $DIRECT, $dir_name
or die "Bad things happened trying to open '$dir_name':$!";

# Now read the entries one by one and do your stuff
my $file_count = 0; # Let's count them, why not?
while (my $file= readdir DIRECT) {
next if $file =~ /^\.{1,2}/; # Ignore "." and ".."

This ignores *any* file that starts with a period.

next if $file eq '.' or $file eq '..';
#or...
next if $file =~ /^\..?$/;
$file_count++;

# Open a handle for writing the ini file
open INI,">dts.ini" or die "Broke opening 'dts.ini': $!";

In addition to the lexical directory handle above, when opening file
handles, you should use the three argument form of open. It won't make
any difference in this case, but it's a good habbit to get into.
Eventually, you'll write a program in which you ask the user for a
filename, and then try to open that file. Saying:

open my $fh, '>', $file or die $!;
as opposed to
open my $fh, ">$file" or die $!;

prevents the user from causing massive damage by entering a filename of
foo ; rm -rf *
# If the file name has to vary each time then you'll need
# to construct the name before issuing "open".

# Now write the INI file. A "heredoc" will be most
# convenient
print INI <<ENDINI;
[Source Filepath]
value=C:\$file

When you don't put quotes around the Heredoc terminator, the heredoc is
treated as a double-quoted string. That means you need to take the
same precautions as you would in any other double-quoted string - such
as escaping a backslash. The above would litterally print 'value =
C:\$file' rather than 'value = C:\dts.ini'.

value = C:\\$file
[Destination Server]
value=WINSQL2K

[Destination Database]
value=Pubs

[Destination Table]
value=pubs.dbo.BcpText
ENDINI

# Close the INI file so it can be used
close INI;

# Now invoke DTS; I've no idea what it does so you'll have to
# validate this bit and probably re-write it
print STDERR "Processing $file\n";

'print STDERR' is more compactly written 'warn'

Constructive Criticism has now been given. Comments welcome.

Paul Lalli
 
H

Henry Law

Paul said:
Wouldn't you rather tell the user *why* the program suddenly stopped
without doing anything?
Yes, indeed. Laziness, but ...
die "Usage: $0 dir_name\n" unless defined $dir_name;

I don't like "die" because it emits all sorts of stuff about line
numbers and so forth, which for a non-technical user is untidy. Same
for "warn". If it's something that should show at the first run of the
program - something that the programmer will see immediately - I'll use
"die". Or for shorthand, as elsewhere in this program.
1) an exit code of 0 traditionally means "exited with success", which
is clearly not what's happening here.

Yes. Been writing too many subroutines, which tend to emit FALSE (which
I usually code as zero unless there's some reason) when they fail.
2) An exit message printed to STDERR is more compactly and
traditionally printed with die()
But see my personal dislike of "die" as above.
Use lexical filehandles rather than global barewords. They're subject
to 'use strict'; they're lexically defined so you can't accidentally
use the same handle in two bits of code 5000 lines apart; they're
automatically closed when they go out of scope.

opendir my $DIRECT, $dir_name

Aha; that one's news to me and I can see the benefits. I'll do that in
future.
This ignores *any* file that starts with a period.

Oops, yes.
In addition to the lexical directory handle above, when opening file
handles, you should use the three argument form of open. It won't make
any difference in this case, but it's a good habbit to get into.
Eventually, you'll write a program in which you ask the user for a
filename, and then try to open that file. Saying:

open my $fh, '>', $file or die $!;
as opposed to
open my $fh, ">$file" or die $!;

prevents the user from causing massive damage by entering a filename of
foo ; rm -rf *

Aaaagh. I never in a million years would have thought of that. Noted.
print INI <<ENDINI;
[Source Filepath]
value=C:\$file


When you don't put quotes around the Heredoc terminator, the heredoc is
treated as a double-quoted string.

Yes indeed; otherwise the "$file" variable wouldn't have been replaced.
That means you need to take the
same precautions as you would in any other double-quoted string - such
as escaping a backslash. The above would litterally print 'value =
C:\$file' rather than 'value = C:\dts.ini'.

So it does! Sloppy code. When working with Windows filenames I
frequently find myself doing things like $file =~ s/\\/\\\\/g;
'print STDERR' is more compactly written 'warn'

But see above. OK, maybe I'm the only one in the world that gets
offended when Perl prints the full path and the line number, but there
it is.
Constructive Criticism has now been given. Comments welcome.

The only important comment already appears above: Thank you; I've
learned some things.
 
P

Paul Lalli

Henry said:
I don't like "die" because it emits all sorts of stuff about line
numbers and so forth, which for a non-technical user is untidy. Same
for "warn".

Run the above code exactly as I've typed it. You will see no further
information about line numbers.

perldoc -f die
If the last element of LIST does not end in a newline, the current
script line number and input line number (if any) are also printed, and
a newline is supplied.
When working with Windows filenames I
frequently find myself doing things like $file =~ s/\\/\\\\/g;

For what it's worth, about 90% of the time, that's not necessary.
Windows understands "normal" slashes just as well as it understands
backslashes. It is only the `cmd` command line interpreter that does
not like / as a directory separator (and even that can be overcome by
using quotes)
But see above. OK, maybe I'm the only one in the world that gets
offended when Perl prints the full path and the line number, but there
it is.

Same thing for warn() as for die() above. The line numbers are only
printed if you omit a terminating newline in your error/warning
message.

Paul Lalli
 
H

Henry Law

Paul said:
Same thing for warn() as for die() above. The line numbers are only
printed if you omit a terminating newline in your error/warning
message.

My total failure to know that is an interesting comment on how languages
are learnt. I don't think I've ever read up on "die"; early in the time
I was learning I saw it used, grasped the basic concept, and have coded
it the same way ever since; same for "warn". That the presence of a
terminating new line has an effect on how "die" and "warn" actually
behave isn't intuitive and I've never have thought to find out.

I'll use them in future ...
 
B

Bart Lateur

Henry said:
I don't like "die" because it emits all sorts of stuff about line
numbers and so forth, which for a non-technical user is untidy. Same
for "warn".

Append a newline at hte end of your error message, and it won't do that.
I promise.

warn "Something happened...\n";
die "Nice clean exit!\n";
If it's something that should show at the first run of the
program - something that the programmer will see immediately - I'll use
"die".

If it's something unexpected that is so bad the program has to end, then
you do need an error message and a nonzero exit status (for any scripts
that use your program). The user will curse you the first time the
program just stops without any warning.
 
A

A. Sinan Unur

Yes, indeed. Laziness, but ...

I don't like "die" because it emits all sorts of stuff about line
numbers and so forth, which for a non-technical user is untidy. Same
for "warn". If it's something that should show at the first run of
the program - something that the programmer will see immediately -
I'll use "die". Or for shorthand, as elsewhere in this program.

die will suppress those messages if the last argument you supply to it
ends in a newline:

perldoc -f die

....
If the last element of LIST does not end in a newline, the
current script line number and input line number (if any) are
also printed, and a newline is supplied.
....

--
A. Sinan Unur <[email protected]>
(remove .invalid and reverse each component for email address)

comp.lang.perl.misc guidelines on the WWW:
http://augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html
 
A

A. Sinan Unur

I quite often use die in the sort of places
I would have used Assert in 'C'. To check
things that really, REALLY should be true.

For example, when re-opening a file for read that
was written in another part of the script.

I am sure this means something, but I am not sure what it means.
So I believe that generating a message (including stack trace)
that is helpful to the support engineer is a good thing.

In which case, you should read:

perldoc Carp

die in general is not very useful.

Sinan

--
A. Sinan Unur <[email protected]>
(remove .invalid and reverse each component for email address)

comp.lang.perl.misc guidelines on the WWW:
http://augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html
 
P

Peter J. Holzer

Paul said:
In addition to the lexical directory handle above, when opening file
handles, you should use the three argument form of open. It won't
make any difference in this case, but it's a good habbit to get into.
Eventually, you'll write a program in which you ask the user for a
filename, and then try to open that file. Saying:

open my $fh, '>', $file or die $!;
as opposed to
open my $fh, ">$file" or die $!;

prevents the user from causing massive damage by entering a filename
of foo ; rm -rf *

Nothing bad will happen, at least on Unixish systems. The leading '>' on
the filename will cause perl to open a file for writing and nothing in
the filename can change that. The only problem I know of is that perl
will strip leading and trailing spaces from the filename.

What is really dangerous is using

open my $fh, $file or die $!;

with a user-supplied filename. In this case the user can supply a
filename starting or ending with a '|' which causes perl to execute it
as a command (with a pipe connected to the appropriate end).

OTOH, sometimes this can be really useful, too.

hp
 
P

Paul Lalli

Peter said:
Nothing bad will happen, at least on Unixish systems. The leading '>' on
the filename will cause perl to open a file for writing and nothing in
the filename can change that. The only problem I know of is that perl
will strip leading and trailing spaces from the filename.

What is really dangerous is using

open my $fh, $file or die $!;

with a user-supplied filename. In this case the user can supply a
filename starting or ending with a '|' which causes perl to execute it
as a command (with a pipe connected to the appropriate end).

Thank you for the corrections. My advice would still seem to stand,
albeit for not quite the reason I gave.

Thank you,
Paul Lalli
 
P

Peter J. Holzer

Paul said:
Peter said:
Paul said:
Henry Law wrote:
In addition to the lexical directory handle above, when opening
file handles, you should use the three argument form of open. [...]
prevents the user from causing massive damage by entering a
filename of foo ; rm -rf *

Nothing bad will happen, at least on Unixish systems. The leading '>'
on the filename will cause perl to open a file for writing and
nothing in the filename can change that. The only problem I know of
is that perl will strip leading and trailing spaces from the
filename.

What is really dangerous is using

open my $fh, $file or die $!;

with a user-supplied filename. In this case the user can supply a
filename starting or ending with a '|' which causes perl to execute
it as a command (with a pipe connected to the appropriate end).

Thank you for the corrections. My advice would still seem to stand,
albeit for not quite the reason I gave.

Right. I would generalize that into:

If you want your script to be in control of what is opened, use the
three-argument form.

If you want the user to be in control of what is opened, use the two
argument form with a bare variable. Do this only if you are *absolutely
certain* that the user supplying the value of the variable and the user
effectively running the script are the same or that the latter trusts
the former.

Off the top of my head I cannot think of a reason to use the two
argument form with the "magic characters" supplied by the script.
But somebody will post such a reason shortly (it always happens when I
make sweeping generalizations :)).

hp
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,900
Latest member
Nell636132

Latest Threads

Top