I
ifiaz
I have a data that looks like this in a single line.
"01 17060757 EG 6880232 N 0131020321 17 060712 l 8828 TR6322
00030070 01 20030317060807749544 060645 244 PA1"
for about 280,000 lines.
The fields are fixed-widths. You can't extract it using delimiters as
some of the
fields may be blank.
I originally wrote an awkscript and used substr to extract the fields
from $0
and it took 25.26 seconds to calculate the summary.
Field Splitting in awk, for your info
F1 =substr($0, 1, 2)
TiltTime =substr($0, 4, 8)
....
....
Using awk to perl converter, the same thing in perl took only 11.03
seconds.
(awk to perl used substr as well)
Field Splitting in awk to perl, for your info
$F1 = substr($_, 1, 2);
$TiltTime = substr($_, 4, 8);
....
....
Now, I wrote a perl script, but only replaced the field splitting part
with
unpack. Now, the script takes 21.5 seconds.
Field Splitting in perl using unpack, for your info
($F1, $TiltTime, ...) =
unpack("a2xa8xa2xa3a5xa1xa10xa2xa6xa1xa4xa8xa6xa8xa2xa20xa6xa3xa3",
$_);
....
....
Why is unpack not efficient? Am I doing anything wrong?
Should I stick to substr to do such field splitting in the future?
Can I write it any other way to make it more efficient.
- Fiaz Idris
"01 17060757 EG 6880232 N 0131020321 17 060712 l 8828 TR6322
00030070 01 20030317060807749544 060645 244 PA1"
for about 280,000 lines.
The fields are fixed-widths. You can't extract it using delimiters as
some of the
fields may be blank.
I originally wrote an awkscript and used substr to extract the fields
from $0
and it took 25.26 seconds to calculate the summary.
Field Splitting in awk, for your info
F1 =substr($0, 1, 2)
TiltTime =substr($0, 4, 8)
....
....
Using awk to perl converter, the same thing in perl took only 11.03
seconds.
(awk to perl used substr as well)
Field Splitting in awk to perl, for your info
$F1 = substr($_, 1, 2);
$TiltTime = substr($_, 4, 8);
....
....
Now, I wrote a perl script, but only replaced the field splitting part
with
unpack. Now, the script takes 21.5 seconds.
Field Splitting in perl using unpack, for your info
($F1, $TiltTime, ...) =
unpack("a2xa8xa2xa3a5xa1xa10xa2xa6xa1xa4xa8xa6xa8xa2xa20xa6xa3xa3",
$_);
....
....
Why is unpack not efficient? Am I doing anything wrong?
Should I stick to substr to do such field splitting in the future?
Can I write it any other way to make it more efficient.
- Fiaz Idris