Date in CSV/TSV question

Discussion in 'Perl Misc' started by Dr Eberhard Lisse, Jan 1, 2013.

  1. I have a Tab Separated File of roughly 1000 likes with the first fields like

    "07 Jan 2011" "TFR"
    "05 Jan 2011" "DR"

    I need change the first field to look like

    2011-01-07 "TFR"
    2011-01-05 "DR"

    for all lines, of course :)-O

    Can someone point me to where I can read this up? Or send me a code
    fragment?

    Thanks, el
    --
    if you want to reply, replace nospam with my initials
    Dr Eberhard Lisse, Jan 1, 2013
    #1
    1. Advertising

  2. Dr Eberhard Lisse

    Dave Saville Guest

    On Tue, 1 Jan 2013 23:56:14 UTC, Dr Eberhard Lisse <>
    wrote:

    > I have a Tab Separated File of roughly 1000 likes with the first fields like
    >
    > "07 Jan 2011" "TFR"
    > "05 Jan 2011" "DR"
    >
    > I need change the first field to look like
    >
    > 2011-01-07 "TFR"
    > 2011-01-05 "DR"
    >
    > for all lines, of course :)-O
    >
    > Can someone point me to where I can read this up? Or send me a code
    > fragment?


    Not clear if the file has the quotes or you are using them to show the
    fields. Assuming you have extracted the first field then split on
    space to day month year. Set up an array of month names. Find the
    index of the given month. Regenerate the field with sprintf. $new =
    sprintf($year-%2.2d-$day, $index); For simplicity put a dummy month on
    the front of the list, perl arrays index from 0, so @months = qw(crap
    Jan Feb ..........

    HTH
    --
    Regards
    Dave Saville
    Dave Saville, Jan 2, 2013
    #2
    1. Advertising

  3. Thanks.

    el

    On 2013-01-02 15:01 , Henry Law wrote:
    > On 01/01/13 23:56, Dr Eberhard Lisse wrote:
    >> I have a Tab Separated File of roughly 1000 likes with the first
    >> fields like
    >>
    >> "07 Jan 2011" "TFR"
    >> "05 Jan 2011" "DR"
    >>
    >> I need change the first field to look like
    >>
    >> 2011-01-07 "TFR"
    >> 2011-01-05 "DR"

    >
    > OK, couldn't resist having a bash at this. Didn't spend a lot of time
    > on it but this does what you want.
    >
    > #!/usr/bin/perl
    > use strict;
    > use warnings;
    > use 5.010;
    >
    > use Date::Calc qw( Decode_Date_EU );
    > use Text::CSV;
    >
    > my $csv = Text::CSV->new( { sep_char=>"\t", quote_char=>'"' } )
    > or die "Failed to create CSV object: $!\n";
    > while ( 1 ) {
    > my $row = $csv->getline( \*DATA );
    > last unless $row->[0]; # getline returns zero-length arrayref;
    > irritating
    > my ( $year, $month, $day ) = Decode_Date_EU( $row->[0] );
    > die "Bad date" unless $year;
    > printf "%04d-%02d-%02d\t%s\n", $year, $month, $day, $row->[1];
    > }
    >
    > __DATA__
    > "07 Jan 2011" "TFR"
    > "05 Jan 2011" "DR"
    >
    >> henry@eris:~/Perl/tryout$ ./tryout
    >> 2011-01-07 TFR
    >> 2011-01-05 DR

    >
    > It could be improved, and made more Perlish (I write code in isolation,
    > rather, which isn't a good idea). In particular I was maddened by the
    > need to check the EOF condition explicitly. "while my $row =
    > getline..." returns a one-element array containing a null value when it
    > hits EOF; you'd think it would return undef. (And yes I did try
    > "defined" as suggested in perldoc IO::Handle but the arrayref is
    > actually defined, despite not containing anything useful).
    >



    --
    If you want to email me, replace nospam with el
    Dr Eberhard W Lisse, Jan 2, 2013
    #3
  4. Dr Eberhard Lisse <> writes:
    > I have a Tab Separated File of roughly 1000 likes with the first fields like
    >
    > "07 Jan 2011" "TFR"
    > "05 Jan 2011" "DR"
    >
    > I need change the first field to look like
    >
    > 2011-01-07 "TFR"
    > 2011-01-05 "DR"
    >
    > for all lines, of course :)-O
    >
    > Can someone point me to where I can read this up? Or send me a code
    > fragment?


    -----------
    %months = map { $_, sprintf('%02d', ++$n); } qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec);

    while (<>) {
    s/^"(\d+)\s+(\S+)\s+(\d+)"/"$3-$months{$2}-$1"/;
    print;
    }
    -----------
    Rainer Weikusat, Jan 2, 2013
    #4
  5. Henry Law <> writes:
    [...]
    > You could use Date::Calc, particularly the Decode_Date_EU function; it's
    > overkill if what you've described is really all there is, but it saves
    > programming. A truly lazy^H^H^H^Hcreative programmer would look for
    > something to decode the tab-separated file too; maybe Text::CSV would do
    > that? I've only ever used it for comma separated data, (which, er, is
    > what it's for).


    Yes, quoting "perldoc Text::CSV":

    The module accepts either strings or files as input and
    can utilize any user-specified characters as delimiters,
    separators, and escapes so it is perhaps better called ASV
    (anything separated values) rather than just CSV.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Working, but not speaking, for JetHead Development, Inc.
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
    Keith Thompson, Jan 4, 2013
    #5
  6. Henry Law <> writes:

    > On 02/01/13 10:22, Dave Saville wrote:
    >> On Tue, 1 Jan 2013 23:56:14 UTC, Dr Eberhard Lisse <>
    >> wrote:
    >>
    >>> I have a Tab Separated File of roughly 1000 likes with the first fields like
    >>>
    >>> "07 Jan 2011" "TFR"
    >>> "05 Jan 2011" "DR"

    >>
    >> Not clear if the file has the quotes or you are using them to show the
    >> fields. Assuming you have extracted the first field then split on
    >> space to day month year. Set up an array of month names. Find the
    >> index of the given month. Regenerate the field with sprintf. $new =
    >> sprintf($year-%2.2d-$day, $index); For simplicity put a dummy month on
    >> the front of the list, perl arrays index from 0, so @months = qw(crap
    >> Jan Feb ..........

    >
    > You could use Date::Calc, particularly the Decode_Date_EU function;
    > it's overkill if what you've described is really all there is, but it
    > saves programming. A truly lazy^H^H^H^Hcreative programmer would look
    > for something to decode the tab-separated file too; maybe Text::CSV
    > would do that?


    Nice example how it 'saves programming':

    ,----
    | #!/usr/bin/perl
    | use strict;
    | use warnings;
    | use 5.010;
    |
    | use Date::Calc qw( Decode_Date_EU );
    | use Text::CSV;
    |
    | my $csv = Text::CSV->new( { sep_char=>"\t", quote_char=>'"' } )
    | or die "Failed to create CSV object: $!\n";
    | while ( 1 ) {
    | my $row = $csv->getline( \*DATA );
    | last unless $row->[0]; # getline returns zero-length arrayref;
    | irritating
    | my ( $year, $month, $day ) = Decode_Date_EU( $row->[0] );
    | die "Bad date" unless $year;
    | printf "%04d-%02d-%02d\t%s\n", $year, $month, $day, $row->[1];
    | }
    `----

    That's 14 lines of code. Alternate version without Date::Calc and
    Text::CSV

    ,----
    | %months = map { $_, sprintf('%02d', ++$n); } qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec);
    |
    | while (<>) {
    | s/^"(\d+)\s+(\S+)\s+(\d+)"/"$3-$months{$2}-$1"/;
    | print;
    | }
    `----

    That's good enough for the problem which was described and it's four
    lines of code. "Truly creative", -10 lines of code were saved here
    and a comment explaining an 'ugly' workaround for deficiency in the
    downloaded code had to be added as well[*],

    while (1) {
    Rainer Weikusat, Jan 4, 2013
    #6
  7. Dr Eberhard Lisse

    C.DeRykus Guest

    On Wednesday, January 2, 2013 7:37:02 AM UTC-8, Rainer Weikusat wrote:
    > Dr Eberhard Lisse <> writes:
    >
    > > I have a Tab Separated File of roughly 1000 likes with the first fields like

    >
    > >

    >
    > > "07 Jan 2011" "TFR"

    >
    > > "05 Jan 2011" "DR"

    >
    > >

    >
    > > I need change the first field to look like

    >
    > >

    >
    > > 2011-01-07 "TFR"

    >
    > > 2011-01-05 "DR"

    >
    > >

    >
    > > for all lines, of course :)-O

    >
    > >

    >
    > > Can someone point me to where I can read this up? Or send me a code

    >
    > > fragment?

    >
    >
    >
    > -----------
    >
    > %months = map { $_, sprintf('%02d', ++$n); } qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec);
    >
    >
    >
    > while (<>) {
    >
    > s/^"(\d+)\s+(\S+)\s+(\d+)"/"$3-$months{$2}-$1"/;
    >
    > print;
    >
    > }
    >
    > -----------


    Maybe even shrink it to a long one-liner:

    perl -MDate::Manip -pi.bak -le 's{^"(\d+)\s+(\S+)\s+(\d+)"}
    {"$3-" . UnixDate("$1 $2 $3","%m") . "-$1"}e' infile

    --
    Charles DeRykus
    C.DeRykus, Jan 5, 2013
    #7
  8. "C.DeRykus" <> writes:
    > On Wednesday, January 2, 2013 7:37:02 AM UTC-8, Rainer Weikusat wrote:
    >> Dr Eberhard Lisse <> writes:
    >> > I have a Tab Separated File of roughly 1000 likes with the first

    >> fields like
    >>
    >> > "07 Jan 2011" "TFR"
    >> > "05 Jan 2011" "DR"

    >>
    >>> 2011-01-07 "TFR"
    >>> 2011-01-05 "DR"


    [...]

    >> -----------
    >> %months = map { $_, sprintf('%02d', ++$n); } qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec);
    >>
    >> while (<>) {
    >> s/^"(\d+)\s+(\S+)\s+(\d+)"/"$3-$months{$2}-$1"/;
    >> print;
    >> }
    >> -----------

    >
    > Maybe even shrink it to a long one-liner:
    >
    > perl -MDate::Manip -pi.bak -le 's{^"(\d+)\s+(\S+)\s+(\d+)"}
    > {"$3-" . UnixDate("$1 $2 $3","%m") . "-$1"}e' infile


    Considering the situation of the OP, he has a 'zero line' solution
    because all code was written by someone else. I don't know how his is
    for other people, however, I can type

    qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec)

    much faster than I can download anything from the net, especially
    considering that I'd have to read to documentation for this anything,
    too, making this a very bad tradeoff. And if I had to rely one someone
    else's code for totally trivial stuff such as splitting a text file
    with n 'somehow separated' data columns into an array, I would have a
    very hard time solving the much more complicated problems I usually
    need to deal with. Actually, I regularly search CPAN whenever I have a
    reasonably complex and self-contained subtask of something that 'using
    a module' if one existed would be a good idea. The most common result
    of this searches, however, is 'nada', the second most common is some
    totally bizarre implementation of 25% of the features I actually need
    and the third 'implementation is total crap' aka 'IO::poll' (and the
    original author abandoned the code in question in 1975 in order to
    become a missionary in Gabun or something like that).

    CPAN is mostly a load of tripe resulting from fifteen years of bored
    'hobbyists' (here supposed to mean people whose actual job isn't
    programming) trying whatever weirdo-approach for solving fifty
    different but vaguely related _trivial_ problems with the help of a
    steam-engine powered motor umbrella constructed out of yellow,
    magenta and purple lego bricks happened to come to their mind. And
    downloading all these 'incredible machines' is - except in case of
    500 SLOC throw-away 'oneliners' - not the end of the story: I have to
    maintain the code because the people who use the software I'm
    responsible for come to me with any problems resulting from that.

    The rule of thumb I usually follow is that 'using a library' (or -
    something I very much prefer - an already written program somebody
    actually used to solve a real problem) is only worth the effort if it
    saves a significant amount of work, at least something like 500 lines
    of code and preferably, a few thousands. And even then, I end up
    'maintaining' seriously byzantine workarounds for all the problems in
    the 'free' code until I grow tired of that and replace it with
    something which actually works (in the sense that it reliably does
    what is needed to solve the problem I have to solve and nothing else)
    more often than not.
    Rainer Weikusat, Jan 5, 2013
    #8
  9. The OP is an elderly Obstetrician & Gynecologist, who occasionally needs
    to Practically Extract and Report stuff.

    el

    On 2013-01-05 21:56 , Rainer Weikusat wrote:
    > "C.DeRykus" <> writes:
    >> On Wednesday, January 2, 2013 7:37:02 AM UTC-8, Rainer Weikusat wrote:
    >>> Dr Eberhard Lisse <> writes:
    >>>> I have a Tab Separated File of roughly 1000 likes with the first
    >>> fields like
    >>>
    >>>> "07 Jan 2011" "TFR"
    >>>> "05 Jan 2011" "DR"
    >>>
    >>>> 2011-01-07 "TFR"
    >>>> 2011-01-05 "DR"

    >
    > [...]
    >
    >>> -----------
    >>> %months = map { $_, sprintf('%02d', ++$n); } qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec);
    >>>
    >>> while (<>) {
    >>> s/^"(\d+)\s+(\S+)\s+(\d+)"/"$3-$months{$2}-$1"/;
    >>> print;
    >>> }
    >>> -----------

    >>
    >> Maybe even shrink it to a long one-liner:
    >>
    >> perl -MDate::Manip -pi.bak -le 's{^"(\d+)\s+(\S+)\s+(\d+)"}
    >> {"$3-" . UnixDate("$1 $2 $3","%m") . "-$1"}e' infile

    >
    > Considering the situation of the OP, he has a 'zero line' solution
    > because all code was written by someone else. I don't know how his is
    > for other people, however, I can type
    >
    > qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec)
    >
    > much faster than I can download anything from the net, especially
    > considering that I'd have to read to documentation for this anything,
    > too, making this a very bad tradeoff. And if I had to rely one someone
    > else's code for totally trivial stuff such as splitting a text file
    > with n 'somehow separated' data columns into an array, I would have a
    > very hard time solving the much more complicated problems I usually
    > need to deal with. Actually, I regularly search CPAN whenever I have a
    > reasonably complex and self-contained subtask of something that 'using
    > a module' if one existed would be a good idea. The most common result
    > of this searches, however, is 'nada', the second most common is some
    > totally bizarre implementation of 25% of the features I actually need
    > and the third 'implementation is total crap' aka 'IO::poll' (and the
    > original author abandoned the code in question in 1975 in order to
    > become a missionary in Gabun or something like that).
    >
    > CPAN is mostly a load of tripe resulting from fifteen years of bored
    > 'hobbyists' (here supposed to mean people whose actual job isn't
    > programming) trying whatever weirdo-approach for solving fifty
    > different but vaguely related _trivial_ problems with the help of a
    > steam-engine powered motor umbrella constructed out of yellow,
    > magenta and purple lego bricks happened to come to their mind. And
    > downloading all these 'incredible machines' is - except in case of
    > 500 SLOC throw-away 'oneliners' - not the end of the story: I have to
    > maintain the code because the people who use the software I'm
    > responsible for come to me with any problems resulting from that.
    >
    > The rule of thumb I usually follow is that 'using a library' (or -
    > something I very much prefer - an already written program somebody
    > actually used to solve a real problem) is only worth the effort if it
    > saves a significant amount of work, at least something like 500 lines
    > of code and preferably, a few thousands. And even then, I end up
    > 'maintaining' seriously byzantine workarounds for all the problems in
    > the 'free' code until I grow tired of that and replace it with
    > something which actually works (in the sense that it reliably does
    > what is needed to solve the problem I have to solve and nothing else)
    > more often than not.
    >



    --
    if you want to reply, replace nospam with my initials
    Dr Eberhard Lisse, Jan 5, 2013
    #9
  10. Dr Eberhard Lisse

    C.DeRykus Guest

    On Saturday, January 5, 2013 11:56:18 AM UTC-8, Rainer Weikusat wrote:
    > "C.DeRykus" <> writes:
    >
    > > On Wednesday, January 2, 2013 7:37:02 AM UTC-8, Rainer Weikusat wrote:

    >
    > >> Dr Eberhard Lisse <> writes:

    >
    > >> > I have a Tab Separated File of roughly 1000 likes with the first

    >
    > >> fields like

    >
    > >>

    >
    > >> > "07 Jan 2011" "TFR"

    >
    > >> > "05 Jan 2011" "DR"

    >
    > >>

    >
    > >>> 2011-01-07 "TFR"

    >
    > >>> 2011-01-05 "DR"

    >
    >
    >
    > [...]
    >
    >
    >
    > >> -----------

    >
    > >> %months = map { $_, sprintf('%02d', ++$n); } qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec);

    >
    > >>

    >
    > >> while (<>) {

    >
    > >> s/^"(\d+)\s+(\S+)\s+(\d+)"/"$3-$months{$2}-$1"/;

    >
    > >> print;

    >
    > >> }

    >
    > >> -----------

    >
    > >

    >
    > > Maybe even shrink it to a long one-liner:

    >
    > >

    >
    > > perl -MDate::Manip -pi.bak -le 's{^"(\d+)\s+(\S+)\s+(\d+)"}

    >
    > > {"$3-" . UnixDate("$1 $2 $3","%m") . "-$1"}e' infile

    >
    >
    >
    > Considering the situation of the OP, he has a
    > 'zero line' solution because all code was written
    > by someone else.


    Hm, it sounded like he just a separate tab-delimited
    file he needed in a different format (ideal for a 1-
    liner.) The -i switch is especially useful for just
    this if the scenario allows it.

    > I don't know how his
    > for other people, however, I can type
    >
    > qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec)
    >


    > much faster than I can download anything from the net, especially
    >
    > considering that I'd have to read to documentation for this anything,
    >
    > too, making this a very bad tradeoff. And if I had to rely one someone
    >
    > else's code for totally trivial stuff such as splitting a text file
    >
    > with n 'somehow separated' data columns into an array, I would have a
    >
    > very hard time solving the much more complicated problems I usually
    >
    > need to deal with. Actually, I regularly search CPAN whenever I have a
    >
    > reasonably complex and self-contained subtask of something that 'using
    >
    > a module' if one existed would be a good idea. The most common result
    >
    > of this searches, however, is 'nada', the second most common is some
    >
    > totally bizarre implementation of 25% of the features I actually need
    >
    > and the third 'implementation is total crap' aka 'IO::poll' (and the
    >
    > original author abandoned the code in question in 1975 in order to
    >
    > become a missionary in Gabun or something like that).
    >
    >
    >
    > CPAN is mostly a load of tripe resulting from fifteen years of bored
    >
    > 'hobbyists' (here supposed to mean people whose actual job isn't
    >
    > programming) trying whatever weirdo-approach for solving fifty
    >
    > different but vaguely related _trivial_ problems with the help of a
    >
    > steam-engine powered motor umbrella constructed out of yellow,
    >
    > magenta and purple lego bricks happened to come to their mind. And
    >
    > downloading all these 'incredible machines' is - except in case of
    >
    > 500 SLOC throw-away 'oneliners' - not the end of the story: I have to
    >
    > maintain the code because the people who use the software I'm
    >
    > responsible for come to me with any problems resulting from that.
    >
    >
    >
    > The rule of thumb I usually follow is that 'using a library' (or -
    >
    > something I very much prefer - an already written program somebody
    >
    > actually used to solve a real problem) is only worth the effort if it


    >
    > saves a significant amount of work, at least something like 500 lines
    >
    > of code and preferably, a few thousands. And even then, I end up
    >
    > 'maintaining' seriously byzantine workarounds for all the problems in
    >
    > the 'free' code until I grow tired of that and replace it with
    >
    > something which actually works (in the sense that it reliably does
    >
    > what is needed to solve the problem I have to solve and nothing else)
    >
    > more often than not.


    I can appreciate your viewpoint. Date::Manip though
    is well-maintained and extraordinarily useful. There
    are several other very good Date modules as well.

    Leveraging a small bit of module code for a tedious,
    surprisingly frequent little chore appeals to the
    very lazy. So, it's worth it IMO :)

    --
    Charles DeRykus
    C.DeRykus, Jan 5, 2013
    #10
  11. "C.DeRykus" <> writes:
    > On Saturday, January 5, 2013 11:56:18 AM UTC-8, Rainer Weikusat wrote:
    >> "C.DeRykus" <> writes:
    >>> On Wednesday, January 2, 2013 7:37:02 AM UTC-8, Rainer Weikusat wrote:
    >>>> Dr Eberhard Lisse <> writes:
    >>>>> I have a Tab Separated File of roughly 1000 likes with the first
    >>>> fields like
    >>>>
    >>>> "05 Jan 2011" "DR"
    >>>


    [and need to translate that to]

    >>
    >> >>> 2011-01-07 "TFR"
    >> >>> 2011-01-05 "DR"


    [...]

    >>> Maybe even shrink it to a long one-liner:
    >>>
    >>> perl -MDate::Manip -pi.bak -le 's{^"(\d+)\s+(\S+)\s+(\d+)"}
    >>> {"$3-" . UnixDate("$1 $2 $3","%m") . "-$1"}e' infile

    >> Considering the situation of the OP, he has a
    >> 'zero line' solution because all code was written
    >> by someone else.

    >
    > Hm, it sounded like he just a separate tab-delimited
    > file he needed in a different format (ideal for a 1-
    > liner.) The -i switch is especially useful for just
    > this if the scenario allows it.


    If you weren't using -i, it wasn't necessary to worry about creating a
    backup file since the modified content would end up in a new file.

    >
    >> I don't know how his
    >> for other people, however, I can type
    >>
    >> qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec)
    >>

    >
    >> much faster than I can download anything from the net,


    [...]

    > Date::Manip though is well-maintained and extraordinarily
    > useful. There are several other very good Date modules as well.
    >
    > Leveraging a small bit of module code for a tedious,
    > surprisingly frequent little chore appeals to the
    > very lazy. So, it's worth it IMO :)


    It would call this a case of 'false laziness': You happen to be
    familiar with a certain 'date munging' module. The OP wanted to modify
    some 'structured text field' which happened to be a data. Ergo:
    Clearly, a case for using the date manipulation code. But nothing in
    the described problem is related to dates. A sequence of text of the
    form

    "number0 string number1"

    is supposed to be changed such that it becomes

    number1-number2-number0

    that is, the quotes are supposed to be deleted (I didn't realize
    that), the first and the last subfield should be transposed and the
    middle string replaced by a two-digit number using a simple,
    "well-known" static mapping from twelve three character strings to
    numbers. This is exactly the kind of stuff which can be done very
    easily with perl, ie

    -------------
    %months = map { $_, sprintf('%02d', ++$n); } qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec);

    s/^"(\d+)\s+(\S+)\s+(\d+)"/$3-$months{$2}-$1/, print while (<>);
    -------------

    and telling the OP that he should instead download a couple of
    thousands (probably, I've only counted the DM6 file which figures at
    691 LOC) of lines of code consisting of 972(!) different files, most
    of which are documented(!) as broken and are totally useless for the
    problem at hand is not something I'd call a sound piece of technical
    advice. It is probably possible to use a combine harvester instead of
    a lawnmower but nobody in his right mind would ever do that or suggest
    that others do it.
    Rainer Weikusat, Jan 6, 2013
    #11
  12. Dr Eberhard Lisse

    C.DeRykus Guest

    On Sunday, January 6, 2013 9:12:35 AM UTC-8, Rainer Weikusat wrote:
    > "C.DeRykus" <> writes:
    >
    > > On Saturday, January 5, 2013 11:56:18 AM UTC-8, Rainer Weikusat wrote:

    >
    > >> "C.DeRykus" <> writes:

    >
    > >>> On Wednesday, January 2, 2013 7:37:02 AM UTC-8, Rainer Weikusat wrote:

    >
    > >>>> Dr Eberhard Lisse <> writes:

    >
    > >>>>> I have a Tab Separated File of roughly 1000 likes with the first

    >
    > >>>> fields like

    >
    > >>>>

    >
    > >>>> "05 Jan 2011" "DR"

    >
    > >>>

    >
    >
    >
    > [and need to translate that to]
    >
    >
    >
    > >>

    >
    > >> >>> 2011-01-07 "TFR"

    >
    > >> >>> 2011-01-05 "DR"

    >
    >
    >
    > [...]
    >
    >
    >
    > >>> Maybe even shrink it to a long one-liner:

    >
    > >>>

    >
    > >>> perl -MDate::Manip -pi.bak -le 's{^"(\d+)\s+(\S+)\s+(\d+)"}

    >
    > >>> {"$3-" . UnixDate("$1 $2 $3","%m") . "-$1"}e' infile

    >
    > >> Considering the situation of the OP, he has a

    >
    > >> 'zero line' solution because all code was written

    >
    > >> by someone else.

    >
    > >

    >
    > > Hm, it sounded like he just a separate tab-delimited

    >
    > > file he needed in a different format (ideal for a 1-

    >
    > > liner.) The -i switch is especially useful for just

    >
    > > this if the scenario allows it.

    >
    >
    >
    > If you weren't using -i, it wasn't necessary to worry about creating a
    >
    > backup file since the modified content would end up in a new file.
    >
    >


    -i is useful in case you're one of those whose
    code never works the first time time though...
    And you can always remove -i later.

    > >

    >
    > >> I don't know how his

    >
    > >> for other people, however, I can type

    >
    > >>

    >
    > >> qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec)

    >
    > >>

    >
    > >

    >
    > >> much faster than I can download anything from the net,

    >
    >
    >
    > [...]
    >
    >
    >
    > > Date::Manip though is well-maintained and extraordinarily

    >
    > > useful. There are several other very good Date modules as well.

    >
    > >

    >
    > > Leveraging a small bit of module code for a tedious,

    >
    > > surprisingly frequent little chore appeals to the

    >
    > > very lazy. So, it's worth it IMO :)

    >
    >
    >
    > It would call this a case of 'false laziness': You happen to be
    >
    > familiar with a certain 'date munging' module. The OP wanted to modify
    >
    > some 'structured text field' which happened to be a data. Ergo:
    >
    > Clearly, a case for using the date manipulation code. But nothing in
    >
    > the described problem is related to dates. A sequence of text of the
    >
    > form
    >
    >
    >
    > "number0 string number1"
    >
    >
    >
    > is supposed to be changed such that it becomes
    >
    >
    >
    > number1-number2-number0
    >
    >
    >
    > that is, the quotes are supposed to be deleted (I didn't realize
    >
    > that), the first and the last subfield should be transposed and the
    >
    > middle string replaced by a two-digit number using a simple,
    >
    > "well-known" static mapping from twelve three character strings to
    >
    > numbers. This is exactly the kind of stuff which can be done very
    >
    > easily with perl, ie
    >
    >
    >
    > -------------
    >
    > %months = map { $_, sprintf('%02d', ++$n); } qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec);
    >
    >
    >
    > s/^"(\d+)\s+(\S+)\s+(\d+)"/$3-$months{$2}-$1/, print while (<>);
    > ...



    Sure, if you don't deal with this kind of
    transform often, yet another incantation is
    no big deal. And a simple regex can remain
    blissfully ignorant of the fact that it's
    dealing with dates. But then, if tweaks are
    needed, it's "deja vu all over again". Can't
    remember where to cut'n paste your old tweak..
    No problem. Just wade in and watch out for typo's.


    >
    > and telling the OP that he should instead download a couple of
    >
    > thousands (probably, I've only counted the DM6 file which figures at
    >
    > 691 LOC) of lines of code consisting of 972(!) different files, most
    >
    > of which are documented(!) as broken and are totally useless for the
    >
    > problem at hand is not something I'd call a sound piece of technical
    >
    > advice.



    I'd agree there are probably better solutions
    that pulling in the bloat of Date::Manip. But
    there are several good Date modules and it's
    all about leveraging code already written and
    working. Concern with "pulling in a big module"
    is almost always FUD - especially speed concerns. Additionally, if the input format changes, and
    those are dates after all, a good Date module
    probably has a method to cinch the code tweaks.
    One that's already written...


    > It is probably possible to use a combine harvester > instead of a lawnmower but nobody in his right mind would ever do that or suggest
    > that others do it.


    Then why do we use a simple module function to
    escape HTML for instance.. rather than rolling
    our own? Sometimes a Swiss army knife - rather
    than scrounging around for a small pen knife -
    is worth the extra weight in a knapsack.

    --
    Charles DeRykus
    C.DeRykus, Jan 6, 2013
    #12
  13. Ah, the Plonkers.

    el


    On 2013-01-05 23:49 , Henry Law wrote:
    > On 05/01/13 21:33, Dr Eberhard Lisse wrote:
    >> The OP is an elderly Obstetrician & Gynecologist, who occasionally needs
    >> to Practically Extract and Report stuff.

    >
    > By the way, Meinheer Doctor, you might be interested to know that quite
    > a lot of people who frequent this group won't have seen the article
    > which you followed up here, having decided some time ago to block posts
    > from its author at source.
    >
    > I leave it to you to determine the significance of this.
    >
    > PS I bet you're no more elderly than I am :)
    >



    --
    If you want to email me, replace nospam with el
    Dr Eberhard W Lisse, Jan 7, 2013
    #13
  14. "C.DeRykus" <> writes:

    Leading remark: I'm going to cut this somewhat short. I don't agree
    with your opinion on this, however, essentially repeating myself
    doesn't seem very useful to me, so I'm just going to address a few
    isolated points.

    >> On Sunday, January 6, 2013 9:12:35 AM UTC-8, Rainer Weikusat wrote:
    >> "C.DeRykus" <> writes:


    [...]


    >> If you weren't using -i, it wasn't necessary to worry about creating a
    >> backup file since the modified content would end up in a new file.

    >
    > -i is useful in case you're one of those whose
    > code never works the first time time though...
    > And you can always remove -i later.


    What I was trying to get at was that it wouldn't be necessary to use
    the 'automatic backup' feature of -i if 'overwriting' (aka
    'destroying') the input file hadn't been requested to begin with: In this
    case, the processed data would go to stdout, immediately available for
    interactive inspection, and could be redirected to some other file if
    so desired at the user's discretion.

    [...]

    >> -------------
    >>
    >> %months = map { $_, sprintf('%02d', ++$n); } qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec);
    >>
    >>
    >>
    >> s/^"(\d+)\s+(\S+)\s+(\d+)"/$3-$months{$2}-$1/, print while (<>);
    >> ...

    >
    > Sure, if you don't deal with this kind of
    > transform often, yet another incantation is
    > no big deal.


    'Incantation' is IMO a very unfortunate choice for describing this. It
    is a sequence of instructions with exactly defined meaning which
    causes a machine to perform a specific function. That's a completely
    mundane thing with absolutely no 'magic' of any kind involved (except
    insofar 'any sufficiently advanced technology is indistinguishable
    fomr magic' [as seen by someone who doesn't understand any of it],).

    [...]

    >> It is probably possible to use a combine harvester instead of a
    >> lawnmower but nobody in his right mind would ever do that or
    >> suggest that others do it.

    >
    > Then why do we use a simple module function to
    > escape HTML for instance.. rather than rolling
    > our own?


    Hmm ... why would I?

    $text =~ s/([<>"'&])/'&#'.ord($1).';'/ge;
    Rainer Weikusat, Jan 7, 2013
    #14
  15. Dr Eberhard Lisse

    ccc31807 Guest

    On Tuesday, January 1, 2013 6:56:14 PM UTC-5, Dr Eberhard Lisse wrote:
    > "07 Jan 2011" "TFR"
    > "05 Jan 2011" "DR">
    >
    > I need change the first field to look like>
    >
    > 2011-01-07 "TFR"
    > 2011-01-05 "DR"


    For each line in the file, do something like this, assuming that $date contains a string that matches the date you want to change:
    1. my ($day, $month, $year) = split(/ /, $date);
    2. $date = sprintf("%04d-%02d-%02d", $year, $mo2num{$mo}, $day);

    Line 1 splits your date string into the three components: day, month, year.
    Line 2 reassembles those three components and assigns the result back to $date.
    The hash table %mo2num looks like this:
    my %mo2num = (
    JAN => 1,
    FEB => 2,
    mar => 3,
    etc.
    );

    CC.
    ccc31807, Jan 8, 2013
    #15
  16. Thanks,

    el

    on 2013-01-08 18:35 ccc31807 said the following:
    > On Tuesday, January 1, 2013 6:56:14 PM UTC-5, Dr Eberhard Lisse wrote:
    >> "07 Jan 2011" "TFR"
    >> "05 Jan 2011" "DR">
    >>
    >> I need change the first field to look like>
    >>
    >> 2011-01-07 "TFR"
    >> 2011-01-05 "DR"

    >
    > For each line in the file, do something like this, assuming that $date contains a string that matches the date you want to change:
    > 1. my ($day, $month, $year) = split(/ /, $date);
    > 2. $date = sprintf("%04d-%02d-%02d", $year, $mo2num{$mo}, $day);
    >
    > Line 1 splits your date string into the three components: day, month, year.
    > Line 2 reassembles those three components and assigns the result back to $date.
    > The hash table %mo2num looks like this:
    > my %mo2num = (
    > JAN => 1,
    > FEB => 2,
    > mar => 3,
    > etc.
    > );
    >
    > CC.
    >
    Dr Eberhard Lisse, Jan 9, 2013
    #16
  17. Dr Eberhard Lisse <> writes:

    > Thanks,
    >
    > el
    >
    > on 2013-01-08 18:35 ccc31807 said the following:
    >> On Tuesday, January 1, 2013 6:56:14 PM UTC-5, Dr Eberhard Lisse wrote:
    >>> "07 Jan 2011" "TFR"
    >>> "05 Jan 2011" "DR">
    >>>
    >>> I need change the first field to look like>
    >>>
    >>> 2011-01-07 "TFR"
    >>> 2011-01-05 "DR"

    >>
    >> For each line in the file, do something like this, assuming that $date contains a string that matches the date you want to change:
    >> 1. my ($day, $month, $year) = split(/ /, $date);
    >> 2. $date = sprintf("%04d-%02d-%02d", $year, $mo2num{$mo}, $day);
    >>
    >> Line 1 splits your date string into the three components: day, month, year.
    >> Line 2 reassembles those three components and assigns the result back to $date.
    >> The hash table %mo2num looks like this:
    >> my %mo2num = (
    >> JAN => 1,
    >> FEB => 2,
    >> mar => 3,
    >> etc.
    >> );


    And assuming the hash exists (I posted a command generating it two
    times), the format can be transformed with a subsitution expression (I
    also posted two times), namely

    s/"(\d+)\s+(\S+)\s+(\d+)"/$3-$mo2num{$2}-$1/
    Rainer Weikusat, Jan 9, 2013
    #17
  18. Dr Eberhard Lisse

    Ben Goldberg Guest

    On Wednesday, January 2, 2013 10:37:02 AM UTC-5, Rainer Weikusat wrote:
    > Dr Eberhard Lisse <> writes:
    >
    > > I have a Tab Separated File of roughly 1000 likes with the first
    > > fields like
    > >
    > > "07 Jan 2011" "TFR"
    > > "05 Jan 2011" "DR"
    > >
    > > I need change the first field to look like
    > >
    > > 2011-01-07 "TFR"
    > > 2011-01-05 "DR"
    > >
    > > for all lines, of course :)-O
    > >
    > > Can someone point me to where I can read this up? Or send me a code
    > > fragment?

    >
    > -----------
    >
    > %months = map { $_, sprintf('%02d', ++$n); } qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec);
    >
    >
    >
    > while (<>) {
    >
    > s/^"(\d+)\s+(\S+)\s+(\d+)"/"$3-$months{$2}-$1"/;
    >
    > print;
    >
    > }
    >
    > -----------


    Don't forget that you can use perl's "command line" switches even when you put your program in a file.
    #!/usr/bin/perl -pi.bak
    BEGIN {
    %months = map {;$_, sprintf('%02d', ++$n)}
    qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec);
    }
    s/^"(\d+)\s+(\S+)\s+(\d+)"/"$3-$months{$2}-$1"/;
    __END__
    Ben Goldberg, Feb 12, 2013
    #18
  19. Ben Goldberg <> writes:
    > On Wednesday, January 2, 2013 10:37:02 AM UTC-5, Rainer Weikusat wrote:


    [...]

    >> -----------
    >>
    >> %months = map { $_, sprintf('%02d', ++$n); } qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec);
    >>
    >> while (<>) {
    >> s/^"(\d+)\s+(\S+)\s+(\d+)"/"$3-$months{$2}-$1"/;
    >> print;
    >> }
    >>
    >> -----------

    >
    > Don't forget that you can use perl's "command line" switches even when you put your program in a file.
    > #!/usr/bin/perl -pi.bak
    > BEGIN {
    > %months = map {;$_, sprintf('%02d', ++$n)}
    > qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec);
    > }
    > s/^"(\d+)\s+(\S+)\s+(\d+)"/"$3-$months{$2}-$1"/;
    > __END__


    The 'BEGIN' serves no useful purpose here: %months needs to be
    initialized before the while-loop uses it. Since statements in a file
    are executed consecutively (anything else would probably be 'a little
    confusing' :), this will be the case with either variant.

    As I wrote in another posting: If perl hadn't been told to destroy the
    input file, also telling it to make a backup of that before doing so
    wasn't necessary. While this probably doesn't matter much for a
    trivial example like this, 'not using -i' also means that the code can
    be debugged and fixed without constantly renaming files or losing the
    original input file altogether in case the 'backup request' was
    accidentally forgotten. This also enables use of the script(let) as
    'another filter' in a more complicated pipeline.
    Rainer Weikusat, Feb 12, 2013
    #19
  20. Dr Eberhard Lisse

    Uri Guttman Guest

    >>>>> "BM" == Ben Morrow <> writes:

    BM> There's no need to muck about with the #! line and BEGIN blocks, both of
    BM> which would make it impossible to turn this into a subroutine later:

    BM> my %months = ...;

    BM> local $^I = ".bak";
    BM> while (<>) { ... }

    BM> The edit-in-place handling, including renaming the old file and opening
    BM> and selecting ARGVOUT, is done by the no-filehandle <> operator (or an
    BM> explicit <ARGV> or readline(ARGV)) whenever $^I is set. If you want to
    BM> in-place edit a custom list of files, you can also localise @ARGV.

    and File::Slurp has edit_file and edit_file_lines which are even easier
    to use.

    i do need to add a backup file option to those.

    uri
    Uri Guttman, Feb 14, 2013
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Matt
    Replies:
    1
    Views:
    574
    Matthew Speed
    Nov 8, 2003
  2. Peter Grison

    Date, date date date....

    Peter Grison, May 28, 2004, in forum: Java
    Replies:
    10
    Views:
    3,196
    Michael Borgwardt
    May 30, 2004
  3. Matt
    Replies:
    2
    Views:
    495
    Pete Becker
    Nov 8, 2003
  4. BCC

    How to read tsv file?

    BCC, Jan 30, 2004, in forum: C++
    Replies:
    10
    Views:
    4,765
    David Harmon
    Jan 30, 2004
  5. Brian

    TSV to HTML

    Brian, May 31, 2006, in forum: Python
    Replies:
    10
    Views:
    1,213
    Brian
    Jun 1, 2006
Loading...

Share This Page