Date in CSV/TSV question

Discussion in 'Perl Misc' started by Dr Eberhard Lisse, Jan 1, 2013.

  1. I have a Tab Separated File of roughly 1000 likes with the first fields like

    "07 Jan 2011" "TFR"
    "05 Jan 2011" "DR"

    I need change the first field to look like

    2011-01-07 "TFR"
    2011-01-05 "DR"

    for all lines, of course :)-O

    Can someone point me to where I can read this up? Or send me a code
    fragment?

    Thanks, el
     
    Dr Eberhard Lisse, Jan 1, 2013
    #1
    1. Advertisements

  2. Dr Eberhard Lisse

    Dave Saville Guest

    Not clear if the file has the quotes or you are using them to show the
    fields. Assuming you have extracted the first field then split on
    space to day month year. Set up an array of month names. Find the
    index of the given month. Regenerate the field with sprintf. $new =
    sprintf($year-%2.2d-$day, $index); For simplicity put a dummy month on
    the front of the list, perl arrays index from 0, so @months = qw(crap
    Jan Feb ..........

    HTH
     
    Dave Saville, Jan 2, 2013
    #2
    1. Advertisements

  3. Thanks.

    el

     
    Dr Eberhard W Lisse, Jan 2, 2013
    #3
  4. -----------
    %months = map { $_, sprintf('%02d', ++$n); } qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec);

    while (<>) {
    s/^"(\d+)\s+(\S+)\s+(\d+)"/"$3-$months{$2}-$1"/;
    print;
    }
    -----------
     
    Rainer Weikusat, Jan 2, 2013
    #4
  5. Yes, quoting "perldoc Text::CSV":

    The module accepts either strings or files as input and
    can utilize any user-specified characters as delimiters,
    separators, and escapes so it is perhaps better called ASV
    (anything separated values) rather than just CSV.
     
    Keith Thompson, Jan 4, 2013
    #5
  6. Nice example how it 'saves programming':

    ,----
    | #!/usr/bin/perl
    | use strict;
    | use warnings;
    | use 5.010;
    |
    | use Date::Calc qw( Decode_Date_EU );
    | use Text::CSV;
    |
    | my $csv = Text::CSV->new( { sep_char=>"\t", quote_char=>'"' } )
    | or die "Failed to create CSV object: $!\n";
    | while ( 1 ) {
    | my $row = $csv->getline( \*DATA );
    | last unless $row->[0]; # getline returns zero-length arrayref;
    | irritating
    | my ( $year, $month, $day ) = Decode_Date_EU( $row->[0] );
    | die "Bad date" unless $year;
    | printf "%04d-%02d-%02d\t%s\n", $year, $month, $day, $row->[1];
    | }
    `----

    That's 14 lines of code. Alternate version without Date::Calc and
    Text::CSV

    ,----
    | %months = map { $_, sprintf('%02d', ++$n); } qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec);
    |
    | while (<>) {
    | s/^"(\d+)\s+(\S+)\s+(\d+)"/"$3-$months{$2}-$1"/;
    | print;
    | }
    `----

    That's good enough for the problem which was described and it's four
    lines of code. "Truly creative", -10 lines of code were saved here
    and a comment explaining an 'ugly' workaround for deficiency in the
    downloaded code had to be added as well[*],

    while (1) {
     
    Rainer Weikusat, Jan 4, 2013
    #6
  7. Dr Eberhard Lisse

    C.DeRykus Guest

    Maybe even shrink it to a long one-liner:

    perl -MDate::Manip -pi.bak -le 's{^"(\d+)\s+(\S+)\s+(\d+)"}
    {"$3-" . UnixDate("$1 $2 $3","%m") . "-$1"}e' infile
     
    C.DeRykus, Jan 5, 2013
    #7
  8. Considering the situation of the OP, he has a 'zero line' solution
    because all code was written by someone else. I don't know how his is
    for other people, however, I can type

    qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec)

    much faster than I can download anything from the net, especially
    considering that I'd have to read to documentation for this anything,
    too, making this a very bad tradeoff. And if I had to rely one someone
    else's code for totally trivial stuff such as splitting a text file
    with n 'somehow separated' data columns into an array, I would have a
    very hard time solving the much more complicated problems I usually
    need to deal with. Actually, I regularly search CPAN whenever I have a
    reasonably complex and self-contained subtask of something that 'using
    a module' if one existed would be a good idea. The most common result
    of this searches, however, is 'nada', the second most common is some
    totally bizarre implementation of 25% of the features I actually need
    and the third 'implementation is total crap' aka 'IO::poll' (and the
    original author abandoned the code in question in 1975 in order to
    become a missionary in Gabun or something like that).

    CPAN is mostly a load of tripe resulting from fifteen years of bored
    'hobbyists' (here supposed to mean people whose actual job isn't
    programming) trying whatever weirdo-approach for solving fifty
    different but vaguely related _trivial_ problems with the help of a
    steam-engine powered motor umbrella constructed out of yellow,
    magenta and purple lego bricks happened to come to their mind. And
    downloading all these 'incredible machines' is - except in case of
    500 SLOC throw-away 'oneliners' - not the end of the story: I have to
    maintain the code because the people who use the software I'm
    responsible for come to me with any problems resulting from that.

    The rule of thumb I usually follow is that 'using a library' (or -
    something I very much prefer - an already written program somebody
    actually used to solve a real problem) is only worth the effort if it
    saves a significant amount of work, at least something like 500 lines
    of code and preferably, a few thousands. And even then, I end up
    'maintaining' seriously byzantine workarounds for all the problems in
    the 'free' code until I grow tired of that and replace it with
    something which actually works (in the sense that it reliably does
    what is needed to solve the problem I have to solve and nothing else)
    more often than not.
     
    Rainer Weikusat, Jan 5, 2013
    #8
  9. The OP is an elderly Obstetrician & Gynecologist, who occasionally needs
    to Practically Extract and Report stuff.

    el

     
    Dr Eberhard Lisse, Jan 5, 2013
    #9
  10. Dr Eberhard Lisse

    C.DeRykus Guest

    Hm, it sounded like he just a separate tab-delimited
    file he needed in a different format (ideal for a 1-
    liner.) The -i switch is especially useful for just
    this if the scenario allows it.
    I can appreciate your viewpoint. Date::Manip though
    is well-maintained and extraordinarily useful. There
    are several other very good Date modules as well.

    Leveraging a small bit of module code for a tedious,
    surprisingly frequent little chore appeals to the
    very lazy. So, it's worth it IMO :)
     
    C.DeRykus, Jan 5, 2013
    #10
  11. [and need to translate that to]
    If you weren't using -i, it wasn't necessary to worry about creating a
    backup file since the modified content would end up in a new file.
    It would call this a case of 'false laziness': You happen to be
    familiar with a certain 'date munging' module. The OP wanted to modify
    some 'structured text field' which happened to be a data. Ergo:
    Clearly, a case for using the date manipulation code. But nothing in
    the described problem is related to dates. A sequence of text of the
    form

    "number0 string number1"

    is supposed to be changed such that it becomes

    number1-number2-number0

    that is, the quotes are supposed to be deleted (I didn't realize
    that), the first and the last subfield should be transposed and the
    middle string replaced by a two-digit number using a simple,
    "well-known" static mapping from twelve three character strings to
    numbers. This is exactly the kind of stuff which can be done very
    easily with perl, ie

    -------------
    %months = map { $_, sprintf('%02d', ++$n); } qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec);

    s/^"(\d+)\s+(\S+)\s+(\d+)"/$3-$months{$2}-$1/, print while (<>);
    -------------

    and telling the OP that he should instead download a couple of
    thousands (probably, I've only counted the DM6 file which figures at
    691 LOC) of lines of code consisting of 972(!) different files, most
    of which are documented(!) as broken and are totally useless for the
    problem at hand is not something I'd call a sound piece of technical
    advice. It is probably possible to use a combine harvester instead of
    a lawnmower but nobody in his right mind would ever do that or suggest
    that others do it.
     
    Rainer Weikusat, Jan 6, 2013
    #11
  12. Dr Eberhard Lisse

    C.DeRykus Guest

    -i is useful in case you're one of those whose
    code never works the first time time though...
    And you can always remove -i later.

    Sure, if you don't deal with this kind of
    transform often, yet another incantation is
    no big deal. And a simple regex can remain
    blissfully ignorant of the fact that it's
    dealing with dates. But then, if tweaks are
    needed, it's "deja vu all over again". Can't
    remember where to cut'n paste your old tweak..
    No problem. Just wade in and watch out for typo's.


    I'd agree there are probably better solutions
    that pulling in the bloat of Date::Manip. But
    there are several good Date modules and it's
    all about leveraging code already written and
    working. Concern with "pulling in a big module"
    is almost always FUD - especially speed concerns. Additionally, if the input format changes, and
    those are dates after all, a good Date module
    probably has a method to cinch the code tweaks.
    One that's already written...

    Then why do we use a simple module function to
    escape HTML for instance.. rather than rolling
    our own? Sometimes a Swiss army knife - rather
    than scrounging around for a small pen knife -
    is worth the extra weight in a knapsack.
     
    C.DeRykus, Jan 6, 2013
    #12
  13. Ah, the Plonkers.

    el


     
    Dr Eberhard W Lisse, Jan 7, 2013
    #13
  14. Leading remark: I'm going to cut this somewhat short. I don't agree
    with your opinion on this, however, essentially repeating myself
    doesn't seem very useful to me, so I'm just going to address a few
    isolated points.
    What I was trying to get at was that it wouldn't be necessary to use
    the 'automatic backup' feature of -i if 'overwriting' (aka
    'destroying') the input file hadn't been requested to begin with: In this
    case, the processed data would go to stdout, immediately available for
    interactive inspection, and could be redirected to some other file if
    so desired at the user's discretion.

    [...]
    'Incantation' is IMO a very unfortunate choice for describing this. It
    is a sequence of instructions with exactly defined meaning which
    causes a machine to perform a specific function. That's a completely
    mundane thing with absolutely no 'magic' of any kind involved (except
    insofar 'any sufficiently advanced technology is indistinguishable
    fomr magic' [as seen by someone who doesn't understand any of it],).

    [...]
    Hmm ... why would I?

    $text =~ s/([<>"'&])/'&#'.ord($1).';'/ge;
     
    Rainer Weikusat, Jan 7, 2013
    #14
  15. Dr Eberhard Lisse

    ccc31807 Guest

    For each line in the file, do something like this, assuming that $date contains a string that matches the date you want to change:
    1. my ($day, $month, $year) = split(/ /, $date);
    2. $date = sprintf("%04d-%02d-%02d", $year, $mo2num{$mo}, $day);

    Line 1 splits your date string into the three components: day, month, year.
    Line 2 reassembles those three components and assigns the result back to $date.
    The hash table %mo2num looks like this:
    my %mo2num = (
    JAN => 1,
    FEB => 2,
    mar => 3,
    etc.
    );

    CC.
     
    ccc31807, Jan 8, 2013
    #15
  16. Thanks,

    el

     
    Dr Eberhard Lisse, Jan 9, 2013
    #16
  17. And assuming the hash exists (I posted a command generating it two
    times), the format can be transformed with a subsitution expression (I
    also posted two times), namely

    s/"(\d+)\s+(\S+)\s+(\d+)"/$3-$mo2num{$2}-$1/
     
    Rainer Weikusat, Jan 9, 2013
    #17
  18. Dr Eberhard Lisse

    Ben Goldberg Guest

    Don't forget that you can use perl's "command line" switches even when you put your program in a file.
    #!/usr/bin/perl -pi.bak
    BEGIN {
    %months = map {;$_, sprintf('%02d', ++$n)}
    qw(Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec);
    }
    s/^"(\d+)\s+(\S+)\s+(\d+)"/"$3-$months{$2}-$1"/;
    __END__
     
    Ben Goldberg, Feb 12, 2013
    #18
  19. The 'BEGIN' serves no useful purpose here: %months needs to be
    initialized before the while-loop uses it. Since statements in a file
    are executed consecutively (anything else would probably be 'a little
    confusing' :), this will be the case with either variant.

    As I wrote in another posting: If perl hadn't been told to destroy the
    input file, also telling it to make a backup of that before doing so
    wasn't necessary. While this probably doesn't matter much for a
    trivial example like this, 'not using -i' also means that the code can
    be debugged and fixed without constantly renaming files or losing the
    original input file altogether in case the 'backup request' was
    accidentally forgotten. This also enables use of the script(let) as
    'another filter' in a more complicated pipeline.
     
    Rainer Weikusat, Feb 12, 2013
    #19
  20. Dr Eberhard Lisse

    Uri Guttman Guest

    BM> There's no need to muck about with the #! line and BEGIN blocks, both of
    BM> which would make it impossible to turn this into a subroutine later:

    BM> my %months = ...;

    BM> local $^I = ".bak";
    BM> while (<>) { ... }

    BM> The edit-in-place handling, including renaming the old file and opening
    BM> and selecting ARGVOUT, is done by the no-filehandle <> operator (or an
    BM> explicit <ARGV> or readline(ARGV)) whenever $^I is set. If you want to
    BM> in-place edit a custom list of files, you can also localise @ARGV.

    and File::Slurp has edit_file and edit_file_lines which are even easier
    to use.

    i do need to add a backup file option to those.

    uri
     
    Uri Guttman, Feb 14, 2013
    #20
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.