Working with Duplicates in Perl to generate Unique ID

Discussion in 'Perl Misc' started by esimbo@gmail.com, Jun 17, 2005.

  1. Guest

    Hi

    I have been tasked with producing a new input file which requires some
    manipulation of a file to generate a unique ID. I have been advised
    that Perl will be the simplest course of action here but in all
    honesty, I'm not sure where to start.

    My input file contains the following snippets of data.

    Date, Amount, Refno
    2005/01/07, 00000.096532030000,#0000015511
    2005/06/07, 00006.963788280000,#0000015511
    2005/06/13, 00002.243425000000,#0000030502
    2006/06/16, 00002.243425000000,#0000030502
    2006/06/16, 00047.230000000000,#0000030502
    2005/02/18, 00002.243425000000,#0000040505
    2005/02/13, 00001.738765000000,#0000030627

    Based on this file, I need to generate a new file containing the same
    fields but with an added column for the Unique id.

    The premise is simple. Check the refno column and match against that
    value against the corresponding value in the next row. If they both
    match, then apend append both "I" and the Date to the Refno to generate
    the ID. It then iterates through the rows repeating the same step until
    it reaches the last occurence of the Refno. When we reach the last
    occurence of the Refno, i.e we start a new Refno sequence, in which
    case we append a "P".

    Therefore, using the sample above, the result I would expect is as
    follows

    ID,Date,Amount, Refno
    0000015511_I_2005/01/07, 2005/01/07, 00000.096532030000,#0000015511
    0000015511_P_2005/06/07, 2005/06/07, 00006.963788280000,#0000015511
    0000030502_I_2005/06/13, 2005/06/13, 00002.243425000000,#0000030502
    0000030502_I_2006/06/16, 2006/06/16, 00002.243425000000,#0000030502
    0000030502_P_2005/06/16, 2006/06/16, 00047.230000000000,#0000030502
    0000030505_P_2005/02/18, 2005/02/18, 00002.243425000000,#0000040505
    0000030627_P_2005/02/13, 2005/02/13, 00001.738765000000,#0000030627

    If anyone can provide any assistance here, I'd really be grateful.

    Regards.
     
    , Jun 17, 2005
    #1
    1. Advertising

  2. wrote in news:1119019737.746603.282920
    @o13g2000cwo.googlegroups.com:

    > I have been tasked with producing a new input file which requires some
    > manipulation of a file to generate a unique ID. I have been advised
    > that Perl will be the simplest course of action here but in all
    > honesty, I'm not sure where to start.
    >
    > My input file contains the following snippets of data.
    >
    > Date, Amount, Refno
    > 2005/01/07, 00000.096532030000,#0000015511
    > 2005/06/07, 00006.963788280000,#0000015511
    > 2005/06/13, 00002.243425000000,#0000030502
    > 2006/06/16, 00002.243425000000,#0000030502
    > 2006/06/16, 00047.230000000000,#0000030502
    > 2005/02/18, 00002.243425000000,#0000040505
    > 2005/02/13, 00001.738765000000,#0000030627


    ....

    > ID,Date,Amount, Refno
    > 0000015511_I_2005/01/07, 2005/01/07, 00000.096532030000,#0000015511
    > 0000015511_P_2005/06/07, 2005/06/07, 00006.963788280000,#0000015511
    > 0000030502_I_2005/06/13, 2005/06/13, 00002.243425000000,#0000030502
    > 0000030502_I_2006/06/16, 2006/06/16, 00002.243425000000,#0000030502
    > 0000030502_P_2005/06/16, 2006/06/16, 00047.230000000000,#0000030502
    > 0000030505_P_2005/02/18, 2005/02/18, 00002.243425000000,#0000040505
    > 0000030627_P_2005/02/13, 2005/02/13, 00001.738765000000,#0000030627


    I would use a hash where each Refno is a key, and values are references
    arrays of hash references, assuming that the file is a reasonable size.
    You will probably need

    perldoc -f split

    Given this information, you can write some code now. Then, if you have
    problems with your code, please post again.

    In the mean time, you might benefit from reading

    perldoc perlreftut

    as well as the posting guidelines for this group.

    Sinan


    --
    A. Sinan Unur <>
    (reverse each component and remove .invalid for email address)

    comp.lang.perl.misc guidelines on the WWW:
    http://mail.augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html
     
    A. Sinan Unur, Jun 17, 2005
    #2
    1. Advertising

  3. kingpin2502 Guest

    Sinan

    Thanks for your response. I've got a start, which is what I needed. I
    must admit I wasn't aware of the rules prior to posting but I'll read
    them before I post again..

    Thanks.

    Emmon
     
    kingpin2502, Jun 17, 2005
    #3
  4. wrote:
    >
    > I have been tasked with producing a new input file which requires some
    > manipulation of a file to generate a unique ID. I have been advised
    > that Perl will be the simplest course of action here but in all
    > honesty, I'm not sure where to start.
    >
    > My input file contains the following snippets of data.
    >
    > Date, Amount, Refno
    > 2005/01/07, 00000.096532030000,#0000015511
    > 2005/06/07, 00006.963788280000,#0000015511
    > 2005/06/13, 00002.243425000000,#0000030502
    > 2006/06/16, 00002.243425000000,#0000030502
    > 2006/06/16, 00047.230000000000,#0000030502
    > 2005/02/18, 00002.243425000000,#0000040505
    > 2005/02/13, 00001.738765000000,#0000030627
    >
    > Based on this file, I need to generate a new file containing the same
    > fields but with an added column for the Unique id.
    >
    > The premise is simple. Check the refno column and match against that
    > value against the corresponding value in the next row. If they both
    > match, then apend append both "I" and the Date to the Refno to generate
    > the ID. It then iterates through the rows repeating the same step until
    > it reaches the last occurence of the Refno. When we reach the last
    > occurence of the Refno, i.e we start a new Refno sequence, in which
    > case we append a "P".
    >
    > Therefore, using the sample above, the result I would expect is as
    > follows
    >
    > ID,Date,Amount, Refno
    > 0000015511_I_2005/01/07, 2005/01/07, 00000.096532030000,#0000015511
    > 0000015511_P_2005/06/07, 2005/06/07, 00006.963788280000,#0000015511
    > 0000030502_I_2005/06/13, 2005/06/13, 00002.243425000000,#0000030502
    > 0000030502_I_2006/06/16, 2006/06/16, 00002.243425000000,#0000030502
    > 0000030502_P_2005/06/16, 2006/06/16, 00047.230000000000,#0000030502
    > 0000030505_P_2005/02/18, 2005/02/18, 00002.243425000000,#0000040505
    > 0000030627_P_2005/02/13, 2005/02/13, 00001.738765000000,#0000030627
    >
    > If anyone can provide any assistance here, I'd really be grateful.


    use warnings;
    use strict;

    my %seen;

    print
    reverse
    map $_->[2] ? "$_->[2]_" . ( $seen{ $_->[2] }++ ? 'I' : 'P' ) .
    "_$_->[1], $_->[0]" : $_->[0],
    map [ $_, m!^([\d/]+)[^#]+#(\d+)$! ],
    reverse
    <DATA>;


    __DATA__
    Date, Amount, Refno
    2005/01/07, 00000.096532030000,#0000015511
    2005/06/07, 00006.963788280000,#0000015511
    2005/06/13, 00002.243425000000,#0000030502
    2006/06/16, 00002.243425000000,#0000030502
    2006/06/16, 00047.230000000000,#0000030502
    2005/02/18, 00002.243425000000,#0000040505
    2005/02/13, 00001.738765000000,#0000030627



    John
    --
    use Perl;
    program
    fulfillment
     
    John W. Krahn, Jun 18, 2005
    #4
  5. <> kirjoitti 17.06.2005:
    >
    > My input file contains the following snippets of data.
    >
    > Date, Amount, Refno
    > 2005/01/07, 00000.096532030000,#0000015511
    > 2005/06/07, 00006.963788280000,#0000015511
    > 2005/06/13, 00002.243425000000,#0000030502
    > 2006/06/16, 00002.243425000000,#0000030502
    > 2006/06/16, 00047.230000000000,#0000030502
    > 2005/02/18, 00002.243425000000,#0000040505
    > 2005/02/13, 00001.738765000000,#0000030627
    >
    > The premise is simple. Check the refno column and match against that
    > value against the corresponding value in the next row. If they both
    > match, then apend append both "I" and the Date to the Refno to generate
    > the ID. It then iterates through the rows repeating the same step until
    > it reaches the last occurence of the Refno. When we reach the last
    > occurence of the Refno, i.e we start a new Refno sequence, in which
    > case we append a "P".


    Okay, since you need to look ahead to the next line, it would probably
    be easiest to first slurp all the data and then iterate over it. We
    can split each line into an array, which will make manipulating the
    fields easier, and then reassemble the lines afterwards. So:

    #!/usr/bin/perl
    use warnings;
    use strict;

    my @lines = <>; # slurp all lines from input
    chomp @lines; # remove newlines
    shift @lines; # remove first line (column names)

    # split the lines on commas followed by a space or a number sign (#):
    my @data = map [split /,[# ]/], @lines;

    print "ID, Date, Amount,#Refno\n"; # print new header line

    foreach my $i (0 .. $#data) {
    my ($date, $amount, $refno) = @{ $data[$i] }; # columns of this row
    my $next = $data[$i+1][-1] || ""; # last col of next row
    my $char = ($refno eq $next ? "I" : "P"); # I if equal, else P
    my $id = join "_", $refno, $char, $date; # construct id
    print "$id, $date, $amount,#$refno\n"; # print rebuilt line
    }

    There, that should do it. Hopefully the comments are clear enough
    that you can see how it works. In fact, this turned out to be quite a
    nice little example of several common Perl idioms.

    One idiom that may not be immediate obvious is $data[$i+1][-1] || "".
    The array indexing works just as the comment says, but the "logical
    or" with an empty string may be puzzling. In fact, all it does is
    eliminate an unnecessary warning. When we reach the last line, and
    try to access the last column of the line after that, we get an
    undefined value. The "logical or" replaces it with an empty string.
    It won't affect the values on other lines, because those are all
    considered by perl to be logically true.

    --
    Ilmari Karonen
    To reply by e-mail, please replace ".invalid" with ".net" in address.
     
    Ilmari Karonen, Jun 18, 2005
    #5
  6. kingpin2502 Guest

    Jim

    I am very grateful for this.

    Thank you
    Emmon
     
    kingpin2502, Jun 20, 2005
    #6
  7. kingpin2502 Guest

    Hi Ilmari

    That was very clear thank you. I appreciate that very much.

    Thanks
    Emmon
     
    kingpin2502, Jun 20, 2005
    #7
  8. "kingpin2502" <> writes:

    > I am very grateful for this.


    For what?

    sherm--
     
    Sherm Pendley, Jun 20, 2005
    #8
  9. kingpin2502 Guest

    John

    Thanks for your help with this. I really appreciated the help

    Thanks
    Emmon
     
    kingpin2502, Jun 20, 2005
    #9
  10. "kingpin2502" <> writes:

    > That was very clear thank you. I appreciate that very much.


    *What* was very clear? Please quote enough of the message you're replying
    to to provide sufficient context.

    sherm--
     
    Sherm Pendley, Jun 20, 2005
    #10
  11. "kingpin2502" <> writes:

    > Thanks for your help with this. I really appreciated the help


    Whose help, with what?

    sherm--
     
    Sherm Pendley, Jun 20, 2005
    #11
  12. "kingpin2502" <> writes:

    > I was replying to Ilmari's comments, he wanted to know whether his
    > comments were clear.


    What are you talking about? Imari's comments may have been clear, but
    yours aren't. Please quote the relevant parts of the message you're
    replying to, so that your own comments make sense.

    sherm--
     
    Sherm Pendley, Jun 20, 2005
    #12
  13. kingpin2502 Guest

    Sherm

    I was replying to Ilmari's comments, he wanted to know whether his
    comments were clear. The other responses were all individual thank yous
    to the responses I got. I wasn't aware at the time, that it didn't
    quote the original text in the reply
     
    kingpin2502, Jun 20, 2005
    #13
  14. John Bokma Guest

    John Bokma, Jun 20, 2005
    #14
  15. "kingpin2502" <> writes:

    > I'm really not sure where you're going with this. Can you state the
    > relevance here?


    Where I'm going with what? The relevance of what?

    Please quote the relevant parts of the messages you're replying to - the rest
    of us aren't mind-readers.

    > I don't see the need to copy
    > and paste the whole mail I was responding to when all I want to do is
    > say Thank You.


    If you're responding to an email, why would you post the response here in a
    usenet group?

    > You can quite clearly see it in the thread


    No, I can't. I'm not using Google Groups, I'm using a news reader. I'm not
    looking at a thread, I'm looking at a message. A message that makes no sense
    to me because you're making invalid assumptions about what I can see along
    with your message.

    sherm--
     
    Sherm Pendley, Jun 21, 2005
    #15
  16. kingpin2502 Guest

    Sherm

    I'm really not sure where you're going with this. Can you state the
    relevance here? As I have already stated, not quite sure how much
    clearer you'll like me to, I was simply saying thank you to the people
    who took time to respond to my query. If you look at the thread, you'll
    find they are all replies to the authors. I don't see the need to copy
    and paste the whole mail I was responding to when all I want to do is
    say Thank You. You can quite clearly see it in the thread who I have
    replied to.
     
    kingpin2502, Jun 21, 2005
    #16
  17. "kingpin2502" <> wrote in news:1119357774.789912.164810
    @o13g2000cwo.googlegroups.com:

    > I'm really not sure where you're going with this. Can you state the
    > relevance here?


    Who knows?

    Please quote an appropriate amount of context when replying.

    Sinan
    --
    A. Sinan Unur <>
    (reverse each component and remove .invalid for email address)

    comp.lang.perl.misc guidelines on the WWW:
    http://mail.augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html
     
    A. Sinan Unur, Jun 21, 2005
    #17
  18. kingpin2502 wrote:
    > Sherm
    >
    > I'm really not sure where you're going with this.


    What is "this"? Please quote some context such that people have a chance to
    know what you are talking about.

    > Can you state the
    > relevance here? As I have already stated, not quite sure how much
    > clearer you'll like me to, I was simply saying thank you to the people
    > who took time to respond to my query.


    That is a very commendable, most people will forget that step.

    > If you look at the thread,
    > you'll find they are all replies to the authors.


    You don't seem to know much about Usenet. Because of its asynchronous,
    distributed implementation there is no guarantee that articles
    - arrive on a server in a specific order
    - arrive on a server at all
    - are available on a server at any specific moment in time
    - are visible to a user now
    - have been visible to a user in the past
    - will ever be visible to a user
    To make a long story short: you can never assume that Joe Reader can see or
    has seen the same set of articles as you.

    Therefore, and to make reading more efficient (no need to scroll back to a
    previous article and most important knowing exactly which part of a
    preceeding article someone is commenting on) it has been a proven Usenet
    custom for the last two decades to quote just so much context from the
    preceeding article that your posting is understandable without someone
    reading the preceeding article. He may not had a chance to read it.

    Now, for a general thank you it is quite customary to follow-up to your own
    posting and just to say "Thanks to all who replied, I will try your
    suggestions" or something to that effect.

    > I don't see the need
    > to copy and paste the whole mail I was responding to when all I want
    > to do is say Thank You.


    That would be quite stupid and frowned upon indeed. You should quote enough
    context, such that you reply is understandable on it's own without someone
    reading the preceeding posting.

    BTW: this is Usenet and there are no mails in Usenet.

    > You can quite clearly see it in the thread
    > who I have replied to.


    Probably not. _You_ can probable see it, but other people will not because
    their view of the thread is different.

    jue
     
    Jürgen Exner, Jun 21, 2005
    #18
  19. kingpin2502 <> wrote:

    > If you look at the thread, you'll
    > find



    How do you know what articles have reached _my_ newserver?

    How do you know how articles are displayed to me?


    > You can quite clearly see it in the thread who I have
    > replied to.



    That is just the point. We *cannot* see that quite clearly.


    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
     
    Tad McClellan, Jun 21, 2005
    #19
  20. David Combs Guest

    In article <>,
    Tad McClellan <> wrote:
    >kingpin2502 <> wrote:
    >
    >> If you look at the thread, you'll
    >> find

    >
    >
    >How do you know what articles have reached _my_ newserver?
    >
    >How do you know how articles are displayed to me?
    >
    >
    >> You can quite clearly see it in the thread who I have
    >> replied to.

    >
    >
    >That is just the point. We *cannot* see that quite clearly.
    >
    >
    >--
    > Tad McClellan SGML consulting
    > Perl programming
    > Fort Worth, Texas




    I don't know what you guys are using for newsreaders,
    but I'm using trn aka trn4, which has the wonderful
    feature of drawing a wee tree (root at left, grows to
    the right) of the surrounding part of the current thread, eg for
    *this* thread:



    | Comp.lang.perl.misc #553640 (45 + 1952 more) --(1)--(1)
    | From: Tad McClellan <> --(1)--(1)--(1)
    | [1] Re: Working with Duplicates in Perl to generate Unique ID --(1)--(1)--(1)--(1)--(1)+-(1)
    | Reply-To: |-(1)
    | Date: Tue Jun 21 12:24:29 EDT 2005 |-(1)
    | Lines: 22 \-(1)



    (any post not yeat read is shown in square-brackets;
    the digit within is for the sub-thread, eg where
    someone changes the subject but continues on
    with the same thread.)

    Also shows where you currently are in the thread.

    And you can use the arrow-keys to traverse the thing.

    So, having this tree-thing, it's pretty obvious what
    a post is replying to.

    And here's the entire tree:

    | [1] Working with Duplicates in Perl to generate Unique ID
    |
    | (1)+-(1)--(1)
    | |-(1)--(1)--(1)
    | |-(1)--(1)--(1)--(1)
    | \-(1)--(1)--(1)--(1)--(1)--(1)+-(1)
    | |-(1)
    | |-(1)
    | \-(1)
    |
    | End of article 553640 (of 555115) -- what next? [npq]
    |

    (they all show round-parens because I'm replying to the
    final post in the thread.)


    So, maybe you're giving that guy a needlessly-hard time,
    when all he's doing is saying "thanks" (for the prior
    post's solution).

    Suggestion: maybe switch to trn4 -- or if not that,
    then look at it's source and lift the code it
    uses to draw the tree.

    Man, without the tree, I'd be totally lost, reading
    newsgroups!


    David
     
    David Combs, Jul 13, 2005
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Ronald
    Replies:
    6
    Views:
    6,994
    Andy Mortimer [MS]
    Feb 23, 2004
  2. Replies:
    0
    Views:
    531
  3. ToshiBoy
    Replies:
    6
    Views:
    858
    ToshiBoy
    Aug 12, 2008
  4. Chuck Remes
    Replies:
    23
    Views:
    350
    Joel VanderWerf
    Jul 20, 2009
  5. Token Type
    Replies:
    9
    Views:
    365
    Chris Angelico
    Sep 9, 2012
Loading...

Share This Page