Working with Duplicates in Perl to generate Unique ID

esimbo · Jun 17, 2005

Hi

I have been tasked with producing a new input file which requires some
manipulation of a file to generate a unique ID. I have been advised
that Perl will be the simplest course of action here but in all
honesty, I'm not sure where to start.

My input file contains the following snippets of data.

Date, Amount, Refno
2005/01/07, 00000.096532030000,#0000015511
2005/06/07, 00006.963788280000,#0000015511
2005/06/13, 00002.243425000000,#0000030502
2006/06/16, 00002.243425000000,#0000030502
2006/06/16, 00047.230000000000,#0000030502
2005/02/18, 00002.243425000000,#0000040505
2005/02/13, 00001.738765000000,#0000030627

Based on this file, I need to generate a new file containing the same
fields but with an added column for the Unique id.

The premise is simple. Check the refno column and match against that
value against the corresponding value in the next row. If they both
match, then apend append both "I" and the Date to the Refno to generate
the ID. It then iterates through the rows repeating the same step until
it reaches the last occurence of the Refno. When we reach the last
occurence of the Refno, i.e we start a new Refno sequence, in which
case we append a "P".

Therefore, using the sample above, the result I would expect is as
follows

ID,Date,Amount, Refno
0000015511_I_2005/01/07, 2005/01/07, 00000.096532030000,#0000015511
0000015511_P_2005/06/07, 2005/06/07, 00006.963788280000,#0000015511
0000030502_I_2005/06/13, 2005/06/13, 00002.243425000000,#0000030502
0000030502_I_2006/06/16, 2006/06/16, 00002.243425000000,#0000030502
0000030502_P_2005/06/16, 2006/06/16, 00047.230000000000,#0000030502
0000030505_P_2005/02/18, 2005/02/18, 00002.243425000000,#0000040505
0000030627_P_2005/02/13, 2005/02/13, 00001.738765000000,#0000030627

If anyone can provide any assistance here, I'd really be grateful.

Regards.

A. Sinan Unur · Jun 17, 2005

(e-mail address removed) wrote in @o13g2000cwo.googlegroups.com:

I have been tasked with producing a new input file which requires some
manipulation of a file to generate a unique ID. I have been advised
that Perl will be the simplest course of action here but in all
honesty, I'm not sure where to start.

My input file contains the following snippets of data.

Date, Amount, Refno
2005/01/07, 00000.096532030000,#0000015511
2005/06/07, 00006.963788280000,#0000015511
2005/06/13, 00002.243425000000,#0000030502
2006/06/16, 00002.243425000000,#0000030502
2006/06/16, 00047.230000000000,#0000030502
2005/02/18, 00002.243425000000,#0000040505
2005/02/13, 00001.738765000000,#0000030627
....

ID,Date,Amount, Refno
0000015511_I_2005/01/07, 2005/01/07, 00000.096532030000,#0000015511
0000015511_P_2005/06/07, 2005/06/07, 00006.963788280000,#0000015511
0000030502_I_2005/06/13, 2005/06/13, 00002.243425000000,#0000030502
0000030502_I_2006/06/16, 2006/06/16, 00002.243425000000,#0000030502
0000030502_P_2005/06/16, 2006/06/16, 00047.230000000000,#0000030502
0000030505_P_2005/02/18, 2005/02/18, 00002.243425000000,#0000040505
0000030627_P_2005/02/13, 2005/02/13, 00001.738765000000,#0000030627

I would use a hash where each Refno is a key, and values are references
arrays of hash references, assuming that the file is a reasonable size.
You will probably need

perldoc -f split

Given this information, you can write some code now. Then, if you have
problems with your code, please post again.

In the mean time, you might benefit from reading

perldoc perlreftut

as well as the posting guidelines for this group.

Sinan

kingpin2502 · Jun 17, 2005

Sinan

Thanks for your response. I've got a start, which is what I needed. I
must admit I wasn't aware of the rules prior to posting but I'll read
them before I post again..

Thanks.

Emmon

John W. Krahn · Jun 18, 2005

I have been tasked with producing a new input file which requires some
manipulation of a file to generate a unique ID. I have been advised
that Perl will be the simplest course of action here but in all
honesty, I'm not sure where to start.

My input file contains the following snippets of data.

Date, Amount, Refno
2005/01/07, 00000.096532030000,#0000015511
2005/06/07, 00006.963788280000,#0000015511
2005/06/13, 00002.243425000000,#0000030502
2006/06/16, 00002.243425000000,#0000030502
2006/06/16, 00047.230000000000,#0000030502
2005/02/18, 00002.243425000000,#0000040505
2005/02/13, 00001.738765000000,#0000030627

Based on this file, I need to generate a new file containing the same
fields but with an added column for the Unique id.

The premise is simple. Check the refno column and match against that
value against the corresponding value in the next row. If they both
match, then apend append both "I" and the Date to the Refno to generate
the ID. It then iterates through the rows repeating the same step until
it reaches the last occurence of the Refno. When we reach the last
occurence of the Refno, i.e we start a new Refno sequence, in which
case we append a "P".

Therefore, using the sample above, the result I would expect is as
follows

ID,Date,Amount, Refno
0000015511_I_2005/01/07, 2005/01/07, 00000.096532030000,#0000015511
0000015511_P_2005/06/07, 2005/06/07, 00006.963788280000,#0000015511
0000030502_I_2005/06/13, 2005/06/13, 00002.243425000000,#0000030502
0000030502_I_2006/06/16, 2006/06/16, 00002.243425000000,#0000030502
0000030502_P_2005/06/16, 2006/06/16, 00047.230000000000,#0000030502
0000030505_P_2005/02/18, 2005/02/18, 00002.243425000000,#0000040505
0000030627_P_2005/02/13, 2005/02/13, 00001.738765000000,#0000030627

If anyone can provide any assistance here, I'd really be grateful.

use warnings;
use strict;

my %seen;

print
reverse
map $_->[2] ? "$_->[2]_" . ( $seen{ $_->[2] }++ ? 'I' : 'P' ) .
"_$_->[1], $_->[0]" : $_->[0],
map [ $_, m!^([\d/]+)[^#]+#(\d+)$! ],
reverse
<DATA>;

__DATA__
Date, Amount, Refno
2005/01/07, 00000.096532030000,#0000015511
2005/06/07, 00006.963788280000,#0000015511
2005/06/13, 00002.243425000000,#0000030502
2006/06/16, 00002.243425000000,#0000030502
2006/06/16, 00047.230000000000,#0000030502
2005/02/18, 00002.243425000000,#0000040505
2005/02/13, 00001.738765000000,#0000030627

John

Ilmari Karonen · Jun 18, 2005

My input file contains the following snippets of data.

Date, Amount, Refno
2005/01/07, 00000.096532030000,#0000015511
2005/06/07, 00006.963788280000,#0000015511
2005/06/13, 00002.243425000000,#0000030502
2006/06/16, 00002.243425000000,#0000030502
2006/06/16, 00047.230000000000,#0000030502
2005/02/18, 00002.243425000000,#0000040505
2005/02/13, 00001.738765000000,#0000030627

The premise is simple. Check the refno column and match against that
value against the corresponding value in the next row. If they both
match, then apend append both "I" and the Date to the Refno to generate
the ID. It then iterates through the rows repeating the same step until
it reaches the last occurence of the Refno. When we reach the last
occurence of the Refno, i.e we start a new Refno sequence, in which
case we append a "P".

Okay, since you need to look ahead to the next line, it would probably
be easiest to first slurp all the data and then iterate over it. We
can split each line into an array, which will make manipulating the
fields easier, and then reassemble the lines afterwards. So:

#!/usr/bin/perl
use warnings;
use strict;

my @lines = <>; # slurp all lines from input
chomp @lines; # remove newlines
shift @lines; # remove first line (column names)

# split the lines on commas followed by a space or a number sign (#):
my @data = map [split /,[# ]/], @lines;

print "ID, Date, Amount,#Refno\n"; # print new header line

foreach my $i (0 .. $#data) {
my ($date, $amount, $refno) = @{ $data[$i] }; # columns of this row
my $next = $data[$i+1][-1] || ""; # last col of next row
my $char = ($refno eq $next ? "I" : "P"); # I if equal, else P
my $id = join "_", $refno, $char, $date; # construct id
print "$id, $date, $amount,#$refno\n"; # print rebuilt line
}

There, that should do it. Hopefully the comments are clear enough
that you can see how it works. In fact, this turned out to be quite a
nice little example of several common Perl idioms.

One idiom that may not be immediate obvious is $data[$i+1][-1] || "".
The array indexing works just as the comment says, but the "logical
or" with an empty string may be puzzling. In fact, all it does is
eliminate an unnecessary warning. When we reach the last line, and
try to access the last column of the line after that, we get an
undefined value. The "logical or" replaces it with an empty string.
It won't affect the values on other lines, because those are all
considered by perl to be logically true.

kingpin2502 · Jun 20, 2005

Jim

I am very grateful for this.

Thank you
Emmon

kingpin2502 · Jun 20, 2005

Hi Ilmari

That was very clear thank you. I appreciate that very much.

Thanks
Emmon

Sherm Pendley · Jun 20, 2005

kingpin2502 said:
I am very grateful for this.

For what?

sherm--

kingpin2502 · Jun 20, 2005

John

Thanks for your help with this. I really appreciated the help

Thanks
Emmon

Sherm Pendley · Jun 20, 2005

kingpin2502 said:
That was very clear thank you. I appreciate that very much.

*What* was very clear? Please quote enough of the message you're replying
to to provide sufficient context.

sherm--

Sherm Pendley · Jun 20, 2005

kingpin2502 said:
Thanks for your help with this. I really appreciated the help

Whose help, with what?

sherm--

Sherm Pendley · Jun 20, 2005

kingpin2502 said:
I was replying to Ilmari's comments, he wanted to know whether his
comments were clear.

What are you talking about? Imari's comments may have been clear, but
yours aren't. Please quote the relevant parts of the message you're
replying to, so that your own comments make sense.

sherm--

kingpin2502 · Jun 20, 2005

Sherm

I was replying to Ilmari's comments, he wanted to know whether his
comments were clear. The other responses were all individual thank yous
to the responses I got. I wasn't aware at the time, that it didn't
quote the original text in the reply

John Bokma · Jun 20, 2005

Sherm Pendley said:
Whose help, with what?

John's. Oh, and once is enough btw.

Sherm Pendley · Jun 21, 2005

kingpin2502 said:
I'm really not sure where you're going with this. Can you state the
relevance here?

Where I'm going with what? The relevance of what?

Please quote the relevant parts of the messages you're replying to - the rest
of us aren't mind-readers.

I don't see the need to copy
and paste the whole mail I was responding to when all I want to do is
say Thank You.

If you're responding to an email, why would you post the response here in a
usenet group?

You can quite clearly see it in the thread

No, I can't. I'm not using Google Groups, I'm using a news reader. I'm not
looking at a thread, I'm looking at a message. A message that makes no sense
to me because you're making invalid assumptions about what I can see along
with your message.

sherm--

kingpin2502 · Jun 21, 2005

Sherm

I'm really not sure where you're going with this. Can you state the
relevance here? As I have already stated, not quite sure how much
clearer you'll like me to, I was simply saying thank you to the people
who took time to respond to my query. If you look at the thread, you'll
find they are all replies to the authors. I don't see the need to copy
and paste the whole mail I was responding to when all I want to do is
say Thank You. You can quite clearly see it in the thread who I have
replied to.

A. Sinan Unur · Jun 21, 2005

I'm really not sure where you're going with this. Can you state the
relevance here?

Who knows?

Please quote an appropriate amount of context when replying.

Sinan

Jürgen Exner · Jun 21, 2005

kingpin2502 said:
Sherm

I'm really not sure where you're going with this.

What is "this"? Please quote some context such that people have a chance to
know what you are talking about.

Can you state the
relevance here? As I have already stated, not quite sure how much
clearer you'll like me to, I was simply saying thank you to the people
who took time to respond to my query.

That is a very commendable, most people will forget that step.

If you look at the thread,
you'll find they are all replies to the authors.

You don't seem to know much about Usenet. Because of its asynchronous,
distributed implementation there is no guarantee that articles
- arrive on a server in a specific order
- arrive on a server at all
- are available on a server at any specific moment in time
- are visible to a user now
- have been visible to a user in the past
- will ever be visible to a user
To make a long story short: you can never assume that Joe Reader can see or
has seen the same set of articles as you.

Therefore, and to make reading more efficient (no need to scroll back to a
previous article and most important knowing exactly which part of a
preceeding article someone is commenting on) it has been a proven Usenet
custom for the last two decades to quote just so much context from the
preceeding article that your posting is understandable without someone
reading the preceeding article. He may not had a chance to read it.

Now, for a general thank you it is quite customary to follow-up to your own
posting and just to say "Thanks to all who replied, I will try your
suggestions" or something to that effect.

I don't see the need
to copy and paste the whole mail I was responding to when all I want
to do is say Thank You.

That would be quite stupid and frowned upon indeed. You should quote enough
context, such that you reply is understandable on it's own without someone
reading the preceeding posting.

BTW: this is Usenet and there are no mails in Usenet.

You can quite clearly see it in the thread
who I have replied to.

Probably not. _You_ can probable see it, but other people will not because
their view of the thread is different.

jue

Tad McClellan · Jun 21, 2005

kingpin2502 said:
If you look at the thread, you'll
find

How do you know what articles have reached _my_ newserver?

How do you know how articles are displayed to me?

You can quite clearly see it in the thread who I have
replied to.

That is just the point. We *cannot* see that quite clearly.

David Combs · Jul 13, 2005

How do you know what articles have reached _my_ newserver?

How do you know how articles are displayed to me?

That is just the point. We *cannot* see that quite clearly.

I don't know what you guys are using for newsreaders,
but I'm using trn aka trn4, which has the wonderful
feature of drawing a wee tree (root at left, grows to
the right) of the surrounding part of the current thread, eg for
*this* thread:

| Comp.lang.perl.misc #553640 (45 + 1952 more) --(1)--(1)
| From: Tad McClellan <[email protected]> --(1)--(1)--(1)
| [1] Re: Working with Duplicates in Perl to generate Unique ID --(1)--(1)--(1)--(1)--(1)+-(1)
| Reply-To: (e-mail address removed) |-(1)
| Date: Tue Jun 21 12:24:29 EDT 2005 |-(1)
| Lines: 22 \-(1)

(any post not yeat read is shown in square-brackets;
the digit within is for the sub-thread, eg where
someone changes the subject but continues on
with the same thread.)

Also shows where you currently are in the thread.

And you can use the arrow-keys to traverse the thing.

So, having this tree-thing, it's pretty obvious what
a post is replying to.

And here's the entire tree:

| [1] Working with Duplicates in Perl to generate Unique ID
|
| (1)+-(1)--(1)
| |-(1)--(1)--(1)
| |-(1)--(1)--(1)--(1)
| \-(1)--(1)--(1)--(1)--(1)--(1)+-(1)
| |-(1)
| |-(1)
| \-(1)
|
| End of article 553640 (of 555115) -- what next? [npq]
|

(they all show round-parens because I'm replying to the
final post in the thread.)

So, maybe you're giving that guy a needlessly-hard time,
when all he's doing is saying "thanks" (for the prior
post's solution).

Suggestion: maybe switch to trn4 -- or if not that,
then look at it's source and lift the code it
uses to draw the tree.

Man, without the tree, I'd be totally lost, reading
newsgroups!

David

Working on mobile css menu with plenty of frustration!	2	Dec 29, 2022
How can I calculate the last payment of the year to be the sum of all previous payments for that year and subtracting it from Research Costs value?	7	Aug 22, 2023
Trouble with prediction code, for the life of me I can't figure out why it isnt running properly. Help would be appreciated.	0	Jul 8, 2023
Help with code	0	Jun 12, 2022
Brocade Switch Perl Script	0	Aug 19, 2016
Help with my responsive home page	2	Dec 14, 2022
How to use ufixed when it involves multiplication a number of times?(VHDL question)	0	Aug 22, 2016
Connected SQLite to my java program but information are not submitted	2	Aug 2, 2022

Working with Duplicates in Perl to generate Unique ID

esimbo

A. Sinan Unur

kingpin2502

John W. Krahn

Ilmari Karonen

kingpin2502

kingpin2502

Sherm Pendley

kingpin2502

Sherm Pendley

Sherm Pendley

Sherm Pendley

kingpin2502

John Bokma

Sherm Pendley

kingpin2502

A. Sinan Unur

Jürgen Exner

Tad McClellan

David Combs

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads