A Perl parsing question..

clearguy02 · Jun 12, 2005

Hi experts,

Below is my scenario:

I have the below C:\groups.txt file, with a city name and an associated
list of people belonging to the city. A portion of the file is as
follows:
===============================
1. Pleasanton
rob
smith
james
nick

2. Livermore
sarah
mary
linda

3. CA
rob
jose
smith
james
nick
sarah
maria
mary
linda

4. Chicago
james
bob
vivian

5. IL
james
bob
simon
vivian
ann
bruce
===================================

I am working on a script that throws the following output:
=======================
1. Parent of Pleasanton is CA.
2. Parent of Livermore is CA.
3. Parent of Chicago is IL.
========================

I am currently using hashes, grep command etc in the scirpt, but I have
no success. Can some one kindly help me with the right algorithm here?

Thanks,
Jim/

Heini · Jun 12, 2005

(e-mail address removed) kirjoitti:

Hi experts,

This guy keeps sending these questions all over, most apparently to
collect names or email addresses for some purpose. The sender's name and
email address may vary, but the pattern and style of the message is the
same. It's always about using perl to re-arrange a text file.

clearguy02 · Jun 12, 2005

I am not collecting any names or email addresses here.

I am a manager and not a full time programmer. Once in a while I need
to parse the text files I get...

Pl. don't come to some conclusions right away with out thinking what is
the truth. I never posted any junk mails.. I don't work on perl
scirpts on a full time basis and I struggle sometimes to come up with
the right script.

Thanks,
Jim.

Heini · Jun 12, 2005

(e-mail address removed) kirjoitti:

Pl. don't come to some conclusions right away with out thinking what is
the truth. I never posted any junk mails.. I don't work on perl
scirpts on a full time basis and I struggle sometimes to come up with
the right script.

Right away? I didn't, at the first time. But I have seen the same
question popping up too many times, in several different forums, over a
long timespan. There's no way you could be struggling with the same
simple problem over so much time. No way.

clearguy02 · Jun 12, 2005

Can you go ahead and show me the same above question that I had posted
earlier in hteo ther groups?

Did you read my mail completely yet? Yea, I usually post all
text-parsing questions, because it would help in my reports.

The question I posted today is a new issue and I never posted it any
where else.

Instead of wasting time in arguing, why don't you read my mail and
suggest me a solution?

Thanks,
Jim/

xhoster · Jun 12, 2005

Hi experts,

Below is my scenario:

I have the below C:\groups.txt file, with a city name and an associated
list of people belonging to the city. A portion of the file is as
follows:
===============================
1. Pleasanton
rob
smith
james
nick

2. Livermore
sarah
mary
linda

3. CA

I'm assuming that, contrary to your description, "CA" is not a city.

I am working on a script that throws the following output:
=======================
1. Parent of Pleasanton is CA.
2. Parent of Livermore is CA.
3. Parent of Chicago is IL.
========================

Contrary to your subject, this doesn't seem to be a parsing question.
It is perhaps an inference question, or maybe an implementation of an
inference algorithm.

What criteria, exactly, do you wish to be used to determine that the above
output is the appropriate output? 100% of the people in Pleasanton are
also in CA? 50% of them? 50% are in CA and none are anywhere else?

I am currently using hashes, grep command etc in the scirpt, but I have
no success. Can some one kindly help me with the right algorithm here?

Without knowing what the algorithm is to do, it is hard to help you.

To get the number of elements that overlap between two hashes, you can do:
my $overlap=grep exists $h1{$_}, keys %h2;

Obviously if $overlap == keys %h2, then %h2 is competely contained in %h1.
Does that help?

Xho

clearguy02 · Jun 13, 2005

Thanks Xho..

We know that that all 100% of Pleasanton folks are also in CA. We
don't need to check it in the code. I just want to store the
cities/States into hash or array and then their vlaues into another
array (or values of the the hash).

Yea.. CA is not a city.. I just quoted it as an example.

I am confused as how to break the whole input file into two hashes.

Thanks,
Jim/

Fabian Pilkowski · Jun 13, 2005

* [email protected] said:
I have the below C:\groups.txt file, with a city name and an associated
list of people belonging to the city. A portion of the file is as
follows:

Read in this file first and save all data in a hash. The cities are the
hash keys referring to arrays containing the inhabitants. We could read
the file in paragraph mode to avoid regular expressions in this case. A
proper split() will remove all disturbing blanks implicitly. Nice ;-)

#!/usr/bin/perl -w
use strict;

my %hash;
open my $fh, 'c:/groups.txt' or die $!;
{
local $/ = ""; # paragraph mode
while ( <$fh> ) {
my( undef, $city, @names ) = split;
$hash{ $city } = \@names;
}
}
close $fh or die $!;

Now, we need to compare the inner arrays. I suppose to write a sub which
tests if one array contains another one. The following sub expects two
arrayrefs and returns true if the first is a subset of the second one.
It looks a little bit harder than necessary but as a reward it handles
multisets too (if there are names more than once in one city).

sub isSubset {
my( $u, $v ) = @_;
my %v; $v{$_}++ for @$v;
--$v{$_} >= 0 or return for @$u;
return 1;
}

With this we could compare each array to each other (since we could not
gain any information about subset-ness from your file we have to go this
brute-force way).

for my $x ( keys %hash ) {
for my $y ( keys %hash ) {
print "$x is subset of $y\n"
if $x ne $y and isSubset(@hash{$x,$y})
}
}
__END__

I am working on a script that throws the following output:
=======================
1. Parent of Pleasanton is CA.
2. Parent of Livermore is CA.
3. Parent of Chicago is IL.
========================

I am currently using hashes, grep command etc in the scirpt, but I have
no success. Can some one kindly help me with the right algorithm here?

If I run this script with your sample data in the text file I get the
following output.

Chicago is subset of IL
Pleasanton is subset of CA
Livermore is subset of CA

I think this is the information you want.

regards,
fabian

Heini · Jun 13, 2005

(e-mail address removed) kirjoitti:

Can you go ahead and show me the same above question that I had posted
earlier in hteo ther groups?

Did you really think I cannot give any examples? Poor you - the account
name "clearguy02" happens to be not so well chosen as to get buried in
the noise.

On 10th Jan 2003, you posted a message
- to the ClearCase International User Group mailing list
- with display name: "John Smith"
- and message subject: "A clearCase interview question.."

On 17th Jan 2003, you posted another message
- to the ClearCase International User Group mailing list
- with display name: "Bob Smith"
- and message subject: "Parsing a text file with perl.."

In the either case, the "problem" you posted was very close or almost
identical to the one here - and had nothing to do with ClearCase.

You posted the latter one also to comp.lang.perl on 18th Jan 2003, now
with subject "Extracting a portion of a text file.... " but exactly the
same problem. And again on comp.lang.perl.misc, on 22nd Jan 2003. What
was wrong with the answers you received for the previous postings?

On 17th May 2004, you posted
- to this forum (comp.lang.perl.misc)
- with display name "John Smith"
- and the subject "Parsing a text file..... "

On 21th Feb, you posted another question with the exactly same
subject line to the same forum so that it actually shows up in the same
thread as the original question. This time, the content was different,
but the idea closely related anyway.

Oh, and there are plenty of other examples, and other forums too.
Just Google for them.

All of the threads you have initiated are similar enough to raise
suspicions. For one, you surely do not seem to be interested in learning
anything from the answers you get; they do keep repeating certain basic
things that you keep ignoring year after year.

I cannot believe you post these questions for the reason what you claim.
Smells more like fishing to me.

Heini

Fundamentals of Financial Management Concise 7e Brigham Houston	0	May 1, 2011
comp.lang.c Changes to Answers to Frequently Asked Questions (FAQ)	1	Jul 4, 2004
comp.lang.c Answers to Frequently Asked Questions (FAQ List)	1	Feb 1, 2004

A Perl parsing question..

clearguy02

Heini

clearguy02

Heini

clearguy02

xhoster

clearguy02

Fabian Pilkowski

Heini

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads