How to test if a string is already in a array

Discussion in 'Perl Misc' started by Marc Eggenberger, Apr 21, 2005.

  1. Hi there.

    I havent done perl coding for some years now and forgot a lot so bear
    with me ...

    I open a text file, read it line by line and do some splitting and
    substr to get an emailadress of the line. Now the textfile has some
    10k lines and a lot of dublicate mail addresses. I only need each
    emailaddress once .. a bit like a select distinct emailadress would be
    in SQL.

    My though now was to create a array and test if the address is already
    in the array and if not push it into the array. I dont need to have
    the position of the address in the array ... so I thought of something
    like

    if(! exists $address_array[$address]=
    {
    push(@address_array,$address);
    }

    this of course does not work ...

    how would I achive my goal?

    Thanks for any help

    Marc
     
    Marc Eggenberger, Apr 21, 2005
    #1
    1. Advertising

  2. Marc Eggenberger

    John Bokma Guest

    Marc Eggenberger wrote:

    > Hi there.
    >
    > I havent done perl coding for some years now and forgot a lot so bear
    > with me ...


    > I open a text file, read it line by line and do some splitting and
    > substr to get an emailadress of the line. Now the textfile has some
    > 10k lines and a lot of dublicate mail addresses. I only need each
    > emailaddress once .. a bit like a select distinct emailadress would be
    > in SQL.
    >
    > My though now was to create a array and test if the address is already
    > in the array and if not push it into the array. I dont need to have
    > the position of the address in the array ... so I thought of something
    > like
    >
    > if(! exists $address_array[$address]=
    > {
    > push(@address_array,$address);
    > }
    >
    > this of course does not work ...
    >
    > how would I achive my goal?


    my %address_hash;


    and in your loop:

    $address_hash{ $address } = 1;

    ( no need for the test thing )


    keys %address_hash gives the unique addresses.

    BTW: put use strict; use warnings; on top of your script.


    --
    John Small Perl scripts: http://johnbokma.com/perl/
    Perl programmer available: http://castleamber.com/
    Happy Customers: http://castleamber.com/testimonials.html
     
    John Bokma, Apr 21, 2005
    #2
    1. Advertising

  3. Marc Eggenberger

    Maxim Guest

    >
    > if(! exists $address_array[$address])
    > {
    > push(@address_array,$address);
    > }


    Checking every time the string in array would yield O(n^2) complexity.
    The easiest way (I guess) is to do the following: (which is O(n*log n) )

    my %address_hash;

    if( ! exists $address_hash{$address} )
    {
    $address_hash{$address} = 1;
    }

    my @address_array = keys %address_hash;

    Hope this helps

    --
    Maxim Sloyko
     
    Maxim, Apr 21, 2005
    #3
  4. Marc Eggenberger

    Eric Bohlman Guest

    (Marc Eggenberger) wrote in
    news::

    > I havent done perl coding for some years now and forgot a lot so bear
    > with me ...
    >
    > I open a text file, read it line by line and do some splitting and
    > substr to get an emailadress of the line. Now the textfile has some
    > 10k lines and a lot of dublicate mail addresses. I only need each
    > emailaddress once .. a bit like a select distinct emailadress would be
    > in SQL.


    When you start saying words like "duplicate" or "distinct," you should be
    immediately thinking "hash."

    > My though now was to create a array and test if the address is already
    > in the array and if not push it into the array. I dont need to have
    > the position of the address in the array ... so I thought of something
    > like
    >
    > if(! exists $address_array[$address]=
    > {
    > push(@address_array,$address);
    > }
    >
    > this of course does not work ...
    >
    > how would I achive my goal?


    my %addresses;
    ....
    $addresses{$address}=1;
    ....
    foreach my $address (keys %addresses) {
    #do something with the address
    }
     
    Eric Bohlman, Apr 21, 2005
    #4
  5. Marc Eggenberger

    Guest

    ok ... I changed it .. but when I run my new script it prints adresses
    more than once .. why is that?

    Here's my code:

    #!/usr/bin/perl
    use strict;
    use warnings;

    my $textfile = 'empfaenger.txt';

    open(EMPFAENGER, $textfile) || die("Could not open file $textfile");

    my @raw_data = <EMPFAENGER>;
    my %ad_hash;

    foreach my $line (@raw_data)
    {
    my @fields = split(/ /, $line);
    my @fields2 = split(/=/, $fields[6]);
    my $address = $fields2[1];
    $address = substr($address,1,length($address) - 3);

    if(index($address,"domain.ch") > 0)
    {
    $ad_hash{$address} = 1;
    }

    foreach my $key(keys(%ad_hash))
    {
    print $key . "\n";
    }
    }
    close(EMPFAENGER);
     
    , Apr 21, 2005
    #5
  6. Marc Eggenberger

    Guest

    Argl ....
    my last foreach shouldn't be in the foreach loop ...

    stupid me ;)
     
    , Apr 21, 2005
    #6
  7. Marc Eggenberger

    John Bokma Guest

    wrote:

    > ok ... I changed it .. but when I run my new script it prints adresses
    > more than once .. why is that?


    because you print the keys inside the loop

    > Here's my code:
    >
    > #!/usr/bin/perl
    > use strict;
    > use warnings;
    >
    > my $textfile = 'empfaenger.txt';
    >
    > open(EMPFAENGER, $textfile) || die("Could not open file $textfile");


    open my $fh, $textfile or die "Can't open '$textfile': $!";

    $! = why it didn't work, if you don't print that, you get quite a
    meaningless error

    > my @raw_data = <EMPFAENGER>;


    if you do this, you can close now, not after the loop, or:

    my %ad_hash;

    while ( my $line = <$fh> ) {

    > my %ad_hash;
    >
    > foreach my $line (@raw_data)
    > {
    > my @fields = split(/ /, $line);
    > my @fields2 = split(/=/, $fields[6]);
    > my $address = $fields2[1];
    > $address = substr($address,1,length($address) - 3);


    this probably could be done in a shorter way :-D

    > if(index($address,"domain.ch") > 0)
    > {
    > $ad_hash{$address} = 1;
    > }


    $ad_hash{ $address } = 1 if index( $address, "domain.ch") > 0;
    }

    or:

    index( $address, "domain.ch" ) > 0 and $ad_hash{ $address } = 1;
    }

    or:

    index( $address, "domain.ch" ) > 0 or next;
    $ad_hash{ $address } = 1;
    }

    then close:

    close $fh or die "Can't close '$textfile': $!";


    > foreach my $key(keys(%ad_hash))
    > {
    > print $key . "\n";
    > }


    You can write the print as:

    print "$key\n";

    a shorter way to write the print all:

    print "$_\n" for keys %add_hash;

    or

    print map { "$_\n } keys %add_hash;


    --
    John Small Perl scripts: http://johnbokma.com/perl/
    Perl programmer available: http://castleamber.com/
    Happy Customers: http://castleamber.com/testimonials.html
     
    John Bokma, Apr 21, 2005
    #7
  8. Marc Eggenberger <> wrote:

    > I havent done perl coding for some years now and forgot a lot so bear
    > with me ...



    You are still expected to check the Perl FAQ *before* posting.


    > a lot of dublicate mail addresses. I only need each
    > emailaddress once



    The answer is easy to find once you spell the search term correctly:

    perldoc -q duplicate

    How can I remove duplicate elements from a list or array?


    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
     
    Tad McClellan, Apr 21, 2005
    #8
  9. <> wrote:


    > if(index($address,"domain.ch") > 0)



    What would index() return if

    $address = 'domain.ch';

    ??


    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
     
    Tad McClellan, Apr 21, 2005
    #9
  10. Tad McClellan wrote:

    > Marc Eggenberger <> wrote:
    >
    >>a lot of dublicate mail addresses.


    > The answer is easy to find once you spell the search term correctly:
    >
    > perldoc -q duplicate


    There's a "dubya" joke in there somewhere... :)

    sherm--

    --
    Cocoa programming in Perl: http://camelbones.sourceforge.net
    Hire me! My resume: http://www.dot-app.org
     
    Sherm Pendley, Apr 21, 2005
    #10
  11. Marc Eggenberger wrote:
    [...]
    > substr to get an emailadress of the line. Now the textfile has some
    > 10k lines and a lot of dublicate mail addresses. I only need each
    > emailaddress once ..


    See the very last sentence in "perldoc -q duplicate"

    jue
     
    Jürgen Exner, Apr 21, 2005
    #11
  12. * Tad McClellan schrieb:
    > wrote:
    >
    > > if(index($address,"domain.ch") > 0)

    >
    > What would index() return if
    >
    > $address = 'domain.ch';
    >
    > ??


    Sure, index() returns the position where the string 'domain.ch' starts
    and -1 otherwise. So when address is 'domain.ch' it will return 0. But
    be aware that 'domain.ch' is no valid mail address. Therefore I see no
    need to add that to a hash containing mail addresses. Hence, I suggest
    to test whether index() returns *two or more*. It's the minimum a mail
    address must have in front of the domain part: at least one character
    for the local part, followed by »@«.

    Sure, mail addresses like '' will by-pass this
    *filter*, but index() is not made for complicated things ;-)

    regards,
    fabian
     
    Fabian Pilkowski, Apr 21, 2005
    #12
  13. Fabian Pilkowski <-marburg.de> wrote:
    > * Tad McClellan schrieb:
    >> wrote:
    >>
    >> > if(index($address,"domain.ch") > 0)

    >>
    >> What would index() return if
    >>
    >> $address = 'domain.ch';
    >>
    >> ??

    >


    > So when address is 'domain.ch' it will return 0.



    So the test above should be >= rather than >


    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
     
    Tad McClellan, Apr 21, 2005
    #13
  14. * Tad McClellan schrieb:
    > Fabian Pilkowski <-marburg.de> wrote:
    >>>
    >>> > if(index($address,"domain.ch") > 0)
    >>>
    >>> What would index() return if
    >>>
    >>> $address = 'domain.ch';
    >>>
    >>> ??

    >
    >> So when address is 'domain.ch' it will return 0.

    >
    > So the test above should be >= rather than >


    Or

    if ( index($address,'domain.ch') >= 2 )

    as I mentioned in my previous posting.

    regards,
    fabian
     
    Fabian Pilkowski, Apr 21, 2005
    #14
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Richard
    Replies:
    1
    Views:
    457
    Richard
    May 7, 2006
  2. cyberco
    Replies:
    6
    Views:
    710
    John Machin
    Nov 20, 2006
  3. Vincent RICHOMME
    Replies:
    12
    Views:
    711
    kwikius
    May 29, 2006
  4. Skybuck Flying

    Call oddities: &Test() vs &Test vs Test

    Skybuck Flying, Oct 4, 2009, in forum: C Programming
    Replies:
    1
    Views:
    766
    Skybuck Flying
    Oct 4, 2009
  5. Colin Steadman

    Can I add rows to an already dimensioned 2d array

    Colin Steadman, May 17, 2004, in forum: ASP General
    Replies:
    4
    Views:
    210
    dlbjr
    May 18, 2004
Loading...

Share This Page