Matching strings with index – getting extra matches.

Discussion in 'Perl Misc' started by G, Feb 9, 2004.

  1. G

    G Guest

    I’m looping through a sales_file looking for matches. The file
    has a number of entries such as the following:

    sales item aaa | m423a
    sales item bbb | m423
    sales item ccc | m423b
    sales item ddd | 423

    These refer to sales_item and code respectively.

    Here is the code segment:

    open FILE, "<$sales_file";
    while (<FILE>) {
    ($sales_item, $code) = split /\|/;
    if (index($code, $entered_code) != -1) {
    $list .= "<br>" if ($list);
    $list .= $sales_item;
    }
    } # while
    close FILE;

    The problem is, if the $entered_code is 423 I get matches for all 4
    when I would only want matches for the fourth sales item “sales
    item ddd” line. Similarly, an $entered_code of m423 would match
    the first 3. Any suggestions on how I can get the right matches,
    keeping in mind that I would prefer to do it in code, and not alter
    the sales_file.

    Thanks,

    C
    G, Feb 9, 2004
    #1
    1. Advertising

  2. G

    Paul Lalli Guest

    On Mon, 9 Feb 2004, G wrote:

    > I’m looping through a sales_file looking for matches. The file
    > has a number of entries such as the following:
    >
    > sales item aaa | m423a
    > sales item bbb | m423
    > sales item ccc | m423b
    > sales item ddd | 423
    >
    > These refer to sales_item and code respectively.
    >
    > Here is the code segment:
    >
    > open FILE, "<$sales_file";
    > while (<FILE>) {
    > ($sales_item, $code) = split /\|/;
    > if (index($code, $entered_code) != -1) {
    > $list .= "<br>" if ($list);
    > $list .= $sales_item;
    > }
    > } # while
    > close FILE;
    >
    > The problem is, if the $entered_code is 423 I get matches for all 4
    > when I would only want matches for the fourth sales item “sales
    > item ddd” line. Similarly, an $entered_code of m423 would match
    > the first 3. Any suggestions on how I can get the right matches,
    > keeping in mind that I would prefer to do it in code, and not alter
    > the sales_file.


    Replace the index() line with
    if ($code =~ /^\s*$entered_code\s*$/) {

    This will search the $code line for 'beginning of string, possible white
    space, the code, possible white space, end of string', rather than just
    "the code anywhere within the string" as you're doing now.

    Paul Lalli
    Paul Lalli, Feb 9, 2004
    #2
    1. Advertising

  3. G

    gnari Guest

    "G" <> wrote in message
    news:...

    [problem with index not matching string exactly]

    > sales item aaa | m423a
    > sales item bbb | m423
    > sales item ccc | m423b
    > sales item ddd | 423


    if there is allways space around the '|' you should
    have them in your split

    > ...
    > ($sales_item, $code) = split /\|/;

    ($sales_item, $code) = split / \| /;

    > if (index($code, $entered_code) != -1) {



    if there is no trailing space after the code then
    if ($code eq $entered_code) {

    if on the other hand, your data is dirty with
    whilespace, you are better off with a
    regexp match as someone else suggested
    or even replace the split with a match:

    ($sales_item, $code) = /^\s*(.+?)\s*\|\s*(.+?)\s*/;
    if ($code eq $entered_code) {

    gnari
    gnari, Feb 9, 2004
    #3
  4. G <> wrote:

    > open FILE, "<$sales_file";



    You should always, yes *always*, check the return value from open():

    open FILE, "<$sales_file" or die "could not open '$sales_file' $!";


    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
    Tad McClellan, Feb 9, 2004
    #4
  5. G

    G Guest

    "gnari" <> wrote in message news:<c08oh6$3b9$>...
    > "G" <> wrote in message
    > news:...
    >
    > [problem with index not matching string exactly]
    >
    > > sales item aaa | m423a
    > > sales item bbb | m423
    > > sales item ccc | m423b
    > > sales item ddd | 423

    >
    > if there is allways space around the '|' you should
    > have them in your split
    >
    > > ...
    > > ($sales_item, $code) = split /\|/;

    > ($sales_item, $code) = split / \| /;
    >
    > > if (index($code, $entered_code) != -1) {

    >
    >
    > if there is no trailing space after the code then
    > if ($code eq $entered_code) {
    >
    > if on the other hand, your data is dirty with
    > whilespace, you are better off with a
    > regexp match as someone else suggested
    > or even replace the split with a match:
    >
    > ($sales_item, $code) = /^\s*(.+?)\s*\|\s*(.+?)\s*/;
    > if ($code eq $entered_code) {
    >
    > gnari


    Thanks for the suggestions so far, but I now realize I the sample text
    file was flawed. For one there is Never white space around the '|'.
    Secondly a line could have multiple codes but no duplicates(on that
    line only). The sample file should have looked as follows:

    sales item aaa|543,m423a
    sales item bbb|m423,543 'Note how code 543 is on the 1st 2nd
    line.
    sales item ccc|m423b
    sales item ddd|423,423b,m523,652

    Given that the above has changed how could I get a match. e.g. a code
    of 423 should return the description in line 4 "sales item ddd" Where
    m423 only matches the 3rd line. etc.

    Thanks,

    C
    G, Feb 10, 2004
    #5
  6. G

    Ben Morrow Guest

    (G) wrote:
    > Thanks for the suggestions so far, but I now realize I the sample text
    > file was flawed. For one there is Never white space around the '|'.
    > Secondly a line could have multiple codes but no duplicates(on that
    > line only). The sample file should have looked as follows:
    >
    > sales item aaa|543,m423a
    > sales item bbb|m423,543 'Note how code 543 is on the 1st 2nd
    > line.
    > sales item ccc|m423b
    > sales item ddd|423,423b,m523,652
    >
    > Given that the above has changed how could I get a match. e.g. a code
    > of 423 should return the description in line 4 "sales item ddd" Where
    > m423 only matches the 3rd line. etc.


    my $code = 'm423';
    while (<>) {
    my ($item, $codes) = split /\|/;
    my @codes = split /,/, $codes;
    print $item if grep { $_ eq $code } @codes;
    }

    alternatively:

    /(.*) \| (?:.*,|) \Q$code\E (?:,|$)/x and print $1 while <>;

    Ben

    --
    $.=1;*g=sub{print@_};sub r($$\$){my($w,$x,$y)=@_;for(keys%$x){/main/&&next;*p=$
    $x{$_};/(\w)::$/&&(r($w.$1,$x.$_,$y),next);$y eq\$p&&&g("$w$_")}};sub t{for(@_)
    {$f&&($_||&g(" "));$f=1;r"","::",$_;$_&&&g(chr(0012))}};t #
    $J::u::s::t, $a::n::eek:::t::h::e::r, $P::e::r::l, $h::a::c::k::e::r, $.
    Ben Morrow, Feb 10, 2004
    #6
  7. G

    G Guest

    Ben Morrow <> wrote in message news:<c0arqv$9u4$>...
    > (G) wrote:
    > > Thanks for the suggestions so far, but I now realize I the sample text
    > > file was flawed. For one there is Never white space around the '|'.
    > > Secondly a line could have multiple codes but no duplicates(on that
    > > line only). The sample file should have looked as follows:
    > >
    > > sales item aaa|543,m423a
    > > sales item bbb|m423,543 'Note how code 543 is on the 1st 2nd
    > > line.
    > > sales item ccc|m423b
    > > sales item ddd|423,423b,m523,652
    > >
    > > Given that the above has changed how could I get a match. e.g. a code
    > > of 423 should return the description in line 4 "sales item ddd" Where
    > > m423 only matches the 3rd line. etc.

    >
    > my $code = 'm423';
    > while (<>) {
    > my ($item, $codes) = split /\|/;
    > my @codes = split /,/, $codes;
    > print $item if grep { $_ eq $code } @codes;
    > }
    >

    I finally gave this code a try, but it only partially works. For
    instance a code of m423b will not pull up any results. Neither will
    m523. My guess is we are not splitting things out right: my
    @codes = split /,/, $codes;

    Thanks,

    C
    G, Feb 12, 2004
    #7
  8. G

    Ben Morrow Guest

    (G) wrote:
    > Ben Morrow <> wrote in message news:<c0arqv$9u4$>...
    > > my $code = 'm423';
    > > while (<>) {


    chomp;

    > > my ($item, $codes) = split /\|/;
    > > my @codes = split /,/, $codes;
    > > print $item if grep { $_ eq $code } @codes;
    > > }

    >
    > I finally gave this code a try, but it only partially works. For
    > instance a code of m423b will not pull up any results. Neither will
    > m523. My guess is we are not splitting things out right: my
    > @codes = split /,/, $codes;


    Ben

    --
    The cosmos, at best, is like a rubbish heap scattered at random.
    - Heraclitus
    Ben Morrow, Feb 12, 2004
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. anonym
    Replies:
    1
    Views:
    1,015
    Knute Johnson
    Jan 15, 2009
  2. mathieu
    Replies:
    3
    Views:
    594
    Bo Persson
    Sep 4, 2009
  3. Karin Lagesen

    matching strings in a large set of strings

    Karin Lagesen, Apr 29, 2010, in forum: Python
    Replies:
    13
    Views:
    456
    Bryan
    May 3, 2010
  4. Shashank Agarwal

    CGI unescapeHTML to escape code '&# 8211;'

    Shashank Agarwal, Sep 7, 2008, in forum: Ruby
    Replies:
    2
    Views:
    163
    Shashank Agarwal
    Sep 10, 2008
  5. Tomasz Chmielewski

    sorting index-15, index-9, index-110 "the human way"?

    Tomasz Chmielewski, Mar 4, 2008, in forum: Perl Misc
    Replies:
    4
    Views:
    281
    Tomasz Chmielewski
    Mar 4, 2008
Loading...

Share This Page