regex: listing all textstrings to be found more that 2 times in a file

Discussion in 'Perl Misc' started by phaylon, Jan 17, 2005.

  1. phaylon

    phaylon Guest

    tools55 wrote:

    > is there a way to have(regex) listed all text strings that are found more
    > than 2/two times in a file? How could such a regex look like ?


    What do you mean by "text string"?

    What should "get me me some beer" give as result? ('e', ' ', 'm')?

    g,
    Robert

    --
    http://www.dunkelheit.at/
    That is not dead, which can eternal lie,
    and with strange aeons even death may die.
    -- H.P. Lovecraft
     
    phaylon, Jan 17, 2005
    #1
    1. Advertising

  2. phaylon

    Guest

    Hi,

    is there a way to have(regex) listed all text strings that
    are found more than 2/two times in a file?
    How could such a regex look like ?
    ..
    Any tip is appreciated very much. Thank`s, Bill.
     
    , Jan 17, 2005
    #2
    1. Advertising

  3. Re: regex: listing all textstrings to be found more that 2 timesin a file

    wrote:
    > is there a way to have(regex) listed all text strings that
    > are found more than 2/two times in a file?


    As Robert pointed out, you need to make clear what you mean by "text
    string".

    > How could such a regex look like ?


    Not just a regex, but maybe something like this:

    my $string = qr(\b[a-z]+\b)i;
    my %seen;
    while (<FILE>) { $seen{$1}++ while /($string)/g }
    print "$_\n" for grep $seen{$_} > 2, keys %seen;

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
     
    Gunnar Hjalmarsson, Jan 17, 2005
    #3
  4. Re: regex: listing all textstrings to be found more that 2 timesin a file

    wrote:

    > is there a way to have(regex) listed all text strings that
    > are found more than 2/two times in a file?


    A regex works on a string not a file. So I shall assume you first slurp
    the file into a string (with File::Slurp or suchlike).

    /(.+)(?=.*\1.*\1)/gs

    Note this is quite inefficient and almost certainly not want you wanted.

    Note also this only find the longest non-overlapping match at each
    starting postion. i.e. in 'foofoodfool' it will find 'foo' but not 'oo'
    and 'o', 'fo' and 'f' all of which also appear at least 3 times.

    To get _all_ matches for a pattern you'd want to use the trick I
    described[1] in my Usenet Gems talk at YAPC::Europe::20042[2].

    [1] I described it but if you find the original thread you'll see most
    of the credit goes to Abigail.

    [2] http://birmingham.pm.org/talks/YAPC-Europe-2003-Gems.pdf
     
    Brian McCauley, Jan 17, 2005
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?Utf-8?B?bWF2cmlja18xMDE=?=

    SetAuthCookie works some times and fails some times?

    =?Utf-8?B?bWF2cmlja18xMDE=?=, Mar 23, 2006, in forum: ASP .Net
    Replies:
    0
    Views:
    514
    =?Utf-8?B?bWF2cmlja18xMDE=?=
    Mar 23, 2006
  2. =?Utf-8?B?bWF2cmlja18xMDE=?=

    Forms Authentication Fails some times and not some times???

    =?Utf-8?B?bWF2cmlja18xMDE=?=, Mar 28, 2006, in forum: ASP .Net
    Replies:
    0
    Views:
    493
    =?Utf-8?B?bWF2cmlja18xMDE=?=
    Mar 28, 2006
  3. djskrill
    Replies:
    9
    Views:
    701
    djskrill
    Oct 1, 2003
  4. Peng Yu
    Replies:
    17
    Views:
    692
    Peng Yu
    Sep 14, 2008
  5. Jack Steven
    Replies:
    2
    Views:
    441
    Chris Rebert
    Mar 9, 2009
Loading...

Share This Page