extracting strings from a text file

Discussion in 'Perl Misc' started by Andry, Sep 30, 2008.

  1. Andry

    Andry Guest

    Hi,
    I have a text file captured from an SSH session.
    Each line of the text looks like this (opened with VI editor):
    ***********************************************************************************
    -rw-r--r-- 1 root root 2389787 Sep 30 10:45 ^[[00mfilename.pl^[[00m
    **********************************************************************************
    As you can see a lot of spurious/control/special characters are shown
    (in VI editor).
    I need to extract just the filenames at the end of each line (getting
    rid of spurious characters).
    The result should be like this:
    ***********************************************************************************
    filename.pl
    ***********************************************************************************
    Of course, I don't know in advance the value of the string to extract
    (nor its length to pass to a "substring" function).
    Can you suggest any method to extract the single file name at the end
    of each line?

    Thank you,
    Andrea
    Andry, Sep 30, 2008
    #1
    1. Advertising

  2. Andry wrote:
    > Hi,
    > I have a text file captured from an SSH session.
    > Each line of the text looks like this (opened with VI editor):
    > ***********************************************************************************
    > -rw-r--r-- 1 root root 2389787 Sep 30 10:45 ^[[00mfilename.pl^[[00m
    > **********************************************************************************
    > As you can see a lot of spurious/control/special characters are shown
    > (in VI editor).
    > I need to extract just the filenames at the end of each line (getting
    > rid of spurious characters).
    > The result should be like this:
    > ***********************************************************************************
    > filename.pl
    > ***********************************************************************************
    > Of course, I don't know in advance the value of the string to extract
    > (nor its length to pass to a "substring" function).
    > Can you suggest any method to extract the single file name at the end
    > of each line?


    The control characters are ANSI console control sequences.
    AFAIK they consist of an ESC character followed by an optional left
    angle bracket followed by numbers separated by semicolons followed by a
    letter, so you might try to weed out "\033.*?[[:alpha:]]".

    Another possibility would be to use the command "/bin/ls" rather than
    "ls", the latter is an alias for "ls --color=auto".

    HTH,

    Josef
    --
    These are my personal views and not those of Fujitsu Siemens Computers!
    Josef Möllers (Pinguinpfleger bei FSC)
    If failure had no penalty success would not be a prize (T. Pratchett)
    Company Details: http://www.fujitsu-siemens.com/imprint.html
    Josef Moellers, Sep 30, 2008
    #2
    1. Advertising

  3. Andry

    Andry Guest

    On 30 Set, 15:43, Josef Moellers <>
    wrote:
    > Andry wrote:
    > > Hi,
    > > I have a text file captured from an SSH session.
    > > Each line of the text looks like this (opened with VI editor):
    > > ***********************************************************************************
    > > -rw-r--r-- 1 root root 2389787 Sep 30 10:45 ^[[00mfilename.pl^[[00m
    > > **********************************************************************************
    > > As you can see a lot of spurious/control/special characters are shown
    > > (in VI editor).
    > > I need to extract just the filenames at the end of each line (getting
    > > rid of spurious characters).
    > > The result should be like this:
    > > ***********************************************************************************
    > > filename.pl
    > > ***********************************************************************************
    > > Of course, I don't know in advance the value of the string to extract
    > > (nor its length to pass to a "substring" function).
    > > Can you suggest any method to extract the single file name at the end
    > > of each line?

    >
    > The control characters are ANSI console control sequences.
    > AFAIK they consist of an ESC character followed by an optional left
    > angle bracket followed by numbers separated by semicolons followed by a
    > letter, so you might try to weed out "\033.*?[[:alpha:]]".
    >
    > Another possibility would be to use the command "/bin/ls" rather than
    > "ls", the latter is an alias for "ls --color=auto".
    >
    > HTH,
    >
    > Josef
    > --
    > These are my personal views and not those of Fujitsu Siemens Computers!
    > Josef Möllers (Pinguinpfleger bei FSC)
    >         If failure had no penalty success would not be a prize (T..  Pratchett)
    > Company Details:http://www.fujitsu-siemens.com/imprint.html


    Thanks Josef!
    The /bin/ls option works great!

    Now, I can't get the filename out of the string.
    I tried with:
    $extract =~ s/^.*?(\w+)\s*$/$1/;
    and I got:
    *******************
    pl
    *******************
    Then, I tried with:
    $extract =~ s/^.*?(\w+)\.(\w+)\s*$/$1/;
    and I got:
    *******************
    filename
    *******************
    While what I want is:
    *******************
    filename.pl
    *******************

    Could you help with that, please?

    Thanks,
    Andrea
    Andry, Sep 30, 2008
    #3
  4. Andry

    Tim Greer Guest

    Andry wrote:

    > $extract =~ s/^.*?(\w+)\.(\w+)\s*$/$1/;


    filename is $1 and pl is $2. You also didn't capture \.

    So:

    $extract =~ s/^.*?(\w+)\.(\w+)\s*$/$1.$2/;

    or:

    $extract =~ s/^.*?(\w+\.\w+)\s*$/$1/;

    You also probably want to check with some type of word boundary so you
    get all of "filename" in filename.pl, depending on how the file is
    formatted (or could be).
    --
    Tim Greer, CEO/Founder/CTO, BurlyHost.com, Inc.
    Shared Hosting, Reseller Hosting, Dedicated & Semi-Dedicated servers
    and Custom Hosting. 24/7 support, 30 day guarantee, secure servers.
    Industry's most experienced staff! -- Web Hosting With Muscle!
    Tim Greer, Sep 30, 2008
    #4
  5. Andry

    cartercc Guest

    On Sep 30, 11:38 am, Andry <> wrote:
    > > > -rw-r--r-- 1 root root 2389787 Sep 30 10:45 ^[[00mfilename.pl^[[00m


    > Then, I tried with:
    > $extract =~ s/^.*?(\w+)\.(\w+)\s*$/$1/;
    > and I got:
    > *******************
    > filename
    > *******************
    > While what I want is:
    > *******************
    > filename.pl
    > *******************
    >
    > Could you help with that, please?


    UNTESTED

    while(<__DATA__>)
    {
    @line = split;
    $filename = $line[9]; #if it IS 9
    $filename = s/^.*?(\w+)\.(\w+)\s*$/$1.$2/;
    print $filename, "\n";
    }

    CC
    cartercc, Sep 30, 2008
    #5
  6. Andry

    Tim Greer Guest

    cartercc wrote:

    > On Sep 30, 11:38 am, Andry <> wrote:
    >> > > -rw-r--r-- 1 root root 2389787 Sep 30 10:45
    >> > > ^[[00mfilename.pl^[[00m

    >
    >> Then, I tried with:
    >> $extract =~ s/^.*?(\w+)\.(\w+)\s*$/$1/;
    >> and I got:
    >> *******************
    >> filename
    >> *******************
    >> While what I want is:
    >> *******************
    >> filename.pl
    >> *******************
    >>
    >> Could you help with that, please?

    >
    > UNTESTED
    >
    > while(<__DATA__>)
    > {
    > @line = split;
    > $filename = $line[9]; #if it IS 9
    > $filename = s/^.*?(\w+)\.(\w+)\s*$/$1.$2/;
    > print $filename, "\n";
    > }
    >
    > CC


    Remember, it would start from 0, rather than one. If you use split,
    it's [8].
    --
    Tim Greer, CEO/Founder/CTO, BurlyHost.com, Inc.
    Shared Hosting, Reseller Hosting, Dedicated & Semi-Dedicated servers
    and Custom Hosting. 24/7 support, 30 day guarantee, secure servers.
    Industry's most experienced staff! -- Web Hosting With Muscle!
    Tim Greer, Sep 30, 2008
    #6
  7. Andry

    Tim Greer Guest

    Andry wrote:

    > Hi,
    > I have a text file captured from an SSH session.
    > Each line of the text looks like this (opened with VI editor):
    >

    ***********************************************************************************
    > -rw-r--r-- 1 root root 2389787 Sep 30 10:45 ^[[00mfilename.pl^[[00m
    >

    **********************************************************************************
    > As you can see a lot of spurious/control/special characters are shown
    > (in VI editor).
    > I need to extract just the filenames at the end of each line (getting
    > rid of spurious characters).
    > The result should be like this:
    >

    ***********************************************************************************
    > filename.pl
    >

    ***********************************************************************************
    > Of course, I don't know in advance the value of the string to extract
    > (nor its length to pass to a "substring" function).
    > Can you suggest any method to extract the single file name at the end
    > of each line?
    >
    > Thank you,
    > Andrea


    my $line = '-rw-r--r-- 1 root root 2389787 Sep 30 10:45
    ^[[00mfilename.pl^[[00m';

    $line = (split /\s+/, $line)[8];
    $line =~ s/\^\[\[00m//g;

    One way to do it.
    --
    Tim Greer, CEO/Founder/CTO, BurlyHost.com, Inc.
    Shared Hosting, Reseller Hosting, Dedicated & Semi-Dedicated servers
    and Custom Hosting. 24/7 support, 30 day guarantee, secure servers.
    Industry's most experienced staff! -- Web Hosting With Muscle!
    Tim Greer, Sep 30, 2008
    #7
  8. Andry

    Andry Guest

    On 30 Set, 18:57, Tim Greer <> wrote:
    > Andry wrote:
    > > Hi,
    > > I have a text file captured from an SSH session.
    > > Each line of the text looks like this (opened with VI editor):

    >
    > ***********************************************************************************> -rw-r--r-- 1 root root 2389787 Sep 30 10:45 ^[[00mfilename.pl^[[00m
    >
    > **********************************************************************************> As you can see a lot of spurious/control/special characters are shown
    > > (in VI editor).
    > > I need to extract just the filenames at the end of each line (getting
    > > rid of spurious characters).
    > > The result should be like this:

    >
    > ***********************************************************************************> filename.pl
    >
    > ***********************************************************************************
    >
    > > Of course, I don't know in advance the value of the string to extract
    > > (nor its length to pass to a "substring" function).
    > > Can you suggest any method to extract the single file name at the end
    > > of each line?

    >
    > > Thank you,
    > > Andrea

    >
    > my $line = '-rw-r--r-- 1 root root 2389787 Sep 30 10:45
    > ^[[00mfilename.pl^[[00m';
    >
    > $line = (split /\s+/, $line)[8];
    > $line =~ s/\^\[\[00m//g;
    >
    > One way to do it.
    > --
    > Tim Greer, CEO/Founder/CTO, BurlyHost.com, Inc.
    > Shared Hosting, Reseller Hosting, Dedicated & Semi-Dedicated servers
    > and Custom Hosting.  24/7 support, 30 day guarantee, secure servers.
    > Industry's most experienced staff! -- Web Hosting With Muscle!


    Thanks guys!
    All your suggestions were very helpful to me.

    Andrea
    Andry, Oct 1, 2008
    #8
  9. Andry

    J. Gleixner Guest

    Andry wrote:
    >> Andry wrote:
    >>> Hi,
    >>> I have a text file captured from an SSH session.
    >>> Each line of the text looks like this (opened with VI editor):

    >> ***********************************************************************************> -rw-r--r-- 1 root root 2389787 Sep 30 10:45 ^[[00mfilename.pl^[[00m
    >>
    >> **********************************************************************************> As you can see a lot of spurious/control/special characters are shown
    >>> (in VI editor).
    >>> I need to extract just the filenames at the end of each line (getting

    [...]

    If you don't need any of the 'long' output, why not use
    the correct option to 'ls' in the first place?
    J. Gleixner, Oct 2, 2008
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Bubbles
    Replies:
    0
    Views:
    429
    Bubbles
    Mar 3, 2004
  2. Steve
    Replies:
    2
    Views:
    368
    Paul McGuire
    Jul 19, 2004
  3. poener
    Replies:
    4
    Views:
    498
    Matt Humphrey
    Aug 28, 2006
  4. Ben

    Strings, Strings and Damned Strings

    Ben, Jun 22, 2006, in forum: C Programming
    Replies:
    14
    Views:
    740
    Malcolm
    Jun 24, 2006
  5. Scott Bass
    Replies:
    4
    Views:
    112
    Tad McClellan
    May 12, 2005
Loading...

Share This Page