Re: Regex replacement via external command

Discussion in 'Perl Misc' started by Rainer Weikusat, Jan 22, 2014.

  1. Tim Landscheidt <> writes:
    > (anonymous) wrote:
    >
    >>> Don't do `...` when there may be a lot of output. Please see the
    >>> perlopentut man page, specifically "pipe open", and don't try to pass
    >>> long values on the command line.

    >
    >> The problem I was dealing with is, I need to pick out a big chunk of
    >> input string (>200K, by regex), feed it to external program (which is
    >> pipe after pipe after pipe), then replace the matching string with the
    >> processed result. what's the proper way to do it (for big matching
    >> chunks)?

    >
    >> A thousand time over-simplified version is:

    >
    >> perl -e 'print("aa". "x" x 238565 . "bb", "\n")' > HttpBody
    >> <HttpBody perl -n000e 's,(x+),`echo $1 | wc -c`,eg; print'

    >
    >> The problem is that I not only need to process this big chunk of
    >> matching string via the external program, but I also need to replace the
    >> matching string with the result of the external process. Putting two
    >> together is where the problem for me.

    >
    > You could replace (in this example) the call of `echo $1 |
    > wc -c` with a double-sided pipe where you feed $1 on stdin
    > to wc and collect wc's stdout. You need to look at
    > IPC::Open2 & Co. on how to achieve that; see "perldoc -q
    > 'How can I open a pipe both to and from a command?'" for
    > pointers.


    That's almost certainly a recipe for disaster for 'large amounts of
    data' because the process writing to the input pipe will block once the
    'input' pipe buffer is full and the external command will block once the
    'output' pipe buffer is full, ie, the whole thing will deadlock.

    > Another approach (as always) would be temporary files.


    In case the program is really supposed to work as a filter, a possible
    other aproach would be to use a 'Perl lexer', eg, for the example above,
    assuming the input is in $s (untested)

    for ($s) {
    /\G(x+)/gc and do {
    my $fh;

    open($fh, '|command');
    print $fh ($1);
    $fh = undef;

    redo;
    };

    /\G([^x]+)/gc and print($1), redo;
    }

    and simply let the output of the external command appear 'in the right
    place' of the stdout output of the perl script (since they'll share the
    same stdout).
    Rainer Weikusat, Jan 22, 2014
    #1
    1. Advertising

  2. Rainer Weikusat <> writes:

    [...]


    > for ($s) {
    > /\G(x+)/gc and do {
    > my $fh;
    >
    > open($fh, '|command');
    > print $fh ($1);
    > $fh = undef;


    The last line isn't really needed.
    Rainer Weikusat, Jan 22, 2014
    #2
    1. Advertising

  3. Tim Landscheidt <> writes:
    > Rainer Weikusat <> wrote:
    >
    >>>>> Don't do `...` when there may be a lot of output. Please see the
    >>>>> perlopentut man page, specifically "pipe open", and don't try to pass
    >>>>> long values on the command line.

    >
    >>>> The problem I was dealing with is, I need to pick out a big chunk of
    >>>> input string (>200K, by regex), feed it to external program (which is
    >>>> pipe after pipe after pipe), then replace the matching string with the
    >>>> processed result. what's the proper way to do it (for big matching
    >>>> chunks)?

    >
    >>>> A thousand time over-simplified version is:

    >
    >>>> perl -e 'print("aa". "x" x 238565 . "bb", "\n")' > HttpBody
    >>>> <HttpBody perl -n000e 's,(x+),`echo $1 | wc -c`,eg; print'

    >
    >>>> The problem is that I not only need to process this big chunk of
    >>>> matching string via the external program, but I also need to replace the
    >>>> matching string with the result of the external process. Putting two
    >>>> together is where the problem for me.

    >
    >>> You could replace (in this example) the call of `echo $1 |
    >>> wc -c` with a double-sided pipe where you feed $1 on stdin
    >>> to wc and collect wc's stdout. You need to look at
    >>> IPC::Open2 & Co. on how to achieve that; see "perldoc -q
    >>> 'How can I open a pipe both to and from a command?'" for
    >>> pointers.

    >
    >> That's almost certainly a recipe for disaster for 'large amounts of
    >> data' because the process writing to the input pipe will block once the
    >> 'input' pipe buffer is full and the external command will block once the
    >> 'output' pipe buffer is full, ie, the whole thing will deadlock.

    >
    >> [...]

    >
    > "& Co.". Personally, I prefer IPC::Run and:


    In this case, you should refer the OP to that. You can even laugh as he
    falls into the pit nevertheless as soon as he gets hit by the difference
    between _exit (correct) and exit (Docmumentation? Only newbies read
    documentation!).
    Rainer Weikusat, Jan 23, 2014
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    3
    Views:
    748
    Reedick, Andrew
    Jul 1, 2008
  2. Anthony Roy
    Replies:
    7
    Views:
    214
    A. Sinan Unur
    Mar 4, 2005
  3. George Mpouras

    Re: Regex replacement via external command

    George Mpouras, Jan 22, 2014, in forum: Perl Misc
    Replies:
    0
    Views:
    87
    George Mpouras
    Jan 22, 2014
  4. Charles DeRykus

    Re: Regex replacement via external command

    Charles DeRykus, Jan 23, 2014, in forum: Perl Misc
    Replies:
    2
    Views:
    64
    Charles DeRykus
    Jan 23, 2014
  5. Rainer Weikusat

    Re: Regex replacement via external command

    Rainer Weikusat, Jan 23, 2014, in forum: Perl Misc
    Replies:
    1
    Views:
    69
    Rainer Weikusat
    Jan 23, 2014
Loading...

Share This Page