STDOUT beginner problem

Discussion in 'Perl Misc' started by mat.krawczyk@gmail.com, Nov 26, 2013.

  1. Guest

    Hello,

    I would like to write simple script for emails decoding. My problem is connected with input and output of an external program. I would like to use html2text converter:

    open(HTML2TEXT, "| /usr/bin/html2text ") || die "html2text failed: $!\n";
    $text = print HTML2TEXT $html;
    close HTML2TEXT;
    print $text;

    but $text is empty and output is directed to STDOUT.

    I will be grateful for any help..

    Mateusz Krawczyk
     
    , Nov 26, 2013
    #1
    1. Advertising

  2. writes:
    > I would like to write simple script for emails decoding. My problem is connected with input and output of an external program. I would like to use html2text converter:
    >
    > open(HTML2TEXT, "| /usr/bin/html2text ") || die "html2text failed: $!\n";
    > $text = print HTML2TEXT $html;
    > close HTML2TEXT;
    > print $text;
    >
    > but $text is empty and output is directed to STDOUT.


    Output is to stdout because you didn't redirect it somewhere
    else. Generally, the built-in 'pipe open' can't do what you want (write
    data to some process and read its output back). IPC::Open2 can do that,
    although using that is not as straight-forward as it seems (there's a
    chance that both processes deadlock because both wait for data written
    by the other). One way to deal with that is to use select and switch
    between reading and writing as required. Another reasonably easy way
    would be to use three processes, one which reads the output from the
    external command, a 2nd which runs it and a 3rd which feeds input to it.

    Example
    -------
    my ($in, $proc, $line, $rc);

    $rc = open($proc, '-|');
    if ($rc == 0) {
    $rc = open($proc, '-|');
    if ($rc == 0) {
    #
    # 3rd process: reads from input file, stdout connected
    # to 2nd pipe
    #
    open($in, '<', '/var/log/syslog');
    print $line while $line = <$in>;
    exit(0);
    }

    #
    # 2nd process: stdin redirected from 2nd pipe, stdout
    # connected to 1st
    #
    open(STDIN, '<&', $proc);
    exec('tr', '6', '^');
    }

    #
    # original process: reads processed data from 1st pipe
    #
    print $line while $line = <$proc>;
     
    Rainer Weikusat, Nov 26, 2013
    #2
    1. Advertising

  3. gamo Guest

    El 26/11/13 18:38, escribió:
    > Hello,
    >
    > I would like to write simple script for emails decoding. My problem is connected with input and output of


    an external program. I would like to use html2text converter:
    >
    > open(HTML2TEXT, "| /usr/bin/html2text ") || die "html2text failed: $!\n";
    > $text = print HTML2TEXT $html;
    > close HTML2TEXT;
    > print $text;
    >
    > but $text is empty and output is directed to STDOUT.
    >
    > I will be grateful for any help..
    >
    > Mateusz Krawczyk
    >


    Maybe simple backticks are what you are searching for

    ~$ cat test.backticks
    #!/usr/bin/perl -W

    use strict;

    my $html = '<p>hi</p>';
    my $text = `echo "$html" | /usr/bin/html2text`;
    print $text, "\n";

    ~$ perl test.backticks
    hi

    ~$ man perlop

    (pay attention to the different ticks)

    Good luck
     
    gamo, Nov 26, 2013
    #3
  4. gamo Guest

    El 26/11/13 23:21, Ben Morrow escribió:
    > Careful with your quoting. It would probably be better to write the HTML
    > to a file.


    ....probably, but the OP seems to not have problems with the input

    > You can use qx// instead of backticks, and it's usually clearer.
    >
    > Ben
    >


    Then, he must use qx!! or some other separators

    Thanks
     
    gamo, Nov 26, 2013
    #4
  5. Ben Morrow <> writes:
    > Quoth gamo <>:


    [...]

    >> my $html = '<p>hi</p>';
    >> my $text = `echo "$html" | /usr/bin/html2text`;

    >
    > Careful with your quoting. It would probably be better to write the HTML
    > to a file.


    Not necessary. When starting to make "Gee that looks *complicated*,
    can't I sell him something else instead?" assumptions, the simple way to
    do this would be to create a small shell script,

    ------
    #!/bin/sh
    printf '%s' "$1" | html2text
    ------

    and use that like this:

    ------
    my $html = '<html><body><em>$(echo 3)</em></body></html>';
    open($h2t, '-|', '/tmp/h2t', $html);
    print(<$h2t>);
    ------

    (the reason for using printf is that echo may interpret \-escapes in its
    argument).
     
    Rainer Weikusat, Nov 26, 2013
    #5
  6. gamo Guest

    El 26/11/13 23:55, Rainer Weikusat escribió:
    > Ben Morrow <> writes:
    >> Quoth gamo <>:

    ....
    >> Careful with your quoting. It would probably be better to write the HTML
    >> to a file.

    >
    > Not necessary. When starting to make "Gee that looks *complicated*,
    > can't I sell him something else instead?" assumptions, the simple way to
    > do this would be to create a small shell script,
    >
    > ------
    > #!/bin/sh
    > printf '%s' "$1" | html2text
    > ------
    >
    > and use that like this:
    >
    > ------

    ....
    > open($h2t, '-|', '/tmp/h2t', $html);
    > print(<$h2t>);
    > ------
    >
    > (the reason for using printf is that echo may interpret \-escapes in its
    > argument).
    >
    >


    Simplest is to read a file from html2text argument but if he wants to
    use cat file | html2text or echo to, he could, because the
    interpretation of escapes is disabled by default:

    DESCRIPTION
    Echo the STRING(s) to standard output.

    -n do not output the trailing newline

    -e enable interpretation of backslash escapes

    -E disable interpretation of backslash escapes (default)
     
    gamo, Nov 26, 2013
    #6
  7. gamo Guest

    El 27/11/13 01:50, Ben Morrow escribió:
    >> -E disable interpretation of backslash escapes (default)


    > *My* echo(1), OTOH, recognises neither -e nor -E, and the manpage says:
    >
    > | The newline may also be suppressed by appending '\c' to the end of the
    > | string, as is done by iBCS2 compatible systems. Note that the -n option
    > | as well as the effect of '\c' are implementation-defined in IEEE Std
    > | 1003.1-2001 ("POSIX.1") as amended by Cor. 1-2002. For portability, echo
    > | should only be used if the first argument does not start with a hyphen
    > | ('-') and does not contain any backslashes ('\'). If this is not suffi-
    > | cient, printf(1) should be used.
    >
    > and also this:
    >
    > | Most shells provide a builtin echo command which tends to differ from
    > | this utility in the treatment of options and backslashes. Consult the
    > | builtin(1) manual page.
    >
    > so using echo to pass arbitrary text is not reliable.
    >
    > Ben
    >



    I'm afraid that is common to have 2 echo utilities avaible. One built-in
    in the bash that does accept escapes and one in /bin/echo
    who do not. It could be compared "help echo" with "man echo." My
    response to the OP would be to substitute "echo" by "/bin/echo,"
    as I remember it's said to do ever to enhance security when
    invoquing commands.

    Thanks
     
    gamo, Nov 27, 2013
    #7
  8. Ben Morrow <> writes:
    > Quoth gamo <>:
    >> El 26/11/13 23:55, Rainer Weikusat escribi:
    >> > Ben Morrow <> writes:
    >> >> Quoth gamo <>:

    >> ...
    >> >> Careful with your quoting. It would probably be better to write the HTML
    >> >> to a file.
    >> >
    >> > Not necessary. When starting to make "Gee that looks *complicated*,
    >> > can't I sell him something else instead?" assumptions, the simple way to
    >> > do this would be to create a small shell script,
    >> >
    >> > ------
    >> > #!/bin/sh
    >> > printf '%s' "$1" | html2text
    >> > ------
    >> >
    >> > and use that like this:
    >> >
    >> > ------

    >> ...
    >> > open($h2t, '-|', '/tmp/h2t', $html);
    >> > print(<$h2t>);
    >> > ------

    >
    > 'Oh, but there's no need to put that script in a file either...':
    >
    > open my $h2t, "-|", "/bin/sh", "-c",
    > q/printf %s "$1" | html2text/, $html;
    >
    > ...and we end up with the sort of mess shell always turns into.
    > Sometimes a temporary file is the cleanest and simplest solution.


    In this particular case, the main complication is that html2text doesn't
    support passing the text-to-be-processed literally as command-line
    argument. And the simplest way to remedy that while avoiding issues with
    'inappropriate data interpretation/ execution' is to create a shell
    script which takes such an argument and passes it to html2text in the
    appropriate way. This yields a new and possibly generally useful command
    with more reasonable semantics. Actually, the replacement command could
    be written in any programming language including Perl but for these
    kinds of task, the shell is IMO the most appropriate tool.

    Inline use of such a different programming language instead of creating
    is new command is both messy and short-sighted.
     
    Rainer Weikusat, Nov 27, 2013
    #8
  9. gamo <> writes:
    gamo <> writes:
    > El 26/11/13 23:55, Rainer Weikusat escribió:
    >> Ben Morrow <> writes:
    >>> Quoth gamo <>:

    > ...
    >>> Careful with your quoting. It would probably be better to write the HTML
    >>> to a file.

    >>
    >> Not necessary. When starting to make "Gee that looks *complicated*,
    >> can't I sell him something else instead?" assumptions, the simple way to
    >> do this would be to create a small shell script,
    >>
    >> ------
    >> #!/bin/sh
    >> printf '%s' "$1" | html2text
    >> ------
    >>
    >> and use that like this:
    >>
    >> ------

    > ...
    >> open($h2t, '-|', '/tmp/h2t', $html);
    >> print(<$h2t>);
    >> ------
    >>
    >> (the reason for using printf is that echo may interpret \-escapes in its
    >> argument).

    >
    > Simplest is to read a file from html2text argument but if he wants to
    > use cat file | html2text or echo to, he could, because the
    > interpretation of escapes is disabled by default:


    That's not the problem with the backticks idea. For a
    live-demonstration, create a file with the following content:

    -----------
    #!/usr/bin/perl
    $output = `echo "$ARGV[0]" | html2text`;
    print($output);
    -----------

    and execute that with '$(ls /)' as first argument. And the ls / could as
    well have been cd; rm -rf *. And the OP wrote about processing e-mail
    which is not exactly a trusted data source ...
     
    Rainer Weikusat, Nov 27, 2013
    #9
  10. gamo Guest

    El 27/11/13 16:54, Rainer Weikusat escribió:
    >> open my $h2t, "-|", "/bin/sh", "-c",
    >> > q/printf %s "$1" | html2text/, $html;
    >> >
    >> >...and we end up with the sort of mess shell always turns into.
    >> >Sometimes a temporary file is the cleanest and simplest solution.

    > In this particular case, the main complication is that html2text doesn't
    > support passing the text-to-be-processed literally as command-line
    > argument. And the simplest way to remedy that while avoiding issues with


    At least my version of html2text supports input file as argument.
    If that's a problem, anyway, it could be changed for "lynx -dump file,"
    which is a more mature program from ISC, I think.
     
    gamo, Nov 28, 2013
    #10
  11. On 11/26/2013 10:40 AM, Rainer Weikusat wrote:
    > writes
    >> ...
    >> open(HTML2TEXT, "| /usr/bin/html2text ") || die "html2text failed: $!\n";
    >> $text = print HTML2TEXT $html;
    >> close HTML2TEXT;
    >> print $text;
    >>
    >> but $text is empty and output is directed to STDOUT.

    >
    > Output is to stdout because you didn't redirect it somewhere
    > else. Generally, the built-in 'pipe open' can't do what you want (write
    > data to some process and read its output back). IPC::Open2 can do that,
    > although using that is not as straight-forward as it seems (there's a
    > chance that both processes deadlock because both wait for data written
    > by the other). One way to deal with that is to use select and switch
    > between reading and writing as required. Another reasonably easy way
    > would be to use three processes, one which reads the output from the
    > external command, a 2nd which runs it and a 3rd which feeds input to it.
    > ...


    The IPC::Open3 docs (Open2 is just a wrapper) now mention IPC::Run as
    "having better error handling and facilities than Open3". Even though
    deadlock is still a danger, the following seemed to work well even with
    large html strings:

    use IPC::Run qw/run/;

    my @cmd = ('html2text');
    my $html = ....;
    run( \@cmd, \$html, \my $text);
    say $text;


    --
    Charles DeRykus
     
    Charles DeRykus, Nov 29, 2013
    #11
  12. Charles DeRykus <> writes:
    > On 11/26/2013 10:40 AM, Rainer Weikusat wrote:
    >> writes
    >>> ...
    >>> open(HTML2TEXT, "| /usr/bin/html2text ") || die "html2text failed: $!\n";
    >>> $text = print HTML2TEXT $html;
    >>> close HTML2TEXT;
    >>> print $text;
    >>>
    >>> but $text is empty and output is directed to STDOUT.

    >>
    >> Output is to stdout because you didn't redirect it somewhere
    >> else. Generally, the built-in 'pipe open' can't do what you want (write
    >> data to some process and read its output back). IPC::Open2 can do that,
    >> although using that is not as straight-forward as it seems (there's a
    >> chance that both processes deadlock because both wait for data written
    >> by the other). One way to deal with that is to use select and switch
    >> between reading and writing as required. Another reasonably easy way
    >> would be to use three processes, one which reads the output from the
    >> external command, a 2nd which runs it and a 3rd which feeds input to it.
    >> ...

    >
    > The IPC::Open3 docs (Open2 is just a wrapper) now mention IPC::Run as
    > "having better error handling and facilities than Open3".


    Judging from the documentation, this is a particulary ghastly example of
    feeping creatureism. There are 56 open bug reports, among them one where
    the author inadvertently used a wrong function because he apparently
    didn't know the name of the correct one and couldn't be bothered to
    read the documentation,

    https://rt.cpan.org/Public/Bug/Display.html?id=42885

    That's not particularly confidence-inspiring, especially considering
    that the 'benefit' is not having to write an odd dozen lines of code.
     
    Rainer Weikusat, Nov 29, 2013
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?Utf-8?B?S3VydCBTY2hyb2VkZXI=?=

    No Class at ALL!!! beginner/beginner question

    =?Utf-8?B?S3VydCBTY2hyb2VkZXI=?=, Feb 2, 2005, in forum: ASP .Net
    Replies:
    7
    Views:
    595
    =?Utf-8?B?S3VydCBTY2hyb2VkZXI=?=
    Feb 3, 2005
  2. Rensjuh
    Replies:
    7
    Views:
    984
    Mabden
    Sep 2, 2004
  3. Elad
    Replies:
    0
    Views:
    415
  4. Andreas S
    Replies:
    3
    Views:
    268
    Eric Hodel
    Dec 9, 2006
  5. Replies:
    2
    Views:
    346
    A. Sinan Unur
    Dec 7, 2005
Loading...

Share This Page