STDOUT beginner problem

M

mat.krawczyk

Hello,

I would like to write simple script for emails decoding. My problem is connected with input and output of an external program. I would like to use html2text converter:

open(HTML2TEXT, "| /usr/bin/html2text ") || die "html2text failed: $!\n";
$text = print HTML2TEXT $html;
close HTML2TEXT;
print $text;

but $text is empty and output is directed to STDOUT.

I will be grateful for any help..

Mateusz Krawczyk
 
R

Rainer Weikusat

I would like to write simple script for emails decoding. My problem is connected with input and output of an external program. I would like to use html2text converter:

open(HTML2TEXT, "| /usr/bin/html2text ") || die "html2text failed: $!\n";
$text = print HTML2TEXT $html;
close HTML2TEXT;
print $text;

but $text is empty and output is directed to STDOUT.

Output is to stdout because you didn't redirect it somewhere
else. Generally, the built-in 'pipe open' can't do what you want (write
data to some process and read its output back). IPC::Open2 can do that,
although using that is not as straight-forward as it seems (there's a
chance that both processes deadlock because both wait for data written
by the other). One way to deal with that is to use select and switch
between reading and writing as required. Another reasonably easy way
would be to use three processes, one which reads the output from the
external command, a 2nd which runs it and a 3rd which feeds input to it.

Example
-------
my ($in, $proc, $line, $rc);

$rc = open($proc, '-|');
if ($rc == 0) {
$rc = open($proc, '-|');
if ($rc == 0) {
#
# 3rd process: reads from input file, stdout connected
# to 2nd pipe
#
open($in, '<', '/var/log/syslog');
print $line while $line = <$in>;
exit(0);
}

#
# 2nd process: stdin redirected from 2nd pipe, stdout
# connected to 1st
#
open(STDIN, '<&', $proc);
exec('tr', '6', '^');
}

#
# original process: reads processed data from 1st pipe
#
print $line while $line = <$proc>;
 
G

gamo

El 26/11/13 18:38, (e-mail address removed) escribió:
Hello,

I would like to write simple script for emails decoding. My problem is connected with input and output of

an external program. I would like to use html2text converter:
open(HTML2TEXT, "| /usr/bin/html2text ") || die "html2text failed: $!\n";
$text = print HTML2TEXT $html;
close HTML2TEXT;
print $text;

but $text is empty and output is directed to STDOUT.

I will be grateful for any help..

Mateusz Krawczyk

Maybe simple backticks are what you are searching for

~$ cat test.backticks
#!/usr/bin/perl -W

use strict;

my $html = '<p>hi</p>';
my $text = `echo "$html" | /usr/bin/html2text`;
print $text, "\n";

~$ perl test.backticks
hi

~$ man perlop

(pay attention to the different ticks)

Good luck
 
G

gamo

El 26/11/13 23:21, Ben Morrow escribió:
Careful with your quoting. It would probably be better to write the HTML
to a file.

....probably, but the OP seems to not have problems with the input
You can use qx// instead of backticks, and it's usually clearer.

Ben

Then, he must use qx!! or some other separators

Thanks
 
R

Rainer Weikusat

Ben Morrow said:
Quoth gamo <[email protected]>:
[...]
my $html = '<p>hi</p>';
my $text = `echo "$html" | /usr/bin/html2text`;

Careful with your quoting. It would probably be better to write the HTML
to a file.

Not necessary. When starting to make "Gee that looks *complicated*,
can't I sell him something else instead?" assumptions, the simple way to
do this would be to create a small shell script,
 
G

gamo

El 26/11/13 23:55, Rainer Weikusat escribió:
Not necessary. When starting to make "Gee that looks *complicated*,
can't I sell him something else instead?" assumptions, the simple way to
do this would be to create a small shell script,

------
#!/bin/sh
printf '%s' "$1" | html2text
------

and use that like this:

------ ....
open($h2t, '-|', '/tmp/h2t', $html);
print(<$h2t>);

Simplest is to read a file from html2text argument but if he wants to
use cat file | html2text or echo to, he could, because the
interpretation of escapes is disabled by default:

DESCRIPTION
Echo the STRING(s) to standard output.

-n do not output the trailing newline

-e enable interpretation of backslash escapes

-E disable interpretation of backslash escapes (default)
 
G

gamo

El 27/11/13 01:50, Ben Morrow escribió:
*My* echo(1), OTOH, recognises neither -e nor -E, and the manpage says:

| The newline may also be suppressed by appending '\c' to the end of the
| string, as is done by iBCS2 compatible systems. Note that the -n option
| as well as the effect of '\c' are implementation-defined in IEEE Std
| 1003.1-2001 ("POSIX.1") as amended by Cor. 1-2002. For portability, echo
| should only be used if the first argument does not start with a hyphen
| ('-') and does not contain any backslashes ('\'). If this is not suffi-
| cient, printf(1) should be used.

and also this:

| Most shells provide a builtin echo command which tends to differ from
| this utility in the treatment of options and backslashes. Consult the
| builtin(1) manual page.

so using echo to pass arbitrary text is not reliable.

Ben


I'm afraid that is common to have 2 echo utilities avaible. One built-in
in the bash that does accept escapes and one in /bin/echo
who do not. It could be compared "help echo" with "man echo." My
response to the OP would be to substitute "echo" by "/bin/echo,"
as I remember it's said to do ever to enhance security when
invoquing commands.

Thanks
 
R

Rainer Weikusat

Ben Morrow said:
'Oh, but there's no need to put that script in a file either...':

open my $h2t, "-|", "/bin/sh", "-c",
q/printf %s "$1" | html2text/, $html;

...and we end up with the sort of mess shell always turns into.
Sometimes a temporary file is the cleanest and simplest solution.

In this particular case, the main complication is that html2text doesn't
support passing the text-to-be-processed literally as command-line
argument. And the simplest way to remedy that while avoiding issues with
'inappropriate data interpretation/ execution' is to create a shell
script which takes such an argument and passes it to html2text in the
appropriate way. This yields a new and possibly generally useful command
with more reasonable semantics. Actually, the replacement command could
be written in any programming language including Perl but for these
kinds of task, the shell is IMO the most appropriate tool.

Inline use of such a different programming language instead of creating
is new command is both messy and short-sighted.
 
R

Rainer Weikusat

gamo said:
El 26/11/13 23:55, Rainer Weikusat escribió:

Simplest is to read a file from html2text argument but if he wants to
use cat file | html2text or echo to, he could, because the
interpretation of escapes is disabled by default:

That's not the problem with the backticks idea. For a
live-demonstration, create a file with the following content:
 
G

gamo

El 27/11/13 16:54, Rainer Weikusat escribió:
In this particular case, the main complication is that html2text doesn't
support passing the text-to-be-processed literally as command-line
argument. And the simplest way to remedy that while avoiding issues with

At least my version of html2text supports input file as argument.
If that's a problem, anyway, it could be changed for "lynx -dump file,"
which is a more mature program from ISC, I think.
 
C

Charles DeRykus

(e-mail address removed) writes

Output is to stdout because you didn't redirect it somewhere
else. Generally, the built-in 'pipe open' can't do what you want (write
data to some process and read its output back). IPC::Open2 can do that,
although using that is not as straight-forward as it seems (there's a
chance that both processes deadlock because both wait for data written
by the other). One way to deal with that is to use select and switch
between reading and writing as required. Another reasonably easy way
would be to use three processes, one which reads the output from the
external command, a 2nd which runs it and a 3rd which feeds input to it.
...

The IPC::Open3 docs (Open2 is just a wrapper) now mention IPC::Run as
"having better error handling and facilities than Open3". Even though
deadlock is still a danger, the following seemed to work well even with
large html strings:

use IPC::Run qw/run/;

my @cmd = ('html2text');
my $html = ....;
run( \@cmd, \$html, \my $text);
say $text;
 
R

Rainer Weikusat

Charles DeRykus said:
The IPC::Open3 docs (Open2 is just a wrapper) now mention IPC::Run as
"having better error handling and facilities than Open3".

Judging from the documentation, this is a particulary ghastly example of
feeping creatureism. There are 56 open bug reports, among them one where
the author inadvertently used a wrong function because he apparently
didn't know the name of the correct one and couldn't be bothered to
read the documentation,

https://rt.cpan.org/Public/Bug/Display.html?id=42885

That's not particularly confidence-inspiring, especially considering
that the 'benefit' is not having to write an odd dozen lines of code.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,042
Latest member
icassiem

Latest Threads

Top