Need ideas on how to make this code faster than a speeding turtle

U

Uri Guttman

IZ> [A complimentary Cc of this posting was sent to
IZ> Uri Guttman

IZ> This may depend on many parameters, but the overhead of
IZ> system()ing may be quite low. The overhead of opening a new HTTP
IZ> connection for each line may be larger. LWP will have a chance to
IZ> use persistent connections...

i highly doubt forking lynx and it doing a fetch with passing the page
back via a pipe would be faster than a direct call to lwp and getting
the page in ram. it would have to be a very odd system for the lynx
solution to be faster.

and lynx would have to always open a new connection as forked procs have
no memory.

uri
 
A

A. Sinan Unur

I have not changed my identity. My name is Gordon Etly. I have not
changed that part, nor made any attempt to hide it, so your statement
is false.

I happen to be a sys op for the company I work for, including our mail
server, so I am able to add entries to /etc/aliases (which I commonly
use to public variants of my main email address that any unwanted
mailings can be easily stopped.) I've never seen any rule saying
"never change your email field", as that is anyone's right.

Noting from the Anti-Troll FAQ:

Subject: 7.6 Morphed Identity

A morphed identity is when a poster has one usenet identity,
which changes in detail, to outwit killfiles. For instance the
name may remain the same and the email address change, or the
name and/or email address may contain accented characters which
are changed for different versions of the same letter.

Here are all of your selves as recorded by my killfile:

gordon etly (e-mail address removed) 0
gordon etly (e-mail address removed) 0
gordon etly (e-mail address removed) 0

I have added another one with this post.

Now, I don't know about Individual.NET's policies regarding morphing,
but their terms of use seems to explicitly prohibit using domains you do
not own as your from address:

http://www.individual.net/rules.php

Sender Address
The e-mail addresses given in "From:", "Reply-To:", and "Sender:" should
be valid (= should not bounce because of invalidity). Using addresses
and name space of other people without their permission is prohibited.

It does not look like you own bent-INVALID-sys.com or invalidbentsys.com
or bentsys-invalid.com.

Don't morph. Pick one identity and stick with it.

Sinan

--
A. Sinan Unur <[email protected]>
(remove .invalid and reverse each component for email address)

comp.lang.perl.misc guidelines on the WWW:
http://www.rehabitation.com/clpmisc/
 
G

Gordon Etly

Changing your identity again because everyone filtered you?

1) My identity has never changed. It has always been Gordon Etly, which
is my name.

2) Why are you trying to speak for everyone. While certain people may
share your view (and vice versa), it doesn't mean you speak for the
whole of the group.
 
G

Gordon Etly

A. Sinan Unur said:
"Gordon Etly" wrote in
Noting from the Anti-Troll FAQ:

Subject: 7.6 Morphed Identity

A morphed identity is when a poster has one usenet identity,

Any email address is not an identity. It's an email address. The "Name"
field is your identity), and I have not changed that. I am free to
change my email address field however I wish, as are you and anyone
else.

Sender Address
The e-mail addresses given in "From:", "Reply-To:", and "Sender:"
should
be valid (= should not bounce because of invalidity). Using addresses
and name space of other people without their permission is prohibited.

Being in control of your mail server actually allows you to fulfill the
"should not bounce because of invalidity" if you want to get down to
that. How a poster writes their email address is completely up to that
person. A rather large amount of people munge their email addresses, so
this isn't even an issue.

Lastly, attempting to pose that "identity" on a medium like UseNet
actually meaning something is idiotic at best. There is no guarantee
that a name you see is a real name, and in many cases it is not. Many
people use a "nick" name of sorts, and it is quite common to use a false
or munged email address to thwart spammer email harvesting.
 
I

Ilya Zakharevich

[A complimentary Cc of this posting was sent to
Uri Guttman
IZ> This may depend on many parameters, but the overhead of
IZ> system()ing may be quite low. The overhead of opening a new HTTP
IZ> connection for each line may be larger. LWP will have a chance to
IZ> use persistent connections...

i highly doubt forking lynx and it doing a fetch with passing the page
back via a pipe would be faster than a direct call to lwp and getting
the page in ram. it would have to be a very odd system for the lynx
solution to be faster.

and lynx would have to always open a new connection as forked procs have
no memory.

I do not think you understood what I wrote.

I'm not claiming that *this* overhead is small. What I say is that
*other* overheads may be not negligible.

Anyway, all overheads I know are in favor on LWP.

Hope this helps,
Ilya
 
G

Gordon Etly

There is no "Name" field. The From: header often includes both a name
and an email address.

Many readers separate the "name" and "email" fields. I never changed my
name. The email address part of the From: line is not a atatic entity;
one can always change their email address. It's anyone's right to do so,
as it's their info. You're not suggesting an email address is a reliable
way of tracking someone, are you?
Changing one's From: header as often as you have is a strong
indicator of a troll.

Or someone who does not wish to satisfy someone's false notion that they
can force the last word using that tired old method. If they going to
reply and then inform you that you're killfiled, as if the public really
needs to know (#1), then it is no less wrong to circumvent their
killfile; it's attack an counter, something that's existed as long as
man.

If one really wants to ignore me, they can either not read my posts or
block my name, as that remains constant.

It is not common to alter the From: header

This is untrue. I see many people post one day with one name and/or
email and the next time I see a variant of their Name (or a nick name)
and/or a differing email address.

no matter whether your name is Gordon Etly, Gordon Gekko, or Trolly
McTroll.

My name has always been Gordon Etly. That is my identity; my name. If
one wishes to killfile me using that, then they are welcome to do so. If
they killfile me by email address then



(#1)
If you true need to ignore someone, you don't need to announce the fact,
or for that matter, one doesn't need a killfile either, though it can be
nice.
 
A

A. Sinan Unur

(e-mail address removed) wrote in @k10g2000prm.googlegroups.com:
I 'll eventually have the input file filled with 350 million items.

Incidentally, if you could do three pages in a second, this corresponds to
about 3.7 years of continues scraping.

If you try to do this in massively parallel way, then it might be
considered a denial of service attack.

Of course, if you could do that, then the performance constraints of the
web server on the other and of the connection kick in.

I am not sure if it is a good idea for you to invest any more time &
resources into improving the performance of your script.

Sinan
--
A. Sinan Unur <[email protected]>
(remove .invalid and reverse each component for email address)

comp.lang.perl.misc guidelines on the WWW:
http://www.rehabitation.com/clpmisc/
 
A

A. Sinan Unur

Any email address is not an identity. It's an email address. The
"Name" field is your identity), and I have not changed that.
I am free to change my email address field however I wish,
as are you and anyone else.

In newsgroups, your identity is your full handle. It does not matter if
that does not correspond to your real life identity. So, so long as you
pick one, and stick with it, no one has a problem with it.

Except,

You snipped the source of that rule. That is a rule stated by the
service provider you chose.
Being in control of your mail server actually allows you to fulfill
the "should not bounce because of invalidity" if you want to get down
to that.

That's funny because most of the domain names you use are not
registered. I am not sure how you are running a mail servers for non-
existent domains.

Second, some of the domains you use are registered but do not seem to be
owned by someone named Gordon Etly.
How a poster writes their email address is completely up to
that person. A rather large amount of people munge their email
addresses, so this isn't even an issue.

From other users' perspective, what matters is that you pick one and
stick with it. It seems like your service provider has explicit policies
prohibiting you from using non-existent domains or domains owned by
others. So, you should argue this point with them.
Lastly, attempting to pose that "identity" on a medium like UseNet
actually meaning something is idiotic at best. There is no guarantee
that a name you see is a real name, and in many cases it is not. Many
people use a "nick" name of sorts, and it is quite common to use a
false or munged email address to thwart spammer email harvesting.

And that is completely irrelevant.

Sinan

--
A. Sinan Unur <[email protected]>
(remove .invalid and reverse each component for email address)

comp.lang.perl.misc guidelines on the WWW:
http://www.rehabitation.com/clpmisc/
 
J

Jürgen Exner

Gordon Etly said:
1) My identity has never changed.

Oh really? So
Author: Gordon Etly <[email protected]>
Author: Gordon Etly <[email protected]>
Author: Gordon Etly <[email protected]>
was not you? How come that I don't believe you?

And now using identity number 4:
Author: Gordon Etly <[email protected]>?

You must have a _REALLY_ bad reputation that you feel the need to change
your ID every other day.
2) Why are you trying to speak for everyone. While certain people may
share your view (and vice versa), it doesn't mean you speak for the
whole of the group.

I never claimed to speak for anyone but myself.

jue
 
J

Jürgen Exner

Gordon Etly said:
Many readers separate the "name" and "email" fields.

Nonsense. There is a From header. And maybe a ReplyTo header. And maybe
a FollowupTo header. But there is no such thing as a "Name" or an
"Email" header field in the first place.
I never changed my
name. The email address part of the From: line is not a atatic entity;
one can always change their email address. It's anyone's right to do so,
as it's their info. You're not suggesting an email address is a reliable
way of tracking someone, are you?

If someone has to change it frequently then it is a very good indication
that that person has something to hide in their past. Why else would
they change their ID frequently?

Back you go to where you crawled out from

jue
 
G

Gordon Etly

Oh really? So
Author: Gordon Etly <[email protected]>
Author: Gordon Etly <[email protected]>
Author: Gordon Etly <[email protected]>
was not you?

My name never changed. Email address is not an identity, it's an email
address. They are a variable field. One can always change it, so stop
trying to use that as an argument here. I said before my name never
changed and you just proved that for me.
I never claimed to speak for anyone but myself.

Not true:

( from above )
You clearly implied you knew -everyone- had done it. Stop trying to
misrepresent things in order to formulate your arguments.
 
G

Gordon Etly

Nonsense. There is a From header. And maybe a ReplyTo header. And
maybe a FollowupTo header. But there is no such thing as a "Name" or
an "Email" header field in the first place.

No, most readers that I've used give separate fields for Name and Email.
It writes the From: header behind the scenes. Either way, it doesn't
change the fact that Email part is a variable field that can change at
any time. Whether it's from changing email providers, or any number of
reasons (which one is not required to disclose), it is a person's own
choice what they want to display to the public as an email address.

Hell, some providers don't even require an email address (I once had one
when I was in Europe for a few months that allowed "Name < >" (a space
for an email), which I realized when I forgot to enter an email. Granted
most don't allow it, but the point is what ever it is, it's up to the
poster.
If someone has to change it frequently then it is a very good
indication that that person has something to hide in their past.

Err... I never changed my name, so how could I possible be trying to
hide? Actualyl quite the oppisite, I change the way my email appears in
the From: like so I am -NOT- hidden :)
 
G

Gordon Etly

A. Sinan Unur said:
In newsgroups, your identity is your full handle. It does not matter
if that does not correspond to your real life identity. So, so long
as you pick one, and stick with it, no one has a problem with it.

Except,
You snipped the source of that rule. That is a rule stated by the
service provider you chose.

So what? I am not violating it.
That's funny because most of the domain names you use are not
registered.

Please stop playing stupid. I am not the first to add "invalid" or
"nospam" or so to my email address. IT's a common practice and it've
never been prohibted by any privider I've come across. Bottom line: the
email address you enter is for public display and that's what many
harvesters look for.
Second, some of the domains you use are registered

I only use one domain. You know very well about munging practices so
please stop feigning ignorance so suddenly.
but do not seem to be owned by someone named Gordon Etly.

Come on, really. How many @aol, @yahoo, etc etc etc own those domains?
You know better than to make such an arugement. Most people -don't- own
the domain their email is in.
From other users' perspective, what matters is that you pick one and
stick with it.

One does not have to use the same email address. One is free to change
that to what ever they wish.
 
B

Ben Morrow

Quoth Jürgen Exner said:
Nonsense. There is a From header. And maybe a ReplyTo header. And maybe
a FollowupTo header. But there is no such thing as a "Name" or an
"Email" header field in the first place.


+-------------------+ .:\:\:/:/:.
| PLEASE DO NOT | :.:\:\:/:/:.:
| FEED THE TROLLS | :=.' - - '.=:
| | '=(\ 9 9 /)='
| Thank you, | ( (_) )
| Management | /`-vvv-'\
+-------------------+ / \
| | @@@ / /|,,,,,|\ \
| | @@@ /_// /^\ \\_\
@x@@x@ | | |/ WW( ( ) )WW
\||||/ | | \| __\,,\ /,,/__
\||/ | | | (______Y______)
/\/\/\/\/\/\/\/\//\/\\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\
==================================================================

Ben
 
G

Gordon Etly

One is also free to fart in an elevator whenever one wishes.

Not that I would recommend it.
The point being, not everything that's allowed is polite.

Agreed.

But lets go back a second: I have not changed my name. I have not
attempted to hide myself. I only slightly altered the appearance of my
email addresses field, which is that they are a variable field, and can
change at any time. Such as when none moves from one ISP to another. It
has nothing to do with politeness. It's my info that I set and it's my
choice. No one else's.

Please don't be no naive that all of UseNet uses real email addresses in
their From line. It's not uncommon see munged or fake emails, especially
if you've been spammed to high hell in the past when using a valid email
address. If you really want to get in direct contact with me, reply
saying so and I'll provide you with my real address (and how to decode
it.)
 
U

Uri Guttman

IZ> [A complimentary Cc of this posting was sent to
IZ> Uri Guttman
IZ> This may depend on many parameters, but the overhead of
IZ> system()ing may be quite low. The overhead of opening a new HTTP
IZ> connection for each line may be larger. LWP will have a chance to
IZ> use persistent connections...
IZ> I do not think you understood what I wrote.

so make it clearer the next time you write.

IZ> I'm not claiming that *this* overhead is small. What I say is that
IZ> *other* overheads may be not negligible.

IZ> Anyway, all overheads I know are in favor on LWP.

my point too.

uri
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,011
Latest member
AjaUqq1950

Latest Threads

Top