python docs search for 'print'

D

David Hoese

A friend made me aware of this:
When a python beginner (2.x) quick searches for "print" on
docs.python.org, the print function doesn't even come up in the top 20
results. The print statement isn't even listed as far as I can tell.
Is there something that can be done about this to make it easier for
beginners?

I understand that this is a very basic search and "print" is a very
common word and a very basic python statement, but it's pretty difficult
for a beginner to learn when the first 5 results are about the
disassembler and the next 5 are C functions.

-Dave
 
T

Thomas 'PointedEars' Lahn

David said:
A friend made me aware of this:
When a python beginner (2.x) quick searches for "print" on
docs.python.org, the print function doesn't even come up in the top 20
results. The print statement isn't even listed as far as I can tell.
Is there something that can be done about this to make it easier for
beginners?

I understand that this is a very basic search and "print" is a very
common word and a very basic python statement, but it's pretty difficult
for a beginner to learn when the first 5 results are about the
disassembler and the next 5 are C functions.

If they scroll down they will find, among other entries, "1. Introduction"
(to the "Python library"), which they should have read in the first place.

The main problem, as I see it, is that the first search results are Unicode-
sorted by document title, where uppercase letters come first.

However, I do not think that posting to this newsgroup will change anything
there. You should take it to the python.org people instead who are, I am
sorry to say so, responsible for this mess as well¹ (I have seldom, if
ever, found anything useful using that search; usually I go by TOC and
index). There is a "Found a bug?" link at the bottom that appears to be of
use.

_____
¹ The other mess they created (or allowed to be created) is this mashup of
newsgroup and mailing list, neither of which works properly, because the
underlying protocols are not compatible. Add to that the abomination
that Google Groups has become.
 
S

Steven D'Aprano

A friend made me aware of this:
When a python beginner (2.x) quick searches for "print" on
docs.python.org, the print function doesn't even come up in the top 20
results. The print statement isn't even listed as far as I can tell. Is
there something that can be done about this to make it easier for
beginners?

I understand that this is a very basic search and "print" is a very
common word and a very basic python statement, but it's pretty difficult
for a beginner to learn when the first 5 results are about the
disassembler and the next 5 are C functions.

I sympathise. The search functionality on docs.python.org is frankly
crap, and the best thing for your friend to do is to learn to use google,
duckduckgo or some other search engine:

https://www.google.com.au/search?q=python+print
http://duckduckgo.com/html/?q=python+print

In this case, google hits the right Python documentation on the first
link. Duckduckgo doesn't do nearly so well, but it comes up with a bunch
of useful third-party links. It does better for searches with fewer

The second best thing for your friend to do is to learn to read the index
to the docs, where the print statement is listed:

http://docs.python.org/reference/index.html

You can use your browser's Find command to search the page for "print".
 
S

Steven D'Aprano

¹ The other mess they created (or allowed to be created) is this mashup
of newsgroup and mailing list, neither of which works properly,


In what way do they not work properly?

because
the underlying protocols are not compatible.

What?

That is rather like saying that you can't read email via a web interface
because the http protocol is not compatible with the smtp protocol.

Add to that the abomination that Google Groups has become.

It's always been an abomination, although I understand it is much, much
worse now. Blame Google for that.
 
S

Steven D'Aprano

https://www.google.com.au/search?q=python+print
http://duckduckgo.com/html/?q=python+print

In this case, google hits the right Python documentation on the first
link. Duckduckgo doesn't do nearly so well, but it comes up with a bunch
of useful third-party links. It does better for searches with fewer


Gah! Brain meltdown! DDG does better on searches for Python terms with
fewer extraneous meanings, e.g. "python print" finds many links about
fashion, but https://duckduckgo.com/html/?q=python+tuple is all about
Python tuples :)
 
R

Ramchandra Apte

A friend made me aware of this:

When a python beginner (2.x) quick searches for "print" on

docs.python.org, the print function doesn't even come up in the top 20

results. The print statement isn't even listed as far as I can tell.

Is there something that can be done about this to make it easier for

beginners?



I understand that this is a very basic search and "print" is a very

common word and a very basic python statement, but it's pretty difficult

for a beginner to learn when the first 5 results are about the

disassembler and the next 5 are C functions.



-Dave

I was actually planning to write a bug on this.
 
R

Ramchandra Apte

A friend made me aware of this:

When a python beginner (2.x) quick searches for "print" on

docs.python.org, the print function doesn't even come up in the top 20

results. The print statement isn't even listed as far as I can tell.

Is there something that can be done about this to make it easier for

beginners?



I understand that this is a very basic search and "print" is a very

common word and a very basic python statement, but it's pretty difficult

for a beginner to learn when the first 5 results are about the

disassembler and the next 5 are C functions.



-Dave

I was actually planning to write a bug on this.
 
T

Terry Reedy

I was actually planning to write a bug on this.

If you do, find the right place to submit it.
bugs.python.org is for issues relating to the cpython repository.'
I fairly sure that the website search code is not there.

If you do find the right place, you should contribute something to an
improvement. The current search performance is not a secret, so mere
complaints are useless.
 
R

Ramchandra Apte

If you do, find the right place to submit it.

bugs.python.org is for issues relating to the cpython repository.'

I fairly sure that the website search code is not there.



If you do find the right place, you should contribute something to an

improvement. The current search performance is not a secret, so mere

complaints are useless.

I was thinking we could just use Google Site search (it's fast easy to setup and gives good results)
 
R

Ramchandra Apte

If you do, find the right place to submit it.

bugs.python.org is for issues relating to the cpython repository.'

I fairly sure that the website search code is not there.



If you do find the right place, you should contribute something to an

improvement. The current search performance is not a secret, so mere

complaints are useless.

I was thinking we could just use Google Site search (it's fast easy to setup and gives good results)
 
T

Terry Reedy

I was thinking we could just use Google Site search (it's fast easy to
setup and gives good results)

I have the impression that that is what we once did, but maybe not. Or
maybe that is or was for python.org but not docs.python.org, etc. Ease
version of the docs needs the search restricted to that version. If you
can give the way to do the easy setup, with that constraint, that would
be a positive suggestion, accepted or not.
 
R

Ramchandra Apte

I have the impression that that is what we once did, but maybe not. Or

maybe that is or was for python.org but not docs.python.org, etc. Ease

version of the docs needs the search restricted to that version. If you

can give the way to do the easy setup, with that constraint, that would

be a positive suggestion, accepted or not.

Google site search costs 2000$ for 500,000 searches per year and 750$ for 150,000 searches so its quite expensive.
Also the print function only comes in the third result (python 3.2)
if you search for "site:docs.python.org/release/3.2 print" the print function is not found at all.
I think a specialized algorithm would work better.
I'm going to code an program for this.
 
R

Ramchandra Apte

I have the impression that that is what we once did, but maybe not. Or

maybe that is or was for python.org but not docs.python.org, etc. Ease

version of the docs needs the search restricted to that version. If you

can give the way to do the easy setup, with that constraint, that would

be a positive suggestion, accepted or not.

Google site search costs 2000$ for 500,000 searches per year and 750$ for 150,000 searches so its quite expensive.
Also the print function only comes in the third result (python 3.2)
if you search for "site:docs.python.org/release/3.2 print" the print function is not found at all.
I think a specialized algorithm would work better.
I'm going to code an program for this.
 
G

Grant Edwards

If you do, find the right place to submit it.
bugs.python.org is for issues relating to the cpython repository.'
I fairly sure that the website search code is not there.

If you do find the right place, you should contribute something to an
improvement. The current search performance is not a secret, so mere
complaints are useless.

Making the site's "search" box use Google or somesuch is probably the
simplest solution. I'm not enough of a web guy to know how to do
that, but I do know that some sites do handle site search that way.
 
D

Dave Angel

Making the site's "search" box use Google or somesuch is probably the
simplest solution. I'm not enough of a web guy to know how to do
that, but I do know that some sites do handle site search that way.
And google has some API's to make it relatively painless. And a license
form to fill in and send, along with your check.
 
G

Grant Edwards

And google has some API's to make it relatively painless. And a
license form to fill in and send, along with your check.

I just saw the posting mentioning the pricing. So it is a simple
simple solution, but it's probably not cheap enough...
 
T

Thomas 'PointedEars' Lahn

Stephen said:
In what way do they not work properly?

Most prominently, threads are completely and utterly borken.
What?

That is rather like saying that you can't read email via a web interface
because the http protocol is not compatible with the smtp protocol.

Apples and oranges. The problem is gating messages from a mail server to a
news server and vice-versa without regard to the differences between the
underlying protocols.

Netnews User Agents (NUAs, newsreaders), are currently based on [RFC3977]
and [RFC5536].

In a Netnews article, a References header field is mandatory for a posting
that is a follow-up. (Threading by Subject and Date works poorly, if at
all, so the Specification does not suggest that.) The last element of the
References header field value has to be a Message-ID specifiying the
article's precursor. That Message-ID has to match the Message-ID header
field value of an existing posting, unless it has expired on the target
newsserver or was canceled (with Supersedes being a special case). The
In-Reply-To header field (see below) is not allowed there, but it is set by
some hybrid MUA/NUAs like Mozilla Thunderbird anyway¹.

Mail User Agents (MUAs, mailreaders), on the other hand, are currently based
on [RFC5321], [RFC1939], IMAP4 (various RFCs, starting with [RFC1730]), and
last but not least [RFC5322].

There are two possible header fields to build a thread of e-mail messages:
In-Reply-To, and References. Whereas the first header field's value is
supposed to be a Message-ID and the second one's as described in [RFC5536].
Few MUAs set both, some set the first one, and many set none of them at all,
because there is no absolute requirement to set any of them (see [RFC5322],
section 3.6.4.)

And then there is utterly borken software – or shall we say utterly borken
approaches? Consider for example the recent thread with Subject "simple
client data base" started by Mark R Rivet. The original posting has:

| User-Agent: ForteAgent/7.00.32.1200

(posted using a newsreader)

| […]
| Message-ID: <[email protected]>

Chris Angelico's follow-up to that has

| In-Reply-To: <[email protected]>
| References: <[email protected]>
| […]
| Message-ID: <[email protected]>
| […]
| X-Mailman-Version: 2.1.15

(apparently posted using a mailreader, gated by python.org's mail software)

So far, so good. But Peter Otten's follow-up to Chris Angelico's posting
has

| References: <[email protected]>
| <CAPTjJmpHPE=SdE_XJtdi4DMFVeWa8Exo3Arsu13Hd8fgSuZ5bw@mail.gmail.com>
| […]
| User-Agent: KNode/4.7.3

(posted using a newsreader)

| […]
| Message-ID: <[email protected]>

As you can see, the Message-ID of Chris' posting does not occur in the
References header field value of Peter's posting, which is caused by
python.org's SMTP-to-NNTP gating program to set its own Message-ID, ignoring
the Message-ID of the server where the message was injected. Therefore,
although it is a followup to Chris' posting, Peter's posting has no
*technical* (metadata) relation to Chris' posting.

Instead, it should have

| References: <[email protected]>
| <[email protected]>
| […]

or, better: Chris' posting should have had the original

| […]
| Message-ID:
| <CAPTjJmpHPE=SdE_XJtdi4DMFVeWa8Exo3Arsu13Hd8fgSuZ5bw@mail.gmail.com>
| […]

(no word-wrap), then the header fields of Peter's posting can stay as they
are.

My newsreader (KNode/4.4.11) tries its best to resolve this (short of
threading by Subject and Date, which does not work; see above) which causes
Peter's posting to end up as a follow-up to *Mark's* posting instead
(specified by the only valid Message-ID in the References header). Only
when you read Peter's posting you realize that it is not a follow-up to
Mark's at all. Confusion ensues.

There are a lot of similar examples here. As a result of the Message-ID
rewriting, in several cases a follow-up even appears as if it was an
original posting, without any technical (and therefore without any obvious
visual) relation to the thread it actually belongs to at all, even though
the precursor has not expired. For example,

| […]
| X-Original-To: (e-mail address removed)
| Delivered-To: (e-mail address removed)
| […]
| In-Reply-To: <[email protected]>
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| References: <[email protected]>
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| Date: Tue, 4 Sep 2012 14:27:35 -0400
| Subject: Re: python docs search for 'print'
| From: Joel Goldstick <[email protected]>
| To: David Hoese <[email protected]>
| Content-Type: text/plain; charset=UTF-8
| Cc: (e-mail address removed)
| […]
| Newsgroups: comp.lang.python
| Message-ID: <[email protected]>
| […]
|
| > […]

There is no message with Message-ID <[email protected]> (at least
not on the newsserver that I use), because that header field value was
overwritten by the borken gating software that python.org uses. The actual
message posted by that software is:

| […]
| X-Original-To: (e-mail address removed)
| Delivered-To: (e-mail address removed)
| […]
| Date: Tue, 04 Sep 2012 13:58:43 -0400
| From: David Hoese <[email protected]>
| User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7;
| rv:15.0) Gecko/20120824 Thunderbird/15.0
| […]
| To: (e-mail address removed)
| Subject: python docs search for 'print'
| […]
| Newsgroups: comp.lang.python
| Message-ID: <[email protected]>
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

To further show that this is not a coincidence, and that I am not imagining
things here, the same problems started to occur when some people of the
German-speaking Python mailing list at python.org thought it would be a good
idea to merge that mailing list and the German-speaking newsgroup
de.comp.lang.python not so long ago, using the same software. As a result,
that Python newsgroup is a complete mess now, too.
It's always been an abomination,

After they took over the Dejanews archive it was rather OK. You could use
it with the keyboard, lines were at least automatically wrapped at 80
columns (but unfortunately, only when sending and there was no preview
[AFAIK it still isn't]), they removed postings reported as spam, and so
forth.
although I understand it is much, much worse now.

Now you cannot even use it with the keyboard, the postings are not properly
word-wrapped when typing or submitting (resulting in lines of 200 characters
and more). The spam is not removed at all, but only hidden from *Google*
*Groups* users, which causes it to be distributed on Usenet unchecked unless
the closest peers of the Google Groups servers happen to employ a suitable
spam filter, or have at least one dedicated user who runs a killbot.
Blame Google for that.

I do, and I have UDP'd Google Groups since April for that (except follow-ups
to my postings). However, I am also blaming the people still using it
without complaining sufficiently, because if they would not use it or would
complain more often and louder, Google would have to fix it. Unfortunately,
most people do not even know where they are posting to when they access
Usenet via Google Groups, so there is little hope for improvement of the
situation.

But that is another can of worms entirely.

__________
¹ Recent example: <
References:

[RFC1730] Crispin, M. "INTERNET MESSAGE ACCESS PROTOCOL - VERSION 4"
(IMAP4). December 1994. <http://tools.ietf.org/html/rfc1730>
[RFC1939] Myers, J. and Rose, M. "Post Office Protocol - Version 3".
May 1996. <http://tools.ietf.org/html/rfc1939>
[RFC3977] Feather, C. "Network News Transfer Protocol (NNTP)".
October 2006. <http://tools.ietf.org/html/rfc3977>
[RFC5321] Klensin, J. "Simple Mail Transfer Protocol" (SMTP).
October 2008. <http://tools.ietf.org/html/rfc5321>
[RFC5322] Resnick, P. (ed.) "Internet Message Format".
October 2008. <http://tools.ietf.org/html/rfc5322>
[RFC5536] Murchison, K., Lindsey, C., and Kohn, D.
"Netnews Article Format". November 2009.
<http://tools.ietf.org/html/rfc5536>
 
T

Terry Reedy

These ever increasing extra blank lines with each quote are obnoxious.
Consider using a news reader with news.gmane.org instead of google crap.
Or snip heavily.
Google site search costs 2000$ for 500,000 searches per year and 750$ for 150,000 searches so its quite expensive.
Also the print function only comes in the third result (python 3.2)
if you search for "site:docs.python.org/release/3.2 print" the print function is not found at all.
I think a specialized algorithm would work better.
I'm going to code an program for this.

A simple algorithm would be to present index search results first, if
there are any, and then page search results.

Then searching print would return
"Index entries for print:"
Builtin-functions page
a couple of others...

Pages containing print:
<list of about 150 pages>

I would not worry about duplication.

Labeling index results as such would clue people in to the fact that
they could have looked for the object name in the index. People names
like 'Lundh' that are not indexed but which appear on several pages
would give the same result as before.

Looking at the web page (which I do not normally use), I see that the
problem is deeper. The left margin of every page have an inviting "Quick
search" box with text "Enter search terms or a module, class or function
name." But it does not currently work very well for such object names.
The index is only available from the main contents page.

This contrasts with the Windows docs which has an index tab, making the
index directly available from *anywhere*. (There is also a separate text
search tab.) I think an index search box should be added above the text
search box. I will ask on pydev where the suggestion should go.
 
W

Walter Hurry

On 9/5/2012 8:45 AM, Ramchandra Apte wrote:
These ever increasing extra blank lines with each quote are obnoxious.
Consider using a news reader with news.gmane.org instead of google crap.
Or snip heavily.

+1. And the duplicated posts. Enough of him. Bozo bin it is.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,009
Latest member
GidgetGamb

Latest Threads

Top