Readability (html purifier) in Python

  • Thread starter Дамјан ГеоргиевÑки
  • Start date
Ð

Дамјан ГеоргиевÑки

http://lab.arc90.com/experiments/readability/

Readability is a javascript bookmarklet that "makes reading on the Web
more enjoyable by removing the clutter around what you're reading."


Does anyone know of something similar in Python?




--
дамјан ((( http://damjan.softver.org.mk/ )))

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan
 
S

Stefan Behnel

Дамјан ГеоргиевÑки, 15.06.2010 17:44:
http://lab.arc90.com/experiments/readability/

Readability is a javascript bookmarklet that "makes reading on the Web
more enjoyable by removing the clutter around what you're reading."

Does anyone know of something similar in Python?

Well, that sounds like a browser tool. Could you be a bit more specific
about what kind of "similar" functionality you would expect from a
"similar" Python tool? How would you tell it "what you're reading", for
example?

Stefan
 
Ð

Дамјан ГеоргиевÑки

http://lab.arc90.com/experiments/readability/
Well, that sounds like a browser tool.

yes, it's a bookmarklet, a tiny javascript code that when clicked runs
on the current document in the browser.
Could you be a bit more specific about what kind of "similar"
functionality you would expect from a "similar" Python tool?
How would you tell it "what you're reading", for example?

I'm not sure I understand your question corectly, but anyway.

What I need is a package that given a random html document (a page from
any random website) would extract the meaningful content, and filter the
junk (advertisments, non-content elements, any other UI etc.)


Readability seems to do some herustictical manipulation of the DOM, but
I'm not that good at reading/understanding it's source-code. Of course
it can't be 100% correct, but it's good enough in many cases.

http://code.google.com/p/arc90labs-
readability/source/browse/trunk/js/readability.js



--
дамјан ((( http://damjan.softver.org.mk/ )))

war is peace
freedom is slavery
restrictions are enablement
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,901
Latest member
Noble71S45

Latest Threads

Top