Strip out CSS

M

M

When saving web pages, I'd like to strip out all CSS and just leave the raw
HTML intact. Some web developer toolbars will strip out the CSS, but for
some reason they won't let you save the page this way. Any tools that can do
this?

(PS: Yes, I know I could manually delete any style sheets but would like to
automate this process. Bonus points if it can strip out inline styles as
well.)

M
 
D

dorayme

"M said:
When saving web pages, I'd like to strip out all CSS and just leave the raw
HTML intact. Some web developer toolbars will strip out the CSS, but for
some reason they won't let you save the page this way. Any tools that can do
this?

(PS: Yes, I know I could manually delete any style sheets but would like to
automate this process. Bonus points if it can strip out inline styles as
well.)

M

Give an example of one url you would like to do this to.
 
M

M

dorayme said:
Give an example of one url you would like to do this to.

Not sure why this is relevant but, hey, if it leads to something. . . As an
example:

http://niftytutorials.com/basics/transform-your-photos-into-a-beautiful-mosaic/1/

Essentially, I just want to save barebones articles with any relevant
images. I don't want Google ads, sidebars, irrelevant banner images, forms,
search boxes, background images, scripts, etc.

Sometimes the website is gracious enough to offer a print version which gets
rid of most of this stuff.

I have a Notetab script which does most of what I want but wanted to see if
something else out there is better at it.

M
 
J

jmatt

When saving web pages, I'd like to strip out all CSS and just leave the raw
HTML intact. Some web developer toolbars will strip out the CSS, but for
some reason they won't let you save the page this way. Any tools that can do
this?

What browser are you using?
 
D

dorayme

"M said:
dorayme said:
Not sure why this is relevant but, hey, if it leads to something. . . As an
example:

http://niftytutorials.com/basics/transform-your-photos-into-a-beautiful-mosaic
/1/

Essentially, I just want to save barebones articles with any relevant
images. I don't want Google ads, sidebars, irrelevant banner images, forms,
search boxes, background images, scripts, etc.

Sometimes the website is gracious enough to offer a print version which gets
rid of most of this stuff.

I have a Notetab script which does most of what I want but wanted to see if
something else out there is better at it.

M

It is tricky to fashion a general facility to distinguish between
relevant and irrelevant images as you can imagine. Best I can
suggest is this, open in FF (equipped with free developer tools)
and turn off all css and probably javascript too. Save as
webpage. Open the saved in a browser. If too rich for you still,
just delete the associated folder which contains all the images
and other stuff, or inspect the folder and be rid things
selectively - but this is not what you want to do). I am afraid
there is nothing as intelligent as you for this job.
 
J

jmatt

This may give you what you want.

At the top of the Firefox browser select VIEW, PAGE STYLE, then NO
STYLE.

This will strip any web page that you're viewing of all CSS styling.
 
M

M

This may give you what you want.

At the top of the Firefox browser select VIEW, PAGE STYLE, then NO
STYLE.

This will strip any web page that you're viewing of all CSS styling.

Yes, I know. However when you save the de-"css"-esified page, all the CSS is
still saved with it. When you open the saved page again, all the CSS shows
up again. It's the same with the web developer toolbars -- they let you
turn off the CSS to view the page but they don't let you save the modified
page. :(

M
 
J

jmatt

Have a look at this one.

Stylish
https://addons.mozilla.org/en-US/firefox/addon/2108
https://addons.mozilla.org/en-US/firefox/search?q=style&status=4
http://dev.upian.com/hotlinks/tag/greasemonkey?tag=greasemonkey&n=4
Firefox Extension for managing user styles - Stylish allows you to
easily manage user styles for the application UI, all websites, or
only certain websites. Stylish is better than using userChrome.css/
userContent.css because styles are applied immediately instead of
requiring a restart.
Stylish is to CSS what Greasemonkey is to JavaScript. Stylish allows
you to easily manage user styles for the application UI, all websites,
or only certain websites. Stylish is better than using userChrome.css/
userContent.css because styles are applied immediately instead of
requiring a restart.
 
S

Susan Bugher

M said:
When saving web pages, I'd like to strip out all CSS and just leave the raw
HTML intact. Some web developer toolbars will strip out the CSS, but for
some reason they won't let you save the page this way. Any tools that can do
this?

I have a hunch those toolbars don't "strip out" anything. ISTM more
likely they just ignore it.

copied from another post:

"Essentially, I just want to save barebones articles with any relevant
images. I don't want Google ads, sidebars, irrelevant banner images,
forms, search boxes, background images, scripts, etc."

Have you looked at Net Picker?

Program: Net Picker
Company: 100share.com
Ware: (Freeware)
http://www.netpicker.net/
http://www.netpicker.net/netpicker.html

"NetPicker allows you to select and save a portion of the web page by
dragging it from your browser. NetPicker can save all the useful format
like image, table or font style, and organize your collection in a vivid
tree structure. You can even write down your comments in the original
page at any time. you can drag each item node on the tree view to a
new position for a better arrangement. Select NEW to insert a new item;
NEW SUBITEM to add a subitem; Press F2 to edit the item title."

Susan
--
Posted to alt.comp.freeware
Search alt.comp.freeware (or read it online):
http://www.google.com/advanced_group_search?q=+group:alt.comp.freeware
Pricelessware & ACF: http://www.pricelesswarehome.org
Pricelessware: http://www.pricelessware.org (not maintained)
 
D

dorayme

"M said:
Yes, I know. However when you save the de-"css"-esified page, all the CSS is
still saved with it. When you open the saved page again, all the CSS shows
up again. It's the same with the web developer toolbars -- they let you
turn off the CSS to view the page but they don't let you save the modified
page. :(

M

See my post, css did not activate in the saved html using the
technique I outlined.
 
J

Jim Moe

M said:
Not sure why this is relevant but, hey, if it leads to something. . . As an
example:

http://niftytutorials.com/basics/transform-your-photos-into-a-beautiful-mosaic/1/

Essentially, I just want to save barebones articles with any relevant
images. I don't want Google ads, sidebars, irrelevant banner images, forms,
search boxes, background images, scripts, etc.
CSS is the least of the problem, then. In most cases you can ignore
anything between <style> and </style>, or style="inline_syling". Poof! No CSS!
But the rest of the stuff? I doubt you'll find anything that can
distinguish between a "desirable" image and an "undesirable" one.
You can reduce the amount of crud received by the browser be using a
filtering proxy like Squid.
 
B

Ben C

dorayme said:
Not sure why this is relevant but, hey, if it leads to something. . . As an
example:

http://niftytutorials.com/basics/transform-your-photos-into-a-beautiful-mosaic/1/

Essentially, I just want to save barebones articles with any relevant
images. I don't want Google ads, sidebars, irrelevant banner images, forms,
search boxes, background images, scripts, etc.

Sometimes the website is gracious enough to offer a print version which gets
rid of most of this stuff.

I have a Notetab script which does most of what I want but wanted to see if
something else out there is better at it.

If you want to get a lot of stuff out of one particular site a script
using curl and BeautifulSoup (which is a Python module) may be the way
to go, especially if the content has class or id attributes in it that
you can use to latch onto the bits you want.

I use this method for TV listings and traffic news.
 
M

M

dorayme said:
Best I can
suggest is this, open in FF (equipped with free developer tools)
and turn off all css and probably javascript too. Save as
webpage.

I did this (via the View | Page Style | No style) but FF still saves with
the CSS intact. When you open the saved page, there is all the CSS again. Am
I doing this wrong?

Open the saved in a browser. If too rich for you still,
just delete the associated folder which contains all the images
and other stuff,

What I have been doing combined with Notetab text editing and Scrapbook's
DOM editor. It would be nice to have one easy to use tool to do all this. (I
sometimes use Amaya for very busy pages. . .)

M
 
M

M

Susan Bugher said:
Have you looked at Net Picker?

I have used it. IIRC it converts everything to HTML 3.2. Also not sure that
it would be any quicker than what I'm doing now, what with all the selective
dragging and dropping.

M
 
M

M

I thank all for some of your suggestions but most of them deal with CSS and
not the bigger issue of scripts, ads, irrelevant sidebars (tables or divs),
etc. Maybe I'm coming at this the wrong way.

As I mentioned, Notetab's script language does most stuff for me. In order
to strip out CSS though I need to strip out phrases like:
id="something"
class="something"
style="bunch of css attributes"

I've been playing around with Notetab's (v4.95) regular expression search
and replace but I can't seem to find a combination that finds the above
expressions.

Is there a regular expression program that will break this down for me? For
example, the program RegEx Coach lets you enter your text, then test various
regular expressions. The results are highlighted in real time in the text
you entered.

I need something that works IN REVERSE. i.e. I enter text, highlight the
expression I want removed, then it tells me the regular expression needed to
achieve that.

Anything like that out there?

(PS, yes, I know that removing either the stylesheet or the embedded styles
will render any id and class calls irrelevant. However, there are times when
I need them intact, so it would be nice to have the option. . .)

M
 
B

Ben C

I thank all for some of your suggestions but most of them deal with CSS and
not the bigger issue of scripts, ads, irrelevant sidebars (tables or divs),
etc. Maybe I'm coming at this the wrong way.

As I mentioned, Notetab's script language does most stuff for me. In order
to strip out CSS though I need to strip out phrases like:
id="something"
class="something"
style="bunch of css attributes"
I've been playing around with Notetab's (v4.95) regular expression search
and replace but I can't seem to find a combination that finds the above
expressions.

(style|id|class)=".*?"

is your basic regexp for that in PCRE, which I think is what Notetab
uses. Not too difficult.

It reads 'style or id or class followed by =" and then everything up to
the next "'
Is there a regular expression program that will break this down for me? For
example, the program RegEx Coach lets you enter your text, then test various
regular expressions. The results are highlighted in real time in the text
you entered.

I need something that works IN REVERSE. i.e. I enter text, highlight the
expression I want removed, then it tells me the regular expression needed to
achieve that.

That's very difficult for the program to know-- there are a vast number
of ways to match a given bit of highlighted text, how is the program
supposed to know which of them you want?
Anything like that out there?

Honestly it's easier just to read the manual. The Python docs have a
very clear explanation of PCRE syntax.

http://docs.python.org/lib/re-syntax.html
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top