Strip out CSS

Discussion in 'HTML' started by M, Jul 19, 2007.

  1. M

    M Guest

    When saving web pages, I'd like to strip out all CSS and just leave the raw
    HTML intact. Some web developer toolbars will strip out the CSS, but for
    some reason they won't let you save the page this way. Any tools that can do
    this?

    (PS: Yes, I know I could manually delete any style sheets but would like to
    automate this process. Bonus points if it can strip out inline styles as
    well.)

    M
    M, Jul 19, 2007
    #1
    1. Advertising

  2. M

    dorayme Guest

    In article <bCMni.131315$NV3.476@pd7urf2no>,
    "M" <> wrote:

    > When saving web pages, I'd like to strip out all CSS and just leave the raw
    > HTML intact. Some web developer toolbars will strip out the CSS, but for
    > some reason they won't let you save the page this way. Any tools that can do
    > this?
    >
    > (PS: Yes, I know I could manually delete any style sheets but would like to
    > automate this process. Bonus points if it can strip out inline styles as
    > well.)
    >
    > M


    Give an example of one url you would like to do this to.

    --
    dorayme
    dorayme, Jul 19, 2007
    #2
    1. Advertising

  3. M

    M Guest

    "dorayme" <> wrote in message
    news:...
    > In article <bCMni.131315$NV3.476@pd7urf2no>,
    > "M" <> wrote:


    > Give an example of one url you would like to do this to.


    Not sure why this is relevant but, hey, if it leads to something. . . As an
    example:

    http://niftytutorials.com/basics/transform-your-photos-into-a-beautiful-mosaic/1/

    Essentially, I just want to save barebones articles with any relevant
    images. I don't want Google ads, sidebars, irrelevant banner images, forms,
    search boxes, background images, scripts, etc.

    Sometimes the website is gracious enough to offer a print version which gets
    rid of most of this stuff.

    I have a Notetab script which does most of what I want but wanted to see if
    something else out there is better at it.

    M
    M, Jul 20, 2007
    #3
  4. M

    Guest

    On Jul 20, 12:49 am, "M" <> wrote:
    > When saving web pages, I'd like to strip out all CSS and just leave the raw
    > HTML intact. Some web developer toolbars will strip out the CSS, but for
    > some reason they won't let you save the page this way. Any tools that can do
    > this?


    What browser are you using?
    , Jul 20, 2007
    #4
  5. M

    dorayme Guest

    In article <tGSni.130529$xq1.5003@pd7urf1no>,
    "M" <> wrote:

    > "dorayme" <> wrote in message
    > news:...
    > > In article <bCMni.131315$NV3.476@pd7urf2no>,
    > > "M" <> wrote:

    >
    > > Give an example of one url you would like to do this to.

    >
    > Not sure why this is relevant but, hey, if it leads to something. . . As an
    > example:
    >
    > http://niftytutorials.com/basics/transform-your-photos-into-a-beautiful-mosaic
    > /1/
    >
    > Essentially, I just want to save barebones articles with any relevant
    > images. I don't want Google ads, sidebars, irrelevant banner images, forms,
    > search boxes, background images, scripts, etc.
    >
    > Sometimes the website is gracious enough to offer a print version which gets
    > rid of most of this stuff.
    >
    > I have a Notetab script which does most of what I want but wanted to see if
    > something else out there is better at it.
    >
    > M


    It is tricky to fashion a general facility to distinguish between
    relevant and irrelevant images as you can imagine. Best I can
    suggest is this, open in FF (equipped with free developer tools)
    and turn off all css and probably javascript too. Save as
    webpage. Open the saved in a browser. If too rich for you still,
    just delete the associated folder which contains all the images
    and other stuff, or inspect the folder and be rid things
    selectively - but this is not what you want to do). I am afraid
    there is nothing as intelligent as you for this job.

    --
    dorayme
    dorayme, Jul 20, 2007
    #5
  6. M

    M Guest

    <> wrote in message
    news:...
    > On Jul 20, 12:49 am, "M" <> wrote:
    >> When saving web pages, I'd like to strip out all CSS and just leave the
    >> raw
    >> HTML intact. Some web developer toolbars will strip out the CSS, but for
    >> some reason they won't let you save the page this way. Any tools that can
    >> do
    >> this?

    >
    > What browser are you using?


    Normally I use FF, but I'd use IE if there was a tool for it.

    M
    M, Jul 20, 2007
    #6
  7. M

    Guest

    , Jul 20, 2007
    #7
  8. M

    Guest

    On Jul 20, 12:49 am, "M" <> wrote:

    This may give you what you want.

    At the top of the Firefox browser select VIEW, PAGE STYLE, then NO
    STYLE.

    This will strip any web page that you're viewing of all CSS styling.
    , Jul 20, 2007
    #8
  9. M

    M Guest

    <> wrote in message
    news:...
    > On Jul 20, 12:49 am, "M" <> wrote:
    >
    > This may give you what you want.
    >
    > At the top of the Firefox browser select VIEW, PAGE STYLE, then NO
    > STYLE.
    >
    > This will strip any web page that you're viewing of all CSS styling.


    Yes, I know. However when you save the de-"css"-esified page, all the CSS is
    still saved with it. When you open the saved page again, all the CSS shows
    up again. It's the same with the web developer toolbars -- they let you
    turn off the CSS to view the page but they don't let you save the modified
    page. :(

    M
    M, Jul 20, 2007
    #9
  10. M

    Guest

    On Jul 20, 12:49 am, "M" <> wrote:

    Have a look at this one.

    Stylish
    https://addons.mozilla.org/en-US/firefox/addon/2108
    https://addons.mozilla.org/en-US/firefox/search?q=style&status=4
    http://dev.upian.com/hotlinks/tag/greasemonkey?tag=greasemonkey&n=4
    Firefox Extension for managing user styles - Stylish allows you to
    easily manage user styles for the application UI, all websites, or
    only certain websites. Stylish is better than using userChrome.css/
    userContent.css because styles are applied immediately instead of
    requiring a restart.
    Stylish is to CSS what Greasemonkey is to JavaScript. Stylish allows
    you to easily manage user styles for the application UI, all websites,
    or only certain websites. Stylish is better than using userChrome.css/
    userContent.css because styles are applied immediately instead of
    requiring a restart.
    , Jul 20, 2007
    #10
  11. M

    Susan Bugher Guest

    M wrote:

    > When saving web pages, I'd like to strip out all CSS and just leave the raw
    > HTML intact. Some web developer toolbars will strip out the CSS, but for
    > some reason they won't let you save the page this way. Any tools that can do
    > this?


    I have a hunch those toolbars don't "strip out" anything. ISTM more
    likely they just ignore it.

    copied from another post:

    "Essentially, I just want to save barebones articles with any relevant
    images. I don't want Google ads, sidebars, irrelevant banner images,
    forms, search boxes, background images, scripts, etc."

    Have you looked at Net Picker?

    Program: Net Picker
    Company: 100share.com
    Ware: (Freeware)
    http://www.netpicker.net/
    http://www.netpicker.net/netpicker.html

    "NetPicker allows you to select and save a portion of the web page by
    dragging it from your browser. NetPicker can save all the useful format
    like image, table or font style, and organize your collection in a vivid
    tree structure. You can even write down your comments in the original
    page at any time. you can drag each item node on the tree view to a
    new position for a better arrangement. Select NEW to insert a new item;
    NEW SUBITEM to add a subitem; Press F2 to edit the item title."

    Susan
    --
    Posted to alt.comp.freeware
    Search alt.comp.freeware (or read it online):
    http://www.google.com/advanced_group_search?q= group:alt.comp.freeware
    Pricelessware & ACF: http://www.pricelesswarehome.org
    Pricelessware: http://www.pricelessware.org (not maintained)
    Susan Bugher, Jul 20, 2007
    #11
  12. M

    dorayme Guest

    In article <fGWni.131205$xq1.97652@pd7urf1no>,
    "M" <> wrote:

    > <> wrote in message
    > news:...
    > > On Jul 20, 12:49 am, "M" <> wrote:
    > >
    > > This may give you what you want.
    > >
    > > At the top of the Firefox browser select VIEW, PAGE STYLE, then NO
    > > STYLE.
    > >
    > > This will strip any web page that you're viewing of all CSS styling.

    >
    > Yes, I know. However when you save the de-"css"-esified page, all the CSS is
    > still saved with it. When you open the saved page again, all the CSS shows
    > up again. It's the same with the web developer toolbars -- they let you
    > turn off the CSS to view the page but they don't let you save the modified
    > page. :(
    >
    > M


    See my post, css did not activate in the saved html using the
    technique I outlined.

    --
    dorayme
    dorayme, Jul 20, 2007
    #12
  13. M

    Guest

  14. M

    Jim Moe Guest

    M wrote:
    >
    >> Give an example of one url you would like to do this to.

    >
    > Not sure why this is relevant but, hey, if it leads to something. . . As an
    > example:
    >
    > http://niftytutorials.com/basics/transform-your-photos-into-a-beautiful-mosaic/1/
    >
    > Essentially, I just want to save barebones articles with any relevant
    > images. I don't want Google ads, sidebars, irrelevant banner images, forms,
    > search boxes, background images, scripts, etc.
    >

    CSS is the least of the problem, then. In most cases you can ignore
    anything between <style> and </style>, or style="inline_syling". Poof! No CSS!
    But the rest of the stuff? I doubt you'll find anything that can
    distinguish between a "desirable" image and an "undesirable" one.
    You can reduce the amount of crud received by the browser be using a
    filtering proxy like Squid.

    --
    jmm (hyphen) list (at) sohnen-moe (dot) com
    (Remove .AXSPAMGN for email)
    Jim Moe, Jul 20, 2007
    #14
  15. M

    Ben C Guest

    On 2007-07-19, M <> wrote:
    > "dorayme" <> wrote in message
    > news:...
    >> In article <bCMni.131315$NV3.476@pd7urf2no>,
    >> "M" <> wrote:

    >
    >> Give an example of one url you would like to do this to.

    >
    > Not sure why this is relevant but, hey, if it leads to something. . . As an
    > example:
    >
    > http://niftytutorials.com/basics/transform-your-photos-into-a-beautiful-mosaic/1/
    >
    > Essentially, I just want to save barebones articles with any relevant
    > images. I don't want Google ads, sidebars, irrelevant banner images, forms,
    > search boxes, background images, scripts, etc.
    >
    > Sometimes the website is gracious enough to offer a print version which gets
    > rid of most of this stuff.
    >
    > I have a Notetab script which does most of what I want but wanted to see if
    > something else out there is better at it.


    If you want to get a lot of stuff out of one particular site a script
    using curl and BeautifulSoup (which is a Python module) may be the way
    to go, especially if the content has class or id attributes in it that
    you can use to latch onto the bits you want.

    I use this method for TV listings and traffic news.
    Ben C, Jul 20, 2007
    #15
  16. M

    M Guest

    "dorayme" <> wrote in message
    news:...
    > In article <tGSni.130529$xq1.5003@pd7urf1no>,
    > "M" <> wrote:


    > Best I can
    > suggest is this, open in FF (equipped with free developer tools)
    > and turn off all css and probably javascript too. Save as
    > webpage.


    I did this (via the View | Page Style | No style) but FF still saves with
    the CSS intact. When you open the saved page, there is all the CSS again. Am
    I doing this wrong?

    Open the saved in a browser. If too rich for you still,
    > just delete the associated folder which contains all the images
    > and other stuff,


    What I have been doing combined with Notetab text editing and Scrapbook's
    DOM editor. It would be nice to have one easy to use tool to do all this. (I
    sometimes use Amaya for very busy pages. . .)

    M
    M, Jul 20, 2007
    #16
  17. M

    M Guest

    "Susan Bugher" <> wrote in message
    news:...
    >M wrote:
    >


    > Have you looked at Net Picker?


    I have used it. IIRC it converts everything to HTML 3.2. Also not sure that
    it would be any quicker than what I'm doing now, what with all the selective
    dragging and dropping.

    M
    M, Jul 20, 2007
    #17
  18. M

    M Guest

    <> wrote in message
    news:...
    > On Jul 20, 12:49 am, "M" <> wrote:
    >
    > Another to look at.
    >
    > CSS Spy


    I'm unclear from the description. Does it strip out CSS in bulk? And does it
    deal with scripts, iframes, ad tables, etc?

    M
    M, Jul 20, 2007
    #18
  19. M

    M Guest

    Regular expression evaluator [Was Re: Strip out CSS]

    I thank all for some of your suggestions but most of them deal with CSS and
    not the bigger issue of scripts, ads, irrelevant sidebars (tables or divs),
    etc. Maybe I'm coming at this the wrong way.

    As I mentioned, Notetab's script language does most stuff for me. In order
    to strip out CSS though I need to strip out phrases like:
    id="something"
    class="something"
    style="bunch of css attributes"

    I've been playing around with Notetab's (v4.95) regular expression search
    and replace but I can't seem to find a combination that finds the above
    expressions.

    Is there a regular expression program that will break this down for me? For
    example, the program RegEx Coach lets you enter your text, then test various
    regular expressions. The results are highlighted in real time in the text
    you entered.

    I need something that works IN REVERSE. i.e. I enter text, highlight the
    expression I want removed, then it tells me the regular expression needed to
    achieve that.

    Anything like that out there?

    (PS, yes, I know that removing either the stylesheet or the embedded styles
    will render any id and class calls irrelevant. However, there are times when
    I need them intact, so it would be nice to have the option. . .)

    M
    M, Jul 20, 2007
    #19
  20. M

    Ben C Guest

    Re: Regular expression evaluator [Was Re: Strip out CSS]

    On 2007-07-20, M <> wrote:
    > I thank all for some of your suggestions but most of them deal with CSS and
    > not the bigger issue of scripts, ads, irrelevant sidebars (tables or divs),
    > etc. Maybe I'm coming at this the wrong way.
    >
    > As I mentioned, Notetab's script language does most stuff for me. In order
    > to strip out CSS though I need to strip out phrases like:
    > id="something"
    > class="something"
    > style="bunch of css attributes"


    > I've been playing around with Notetab's (v4.95) regular expression search
    > and replace but I can't seem to find a combination that finds the above
    > expressions.


    (style|id|class)=".*?"

    is your basic regexp for that in PCRE, which I think is what Notetab
    uses. Not too difficult.

    It reads 'style or id or class followed by =" and then everything up to
    the next "'

    > Is there a regular expression program that will break this down for me? For
    > example, the program RegEx Coach lets you enter your text, then test various
    > regular expressions. The results are highlighted in real time in the text
    > you entered.
    >
    > I need something that works IN REVERSE. i.e. I enter text, highlight the
    > expression I want removed, then it tells me the regular expression needed to
    > achieve that.


    That's very difficult for the program to know-- there are a vast number
    of ways to match a given bit of highlighted text, how is the program
    supposed to know which of them you want?

    > Anything like that out there?


    Honestly it's easier just to read the manual. The Python docs have a
    very clear explanation of PCRE syntax.

    http://docs.python.org/lib/re-syntax.html
    Ben C, Jul 20, 2007
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?Utf-8?B?bmV3Ymll?=

    strip out html

    =?Utf-8?B?bmV3Ymll?=, Jan 13, 2004, in forum: ASP .Net
    Replies:
    2
    Views:
    637
    Curt_C [MVP]
    Jan 13, 2004
  2. Steve Bergman
    Replies:
    6
    Views:
    384
    Diez B. Roggisch
    Sep 26, 2005
  3. jcf
    Replies:
    16
    Views:
    535
    Martijn
    Jul 20, 2005
  4. Aquila
    Replies:
    35
    Views:
    437
    Mathieu Bouchard
    Mar 31, 2005
  5. yelipolok
    Replies:
    4
    Views:
    240
    John W. Krahn
    Jan 27, 2010
Loading...

Share This Page