cleaning up classes and ids in large site

windandwaves · Feb 16, 2007

Hi Folk

I am managing a rather large site. Over time, the css file has gone
from a few lines to 15Kb of styles. I would really like to clean this
up and simplify it. Do you know of any systems / applications (I use
PHP for the site) that I can use to

a. find all the IDs and classes used on the site.
b. check if these classes and IDs are listed in the css
c. find any classes and IDs listed in the css that are no longer used
on the site

My next job would then be to simplify the css, but that is more of a
manual task. Having said that, it would be great to see where IDs and
classes are used so that I can make decisions based on that and order
my css in such a way that regular used classes are listed first,
etc....

Any recommendations greatly appreciated.

Cheers

Nicolaas

dorayme · Feb 16, 2007

"windandwaves said:
Hi Folk

I am managing a rather large site. Over time, the css file has gone
from a few lines to 15Kb of styles. I would really like to clean this
up and simplify it. Do you know of any systems / applications (I use
PHP for the site) that I can use to

a. find all the IDs and classes used on the site.
b. check if these classes and IDs are listed in the css
c. find any classes and IDs listed in the css that are no longer used
on the site

This is very easy technically. This is what I do: search for
instances of ids and classes in the html files by using Search
and Replace functions that come with any decent text editor. It
should of course have search over whole folders (that includses
all files in sub folders). If none are found, I search the css
files and delete those ids or classes or otherwise attend to the
matter. It really does not much matter if you conduct the search
over the whole website folder and some of the found references
are to css instances, you just see if there are references in
both.

If you are using php includes then of course, you will search the
includes folder as well.

Good luck.

mbstevens · Feb 16, 2007

a. find all the IDs and classes used on the site.
b. check if these classes and IDs are listed in the css
c. find any classes and IDs listed in the css that are no longer used
on the site

I know of nothing ready-made, but it should be reasonably easy to
program in either Perl or Python.

shimmyshack · Feb 16, 2007

I know of nothing ready-made, but it should be reasonably easy to
program in either Perl or Python.

if i were you I would write a small reg exp that goes and grabs
classes from the css files, put them inside a javascript object/array
and download the behaviour.js library and include the object and class
in a script on each page.
Using XHR to send back the element id's of the css styles array which
you save crossing them off your list, (and out of the styles js array/
object)
This way your clients do the work in their browsers, reaching every
nook and cranny of your monster site in a "period of time" at the end
of which the remaining styles can be safely deleted from the
stylesheet.
As well as this you provide an easy togglable way to repeat the
process later, for any site.

Adrienne Boswell · Feb 16, 2007

Hi Folk

I am managing a rather large site. Over time, the css file has gone
from a few lines to 15Kb of styles. I would really like to clean this
up and simplify it. Do you know of any systems / applications (I use
PHP for the site) that I can use to

a. find all the IDs and classes used on the site.
b. check if these classes and IDs are listed in the css
c. find any classes and IDs listed in the css that are no longer used
on the site

My next job would then be to simplify the css, but that is more of a
manual task. Having said that, it would be great to see where IDs and
classes are used so that I can make decisions based on that and order
my css in such a way that regular used classes are listed first,
etc....

Any recommendations greatly appreciated.

Cheers

Nicolaas

You could try TopStyle - it has such a feature -
<http://www.bradsoft.com/topstyle/>.

dorayme · Feb 16, 2007

"shimmyshack said:
if i were you I would write a small reg exp that goes and grabs
classes from the css files, put them inside a javascript object/array
and download the behaviour.js library and include the object and class
in a script on each page.

Would you now! I gave a perfectly simple way to do the job and
yet you still would do this.

Ben C · Feb 16, 2007

This is very easy technically. This is what I do: search for
instances of ids and classes in the html files by using Search
and Replace functions that come with any decent text editor. It
should of course have search over whole folders (that includses
all files in sub folders). If none are found, I search the css
files and delete those ids or classes or otherwise attend to the
matter. It really does not much matter if you conduct the search
over the whole website folder and some of the found references
are to css instances, you just see if there are references in
both.

If you are using php includes then of course, you will search the
includes folder as well.

A less repetitive variant of the same idea:

1. Find all the occurrences in the html and save them to a file, using,
e.g. "grep -oP '(class|id)=".*?"' *.html > h"
2. Find all the occurrences in the css and save them to another file,
e.g. "grep -oP '[#\.](.*?)\s' tutorial.css > c". It may be your
editor can do these jobs just as well as grep.
3. Edit h and c to clean them up (yes the grep command could have been
cleverer, but this way is easier). And there will probably be a few
"false positives".
4. Do a unique sort on each of h and c. The command "sort h | uniq > hs;
mv hs h" for example, or I would use ":%!sort | uniq" in the editor.
5. Diff the files h and c. You can quickly see what classes/ids are in
one but not in the other.

Toby A Inkster · Feb 16, 2007

shimmyshack said:
Using XHR to send back the element id's of the css styles array which
you save crossing them off your list, (and out of the styles js array/
object)

Quite a nifty idea, and in certain cases more reliable than a regexp-based
search. I'm thinking specifically of something like:

<?php
function startTag ($e, $attr=array())
{
$r = '<'.strtolower($e);
foreach ($attr as $k=>$v)
$r .= ' '.strtolower($k).'="'.htmlentities($v).'"';
$r .= '>';
return $r;
}
function endTag ($e, $attr=array())
{
return '</'.strtolower($e).'>';
}

print startTag('A',
array(
'href'=>'http://www.google.co.uk/',
'class'=>'external_link',
)
)
.'Google'.
endTag('A',
array(
'href'=>'http://www.google.co.uk/',
'class'=>'external_link',
)
);

?>

--
Toby A Inkster BSc (Hons) ARCS
Contact Me ~ http://tobyinkster.co.uk/contact
Geek of ~ HTML/SQL/Perl/PHP/Python*/Apache/Linux

* = I'm getting there!

shimmyshack · Feb 16, 2007

Would you now! I gave a perfectly simple way to do the job and
yet you still would do this.

sorry to cause you so much offence dorayne, but I felt your idea
didn't catch dynamic building of css using client side code, or css
coming from a database query - in fact any complex web application
which is not completely abstracted, and only covers plain text files
in a ready built state. Even php files might have some css styles as
echo 'border:1px #'.(($i%2==0)?'ccc':'ebebeb').' solid;';
Using users as your engine gives you a distributed solution, which you
can also use to check links, validate code, and many other things.
It is more reliable because it checks the final application when it is
running in all different user agents, and therefore gets the true
picture, it is a bit different to real life where you make the cake
with eggs, here we make the mixing bowl and throw it, a hen and some
grain at the client. If they have a compliant user-agent their cake
will taste nicer.

Andy Dingley · Feb 16, 2007

This is very easy technically. This is what I do: search for
instances of ids and classes in the html files by using Search
and Replace functions that come with any decent text editor.

That sounds like hard work!

I do it in Python, as there are a couple of decent HTML parsers in
existence for it: the event-driven HTMLParser is in the box and would
find class or id attributes very easily. BeautifulSoup is a separate
install, but rather friendlier to use for screen-scraping in general.
Trivial use of dictionaries (Parseltongue for associative arrays or
hashes) counts how many duplicate ids you have in each file.

I wouldn't like to do it in an editor, even though I have powerful
ones to hand. Is emacs and lisp the favoured choice on Mars?

Without Python I'd do it in a shell easily enough, but I'd use grep to
match things and it might be confused when parsing HTML tutorials
where the string "class=foo" occurs in the content, but not as an
attribute.

dorayme · Feb 16, 2007

"shimmyshack said:
sorry to cause you so much offence dorayne, but I felt your idea
didn't catch dynamic building of css using client side code, or css
coming from a database query - in fact any complex web application
which is not completely abstracted, and only covers plain text files
in a ready built state. Even php files might have some css styles as
echo 'border:1px #'.(($i%2==0)?'ccc':'ebebeb').' solid;';
Using users as your engine gives you a distributed solution, which you
can also use to check links, validate code, and many other things.
It is more reliable because it checks the final application when it is
running in all different user agents, and therefore gets the true
picture, it is a bit different to real life where you make the cake
with eggs, here we make the mixing bowl and throw it, a hen and some
grain at the client. If they have a compliant user-agent their cake
will taste nicer.

God almighty! If the task is so much more complex than was
conveyed by the OP then sure, maybe what I do is too simple
minded for words. I can scarcely believe what you have just
said... php, dynamic, css, database, client this and that, shake
it all up and bake it for 2 hours at 250 C... <g>

I was not really offended, just miffed at the general silence or
off key replies whenever I propose S & R techniques. I am
beginning to think that a lot of people simply do not know the
capabilities of good S & R engines that come with good text
editors.

Yes, I know many of the regulars here know all about these things
(they know too much for their own good in my opinion) but perhaps
many many others have less of a clue?

Perhaps I have been spoilt with BBEdit, a fine Mac editor with
fine S & R capabilities including GREP. Perhaps Windows editors
are not generally as good or lack an absolutley crucial feature
that is required for the technique I proposed, namely to Search
over a whole folder (including sub folders) at the press of a
mouse in seconds and, equally crucial, to issue a report of all
found instances and in that report to be able to go to those
instances at their source at the press of a mouse...

But for what the OP seemed to want, not even GREP needed to be
used.

dorayme · Feb 16, 2007

"Andy Dingley said:
That sounds like hard work!

Oh well, perhaps there are things I am missing. But I suspect -
perhaps unfairly - that there is a Force of Unnecessary
Complexity secretly acting on earthlings.

I have done these things without much work at all, see my
previous reply.

Perhaps I need to explain more? The OP had a problem we all
confront. Simple enough in most cases no matter how big the site.

Suppose you want to know if a #navMonday css instruction can be
safely removed from your CSS sheets. All you have to do is search
for ' id="navMonday" ' over the folder that contains all the
website files. It is a button press. What is hard about it? If
the search comes up with nothing, you search for '#navMonday' and
all instances in your css will show up and you delete them by
hand because they are there at your fingertips (see my previous
post). You can, when you get good at the S & R function be a
little cleverer and get any class (as well as or instead of id)
of this name. You can replace instances with nothing and so fix
all up automatically. But for most situations it is so quick and
easy, why bother your head, it is the finding not the deleting
that takes time. I doubt many would need fancy doodle perling and
reg exping and generally tooling about in some macho V8 flexing
about...

Come on guys, get real!

Ben C · Feb 16, 2007

Oh well, perhaps there are things I am missing. But I suspect -
perhaps unfairly - that there is a Force of Unnecessary
Complexity secretly acting on earthlings.

I have done these things without much work at all, see my
previous reply.

Perhaps I need to explain more? The OP had a problem we all
confront. Simple enough in most cases no matter how big the site.

Suppose you want to know if a #navMonday css instruction can be
safely removed from your CSS sheets. All you have to do is search
for ' id="navMonday" ' over the folder that contains all the
website files. It is a button press. What is hard about it? If
the search comes up with nothing, you search for '#navMonday' and
all instances in your css will show up and you delete them by
hand because they are there at your fingertips (see my previous
post). You can, when you get good at the S & R function be a
little cleverer and get any class (as well as or instead of id)
of this name. You can replace instances with nothing and so fix
all up automatically. But for most situations it is so quick and
easy, why bother your head, it is the finding not the deleting
that takes time. I doubt many would need fancy doodle perling and
reg exping and generally tooling about in some macho V8 flexing
about...

The only detraction with your way of doing it is that don't you have to
manually do the search and replace for each class or id that you're
concerned with? Ideally one would like to get the complete list of dead
classes and ids in one go, and then maybe also delete them in one go.
It's quite interesting to see what different people's approaches are to
this.

Of course it depends how many there are to deal with. If it's a huge
site with thousands of them, or if I found myself making a habit of it,
I'd write a Python program. If there were about 20 or 30, I'd mess about
with grep and the editor in unnecessarily complex ways, get it all wrong
several times, take longer but have more fun; if there were only about 4
I'd do simple search and replacing.

Andy Dingley · Feb 16, 2007

Suppose you want to know if a #navMonday css instruction can be
safely removed from your CSS sheets. All you have to do is search
for ' id="navMonday" ' over the folder that contains all the
website files.

That's OK for one class or id, but how about the ful llist?

With an editor, how do you produce the full list? You'd need an editor
and regex to match everything that _wasn't_ an id and then strip it.
That gets tricky.

mbstevens · Feb 16, 2007

Come on guys, get real!

If it will take you more time to do by hand
than it would to write a script, go for the script.
Time spent learning to script does not count!
Don't you want that bubbly feeling of absolute power
over your computing environment?

dorayme · Feb 17, 2007

Ben C said:
.....

The only detraction with your way of doing it is that don't you have to
manually do the search and replace for each class or id that you're
concerned with? Ideally one would like to get the complete list of dead
classes and ids in one go, and then maybe also delete them in one go.
It's quite interesting to see what different people's approaches are to
this.

Of course it depends how many there are to deal with. If it's a huge
site with thousands of them, or if I found myself making a habit of it,
I'd write a Python program. If there were about 20 or 30, I'd mess about
with grep and the editor in unnecessarily complex ways, get it all wrong
several times, take longer but have more fun; if there were only about 4
I'd do simple search and replacing.

You can do concatenated searches (take a while now to compose it)
and then at the press of a button (almost instant results on a
fast machine) in a S & R function (yes, thousands of instances
all at once in one go, gone like magic). Using nothing but a
built in S & R like in the free TextWrangler or shareware BBEdit.

The real fact of the matter is if it was your own built site, you
simply would know that most of the classes are being used. You
cast your eye on the ones you are not sure of and search for
them.

Trying to get turn key solutions to things is often just plain
unrealistic in terms of time and effort (counter-intuitively).

I admit this: if you get landed with someone else's site, very
big and very complex and you have few bearings, maybe something
very special to automate the thing would be very nice, almost
necessary...

dorayme · Feb 17, 2007

Andy Dingley said:
That's OK for one class or id, but how about the ful llist?

Yes, see my reply to Ben. The special case of a need to check
every id and class is far fetched but interesting. I leave you
with that one. What would I know off such things? But
realistically, think about it, you will straight off be able to
see many ids and classes and things that are not worth searching
for by simple inspection of a few pages of the html, they are
clearly used. Earthlings are fast and good at knowing after brief
inspections what is not worth bothering about.

With an editor, how do you produce the full list? You'd need an editor
and regex to match everything that _wasn't_ an id and then strip it.
That gets tricky.

Am I missing something big here? I thought it was about something
in a css sheet that was uselessly there because nowhere is it
used in the html. So if it is not there in the html, nothing
needs to be done to the html. As for the css, well, these are
just not going to be thousands of pages, not on earth at least,
it does not happen. And there will not be that many of them. And
if some are missed in what I am proposing, they will do no harm
and will be picked up in later sweeps.

I am not trying to spoil a party here, just trying to inject a
bit of common sense into it.

shimmyshack · Feb 17, 2007

Oh well, perhaps there are things I am missing. But I suspect -
perhaps unfairly - that there is a Force of Unnecessary
Complexity secretly acting on earthlings.

I have done these things without much work at all, see my
previous reply.

Perhaps I need to explain more? The OP had a problem we all
confront. Simple enough in most cases no matter how big the site.

Suppose you want to know if a #navMonday css instruction can be
safely removed from your CSS sheets. All you have to do is search
for ' id="navMonday" ' over the folder that contains all the
website files. It is a button press. What is hard about it? If
the search comes up with nothing, you search for '#navMonday' and
all instances in your css will show up and you delete them by
hand because they are there at your fingertips (see my previous
post). You can, when you get good at the S & R function be a
little cleverer and get any class (as well as or instead of id)
of this name. You can replace instances with nothing and so fix
all up automatically. But for most situations it is so quick and
easy, why bother your head, it is the finding not the deleting
that takes time. I doubt many would need fancy doodle perling and
reg exping and generally tooling about in some macho V8 flexing
about...

Come on guys, get real!

I understand your point, and for a site with a few hundred pages of
html and perhaps 200 classes, finding "orphaned" classes by hand would
be possible, if not tedious.
But if your site uses any kind of DOM manipulation, then even screen
scraping won't get you knowledge of what classes come into play.
Next time you're using gmail take a look at some of those XHR
responses, you will see not only JSON but also styles (albeit in an
unorthodox way!) pushing their way into the DOM.
Using a greasemonkey extension to parse the DOM for classes wouldnt be
very much harder than using an editor's regular expression feature,
you are essential writing a macro to be performed on every page
request, or at various states of the application.
However using greasemonkey would be require it to be installed on your
system, so just inserting the code into the main web app provides a
way to have your users check your site.
You could repeatedly audit your won site every month for broken links
or whatever by simply uncommenting a single line of javascript.
Although initially complex, the technique is scalable and very fast, I
used to use a similar script back in 2004 which ran on a major site I
wrote, if a user-agent came across a broken link, it would send the
data to my home webserver, which used COM to open and navigate IE to
the spot, and highlight the link. It was a headache to get it al up
and running, but I got a real buzz from mistakes in the site, and have
been in love with weird solutions like this ever since! (I used to
play with dreamweavers site search and replace regular expression
feature until it corrupted that site in the odd place here and their,
since then it is more fun and admittedly more painful to write my own
solution and at least then if it fails I can own it!)
glad you didnt take offence!,m

dorayme · Feb 17, 2007

mbstevens said:
If it will take you more time to do by hand
than it would to write a script, go for the script.
Time spent learning to script does not count!
Don't you want that bubbly feeling of absolute power
over your computing environment?

Confession time: I can labour over a GREP pattern to do in an
hour what I could do without it in a minute or two. But do you
think I would actually recommend such madness to others?

mbstevens · Feb 17, 2007

Confession time: I can labour over a GREP pattern to do in an
hour what I could do without it in a minute or two. But do you
think I would actually recommend such madness to others?

Oh, goodness, not grep! A bit of Perl or Python that have modules for
HTML parsing and also have regular expressions integrated into a full
bodied language. (OK, a module for regexes in Python, but that's no
problem.)

Knowing you, I'm pretty sure that if you spent a week learning either,
paying particular attention to the regular expressions and HTML parser
modules, you would be able to write the program in about an hour.

I suggest Python if it is your first scripting language. Perl is easier
for people with C background. (Ducks as people from all over the net
throw brickbats.)

cleaning up Frontpage code	24	Jan 16, 2010
What should I do Before I give up programming?	6	Jan 14, 2023
CSS classes and IDs	2	Nov 29, 2006
Reorganizing Large Web Site	2	Jan 8, 2012
First steps in setting up VSCode to work with Python.	2	Mar 13, 2023
Naming files, classes, ids...	8	Nov 10, 2005
still confused on Classes and IDs	18	Aug 4, 2004
question concerning pipes and large strings	29	Jun 19, 2012

cleaning up classes and ids in large site

windandwaves

dorayme

mbstevens

shimmyshack

Adrienne Boswell

dorayme

Ben C

Toby A Inkster

shimmyshack

Andy Dingley

dorayme

dorayme

Ben C

Andy Dingley

mbstevens

dorayme

dorayme

shimmyshack

dorayme

mbstevens

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads