P
Peter Rilling
Hi,
I have uploaded a library that, I hope, is better than using CDOSYS/CDONTS
for handling MHTML downloads. Right now the infrastructure for downloading
is in place, but I have not gotten the persistent system in place, but I
will be working on that shortly. If nothing else, this library provides a
simple way to download a page and all its referenced resources.
Here are some of the features that it currently supports:
** Downloads a page and all its referenced resources.
** Downloads them recursively. For instance, a CSS will be downloaded if
referenced by a page, but if that CSS references any images, those will also
be downloaded.
** Only one instance of a resource is downloaded regardless of how many
times they are referenced. The instance is still associated with all the
parent pages that contain it.
** Currently downloadable types include: audio, images, css, html, xml,
scripts.
** Current types of references that are processed include:
background/foreground images (both html and css), css, background sound,
JavaScript, iframe and framesets, xml islands.
** Cool demo app that shows the downloaded content allowing you to sort by
type or referenced relationship.
The following are issues that are on my agenda:
** Update the URLs since they will be eventually viewed locally.
** Support saving to various forms, including the a single mhtml file.
** Support loading of single mhtml files.
** Fix bugs.
Now, what I need from this community is to put this code through its paces.
Find any bugs (including URLs that are not processed correctly). I also
like constructive criticism so any comments about my architecture would be
great, keeping in mind this is pre-alpha so it is far from perfect.
You can download it at http://www.codeproject.com/useritems/mhtmllib.asp.
I have uploaded a library that, I hope, is better than using CDOSYS/CDONTS
for handling MHTML downloads. Right now the infrastructure for downloading
is in place, but I have not gotten the persistent system in place, but I
will be working on that shortly. If nothing else, this library provides a
simple way to download a page and all its referenced resources.
Here are some of the features that it currently supports:
** Downloads a page and all its referenced resources.
** Downloads them recursively. For instance, a CSS will be downloaded if
referenced by a page, but if that CSS references any images, those will also
be downloaded.
** Only one instance of a resource is downloaded regardless of how many
times they are referenced. The instance is still associated with all the
parent pages that contain it.
** Currently downloadable types include: audio, images, css, html, xml,
scripts.
** Current types of references that are processed include:
background/foreground images (both html and css), css, background sound,
JavaScript, iframe and framesets, xml islands.
** Cool demo app that shows the downloaded content allowing you to sort by
type or referenced relationship.
The following are issues that are on my agenda:
** Update the URLs since they will be eventually viewed locally.
** Support saving to various forms, including the a single mhtml file.
** Support loading of single mhtml files.
** Fix bugs.
Now, what I need from this community is to put this code through its paces.
Find any bugs (including URLs that are not processed correctly). I also
like constructive criticism so any comments about my architecture would be
great, keeping in mind this is pre-alpha so it is far from perfect.
You can download it at http://www.codeproject.com/useritems/mhtmllib.asp.