regexp for parsing image filenames out of html code

  • Thread starter Georg Daniel Vassilopulos
  • Start date
G

Georg Daniel Vassilopulos

Hello!

I have a lot of html files and I would like to get all image filenames.
The problem is it is not always valid xml.
So I have to use regexps.

The imagetags can be following format:
<img src="/images/pic1.png">
or
<img src='/images/pic1.png'>

the images can be *.png *.gif *.jpg *.bmp

What is the regexp of choice?

Can anyone help?

Thanks a lot!
Georg

(e-mail address removed)
 
T

Tad McClellan

Georg Daniel Vassilopulos said:
I have a lot of html files and I would like to get all image filenames. ^^^^
The problem is it is not always valid xml.
^^^

So which is it, HTML or XML?

So I have to use regexps.


Then it will work correctly sometimes and not work correctly sometimes...

The imagetags can be following format:
<img src="/images/pic1.png">
or
<img src='/images/pic1.png'>


Those look like valid HTML and valid XML, what is invalide about
your *ML?

These are also valid *ML:

<img src = "/images/pic1.png">

<img
src
=
"/images/pic1.png"
What is the regexp of choice?


There is never a regex of choice for a job not suited for regexes
in the first place.


m/<img src=("[^"]+"|'[^']+')/g
 
A

Alan J. Flavell

^^^

So which is it, HTML or XML?

Or maybe XHTML...

Can we say "petitio principii"? It used to be called "begging the
question" in English, until that phrase was rendered worthless by
folks who didn't know that it meant...
Then it will work correctly sometimes and not work correctly sometimes...

But isn't that inevitable if you propose to parse material which is
allowed to contain errors? OT but: if you're doing that with
XML-based markup, then you're already in a state if sin.
Those look like valid HTML and valid XML,

OK; but they're not, however, acceptable as XHTML. (Have to be
There is never a regex of choice for a job not suited for regexes
in the first place.

Seems a fair enough comment to me.

But if you're hoping (or "if one's hoping") to recover from syntax
errors - and if one's entitled to assume the much more restrictive
syntax of XML (rather than the bizzare backwaters of SGML), I'm not
sure what better approach to recommend. XML-conforming software is
mandated to deliver an error report and bale out when errors are
encountered, surely? So then what...?

all the best

--
Mag sein. Aber ich seh da _gar kein_ Menue.
Weil es zugeklappt ist.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,754
Messages
2,569,527
Members
44,999
Latest member
MakersCBDGummiesReview

Latest Threads

Top