regexp for parsing image filenames out of html code

Discussion in 'Perl Misc' started by Georg Daniel Vassilopulos, Aug 30, 2003.

  1. Hello!

    I have a lot of html files and I would like to get all image filenames.
    The problem is it is not always valid xml.
    So I have to use regexps.

    The imagetags can be following format:
    <img src="/images/pic1.png">
    or
    <img src='/images/pic1.png'>

    the images can be *.png *.gif *.jpg *.bmp

    What is the regexp of choice?

    Can anyone help?

    Thanks a lot!
    Georg

    Georg Daniel Vassilopulos, Aug 30, 2003
    #1
    1. Advertising

  2. Georg Daniel Vassilopulos <> wrote:

    > I have a lot of html files and I would like to get all image filenames.

    ^^^^
    > The problem is it is not always valid xml.

    ^^^

    So which is it, HTML or XML?


    > So I have to use regexps.



    Then it will work correctly sometimes and not work correctly sometimes...


    > The imagetags can be following format:
    ><img src="/images/pic1.png">
    > or
    ><img src='/images/pic1.png'>



    Those look like valid HTML and valid XML, what is invalide about
    your *ML?

    These are also valid *ML:

    <img src = "/images/pic1.png">

    <img
    src
    =
    "/images/pic1.png"
    >



    > What is the regexp of choice?



    There is never a regex of choice for a job not suited for regexes
    in the first place.


    m/<img src=("[^"]+"|'[^']+')/g


    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
    Tad McClellan, Aug 30, 2003
    #2
    1. Advertising

  3. On Sat, Aug 30, Tad McClellan inscribed on the eternal scroll:

    > Georg Daniel Vassilopulos <> wrote:
    >
    > > I have a lot of html files and I would like to get all image filenames.

    > ^^^^
    > > The problem is it is not always valid xml.

    > ^^^
    >
    > So which is it, HTML or XML?


    Or maybe XHTML...

    > > So I have to use regexps.


    Can we say "petitio principii"? It used to be called "begging the
    question" in English, until that phrase was rendered worthless by
    folks who didn't know that it meant...

    > Then it will work correctly sometimes and not work correctly sometimes...


    But isn't that inevitable if you propose to parse material which is
    allowed to contain errors? OT but: if you're doing that with
    XML-based markup, then you're already in a state if sin.

    > > The imagetags can be following format:
    > ><img src="/images/pic1.png">
    > > or
    > ><img src='/images/pic1.png'>

    >
    > Those look like valid HTML and valid XML,


    OK; but they're not, however, acceptable as XHTML. (Have to be
    <img ... /> )

    > There is never a regex of choice for a job not suited for regexes
    > in the first place.


    Seems a fair enough comment to me.

    But if you're hoping (or "if one's hoping") to recover from syntax
    errors - and if one's entitled to assume the much more restrictive
    syntax of XML (rather than the bizzare backwaters of SGML), I'm not
    sure what better approach to recommend. XML-conforming software is
    mandated to deliver an error report and bale out when errors are
    encountered, surely? So then what...?

    all the best

    --
    >> Es handelt sich also um ein Zuklappmenu.

    > Mag sein. Aber ich seh da _gar kein_ Menue.

    Weil es zugeklappt ist.
    Alan J. Flavell, Aug 30, 2003
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. B.J.
    Replies:
    4
    Views:
    726
    Toby Inkster
    Apr 23, 2005
  2. hokiegal99

    Re: regexp and filenames

    hokiegal99, Jul 9, 2003, in forum: Python
    Replies:
    1
    Views:
    312
  3. Frantisek Fuka
    Replies:
    2
    Views:
    257
    Frantisek Fuka
    Feb 16, 2004
  4. Peter Bailey

    Regex parsing against filenames

    Peter Bailey, Oct 13, 2006, in forum: Ruby
    Replies:
    3
    Views:
    137
    Hugh Sasse
    Oct 13, 2006
  5. Joao Silva
    Replies:
    16
    Views:
    340
    7stud --
    Aug 21, 2009
Loading...

Share This Page