Replace and inserting strings within .txt files with the use of regex

Discussion in 'Python' started by Íßêïò, Aug 8, 2010.

  1. Íßêïò

    Íßêïò Guest

    Hello dear Pythoneers,

    I have over 500 .php web pages in various subfolders under 'data'
    folder that i have to rename to .html and and ditch the '<?' and '?>'
    tages from within and also insert a very first line of <!-- id -->
    where id must be an identification unique number of every page for
    counter tracking purposes. ONly pure html code must be left.

    I before find otu Python used php and now iam switching to templates +
    python solution so i ahve to change each and every page.

    I don't know how to handle such a big data replacing problem and
    cannot play with fire because those 500 pages are my cleints pages and
    data of those filesjust cannot be messes up.

    Can you provide to me a script please that is able of performing an
    automatic way of such a page content replacing?

    Thanks a million!
    Íßêïò, Aug 8, 2010
    #1
    1. Advertising

  2. Íßêïò

    rantingrick Guest

    Re: Replace and inserting strings within .txt files with the use ofregex

    On Aug 7, 7:20 pm, Íßêïò <> wrote:
    > Hello dear Pythoneers,


    I prefer Pythonista, but anywho..

    > I have over 500 .php web pages in various subfolders under 'data'
    > folder that i have to rename to .html


    import os
    os.rename(old, new)

    > and and ditch the '<?' and '?>' tages from within


    path = 'some/valid/path'
    f = open(path, 'r')
    data = f.read()
    f.close()
    data.replace('<?', '')
    data.replace('?>', '')

    > and also insert a very first line of <!-- id -->
    > where id must be an identification unique number of every page for
    > counter tracking purposes.


    comment = "<!-- %s -->"%(idnum)
    data.insert(idx, comment)

    > ONly pure html code must be left.


    Well then don't F up! However judging from the amount of typos in this
    post i would suggest you do some major testing!

    > I don't know how to handle such a big data replacing problem and
    > cannot play with fire because those 500 pages are my cleints pages and
    > data of those files just cannot be messes up.


    Better do some serous testing first, or (if you have enough disc
    space ) create copies instead!

    > Can you provide to me a script please that is able of performing an
    > automatic way of such a page content replacing?


    This is very basic stuff and the fine manual is free you know. But how
    much are you willing to pay?
    rantingrick, Aug 8, 2010
    #2
    1. Advertising

  3. Íßêïò

    MRAB Guest

    Re: Replace and inserting strings within .txt files with the useof regex

    rantingrick wrote:
    > On Aug 7, 7:20 pm, Íßêïò <> wrote:
    >> Hello dear Pythoneers,

    >
    > I prefer Pythonista, but anywho..
    >
    >> I have over 500 .php web pages in various subfolders under 'data'
    >> folder that i have to rename to .html

    >
    > import os
    > os.rename(old, new)
    >
    >> and and ditch the '<?' and '?>' tages from within

    >
    > path = 'some/valid/path'
    > f = open(path, 'r')
    > data = f.read()
    > f.close()
    > data.replace('<?', '')
    > data.replace('?>', '')
    >

    That should be:

    data = data.replace('<?', '')
    data = data.replace('?>', '')

    >> and also insert a very first line of <!-- id -->
    >> where id must be an identification unique number of every page for
    >> counter tracking purposes.

    >
    > comment = "<!-- %s -->"%(idnum)
    > data.insert(idx, comment)
    >

    Strings don't have an 'insert' method!

    >> ONly pure html code must be left.

    >
    > Well then don't F up! However judging from the amount of typos in this
    > post i would suggest you do some major testing!
    >
    >> I don't know how to handle such a big data replacing problem and
    >> cannot play with fire because those 500 pages are my cleints pages and
    >> data of those files just cannot be messes up.

    >
    > Better do some serous testing first, or (if you have enough disc
    > space ) create copies instead!
    >
    >> Can you provide to me a script please that is able of performing an
    >> automatic way of such a page content replacing?

    >
    > This is very basic stuff and the fine manual is free you know. But how
    > much are you willing to pay?
    MRAB, Aug 8, 2010
    #3
  4. Íßêïò

    Íßêïò Guest

    Re: Replace and inserting strings within .txt files with the use ofregex

    # rename ALL php files to html in every subfolder of the folder 'data'
    os.rename('*.php', '*.html') # how to tell python to
    rename ALL php files to html to ALL subfolder under 'data' ?

    # current path of the file to be processed
    path = './data' # this must be somehow in a loop i feel
    that read every file of every subfolder

    # open an html file for reading
    f = open(path, 'rw')
    # read the contents of the whole file
    data = f.read()

    # replace all php tags with empty string
    data = data.replace('<?', '')
    data = data.replace('?>', '')

    # write replaced data to file
    data = f.write()

    # insert an increasing unique integer number at the very first line
    of every html file processing
    comment = "<!-- %s -->"%(idnum) # how will the number
    change here an increased by one file after file?
    f = f.close()

    Please help i'm new to python an apart from syntx its a logic problem
    as well and needs experience.
    Íßêïò, Aug 8, 2010
    #4
  5. Íßêïò

    John S Guest

    Re: Replace and inserting strings within .txt files with the use ofregex

    On Aug 7, 8:20 pm, Íßêïò <> wrote:
    > Hello dear Pythoneers,
    >
    > I have over 500 .php web pages in various subfolders under 'data'
    > folder that i have to rename to .html and and ditch the '<?' and '?>'
    > tages from within and also insert a very first line of <!-- id -->
    > where id must be an identification unique number of every page for
    > counter tracking purposes. ONly pure html code must be left.
    >
    > I before find otu Python used php and now iam switching to templates +
    > python solution so i ahve to change each and every page.
    >
    > I don't know how to handle such a big data replacing problem and
    > cannot play with fire because those 500 pages are my cleints pages and
    > data of those filesjust cannot be messes up.
    >
    > Can you provide to me a script please that is able of performing an
    > automatic way of such a page content replacing?
    >
    > Thanks a million!


    If the 500 web pages are PHP only in the sense that there is only one
    pair of <? ?> tags in each file, surrounding the entire content, then
    what you ask for is doable.

    from os.path import join
    import os

    id = 1 # id number
    for currdir,files,dirs in os.walk('data'):
    for f in files:
    if f.endswith('php'):
    source_file_name = join(currdir,f) # get abs path to
    filename
    source_file = open(source_file_name)
    source_contents = source_file.read() # read contents of
    PHP file
    source_file.close()

    # replace tags
    source_contents = source_contents.replace('<%','')
    source_contents = source_contents.replace('%>','')

    # add ID
    source_contents = ( '<!-- %d -->' % id ) + source_contents
    id += 1

    # create new file with .html extension
    source_file_name =
    source_file_name.replace('.php','.html')
    dest_file = open(source_file_name,'w')
    dest_file.write(source_contents) # write contents
    dest_file.close()

    Note: error checking left out for clarity.

    On the other hand, if your 500 web pages contain embedded PHP
    variables or logic, you have a big job ahead. Django templates and PHP
    are two different languages for embedding data and logic in web pages.
    Converting a project from PHP to Django involves more than renaming
    the template files and deleting "<?" and friends.

    For example, here is a snippet of PHP which checks which browser is
    viewing the page:

    <?php
    if (strpos($_SERVER['HTTP_USER_AGENT'], 'MSIE') !== FALSE) {
    echo 'You are using Internet Explorer.<br />';
    }
    ?>

    In Django, you would typically put this logic in a Django *view*
    (which btw is not what is called a 'view' in MVC term), which is the
    code that prepares data for the template. The logic would not live
    with the HTML. The template uses "template variables" that the view
    has associated with a Python variable or function. You might create a
    template variable (created via a Context object) named 'browser' that
    contains a value that identifies the browser.

    Thus, your Python template (HTML file) might look like this:

    {% if browser == 'IE' %}You are using Internet Explorer{% endif %}

    PHP tends to combine the presentation with the business logic, or in
    MVC terms, combines the view with the controller. Django separates
    them out, which many people find to be a better way. The person who
    writes the HTML doesn't have to speak Python, but only know the names
    of template variables and a little bit of template logic. In PHP, the
    HTML code and all the business logic lives in the same files. Even
    here, it would probably make sense to calculate the browser ID in the
    header of the HTML file, then access it via a variable in the body.

    If you have 500 static web pages that are part of the same
    application, but that do not contain any logic, your application might
    need to be redesigned.

    Also, you are doing your changes on a COPY of the application on a non-
    public server, aren't you? If not, then you really are playing with
    fire.


    HTH,
    John
    John S, Aug 8, 2010
    #5
  6. Íßêïò

    rantingrick Guest

    Re: Replace and inserting strings within .txt files with the use ofregex

    On Aug 7, 8:42 pm, MRAB <> wrote:

    > That should be:
    >
    >    data = data.replace('<?', '')
    >    data = data.replace('?>', '')


    Yes, Thanks MRAB. I did forget that important detail.

    > Strings don't have an 'insert' method!


    *facepalm*! I really must stop Usenet-ing whilst consuming large
    volumes of alcoholic beverages.
    rantingrick, Aug 8, 2010
    #6
  7. Íßêïò

    John S Guest

    Re: Replace and inserting strings within .txt files with the use ofregex

    Even though I just replied above, in reading over the OP's message, I
    think the OP might be asking:

    "How can I use RE string replacement to find PHP tags and convert them
    to Django template tags?"

    Instead of saying

    source_contents = source_contents.replace(...)

    say this instead:

    import re


    def replace_php_tags(m):
    ''' PHP tag replacer
    This function is called for each PHP tag. It gets a Match object as
    its parameter, so you can get the contents of the old tag, and
    should
    return the new (Django) tag.
    '''

    # m is the match object from the current match
    php_guts = m.group(1) # the contents of the PHP tag

    # now put the replacement logic here

    # and return whatever should go in place of the PHP tag,
    # which could be '{{ python_template_var }}'
    # or '{% template logic ... %}
    # or some combination

    source_contents = re.sub('<?\s*(.*?)\s*?
    >',replace_php_tags,source_contents)
    John S, Aug 8, 2010
    #7
  8. Íßêïò

    Íßêïò Guest

    Re: Replace and inserting strings within .txt files with the use ofregex

    On 8 Áýã, 05:42, John S <> wrote:
    > If the 500 web pages are PHP only in the sense that there is only one
    > pair of <? ?> tags in each file, surrounding the entire content, then
    > what you ask for is doable.


    First of all, thank you very much John for your BIG effort to help
    me(i'm still readign your posts)!

    I have to tell you here that those php files contain several instances
    of php opening and closing tags(like 3 each php file). The rest is
    pure html data. That happened because those files were in the
    beginning html only files that later needed conversion to php due to
    some dynamic code that had to be used to address some issues.

    Please tell me that the code you provided can be adjusted to several
    instances as well!
    Íßêïò, Aug 8, 2010
    #8
  9. Íßêïò

    Íßêïò Guest

    Re: Replace and inserting strings within .txt files with the use ofregex

    On 8 Áýã, 05:56, John S <> wrote:
    >"How can I use RE string replacement to find PHP tags and convert them
    >to Django template tags?"


    No, not at all John, at least not yet!

    I have only 1 week that i'm learnign python(changing from php & perl)
    so i'm very fresh at this beautifull and straighforwrd language.

    When i have a good understnading of Python then i will proceed to
    Django templates.
    Until then my Python templates would be only 'simple html files' that
    the only thign they contain apart form the html data would be the
    special string formatting identifies '%s' :)
    Íßêïò, Aug 8, 2010
    #9
  10. Re: Replace and inserting strings within .txt files with the use ofregex

    On Sat, 07 Aug 2010 17:20:24 -0700, Îίκος wrote:

    > I don't know how to handle such a big data replacing problem and cannot
    > play with fire because those 500 pages are my cleints pages and data of
    > those filesjust cannot be messes up.


    Take a backup copy of the files, and only edit the copies. Don't replace
    the originals until you know they're correct.

    --
    Steven
    Steven D'Aprano, Aug 8, 2010
    #10
  11. Re: Replace and inserting strings within .txt files with the use ofregex

    On 8 ΑÏγ, 11:09, Steven D'Aprano <st...@REMOVE-THIS-
    cybersource.com.au> wrote:
    > On Sat, 07 Aug 2010 17:20:24 -0700, Îίκος wrote:
    > > I don't know how to handle such a big data replacing problem and cannot
    > > play with fire because those 500 pages are my cleints pages and data of
    > > those filesjust cannot be messes up.

    >
    > Take a backup copy of the files, and only edit the copies. Don't replace
    > the originals until you know they're correct.
    >
    > --
    > Steven


    Yes of course, but the code that John S provided need soem
    modification in order to be able to change various instances of php
    tags and not only one set.
    Îίκος, Aug 8, 2010
    #11
  12. Re: Replace and inserting strings within .txt files with the use ofregex

    Script so far:

    #!/usr/bin/python

    import cgitb; cgitb.enable()
    import cgi, re, os

    print ( "Content-type: text/html; charset=UTF-8 \n" )


    id = 0 # unique page_id

    for currdir, files, dirs in os.walk('data'):

    for f in files:

    if f.endswith('php'):

    # get abs path to filename
    src_f = join(currdir,f)

    # open php src file
    f = open(src_f, 'r')
    src_data = f.read() # read contents of PHP file
    f.close()
    print 'reading from %s' % src_f

    # replace tags
    src_data = src_data.replace('<%', '')
    src_data = src_data.replace('%>', '')
    print 'replacing php tags'

    # add ID
    src_data = ( '<!-- %d -->' % id ) + src_data
    id += 1
    print 'adding unique page_id'

    # create new file with .html extension
    src_file = src_file.replace('.php', '.html')

    # open newly created html file for insertid data
    dest_f = open(src_f, 'w')
    dest_f.write(src_data) # write contents
    dest_f.close()
    print 'writing to %s' % dest_f

    Please help me adjust it, if need extra modification for more php tags
    replacing.
    Îίκος, Aug 8, 2010
    #12
  13. Re: Replace and inserting strings within .txt files with the useof regex

    On 08/08/2010 04:46 AM, rantingrick wrote:
    > *facepalm*! I really must stop Usenet-ing whilst consuming large
    > volumes of alcoholic beverages.


    THAT explains a lot.

    Cheers
    Thomas Jollans, Aug 8, 2010
    #13
  14. Re: Replace and inserting strings within .txt files with the useof regex

    On 08/08/2010 11:21 AM, Îίκος wrote:
    > Please help me adjust it, if need extra modification for more php tags
    > replacing.


    Have you tried it ? I haven't, but I see no immediate reason why it
    wouldn't work with multiple PHP blocks.

    > #!/usr/bin/python
    >
    > import cgitb; cgitb.enable()
    > import cgi, re, os
    >
    > print ( "Content-type: text/html; charset=UTF-8 \n" )
    >
    >
    > id = 0 # unique page_id
    >
    > for currdir, files, dirs in os.walk('data'):
    >
    > for f in files:
    >
    > if f.endswith('php'):
    >
    > # get abs path to filename
    > src_f = join(currdir,f)
    >
    > # open php src file
    > f = open(src_f, 'r')
    > src_data = f.read() # read contents of PHP file
    > f.close()
    > print 'reading from %s' % src_f
    >
    > # replace tags
    > src_data = src_data.replace('<%', '')
    > src_data = src_data.replace('%>', '')


    Did you read the script before posting? ;-)
    Here, you remove ASP-style tags. Which is fine, PHP supports them if you
    configure it that way, but you probably didn't. Change this to the start
    and end tags you actually use, and, if you use multiple forms (such as
    <?php vs <?), then add another line or two.

    > print 'replacing php tags'
    >
    > # add ID
    > src_data = ( '<!-- %d -->' % id ) + src_data
    > id += 1
    > print 'adding unique page_id'
    >
    > # create new file with .html extension
    > src_file = src_file.replace('.php', '.html')
    >
    > # open newly created html file for insertid data
    > dest_f = open(src_f, 'w')
    > dest_f.write(src_data) # write contents
    > dest_f.close()
    > print 'writing to %s' % dest_f
    >
    Thomas Jollans, Aug 8, 2010
    #14
  15. Re: Replace and inserting strings within .txt files with the use ofregex

    On 8 ΑÏγ, 13:13, Thomas Jollans <> wrote:
    > On 08/08/2010 11:21 AM, Îίκος wrote:
    >
    > > Please help me adjust it, if need extra modification for more php tags
    > > replacing.

    >
    > Have you tried it ? I haven't, but I see no immediate reason why it
    > wouldn't work with multiple PHP blocks.
    >
    >
    >
    >
    >
    > > #!/usr/bin/python

    >
    > > import cgitb; cgitb.enable()
    > > import cgi, re, os

    >
    > > print ( "Content-type: text/html; charset=UTF-8 \n" )

    >
    > > id = 0  # unique page_id

    >
    > > for currdir, files, dirs in os.walk('data'):

    >
    > >     for f in files:

    >
    > >         if f.endswith('php'):

    >
    > >             # get abs path to filename
    > >             src_f = join(currdir,f)

    >
    > >             # open php src file
    > >             f = open(src_f, 'r')
    > >             src_data = f.read()         # read contents of PHP file
    > >             f.close()
    > >             print 'reading from %s' % src_f

    >
    > >             # replace tags
    > >             src_data = src_data.replace('<%', '')
    > >             src_data = src_data.replace('%>', '')

    >
    > Did you read the script before posting? ;-)
    > Here, you remove ASP-style tags. Which is fine, PHP supports them if you
    > configure it that way, but you probably didn't. Change this to the start
    > and end tags you actually use, and, if you use multiple forms (such as
    > <?php vs <?), then add another line or two.
    >
    >
    >
    > >             print 'replacing php tags'

    >
    > >             # add ID
    > >             src_data = ( '<!-- %d -->' % id ) + src_data
    > >             id += 1
    > >             print 'adding unique page_id'

    >
    > >             # create new file with .html extension
    > >             src_file = src_file.replace('.php', '.html')

    >
    > >             # open newly created html file for insertid data
    > >             dest_f = open(src_f, 'w')
    > >             dest_f.write(src_data)      # write contents
    > >             dest_f.close()
    > >             print 'writing to %s' % dest_f


    Yes i have read the code very well and by mistake i wrote '<%>'
    instead of '<?'

    I was so dizzy and confused yesterday that i forgot to metnion that
    not only i need removal of php openign and closing tags but whaevers
    data lurks inside those tags as well ebcause now with the 'counter.py'
    script i wrote the html fiels would open ftm there and substitute the
    tempalte variabels like %(counter)d

    Also before the

    </body>
    </html>

    of every html file afetr removing the tags this line must be
    inserted(this holds the template variable) that 'counter.py' uses to
    produce data

    <br><br><center><h4><font color=green> ΑÏιθμός Επισκεπτών: %(counter)d
    </h4>

    After making this modifications then i can trst the script to a COPY
    of the original data in my pc.

    *In my pc i run Windows 7 while remote web hosting setup uses Linux
    Servers.
    *That wont be a problem right?
    Îίκος, Aug 8, 2010
    #15
  16. Re: Replace and inserting strings within .txt files with the useof regex

    On 08/08/2010 01:41 PM, Îίκος wrote:
    > I was so dizzy and confused yesterday that i forgot to metnion that
    > not only i need removal of php openign and closing tags but whaevers
    > data lurks inside those tags as well ebcause now with the 'counter.py'
    > script i wrote the html fiels would open ftm there and substitute the
    > tempalte variabels like %(counter)d


    I could just hand you a solution, but I'll be a bit of a bastard and
    just give you some hints.

    You could use regular expressions. If you know regular expressions, it's
    relatively trivial - but I doubt you know regexp.

    You could also repeatedly find the next occurrence of first a start tag,
    then an end tag, using either str.find or str.split, and build up a
    version of the file without PHP yourself.


    > Also before the
    >
    > </body>
    > </html>
    >
    > of every html file afetr removing the tags this line must be
    > inserted(this holds the template variable) that 'counter.py' uses to
    > produce data
    >
    > <br><br><center><h4><font color=green> ΑÏιθμός Επισκεπτών: %(counter)d
    > </h4>


    This problem is truly trivial. I know you can do it yourself, or at
    least give it a good shot, and ask again when you hit a serious roadblock.

    If I may comment on your HTML: you forgot to close your <center> and
    <font> tags. Close them! Also, both (CENTER and FONT) have been
    deprecated since HTML 4.0 -- you should consider using CSS for these
    tasks instead. Also, this line does not look like a heading, so H4 is
    hardly fitting.

    >
    > After making this modifications then i can trst the script to a COPY
    > of the original data in my pc.


    It would be nice if you re-read your posts before sending and tried to
    iron out some of more careless spelling mistakes. Maybe you are doing
    your best to post in good English -- it isn't bad and I realize this is
    neither your native language nor alphabet, in which case I apologize.
    The fact of the matter is: I originally interpreter "trst" as "trust",
    which made no sense whatsoever.

    >
    > *In my pc i run Windows 7 while remote web hosting setup uses Linux
    > Servers.
    > *That wont be a problem right?


    Nah.
    Thomas Jollans, Aug 8, 2010
    #16
  17. Re: Replace and inserting strings within .txt files with the use ofregex

    On 8 ΑÏγ, 15:40, Thomas Jollans <> wrote:
    > On 08/08/2010 01:41 PM, Îίκος wrote:
    >
    > > I was so dizzy and confused yesterday that i forgot to metnion that
    > > not only i need removal of php openign and closing tags but whaevers
    > > data lurks inside those tags as well ebcause now with the 'counter.py'
    > > script i wrote the html fiels would open ftm there and substitute the
    > > tempalte variabels like %(counter)d

    >
    > I could just hand you a solution, but I'll be a bit of a bastard and
    > just give you some hints.
    >
    > You could use regular expressions. If you know regular expressions, it's
    > relatively trivial - but I doubt you know regexp.


    Here is the code with some try-and-fail modification i made, still non-
    working based on your hints:
    ==========================================================

    id = 0 # unique page_id

    for currdir, files, dirs in os.walk('varsa'):

    for f in files:

    if f.endswith('php'):

    # get abs path to filename
    src_f = join(currdir, f)

    # open php src file
    print 'reading from %s' % src_f
    f = open(src_f, 'r')
    src_data = f.read() # read contents of PHP file
    f.close()

    # replace tags
    print 'replacing php tags and contents within'
    src_data = src_data.replace(r'<?.?>', '') #
    the dot matches any character i hope! no matter how many of them?!?

    # add ID
    print 'adding unique page_id'
    src_data = ( '<!-- %d -->' % id ) + src_data
    id += 1

    # add template variables
    print 'adding counter template variable'
    src_data = src_data + ''' <h4><font color=green> ΑÏιθμός
    Επισκεπτών: %(counter)d </font></h4> '''
    # i can think of this but the above line must be above </
    body></html> NOT after but how to right that?!?

    # rename old php file to new with .html extension
    src_file = src_file.replace('.php', '.html')

    # open newly created html file for inserting data
    print 'writing to %s' % dest_f
    dest_f = open(src_f, 'w')
    dest_f.write(src_data) # write contents
    dest_f.close()

    This is the best i can do. Sorry for any typos i might made.

    Please shed some LIGHT!
    Îίκος, Aug 8, 2010
    #17
  18. Re: Replace and inserting strings within .txt files with the useof regex

    On 08/08/2010 04:06 PM, Îίκος wrote:
    > On 8 ΑÏγ, 15:40, Thomas Jollans <> wrote:
    >> On 08/08/2010 01:41 PM, Îίκος wrote:
    >>
    >>> I was so dizzy and confused yesterday that i forgot to metnion that
    >>> not only i need removal of php openign and closing tags but whaevers
    >>> data lurks inside those tags as well ebcause now with the 'counter.py'
    >>> script i wrote the html fiels would open ftm there and substitute the
    >>> tempalte variabels like %(counter)d

    >>
    >> I could just hand you a solution, but I'll be a bit of a bastard and
    >> just give you some hints.
    >>
    >> You could use regular expressions. If you know regular expressions, it's
    >> relatively trivial - but I doubt you know regexp.

    >
    > Here is the code with some try-and-fail modification i made, still non-
    > working based on your hints:
    > ==========================================================
    >
    > id = 0 # unique page_id
    >
    > for currdir, files, dirs in os.walk('varsa'):
    >
    > for f in files:
    >
    > if f.endswith('php'):
    >
    > # get abs path to filename
    > src_f = join(currdir, f)
    >
    > # open php src file
    > print 'reading from %s' % src_f
    > f = open(src_f, 'r')
    > src_data = f.read() # read contents of PHP file
    > f.close()
    >
    > # replace tags
    > print 'replacing php tags and contents within'
    > src_data = src_data.replace(r'<?.?>', '') #
    > the dot matches any character i hope! no matter how many of them?!?


    Two problems here:

    str.replace doesn't use regular expressions. You'll have to use the re
    module to use regexps. (the re.sub function to be precise)

    '.' matches a single character. Any character, but only one.
    '.*' matches as many characters as possible. This is not what you want,
    since it will match everything between the *first* <? and the *last* ?>.
    You want non-greedy matching.

    '.*?' is the same thing, without the greed.

    >
    > # add ID
    > print 'adding unique page_id'
    > src_data = ( '<!-- %d -->' % id ) + src_data
    > id += 1
    >
    > # add template variables
    > print 'adding counter template variable'
    > src_data = src_data + ''' <h4><font color=green> ΑÏιθμός
    > Επισκεπτών: %(counter)d </font></h4> '''
    > # i can think of this but the above line must be above </
    > body></html> NOT after but how to right that?!?


    You will have to find the </body> tag before inserting the string.
    str.find should help -- or you could use str.replace and replace the
    </body> tag with you counter line, plus a new </body>.

    >
    > # rename old php file to new with .html extension
    > src_file = src_file.replace('.php', '.html')
    >
    > # open newly created html file for inserting data
    > print 'writing to %s' % dest_f
    > dest_f = open(src_f, 'w')
    > dest_f.write(src_data) # write contents
    > dest_f.close()
    >
    > This is the best i can do.


    No it's not. You're just giving up too soon.
    Thomas Jollans, Aug 8, 2010
    #18
  19. Íßêïò

    John S Guest

    Re: Replace and inserting strings within .txt files with the use ofregex

    On Aug 8, 10:59 am, Thomas Jollans <> wrote:
    > On 08/08/2010 04:06 PM, Îίκος wrote:
    >
    >
    >
    >
    >
    > > On 8 ΑÏγ, 15:40, Thomas Jollans <> wrote:
    > >> On 08/08/2010 01:41 PM, Îίκος wrote:

    >
    > >>> I was so dizzy and confused yesterday that i forgot to metnion that
    > >>> not only i need removal of php openign and closing tags but whaevers
    > >>> data lurks inside those tags as well ebcause now with the 'counter.py'
    > >>> script i wrote the html fiels would open ftm there and substitute the
    > >>> tempalte variabels like %(counter)d

    >
    > >> I could just hand you a solution, but I'll be a bit of a bastard and
    > >> just give you some hints.

    >
    > >> You could use regular expressions. If you know regular expressions, it's
    > >> relatively trivial - but I doubt you know regexp.

    >
    > > Here is the code with some try-and-fail modification i made, still non-
    > > working based on your hints:
    > > ==========================================================

    >
    > > id = 0  # unique page_id

    >
    > > for currdir, files, dirs in os.walk('varsa'):

    >
    > >     for f in files:

    >
    > >         if f.endswith('php'):

    >
    > >             # get abs path to filename
    > >             src_f = join(currdir, f)

    >
    > >             # open php src file
    > >             print 'reading from %s' % src_f
    > >             f = open(src_f, 'r')
    > >             src_data = f.read()         # read contents of PHP file
    > >             f.close()

    >
    > >             # replace tags
    > >             print 'replacing php tags and contents within'
    > >             src_data = src_data.replace(r'<?.?>', '')             #
    > > the dot matches any character i hope! no matter how many of them?!?

    >
    > Two problems here:
    >
    > str.replace doesn't use regular expressions. You'll have to use the re
    > module to use regexps. (the re.sub function to be precise)
    >
    > '.'  matches a single character. Any character, but only one.
    > '.*' matches as many characters as possible. This is not what you want,
    > since it will match everything between the *first* <? and the *last* ?>.
    > You want non-greedy matching.
    >
    > '.*?' is the same thing, without the greed.
    >
    >
    >
    > >             # add ID
    > >             print 'adding unique page_id'
    > >             src_data = ( '<!-- %d -->' % id ) + src_data
    > >             id += 1

    >
    > >             # add template variables
    > >             print 'adding counter template variable'
    > >             src_data = src_data + ''' <h4><font color=green> ΑÏιθμός
    > > Επισκεπτών: %(counter)d </font></h4> '''
    > >             # i can think of this but the above line must be above </
    > > body></html> NOT after but how to right that?!?

    >
    > You will have to find the </body> tag before inserting the string.
    > str.find should help -- or you could use str.replace and replace the
    > </body> tag with you counter line, plus a new </body>.
    >
    >
    >
    > >             # rename old php file to new with .html extension
    > >             src_file = src_file.replace('.php', '.html')

    >
    > >             # open newly created html file for inserting data
    > >             print 'writing to %s' % dest_f
    > >             dest_f = open(src_f, 'w')
    > >             dest_f.write(src_data)      # write contents
    > >             dest_f.close()

    >
    > > This is the best i can do.

    >
    > No it's not. You're just giving up too soon.


    When replacing text in an HTML document with re.sub, you want to use
    the re.S (singleline) option; otherwise your pattern won't match when
    the opening tag is on one line and the closing is on another.
    John S, Aug 8, 2010
    #19
  20. Re: Replace and inserting strings within .txt files with the useof regex

    � wrote:
    > Hello dear Pythoneers,
    >
    > I have over 500 .php web pages in various subfolders under 'data'
    > folder that i have to rename to .html and and ditch the '<?' and '?>'
    > tages from within and also insert a very first line of <!-- id -->
    > where id must be an identification unique number of every page for
    > counter tracking purposes. ONly pure html code must be left.
    >
    > I before find otu Python used php and now iam switching to templates +
    > python solution so i ahve to change each and every page.
    >
    > I don't know how to handle such a big data replacing problem and
    > cannot play with fire because those 500 pages are my cleints pages and
    > data of those filesjust cannot be messes up.
    >
    > Can you provide to me a script please that is able of performing an
    > automatic way of such a page content replacing?
    >
    > Thanks a million!


    This is quite a vague description of the file contents. But, for a
    completely different approach, how about using a browser and doing view
    source, then saving the html that was generated. This will contain no
    php code, but it will contain the results of whatever the php was doing.

    If you don't have time to do this manually, look into wget or curl,
    which will do the job in a program environment.

    The discussion so far has dealt with stripping php, and leaving the
    html. But the html must have embeded <?php some code to print something
    ?> in it. Or, there could be long fragments of html which are
    constructed by php and then echo'ed.

    Joel Goldstick
    Joel Goldstick, Aug 8, 2010
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Ben

    Strings, Strings and Damned Strings

    Ben, Jun 22, 2006, in forum: C Programming
    Replies:
    14
    Views:
    736
    Malcolm
    Jun 24, 2006
  2. Sameen
    Replies:
    2
    Views:
    424
    Victor Bazarov
    Aug 29, 2005
  3. Replies:
    3
    Views:
    728
    Reedick, Andrew
    Jul 1, 2008
  4. anonym
    Replies:
    1
    Views:
    1,000
    Knute Johnson
    Jan 15, 2009
  5. Jochen Brenzlinger
    Replies:
    7
    Views:
    5,475
    Roedy Green
    Sep 15, 2011
Loading...

Share This Page