Replace and inserting strings within .txt files with the use of regex

MRAB · Aug 9, 2010

ÎÎ¯ÎºÎ¿Ï‚ said:
src_data = re.sub( '<\?(.*?)\?>', '', src_data, re.DOTALL )

like this?

re.sub doesn't accept a flags argument. You can put the flag inside the
regex itself like this:

src_data = re.sub(r'(?s)<\?(.*?)\?>', '', src_data)

(Note that the abbreviation for re.DOTALL is re.S and the inline flag is
'(?s)'. This is for historical reasons!

)

MRAB · Aug 9, 2010

Íßêïò said:
Now the code looks as follows:

=============================
#!/usr/bin/python

import re, os, sys

id = 0 # unique page_id

for currdir, files, dirs in os.walk('test'):

for f in files:

if f.endswith('php'):
[snip]

I just tried to test it. I created a folder names 'test' in me 'd:\'
drive.
Then i have put to .php files inside form the original to test if it
would work ok for those too files before acting in the whole copy and
after in the original project.

so i opened a 'cli' form my Win7 and tried

D:\>convert.py

D:\>

Itsjust printed an empty line and nothign else. Why didn't even try to
open the folder and fiels within?
Syntactically it doesnt ghive me an error!
Somehting with os.walk() methos perhaps?

Click to expand...

Can you help in this too please?

Now iam able to just convrt a single file 'd:\test\index.php'

But these needs to be done for ALL the php files in every subfolder.

for currdir, files, dirs in os.walk('test'):

for f in files:

if f.endswith('php'):

Click to expand...

Should the above lines enter folders and find php files in each folder
so to be edited?

I'd start by commenting-out the lines which change the files and then
add some more print statements to see which files it's finding. That
might give a clue. Only when it's fixed and finding the correct files
would I remove the additional print statements and then restore the
commented lines.

ÎÎ¯ÎºÎ¿Ï‚ · Aug 9, 2010

re.sub doesn't accept a flags argument. You can put the flag inside the
regex itself like this:

Â Â Â src_data = re.sub(r'(?s)<\?(.*?)\?>', '', src_data)

(Note that the abbreviation for re.DOTALL is re.S and the inline flag is
'(?s)'. This is for historical reasons! )

This is for the '.' to match any character including '\n' too right?
so no matter if the php start tag and the end tag is in different
lines still to be matched, correct?

We nned the 'raw' string as well? why? The regex doens't cotnain
backslashes.

Íßêïò · Aug 9, 2010

ÃÃŸÃªÃ¯Ã² said:
ÃÃŸÃªÃ¯Ã² said:

Now the code looks as follows:
=============================
#!/usr/bin/python
import re, os, sys
id = 0 Â # unique page_id
for currdir, files, dirs in os.walk('test'):
Â Â Â Â for f in files:
Â Â Â Â Â Â Â Â if f.endswith('php'):

Click to expand...

[snip]

I just tried to test it. I created a folder names 'test' in me 'd:\'
drive.
Then i have put to .php files inside form the original to test if it
would work ok for those too files before acting in the whole copy and
after in the original project.
so i opened a 'cli' form my Win7 and tried
D:\>convert.py
D:\>
Itsjust printed an empty line and nothign else. Why didn't even try to
open the folder and fiels within?
Syntactically it doesnt ghive me an error!
Somehting with os.walk() methos perhaps?

Click to expand...

Click to expand...

Can you help in this too please?

Click to expand...

Now iam able to just convrt a single file 'd:\test\index.php'

Click to expand...

But these needs to be done for ALL the php files in every subfolder.

Click to expand...

Should the above lines enter folders and find php files in each folder
so to be edited?

Click to expand...

I'd start by commenting-out the lines which change the files and then
add some more print statements to see which files it's finding. That
might give a clue. Only when it's fixed and finding the correct files
would I remove the additional print statements and then restore the
commented lines.

I did that, but it doesnt even get to the 'test' folder to search for
the files!

Íßêïò · Aug 9, 2010

D:\>convert.py
File "D:\convert.py", line 34
SyntaxError: Non-ASCII character '\xce' in file D:\convert.py on line
34, but no
encoding declared; see http://www.python.org/peps/pep-0263.html for
details

D:\>

What does it refering too? what character cannot be identified?

Line 34 is:

src_data = src_data.replace( '</body>', '<br><br><center><h4><font
color=green> Áñéèìüò Åðéóêåðôþí: %(counter)d </body>' )

Also,

for currdir, files, dirs in os.walk('test'):

for f in files:

if f.lower().endswith("php"):

in the above lines

should i state os.walk('test') or os.walk('d:\test') ?

MRAB · Aug 9, 2010

ÎÎ¯ÎºÎ¿Ï‚ said:
This is for the '.' to match any character including '\n' too right?
so no matter if the php start tag and the end tag is in different
lines still to be matched, correct?

We nned the 'raw' string as well? why? The regex doens't cotnain
backslashes.

Yes it does; two of them!

MRAB · Aug 9, 2010

Íßêïò said:
D:\>convert.py
File "D:\convert.py", line 34
SyntaxError: Non-ASCII character '\xce' in file D:\convert.py on line
34, but no
encoding declared; see http://www.python.org/peps/pep-0263.html for
details

D:\>

What does it refering too? what character cannot be identified?

Line 34 is:

src_data = src_data.replace( '</body>', '<br><br><center><h4><font
color=green> Áñéèìüò Åðéóêåðôþí: %(counter)d </body>' )

Didn't you say that you're using Python 2.7 now? The default file
encoding will be ASCII, but your file isn't ASCII, it contains Greek
letters. Add the encoding line:

# -*- coding: utf-8 -*-

and check that the file is saved as UTF-8.

Also,

for currdir, files, dirs in os.walk('test'):

for f in files:

if f.lower().endswith("php"):

in the above lines

should i state os.walk('test') or os.walk('d:\test') ?

The path 'test' is relative to the current working directory. Is that
D:\ for your script? If not, then it won't find the (correct) folder.

It might be better to use an absolute path instead. You could use
either:

r'd:\test'

(note that I've made it a raw string because it contains a backslash
which I want treated as a literal backslash) or:

'd:/test'

(Windows should accept a slash as well as of a backslash.)

Íßêïò · Aug 10, 2010

Didn't you say that you're using Python 2.7 now? The default file
encoding will be ASCII, but your file isn't ASCII, it contains Greek
letters. Add the encoding line:

Â Â Â # -*- coding: utf-8 -*-

and check that the file is saved as UTF-8.

The path 'test' is relative to the current working directory. Is that
D:\ for your script? If not, then it won't find the (correct) folder.

It might be better to use an absolute path instead. You could use
either:

Â Â Â r'd:\test'

(note that I've made it a raw string because it contains a backslash
which I want treated as a literal backslash) or:

Â Â Â 'd:/test'

(Windows should accept a slash as well as of a backslash.)

I will try it as soon as i make another change that i missed:

The ID number of each php page was contained in the old php code
within this string

PageID = some_number

So instead of create a new ID number for eaqch page i have to pull out
this number to store to the beginnign to the file as comment line,
because it has direct relationship with the mysql database as in
tracking the number of each webpage and finding the counter of it.

# Grab the PageID contained within the php code and store it in id
variable
id = re.search( 'PageID = ', src_data )

How to tell Python to Grab that number after 'PageID = ' string and to
store it in var id that a later use in the program?

also i made another changewould something like this work:

===============================
# open same php file for storing modified data
print ( 'writing to %s' % dest_f )
f = open(src_f, 'w')
f.write(src_data)
f.close()

# rename edited .php file to .html extension
dst_f = src_f.replace('.php', '.html')
os.rename( src_f, dst_f )
===============================

Because instead of creating a new .html file and inserting the desired
data of the old php thus having two files(old php, and new html) i
decided to open the same php file for writing that data and then
rename it to html.
Would the above code work?

Íßêïò · Aug 10, 2010

Please help me with these last changes before i try to perform an
overall change.
its almost done!

MRAB · Aug 10, 2010

ÎÎ¯ÎºÎ¿Ï‚ wrote:
[snip]

The ID number of each php page was contained in the old php code
within this string

PageID = some_number

So instead of create a new ID number for eaqch page i have to pull out
this number to store to the beginnign to the file as comment line,
because it has direct relationship with the mysql database as in
tracking the number of each webpage and finding the counter of it.

# Grab the PageID contained within the php code and store it in id
variable
id = re.search( 'PageID = ', src_data )

How to tell Python to Grab that number after 'PageID = ' string and to
store it in var id that a later use in the program?

If the part of the file you're trying to match look like this:

PageID = 12

then the regex should look like this:

PageID = (\d+)

and the code should look like this:

page_id = re.search(r'PageID = (\d+)', src_data).group(1)

The page_id will, of course, be a string.

also i made another changewould something like this work:

===============================
# open same php file for storing modified data
print ( 'writing to %s' % dest_f )
f = open(src_f, 'w')
f.write(src_data)
f.close()

# rename edited .php file to .html extension
dst_f = src_f.replace('.php', '.html')
os.rename( src_f, dst_f )
===============================

Because instead of creating a new .html file and inserting the desired
data of the old php thus having two files(old php, and new html) i
decided to open the same php file for writing that data and then
rename it to html.
Would the above code work?

Why wouldn't it?

ÎÎ¯ÎºÎ¿Ï‚ · Aug 11, 2010

ÎÎ¯ÎºÎ¿Ï‚ wrote:

[snip]

The ID number of each php page was contained in the old php code
within this string

Click to expand...

PageID = some_number

Click to expand...

So instead of create a new ID number for eaqch page i have to pull out
this number to store to the beginnign to the file as comment line,
because it has direct relationship with the mysql database as in
tracking the number of each webpage and finding the counter of it.

Click to expand...

# Grab the PageID contained within the php code and store it in id
variable
id = re.search( 'PageID = ', src_data )

Click to expand...

How to tell Python to Grab that number after 'PageID = ' string and to
store it in var id that a later use in the program?

Click to expand...

If the part of the file you're trying to match look like this:

Â Â Â PageID = 12

then the regex should look like this:

Â Â Â PageID = (\d+)

and the code should look like this:

Â Â Â page_id = re.search(r'PageID = (\d+)', src_data).group(1)

The page_id will, of course, be a string.

Thank you very much for helping me with the syntax.

Why wouldn't it?

I though i was perhaps did something wrong with the code.

=========================================
for currdir, files, dirs in os.walk('d:\\test'): # neither 'd:/test'
tracks the folder

for f in files:

if f.lower().endswith("php"):

print currdir, files, dirs, f
=========================================

As you advised me in a previous post of yours i need to find out why
the converting code
although works for a single file doesn't for some reason enter folders
and subfolders to grab files form there to convert.

So as you said i should comment all other statements to find out the
culprit in the above lines.

Well those lines are supposed to print current working folder and
files but when i run the above code it gives me nothing in response,
not even 'f'.

So does that mean that os.walk() method cannot enter the windows 7
folders?

* One more thing is that instead of trying to run the above script
form 'cli' wouldn't it better to run it as a cgi script and see the
results in the browser instead with the addition fo this line?

print ( "Content-type: text/html; charset=UTF-8 \n" )

Or for some reason this has to be run from the shell to both
local(windows 7) and remote hosting (linux) servers?

Íßêïò · Aug 16, 2010

Didn't you say that you're using Python 2.7 now? The default file
encoding will be ASCII, but your file isn't ASCII, it contains Greek
letters. Add the encoding line:

Â Â Â # -*- coding: utf-8 -*-

and check that the file is saved as UTF-8.

sctually its for currdir, dirs, filesin os.walk('test'): thats whay
ti couldnt run!!

After changifn this and made some other modification my convertion
script finally run!

Here it is for someone that might want a similar functionality.

======================================================================

#!/usr/bin/python
# -*- coding: utf-8 -*-

import re, os, sys

count = 520

for currdir, dirs, files in os.walk('d:\\akis'):

for f in files:

if f.lower().endswith("php"):

# get abs path to filename
src_f = os.path.join(currdir, f)

# open php src file
f = open(src_f, 'r')
src_data = f.read()
f.close()

# Grab the id number contained within the php code and insert it
above all other data
found = re.search( r'PageID = (\d+)', src_data )
if found:
id = found.group(1)
else:
id = count =+ 1
src_data = ( '\n\n' % id ) + src_data

# replace php tags and contents within
src_data = re.sub( r'(?s)<\?(.*?)\?>', '', src_data )

# add template variables
src_data = src_data.replace( '</body>', '<br><br><center><h4><font
color=green> Î‘ÏÎ¹Î¸Î¼ÏŒÏ‚ Î•Ï€Î¹ÏƒÎºÎµÏ€Ï„ÏŽÎ½: %(counter)d </body>' )

# open same php file for storing modified data
f = open(src_f, 'w')
f.write(src_data)
f.close()

# rename edited .php file to .html extension
dst_f = src_f.replace('.php', '.html')
os.rename( src_f, dst_f )
print ( "renaming: %s => %s\n" % (src_f, dst_f) )

Problem with a login script, SESSION user rights and put this together so it works with the other pages and MySQL. Code examples.	2	May 5, 2023
Working on mobile css menu with plenty of frustration!	2	Dec 29, 2022
The devolution of English language and slothful c.l.p behaviors exposed!	50	Jan 24, 2012
Can I get a little help with my program? (string searching and regex)	0	Jan 8, 2009
Using a RegEx as a "variable" WITHIN an array?	4	May 15, 2005
Use of CSS and Master Pages	4	May 2, 2007
Archos 70 Android tablet with HTML pages for control and data display	3	Sep 14, 2011
CFP with extended deadline of Mar. 31, 2011: The 2011 InternationalConference on Modeling, Simulati	0	Mar 20, 2011

Replace and inserting strings within .txt files with the use of regex

MRAB

MRAB

ÎÎ¯ÎºÎ¿Ï‚

Íßêïò

Íßêïò

MRAB

MRAB

Íßêïò

Íßêïò

MRAB

ÎÎ¯ÎºÎ¿Ï‚

Íßêïò

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads