Replace unknow string varible in file.

N

namire

Hey .python first time poster here. I'm pretty good with python so
far, but I keep needed a function in my program but not knowing how to
build it. =( Here's the problem:

Imagine a html file full of 100's of these strings all mooshed
together onto many lines;
<!--"@@MARKER@@; id=ITEM"-->ITEM<br>
Where the word 'MARKER' is a constant, it stay the same in each string
and the word 'ITEM' is a random combination of ascii characters of an
unknown length. So for example a:
<!--"@@MARKER@@; id=CATFISH"-->CATFISH<br><h1>Text text text</
h1><!--"@@MARKER@@; id=SPAM"-->SPAM<br> and so on...

What I need to do it replace each instance of the random letter with a
constant and/or delete them. The file is a html file so the stuff
inside of <!-- --> is ok to keep and I need that data to identify
where the strings are in the file (it's full of other stuff too). I'm
tired making a bruteforcer but with unknown length and 26 letters of
the alphabet I gave up because it would take too long (it was
something like; read file; if '@@MARKER@@; id="'+str(gen_string)+'"--
"+sr(gen_string)+'<br>' in file then replace with '', but I'm
paraphrasing code and it's not the best solution anyway).

Just as a comparison in the Windows OS this seems easy to do when
managing files, say for the file a-blah-b-blah.tmp where blah is an
unknown you can use: del a-*-b-*.tmp to get rid of that file. But for
python and a string in text file I don't have a clue. @_@ could
someone please help me?
 
V

Vlastimil Brom

2009/2/10 namire said:
Hey .python first time poster here. I'm pretty good with python so
far, but I keep needed a function in my program but not knowing how to
build it. =( Here's the problem:

Imagine a html file full of 100's of these strings all mooshed
together onto many lines;
<!--"@@MARKER@@; id=ITEM"-->ITEM<br>
Where the word 'MARKER' is a constant, it stay the same in each string
and the word 'ITEM' is a random combination of ascii characters of an
unknown length. So for example a:
<!--"@@MARKER@@; id=CATFISH"-->CATFISH<br><h1>Text text text</
h1><!--"@@MARKER@@; id=SPAM"-->SPAM<br> and so on...

What I need to do it replace each instance of the random letter with a
constant and/or delete them.
...
Just as a comparison in the Windows OS this seems easy to do when
managing files, say for the file a-blah-b-blah.tmp where blah is an
unknown you can use: del a-*-b-*.tmp to get rid of that file. But for
python and a string in text file I don't have a clue. @_@ could
someone please help me?

Hi,
It is not quite clear to me, what should be achieved with the given
file, but an example with wildcard characters in windows implies, that
the regular expressions can be of some use here (given the file is as
regular as the samples, especially without nesting the comments etc.)

the segments in examples can be matched eg. with the expression>:

<!--"@@MARKER@@; id=([^"]+)"-->\1<br>

the ITEM, CATFISH, SPAM ... elements are captured in the parethesised
group and can be used foe matching or replacing.

check the re module in the python library:
http://docs.python.org/library/re.html

hth
vbr
 
R

r0g

namire said:
Just as a comparison in the Windows OS this seems easy to do when
managing files, say for the file a-blah-b-blah.tmp where blah is an
unknown you can use: del a-*-b-*.tmp to get rid of that file. But for
python and a string in text file I don't have a clue. @_@ could
someone please help me?

Hi Namire,

The equivalent thing in programming languages is "Regular Expressions",
also known as "regex". It's like a small pattern matching sub-language.

There quite a lot more to it than the odd "*" so it might take a bit of
googling around and study to really understand it but the principle is
the same.

In python you need to import the 're' module.

Have a look at this, the replace method is called 'sub'...

http://www.amk.ca/python/howto/regex/regex.html

Take your time with it though, it can be confusing until you get used to
it and when you're building an expression don't try and do it all at
once, start small and build it up a little at a time.

Roger Heathcote.
 
T

Terry Reedy

namire said:
Hey .python first time poster here. I'm pretty good with python so
far, but I keep needed a function in my program but not knowing how to
build it. =( Here's the problem:

Imagine a html file full of 100's of these strings all mooshed
together onto many lines;
<!--"@@MARKER@@; id=ITEM"-->ITEM<br>
Where the word 'MARKER' is a constant, it stay the same in each string
and the word 'ITEM' is a random combination of ascii characters of an
unknown length. So for example a:
<!--"@@MARKER@@; id=CATFISH"-->CATFISH<br><h1>Text text text</
h1><!--"@@MARKER@@; id=SPAM"-->SPAM<br> and so on...

What I need to do it replace each instance of the random letter with a
constant and/or delete them. The file is a html file so the stuff
inside of <!-- --> is ok to keep

I cannot understand what you want to do where. The last phrase implies
'leave the comments alone' but you only talked about random letters
within the comments. I suggest a minimal but complete example of
possible input and desired output.
 
N

namire

Thanks to Vlastimil Brom for his example and r0g for his helpful
attitude and hyperlinks I was able to made program do what was
needed. Terry the comments in the html are not important I was just
saying that if I could cover the undesired strings in html comment
tags then they would not should up on a web browser viewing the file,
lucky I get then erased also so the comments is not needed. I anyone
wants I can post a pastebin the the full code that uses the regex's.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,071
Latest member
MetabolicSolutionsKeto

Latest Threads

Top