deleting lines from many xml documents

N

Nora

Hi,

I have about 200 xml files which contain one line, that I want to delete.
This line is always the last line of the file and it always begins with
"<?Pub"
Transformations don't work as due to this line the document ist not valid
and saxon won't perform the transformation.

Has anyone an idea how I can get rid of this last line in all documents
without having to open all documents and deleting manually?

Thanks in advance for your help!
Nora
 
M

Martin Honnen

Nora wrote:

I have about 200 xml files which contain one line, that I want to delete.
This line is always the last line of the file and it always begins with
"<?Pub"
Transformations don't work as due to this line the document ist not valid
and saxon won't perform the transformation.

Has anyone an idea how I can get rid of this last line in all documents
without having to open all documents and deleting manually?

If it is not well-formed then it is not XML and judging from your
comments it sounds as if it is not XML so any XML parser won't help. But
reading in text files line by line and writing some of the lines back is
a task that can be solved by many programming languages.
 
B

Boris Stumm

Nora said:
I have about 200 xml files which contain one line, that I want to delete.
This line is always the last line of the file and it always begins with
"<?Pub"
Transformations don't work as due to this line the document ist not valid
and saxon won't perform the transformation.

In Unix/Linux I'd do it this way:

mkdir new
for i in *.xml ; do
grep -v "^<\?Pub" $i > new/$i
done
mv new/* .
rmdir new

Not tested, it can well be that the syntax is somewhat wrong. But
you get the idea.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFBIL2lszjGeKyO8GsRAhQdAJwMOpetLc5orjc57nW5a3D3gg3MjACfRvRk
HWc4pIOM7pBQ2GvB7PVnfEg=
=Y5L4
-----END PGP SIGNATURE-----
 
T

Trevor Lowing

Grep for windows (use version 2.2)

http://www.wingrep.com/download.html


--
---------------------------------
Trevor Lowing
Satellite Beach, Fl

(e-mail address removed)
---------------------------------
Need help with:
Access?
http://www.mvps.org/access/
Outlook?
http://www.slipstick.com/
Visio?
http://www.mvps.org/visio/
HTML/CSS?
http://www.NCDesign.org
Scripting(VBScript/JScript/WSH/XML)?
http://www.DevGuru.com
http://cwashington.netreach.net/
http://developer.irt.org/script/script.htm
---------------------------------
 
N

Nora

Trevor,

I downloaded the Grep for windows. But I don't quite see, how it should help
me with my problem.
I want to replace the last line of each document. These lines are not
completely similar in their content. But all of them start the same way. (so
simple find and replace won't help)
Did you understand me that way? And if yes, how should the Grep help me?

Thanks, Nora
 
G

Gerald Aichholzer

Nora said:
I downloaded the Grep for windows. But I don't quite see, how it should help
me with my problem.
I want to replace the last line of each document. These lines are not
completely similar in their content. But all of them start the same way. (so
simple find and replace won't help)
Did you understand me that way? And if yes, how should the Grep help me?

As far as I know grep is a search tool only - except the version
mentioned (Wingrep 2.2 or higher) has some extra features, e.g.
if it supplies an option to delete the line matching a search ex-
pression it would solve your problem:

The regular expression ^<\?Pub identifies all lines beginning
with <?Pub (^ is for beginning of line, \ is the escape character
becaus ? has a special meaning).

If you have access to *nix or cygwin you could apply the following
command to each xml-file:

cat xmlfile.xml | sed '/^<\?Pub/d' > xmlfile-new.xml

This will delete all lines starting with <?Pub - Wingrep 2.2
might have similar features.

HTH,
Gerald
 
W

William Park

Nora said:
Hi,

I have about 200 xml files which contain one line, that I want to delete.
This line is always the last line of the file and it always begins with

To delete the last line,
sed '$d' in > out
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,009
Latest member
GidgetGamb

Latest Threads

Top