Newbie: working with a text file and converting to xml

A

Adam Teale

hi Guys,

I have a tab-delimited text file that I would like to convert into an
xml file that can be read/imported into Apple's Final Cut Pro.

The text file is 2 columns.
The first column is the time (timecode)
The second column is text (for sub-titling)

I thought this might be a good starting project to get into Ruby

Any suggestions on how I might approach this?

Thanks!

Adam Teale
 
K

Kevin Jackson

I have a tab-delimited text file that I would like to convert into an
xml file that can be read/imported into Apple's Final Cut Pro.

The text file is 2 columns.
The first column is the time (timecode)
The second column is text (for sub-titling)

I thought this might be a good starting project to get into Ruby

Any suggestions on how I might approach this?

look at XMLBuilder and FasterCSV

Setup FasterCSV to use a tab as the delimiter instead of the comma and
then use it to read the input and then use XMLBuilder to output
<timecode>data</timecode><sub-title>data</subtitle>

should be fairly simple, or you can avoid libraries and do it by
yourself to learn more about ruby without getting bogged down in 3rd
party libs

x = Builder::XmlMarkup.new:)target => $stdout, :indent => 1)
x.instruct
x.timcode data
x.sub-title data

etc

Kev
 
P

Peter Szinek

Adam said:
hi Guys,

I have a tab-delimited text file that I would like to convert into an
xml file that can be read/imported into Apple's Final Cut Pro.

The text file is 2 columns.
The first column is the time (timecode)
The second column is text (for sub-titling)

Could you send us 2 example files? I guess the text file format is
obvious (but better to work with a real-life example) but I am not so
sure about the Final Cut Pro XML (or is it just a plain simple XML?)

Until then, check out this code:

============================================================
input = <<INPUT
0.12 Salut, Foo!
0.15 Hola Bar! Did you see Baz?
0.22 I guess he is hanging around with Fluff and Ork.
INPUT

template = <<TEMPLATE
<timecode>TIMECODE</timecode>
<sub-titling>SUB-TITLING</sub-titling>
TEMPLATE

result = "<?xml version=\"1.0\" encoding=\"ISO-8859-1\"?>\n"

input.split(/\n/).each do |line|
data = line.split(/\t/)
result += template.sub('TIMECODE'){data[0]}.sub('SUB-TITLING'){data[1]}
end

result += '</xml>'

puts result
============================================================

output:

<?xml version="1.0" encoding="ISO-8859-1"?>
<timecode>0.12</timecode>
<sub-titling>Salut, Foo!</sub-titling>
<timecode>0.15</timecode>
<sub-titling>Hola Bar! Did you see Baz?</sub-titling>
<timecode>0.22</timecode>
<sub-titling>I guess he is hanging around with Fluff and
Ork.</sub-titling>
</xml>


Cheers,
Peter

__
http://www.rubyrailways.com
 
A

Adam Teale

Hi Kev & Peter!

Thanks for respoding so quickly!

The text file looks pretty much like that

00:00:30:13 Swayambhunath Temple: building started 460AD
00:00:42:21 Durbar Square
00:01:05:06 Driving to Trisuli River for Rafting
00:01:55:22 Day 1 Trekking: Pokhara to Tirkhedhunga (1540m)
00:02:20:20 Day 2 Trekking: Tirkhedhunga to Ghorephani (2750m)
00:02:33:19 Day 3 Trekking: Ghorephani to Ghandruk (1940m)
00:02:42:04 Day 4 Trekking: Ghandruk to Pothana (1900m)
00:03:10:13 Day 5 Trekking: Pothana to Phedi (1130m)

It'll take a while for your example to filter down into my brain - when
it does I'll get back to you about it.

Awesome!

Thanykou so much!

Adam


Peter said:
Adam said:
hi Guys,

I have a tab-delimited text file that I would like to convert into an
xml file that can be read/imported into Apple's Final Cut Pro.

The text file is 2 columns.
The first column is the time (timecode)
The second column is text (for sub-titling)

Could you send us 2 example files? I guess the text file format is
obvious (but better to work with a real-life example) but I am not so
sure about the Final Cut Pro XML (or is it just a plain simple XML?)

Until then, check out this code:

============================================================
input = <<INPUT
0.12 Salut, Foo!
0.15 Hola Bar! Did you see Baz?
0.22 I guess he is hanging around with Fluff and Ork.
INPUT

template = <<TEMPLATE
<timecode>TIMECODE</timecode>
<sub-titling>SUB-TITLING</sub-titling>
TEMPLATE

result = "<?xml version=\"1.0\" encoding=\"ISO-8859-1\"?>\n"

input.split(/\n/).each do |line|
data = line.split(/\t/)
result +=
template.sub('TIMECODE'){data[0]}.sub('SUB-TITLING'){data[1]}
end

result += '</xml>'

puts result
============================================================

output:

<?xml version="1.0" encoding="ISO-8859-1"?>
<timecode>0.12</timecode>
<sub-titling>Salut, Foo!</sub-titling>
<timecode>0.15</timecode>
<sub-titling>Hola Bar! Did you see Baz?</sub-titling>
<timecode>0.22</timecode>
<sub-titling>I guess he is hanging around with Fluff and
Ork.</sub-titling>
</xml>


Cheers,
Peter

__
http://www.rubyrailways.com
 
P

Peter Szinek

Adam said:
The text file looks pretty much like that

Then it should be fine - as far as there are no tabs in the second
column. Of course even that would not mean an unsolvable problem but it
would not work with the code I sent you.
It'll take a while for your example to filter down into my brain - when
it does I'll get back to you about it.
Sure!


Awesome!
Yeah, Ruby is awesome! I am a beginner, too (picked up Ruby a few months
ago) and though I have very limited time to learn it, I can do a lot of
things already. The learning curve is really steep.

Cheers,
Peter

__
http://www.rubyrailways.com
 
A

Adam Teale

Hi Peter,

I saved your code and called it convert.rb. I ran it (replacing
'filename' with the path of my text file - was that right to do?)

i got this error:
convert.rb:1: unknown regexp options - atal

any ideas?

also, do you know if thereis any way to run a script from the
commandline like?:
/convert.rb mytextfile.txt
i made a shell script that used this kind of thing - it took the input
file as something like $ARGV (i think - sorry i'm a super newbie!!)
make sense?

Thanks Peter!

Adam
 
P

Peter Szinek

Adam said:
Hi Peter,

I saved your code and called it convert.rb. I ran it (replacing
'filename' with the path of my text file - was that right to do?)

i got this error:
convert.rb:1: unknown regexp options - atal

any ideas?
I guess you are referring to Paul's solution since I did not use any
files :) In any case, could you paste the code here (convert.rb) so I
can check what's going on?
also, do you know if thereis any way to run a script from the
commandline like?:
./convert.rb mytextfile.txt

Sure. The array called ARGV contains all the command line options.

------ test.rb
#!/usr/bin/ruby
puts ARGV[0]
puts ARGV[1]
------

/test rb foo bar

will output

----
foo
bar
----

Cheers,
Peter

__
http://www.rubyrailways.com
 
A

Adam Teale

doh! Sorry guys!

Peter - thanks for the ARGV tips!

I think i have Paul's script going using the ARGV
---------------------------------------------------
#!/usr/bin/ruby -w

data = File.read(ARGV[0])

output = "<?xml version=\"1.0\" encoding=\"ISO-8859-1\"?>\n"

data.each do |line|
timecode,subtitle = line.strip.split("\t")
xml =
"<item><timecode>#{timecode}</timecode><subtitle>#{subtitle}</subtitle></item>"
output += xml + "\n"
end

File.open("output.xml","w") { |f| f.write output }
---------------------------------------------------


However it only outputs the first line from my txt file:
---------------------------------------------------
<?xml version="1.0" encoding="ISO-8859-1"?>
<item><timecode>00:00:30:13</timecode><subtitle>Swayambhunath Temple:
building started 460AD
00:00:42:21</subtitle></item>
---------------------------------------------------

Apologies for my newbieness!

Cheers guys!

Adam
 
P

Peter Szinek

Hi,
However it only outputs the first line from my txt file:
---------------------------------------------------
<?xml version="1.0" encoding="ISO-8859-1"?>
<item><timecode>00:00:30:13</timecode><subtitle>Swayambhunath Temple:
building started 460AD
00:00:42:21</subtitle></item>
---------------------------------------------------
Hmm strange. I have cut'n'pasted this code and the data from your
previous mail and
for me it works perfectly (as all other Paul's solutions). Are you sure your
input txt file is OK?

Are you on Mac? Maybe there can be something with the line breaks?
Apologies for my newbieness!
No need to apologize. In no time, *you* will be answering other's
questions :)

Peter

__
http://www.rubyrailways.com
 
A

Adam Teale

Ah thanks Peter - yes on OSX - you are right, there is something funny
with the line breaks! Weird!

now i just have to work out how to add all the FCP xml stuff in there

I appreciate l all your help & encouraging words!!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top