Find/Replace In Files Using Lookup Table

Andrew Porter · May 28, 2008

I have a directory full of HTML files. Some have anchor tags (<a =20
href=3D"directory/filename.html">), some do not. I also have a tab-=20
delimited text file with=97among other things=97an ID, title, and =
filename.

What I need to do is create a script that will:

1. Search all of the HTML files in a directory for anchor tags
2. Strip out the file name from the href attribute
3. Use the file name to look up the correlating ID in the lookup file
4. Replace the contents of the href attribute with the ID

Being new to Ruby and command-line scripting, I'm not sure where to =20
begin looking for examples of how to do this. Any help is appreciated.

Eric I. · May 29, 2008

I have a directory full of HTML files. Some have anchor tags (<a
href="directory/filename.html">), some do not. I also have a tab-
delimited text file with—among other things—an ID, title, and filename..

What I need to do is create a script that will:

1. Search all of the HTML files in a directory for anchor tags
2. Strip out the file name from the href attribute
3. Use the file name to look up the correlating ID in the lookup file
4. Replace the contents of the href attribute with the ID

Being new to Ruby and command-line scripting, I'm not sure where to
begin looking for examples of how to do this. Any help is appreciated.

Obviously your goal is to this processing. But are you hoping to use
this to learn Ruby? If so, this is a nice-sized project that will
help you to learn the language. Here are some pointers to help you
figure out where to look or start with certain aspects of the project
(the numbers match up with your numbers above):

1. To get a list of all of the HTML files in a given directory, you
can use Dir.glob.

2. To parse an HTML file you can use the hpricot gem. Alternatively,
you could open the file and use regular expressions.

3. To have read your tab-delimited file at the start of the program,
you can use the CSV class in the standard library or the fastercsv
gem. You can put the data into a hash where the file name is the key
and the ID is the value. Lookup becomes trivial then.

4. Depending on whether you're using hpricot or regular expressions
will determine how you do this. If you're using regular expressions,
you might want to do a gsub! call with a block that would allow you to
do your lookup and replacement.

Some relevant information sources:

You should have one of the Ruby books to help you with basic syntax
and all that. They will also help you with regular expressions,
hashes, and file I/O.

Documentation on File (and IO), Dir, CSV, Regexp, and Hash, you can
use:

http://ruby-doc.org/core/

For hpricot:

http://code.whytheluckystiff.net/hpricot/

For fastercsv:

http://fastercsv.rubyforge.org/

I hope that's helpful,

Eric

====

LearnRuby.com offers Rails & Ruby HANDS-ON public & ON-SITE
workshops.
Ruby Fundamentals Wkshp June 16-18 Ann Arbor, Mich.
Ready for Rails Ruby Wkshp June 23-24 Ann Arbor, Mich.
Ruby on Rails Wkshp June 25-27 Ann Arbor, Mich.
Ruby Plus Rails Combo Wkshp June 23-27 Ann Arbor, Mich
Please visit http://LearnRuby.com for all the details.

David Masover · May 29, 2008

On May 28, 6:18=A0pm, Andrew Porter <[email protected]> wrote:

2. To parse an HTML file you can use the hpricot gem. Alternatively,
you could open the file and use regular expressions.

I'd suggest hpricot or REXML if the files are reasonably well-formed and/or=
=20
XML-ish, and regex if they're not.

Andrew Porter · May 29, 2008

Thanks, Eric. These are excellent tips.

Obviously your goal is to this processing. But are you hoping to use
this to learn Ruby? If so, this is a nice-sized project that will
help you to learn the language. Here are some pointers to help you
figure out where to look or start with certain aspects of the project
(the numbers match up with your numbers above):

1. To get a list of all of the HTML files in a given directory, you
can use Dir.glob.

2. To parse an HTML file you can use the hpricot gem. Alternatively,
you could open the file and use regular expressions.

3. To have read your tab-delimited file at the start of the program,
you can use the CSV class in the standard library or the fastercsv
gem. You can put the data into a hash where the file name is the key
and the ID is the value. Lookup becomes trivial then.

4. Depending on whether you're using hpricot or regular expressions
will determine how you do this. If you're using regular expressions,
you might want to do a gsub! call with a block that would allow you to
do your lookup and replacement.

Some relevant information sources:

You should have one of the Ruby books to help you with basic syntax
and all that. They will also help you with regular expressions,
hashes, and file I/O.

Documentation on File (and IO), Dir, CSV, Regexp, and Hash, you can
use:

http://ruby-doc.org/core/

For hpricot:

http://code.whytheluckystiff.net/hpricot/

For fastercsv:

http://fastercsv.rubyforge.org/

I hope that's helpful,

Eric

=3D=3D=3D=3D

LearnRuby.com offers Rails & Ruby HANDS-ON public & ON-SITE
workshops.
Ruby Fundamentals Wkshp June 16-18 Ann Arbor, Mich.
Ready for Rails Ruby Wkshp June 23-24 Ann Arbor, Mich.
Ruby on Rails Wkshp June 25-27 Ann Arbor, Mich.
Ruby Plus Rails Combo Wkshp June 23-27 Ann Arbor, Mich
Please visit http://LearnRuby.com for all the details.

Add recipes using JavaScript in table	20	Apr 17, 2023
Only one table shows up with the information	2	Mar 29, 2023
How to loop in folder through all excel files and all sheets using pandas?	0	Dec 1, 2022
How can I create a table using the input element?	1	Mar 25, 2022
Survey details won't go through using php, ajax, Mysql	0	Oct 26, 2023
Batch Convert HTML to UTF-8 Files	2	Oct 2, 2023
How to create a JSON array with values from DOM(HTML TABLE) when I click a button using JQuery/Javascript?	0	May 1, 2023
I need help in understanding these files on my phone, Could someone help me understand these files? Urgent help needed. Please help.	1	Jun 4, 2023

Find/Replace In Files Using Lookup Table

Andrew Porter

Eric I.

David Masover

Andrew Porter

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads