Search algorithm for HTML files

L

Linda

Can someone point me to an open source search algorithm that would
search through HTML files in a directory and any subdirectories? The
results should not return anything that is within the tags. For
instance, if someone is searching for "blue" then it should not return
the pages that only have the string inside a tag like <div
class="blue"> but should return pages that had "blue" between tags
i.e. <div>blue</div> or <span>blue</span>.

This search feature will be used in a native iPhone/iPad app to search
through eBooks that are loaded within an app on an iOS device.

I can adapt the search engine to work on the device but I don't see
any need in reinventing the wheel when there should be many algorithms
already written to accomplish this. I have failed though in a Google
search to find one.

Thanks,
Linda
 
L

Linda

Google, or any other search engine, often fails to find answers to complex
homework questions.

If you mean that I should do my homework or searching on my own, that
is what I am trying to achieve. If you mean that this is truly a
"homework" question for a school project, it is definitely not.

Perhaps you should try aiming low, and taking it one step a time. First,
figure out how to read the contents of a directory, or how to recursively
scan a directory tree.

I already have a search written that works in a directory and
subdirectories. The problem is that it returns the path to pages that
also include the search string within tags.

So, I have already aimed low and taken the first steps. In fact, this
would be the last step. Unless of course, I start all over again and
use existing code. Objective C on the iPhone allows for use of C/C++.

If you know of a search algorithm written in C++ that is open source,
then I would appreciate a link.

Linda
 
J

Juha Nieminen

Christian Hackl said:
Linda ha scritto:
Can someone point me to an open source search algorithm that would
search through HTML files in a directory and any subdirectories? [...]

You can use the Boost Filesystem library for that. It has a very compact
and simple interface. Iterating through all *.html or *.htm files in a
directory and its subdirectories should be a matter of a few lines of code.
This search feature will be used in a native iPhone/iPad app to search
through eBooks that are loaded within an app on an iOS device.

Don't know if iPhone/iPad are supported, though. 0 experience with
those. I don't even own one of them :)

You would use this:
http://developer.apple.com/library/.../NSFileManager_Class/Reference/Reference.html
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,057
Latest member
KetoBeezACVGummies

Latest Threads

Top