interacting with an updatedb generated data file within python

B

birdsong

Does anybody have any recommendations on how to interact with the data
file that updatedb generates? I'm running through a file list in
sqlite that I want to check against the file system. updatedb is
pretty optimized for building an index and storing it, but I see no
way to query the db file other than calling locate itself. This would
require me to fork and exec for every single file I want to verify -
I'd be better off doing the stat myself in that case, but I'd really
rather let updatedb build the index for me.

I searched high and low for any sort of library that is well suited
for reading these data files, but I've found nothing for any language
other than the source for locate and updatedb itself.
 
J

John Machin

Does anybody have any recommendations on how to interact with the data
file that updatedb generates?  I'm running through a file list in
sqlite that I want to check against the file system. updatedb is
pretty optimized for building an index and storing it, but I see no
way to query the db file other than calling locate itself.  This would
require me to fork and exec for every single file I want to verify -
I'd be better off doing the stat myself in that case, but I'd really
rather let updatedb build the index for me.

I searched high and low for any sort of library that is well suited
for reading these data files, but I've found nothing for any language
other than the source for locate and updatedb itself.

Disclaimer: I had to google to find out what "updatedb" is so don't
take me as any authority on this :)

The format appears to be documented e.g.
http://www.delorie.com/gnu/docs/findutils/locatedb.5.html
and thus should be found on the locatedb(5) man page on your system.

Assuming that you don't have the old version, it should take about 20
lines of Python to loop around extracting the file names, plus some
more to open the file, read it in as one big string (how big is it?),
and check the dummy "LOCATE02" entry up the front -- it's a bit hard
to be sure how the prefix length of the first non-dummy entry is
determined without seeing an actual example, but my guess is that the
file will start like this:

"\x00LOCATE02\x00\xF8name-of-first-file-in-full\x00........."
where the "\xF8" is -8 meaning ignore the 8-character previous name
"LOCATE02" i.e. previous name can be regarded as "".

Anyway, I reckon utter max 50 lines of Python to produce a module with
a generator that yields one file name at a time, or a function
returning a list or a set.

HTH ... feel free to ask more if the above is a little obscure. But do
accompany any questions with the result of doing this:
print repr(open('the_locatedb_file').read(400))
plus what you believe the full name of the first non-dummy file should
be.

Cheers,
John
 
J

John Machin

The format appears to be documented e.g.http://www.delorie.com/gnu/docs/findutils/locatedb.5.html
and thus should be found on the locatedb(5) man page on your system.

More comprehensive:
http://www.gnu.org/software/finduti...d_html/Database-Formats.html#Database-Formats
Assuming that you don't have the old version, it should take about 20
lines of Python to loop around extracting the file names, plus some
more to open the file, read it in as one big string (how big is it?),
and check the dummy "LOCATE02" entry up the front -- it's a bit hard
to be sure how the prefix length of the first non-dummy entry is
determined without seeing an actual example, but my guess is that the
file will start like this:

"\x00LOCATE02\x00\xF8name-of-first-file-in-full\x00........."
where the "\xF8" is -8 meaning ignore the 8-character previous name
"LOCATE02" i.e. previous name can be regarded as "".

After noticing there was in fact an example, make that:
"\x00LOCATE02\x00\x00name-of-first-file-in-full\x00........."

i.e. you can assert that buffer[:10] == "\x00LOCATE02\x00" and start
the loop from offset 10 with the previous name set to "".

Cheers,
John
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,062
Latest member
OrderKetozenseACV

Latest Threads

Top