Tie::File question

R

robic0

Read through the docs on this module.
My question is about contraction and expansion of the file.
It is stated that the entire file is not read into memory
and I can understand that. Its also stated that changes made
to the record array happen instantaneous to the file.
To adjust the file is it using a file realloc primitive or
am I missing something? I understand mild buffereing is done.
Is it a file realloc acting on the fat or am I missing something.
If so it must be an api, its not adjusting fat, like from a driver...
 
U

Uri Guttman

"r" == robic0 <robic0> writes:

r> Read through the docs on this module.
r> My question is about contraction and expansion of the file.
r> It is stated that the entire file is not read into memory
r> and I can understand that. Its also stated that changes made
r> to the record array happen instantaneous to the file.
r> To adjust the file is it using a file realloc primitive or
r> am I missing something? I understand mild buffereing is done.
r> Is it a file realloc acting on the fat or am I missing something.
r> If so it must be an api, its not adjusting fat, like from a driver...

yes, you are misunderstanding many things.

go figure it out for yourself. i will watch the accident in progress.

your asking for help here is so full of chutzpah.

uri
 
J

Joe Smith

robic0 said:
Read through the docs on this module.

Yes, but did you read the source code of Tie/File.pm ?
My question is about contraction and expansion of the file.
It is stated that the entire file is not read into memory
and I can understand that. Its also stated that changes made
to the record array happen instantaneous to the file.
To adjust the file is it using a file realloc primitive or
am I missing something? I understand mild buffereing is done.
Is it a file realloc acting on the fat or am I missing something.

The module works on file systems that do not use a FAT, so
your assumption is completely off base.
If so it must be an api, its not adjusting fat, like from a driver...

Not even close. Modification is done at the file level, not at
the device driver level. Do you not know how truncate() works?

# Truncate the file at the current position
sub _chop_file {
my $self = shift;
truncate $self->{fh}, tell($self->{fh});
}
 
R

robic0

Read through the docs on this module.
My question is about contraction and expansion of the file.
It is stated that the entire file is not read into memory
and I can understand that. Its also stated that changes made
to the record array happen instantaneous to the file.
To adjust the file is it using a file realloc primitive or
am I missing something? I understand mild buffereing is done.
Is it a file realloc acting on the fat or am I missing something.
If so it must be an api, its not adjusting fat, like from a driver...

Originally I had a simple question if the file was rewritten after each change.
Have my own answer now. Here are my observations:

Tie::File is simply a *toy* to be passed over.
This validates everything I suspected.

1. Uses level 1 i/o to manage a custom level 2 scheme.
2. Expansion (insertion) / Contraction (deletion) requires rewrite
of all data from mod record to end of file if file size increases/decreases.
3. The phrase "The file is not loaded into memory" is a misnomer. If 1 byte is deleted/added
at the beginning of the file, the entire file will pass through memory in a caching scheme.
4. Read/Write caches have large memory overhead, tables must be constantly maintained for
record separators.
5. Deferred writing uses all available memory by default.
6. Operations on large file is useless, a 20 gigabyte file will probably crash the machine
leaving a corupt file system. I think the quote was "gigantic files".
7. Caching is useless for random operations on large file's. The overhead maintainance alone
with deferred writing would lock up the process/machine and/or seriously slow it down with file swaps.
8. Tie::File is just a novelty. There is nothing serious about it, like views and sharing. No management.
Its just a simple level 2 buffered i/o scheme that maps lines of a file, with separator, as elements in an array.
The word 'record' used is a misnomer, there are no fields. It could be a 1 field record, but when is that necessary.

Have a nice day!


(some quotes from Tie::File docs)

DESCRIPTION
================
The file is not loaded into memory, so this will work even for gigantic files.

Changes to the array are reflected in the file immediately.



CAVEATS
=============
(That's Latin for 'warnings'.)

- Reasonable effort was made to make this module efficient. Nevertheless, changing the size of a record in the middle of a large file will always be fairly slow, because everything after the new
record must be moved.

-There is a large memory overhead for each record offset and for each cache entry: about 310 bytes per cached data record, and about 21 bytes per offset table entry.
The per-record overhead will limit the maximum number of records you can access per file. Note that accessing the length of the array via $x = scalar @tied_file accesses all records and stores their
offsets. The same for foreach (@tied_file), even if you exit the loop early.


Deferred Writing
======================
(This is an advanced feature. Skip this section on first reading.)

Normally, modifying a Tie::File array writes to the underlying file immediately. Every assignment like $a[3] = ... rewrites as much of the file as is necessary; typically, everything from line 3
through the end will need to be rewritten. This is the simplest and most transparent behavior. Performance even for large files is reasonably good.

However, under some circumstances, this behavior may be excessively slow. For example, suppose you have a million-record file, and you want to do:

for (@FILE) {
$_ = "> $_";
}The first time through the loop, you will rewrite the entire file, from line 0 through the end. The second time through the loop, you will rewrite the entire file from line 1 through the end.
The third time through the loop, you will rewrite the entire file from line 2 to the end. And so on.

If the performance in such cases is unacceptable, you may defer the actual writing, and then have it done all at once. The following loop will perform much better for large files:

(tied @a)->defer;
for (@a) {
$_ = "> $_";
}
(tied @a)->flush;If Tie::File's memory limit is large enough, all the writing will done in memory. Then, when you call ->flush, the entire file will be rewritten in a single pass.

(Actually, the preceding discussion is something of a fib. You don't need to enable deferred writing to get good performance for this common case, because Tie::File will do it for you automatically
unless you specifically tell it not to. See autodeferring, below.)


dw_size
=================
(This is an advanced feature. Skip this section on first reading.)

If you use deferred writing (See Deferred Writing, below) then data you write into the array will not be written directly to the file; instead, it will be saved in the deferred write buffer to be
written out later. Data in the deferred write buffer is also charged against the memory limit you set with the memory option.

You may set the dw_size option to limit the amount of data that can be saved in the deferred write buffer. This limit may not exceed the total memory limit. For example, if you set dw_size to 1000 and
memory to 2500, that means that no more than 1000 bytes of deferred writes will be saved up. The space available for the read cache will vary, but it will always be at least 1500 bytes (if the
deferred write buffer is full) and it could grow as large as 2500 bytes (if the deferred write buffer is empty.)

If you don't specify a dw_size, it defaults to the entire memory limit.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,578
Members
45,052
Latest member
LucyCarper

Latest Threads

Top