How to bypass Windows 'cooking' the I/O? (One more time, please) II

N

norseman

I know I saw the answer recently, as in since February '08, but I can't
re-find it. :( I tried the mail archives and such and my own
collections but the piece I saw still eludes me.


Problem: (sos=same old s...) Microsoft insists the world work it's way
even when the Microsoft way was proven wrong decades ago. In this case
it's (still) 'cooking' the writes even with 'rwb' and O_RDWR|O_BINARY in
(proper respective) use.

Specific: python created and inspected binary file ends:
00460: 0D 1A (this is correct)

after a write
os.lseek(target, -1, 2)
os.write(target,record)
the expected result would be:
00460: 0D 20 .....data bytes.... 1A

BUT I get:
00460: 20 .... data bytes... 1A

It is one byte off!!! And the 0D has to be there. Signifies the end of
the header.


Same python program runs as expected in Linux. Maybe because that's
where it was written?! :)


What I seek is the way to slap Microsoft up side the head and make it
work correctly. OK, well, at least in this situation.


Note: Things like this justify Python implementers bypassing OS calls
(data fetch, data write) and using the BIOS direct. Remember, the CPU
understands bit patterns only. It has no comprehension of 'text',
'program', 'number', 'pointer', blah blah blah.... All that is totally
beyond it's understanding. A given bit pattern means 'do that'. The CPU
is 100% binary. Memory, storage and the rest is just bits-on, bits-off.
Patterns. Proper binary I/O is mandatory for the machine to function.


Anyway - if whoever mentioned the flags and such to 'over ride'
Microsoft's BS would re-send that piece I would be very appreciative.


Steve
(e-mail address removed)
==================================================================
Above is original request. I assume the answer I seek is by someone not
receiving the list currently. I'm on and off myself so I can understand.

I received two replies the the original request. Both insist on trying
to use TEXT modes to do BINARY work. Allow me to explain before hooting
and hollering that I'm all wet.

Yes - 'rwb' is a typo. The compiler catches it every time I use it.
READ as in read only,
WRITE as in write only
but NOT READ/WRITE in syntax when using STREAM I/O.
One is to use r+ for rw and w+ for wr and tack on the b to eliminate
the default of 'cooking' the data.

File.verb is defined as being --- well a STREAM I/O handler. The docs
insist on calling it a file descriptor somehow. Or at least that is the
way it reads. Larry's 'fp' (file pointer) is a correct short form.

Let's start with Larry:
> You may be the victim of buffering (not calling .flush() or .close()
> to commit your write to disk). Why aren't you using the file object
> to do you seek and write?

STREAM things take a great deal of overhead. I'm just reading,
re-arranging and writing. (supposedly) Simple buffer stuff.
>
> Normal file I/O sequence:
>
> fp = open(target, 'wb')
>
> fp.seek(-1, 2)
>
> fp.write(record)
>

Except it doesn't do that in Windows. See below.
> by going through os. methods instead of the file instance I think you
> are accessing the file through 2 different I/O buffers. I could be
> all wrong here.
Nope (on 2 diff...) and You are but I can see where you might think
as you do.



Tim is my next victim:
>>> f = open('x.bat','r+b')
>>> >>> s = f.read()
>>> >>> s 'sed -e "s/[ \\t]*$//" -e "/^$/d" %1\rhow about that\r\n'
>>> >>> f.seek(-1,2)
>>> >>> f.write('xxx\r\n')
>>> >>> f.close()
>>> >>> f = open('x.bat','rb')
>>> >>> t = f.read()
>>> >>> t 'sed -e "s/[ \\t]*$//" -e "/^$/d" %1\rhow about that\rxxx\r\n'
>>> >>>

If you put that in a binary file, that file will never work again.
If you don't believe me, try it on ntldr and see what happens.
(Tim - please don't. Your Window$ system won't boot if you do.
Originally ALL STREAM I/O was supposed to 'cook' the stream for 'text'
transmission. It has evolved, BUT.... legacy tends to linger.)


This is the 'see below' place:

concerning seek and write:
let: 0123456789 be contents of a file. (offsets zero through nine)
File size will be reported as 10 bytes
If you seek to EOF and append ABC it will move to OFFSET 10 and put in
ABC with a result of 0123456789ABC and report a file size of 13.
With offsets it is simple math, quick and efficient.
So move filesize minus 3 shifts to byte 10 which has a content of 9
This is correct and this is what Python on Linux does.
(How many just got lost? Remember, byte 10 is offset 9)
(or offset plus 1 is byte count)

using ...seek(-1,2)
(be it: os.lseek, file.seek, whatever - don't get dum on me)

What Microsoft does is go to EOF and IF it is a hex-1A preceded by a
hex-0D it then backs over the hex-0D also. If the last character of the
file is a hex-1A and no EOL precedes it, it starts next write on top of
hex-1A as it was told. Result: file is shifted left one byte.

using ...seek(0,2) (os.lseek, file.seek, whatever - staaayy....)

What Microsoft does is go to EOF and IF it is a hex-1A preceded by a
line terminator hex-0D it then backs over the hex-1A and starts writing
there. If the last character of the file is a hex-1A and no EOL precedes
it, it starts next right after the hex-1A. Leaving the 1A in the file.
Result: file data shifts one byte right for each record appended having
a hex-1A terminator unless the 1A was preceded by a hex-0D. Fun Huh?

BOTH CASES occur whichever I/O system is chosen and used.
BOTH CASES are unacceptable behavior when the 'b' is added to the file
use mode. Once in binary there is supposed to be no bullshit.

In the mid 1960's I was updating code written in the early 1950's and my
0-9+ABC thing above was rule then too. Over half a century of legacy.

Who's got the 'two by four' to pound Microsoft's nonsense switch into
the off position?



OK - the hooting and hollering that I'm all wet can start. :)


Steve
(e-mail address removed)
 
I

Iain King

        I wouldn't expect that sequence to work on any system... The "w"
implies "create new file, or truncate existing file to 0-bytes, then
write data to it" -- with no seeking permitted. You must include the "+"
to do seeking, and if you want to retain the existing file contents you
probably need to open with "a+" ("a" for append).

        The rest of your situation I won't touch. Other than to wonder why
the situation hasn't hit any of the various database servers which must
be operating in binary mode, and perform lots of seeking... Surely
somewhere out someone else must have encountered a seek crossing an
apparent <cr><eof> mark (which isn't a normal Windows sequence anyway --
since Windows uses <cr><lf> for EOL, I'd have expected to see a problem
if backing over a <cr><lf><eof>)
--
        Wulfraed        Dennis Lee Bieber               KD6MOG
        (e-mail address removed)              (e-mail address removed)
                HTTP://wlfraed.home.netcom.com/
        (Bestiaria Support Staff:               (e-mail address removed))
                HTTP://www.bestiaria.com/


lol @ op not finding the answer to his question in the archives, then
being answered again by someone who doesn't let his answer go in the
archive. How useful.
 
N

norseman

Dennis said:
I wouldn't expect that sequence to work on any system... The "w"
implies "create new file, or truncate existing file to 0-bytes, then
write data to it" -- with no seeking permitted. You must include the "+"
to do seeking, and if you want to retain the existing file contents you
probably need to open with "a+" ("a" for append).

The rest of your situation I won't touch. Other than to wonder why
the situation hasn't hit any of the various database servers which must
be operating in binary mode, and perform lots of seeking... Surely
somewhere out someone else must have encountered a seek crossing an
apparent <cr><eof> mark (which isn't a normal Windows sequence anyway --
since Windows uses <cr><lf> for EOL, I'd have expected to see a problem
if backing over a <cr><lf><eof>)
=============================================
"I wouldn't expect..." ABSOLUTELY CORRECT. No append because the hex-1A
has to be overwritten. (use r+b) There can be only one of those and it
has to be the last byte of the file. The hex-0D at the beginning of a 32
BYTE segment signifies end of structure definition. The hex-1A double
checks the record count. (standard Ashton-Tate dBASE file)
If someone wants to check it out, appending the hex-1A to each record
and backing up one byte on each write reduces coding complexity and
machine cycles considerably.

"The rest of..." I have seen the answer posted but can't find it. I'm
hoping someone has it, sees this and posts the original solution again.
Or knows how to set things to bypass the nonsense and posts that.

Steve (e-mail address removed)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top