python file API

Discussion in 'Python' started by zipher, Sep 24, 2012.

  1. zipher

    zipher Guest

    For some time now, I've wanted to suggest a better abstraction for the <file> type in Python. It currently uses an antiquated C-style interface for moving around in a file, with methods like tell() and seek(). But after attributes were introduced to Python, it seems it should be re-addressed.

    Let file-type have an attribute .pos for position. Now you can get rid ofthe seek() and tell() methods and manipulate the file pointer more easily with standard arithmetic operations.

    >>> file.pos = x0ae1 #move file pointer to an absolute address
    >>> file.pos +=1 #increment the file pointer one byte
    >>> curr_pos = file.pos #read current file pointer


    You've now simplified the API by the removal of two obscure legacy methods and replaced them with a more basic one called "position".

    Thoughts?

    markj
     
    zipher, Sep 24, 2012
    #1
    1. Advertising

  2. zipher

    Dave Angel Guest

    On 09/24/2012 05:35 PM, zipher wrote:
    > For some time now, I've wanted to suggest a better abstraction for the <file> type in Python. It currently uses an antiquated C-style interface for moving around in a file, with methods like tell() and seek(). But after attributes were introduced to Python, it seems it should be re-addressed.
    >
    > Let file-type have an attribute .pos for position. Now you can get rid of the seek() and tell() methods and manipulate the file pointer more easily with standard arithmetic operations.
    >
    >>>> file.pos = x0ae1 #move file pointer to an absolute address
    >>>> file.pos +=1 #increment the file pointer one byte
    >>>> curr_pos = file.pos #read current file pointer

    > You've now simplified the API by the removal of two obscure legacy methods and replaced them with a more basic one called "position".
    >
    > Thoughts?
    >
    > markj


    And what approach would you use for positioning relative to
    end-of-file? That's currently done with an optional second parameter to
    seek() method.




    --

    DaveA
     
    Dave Angel, Sep 24, 2012
    #2
    1. Advertising

  3. zipher

    Chris Kaynor Guest

    On Mon, Sep 24, 2012 at 2:49 PM, Dave Angel <> wrote:
    >
    > And what approach would you use for positioning relative to
    > end-of-file? That's currently done with an optional second parameter to
    > seek() method.
    >


    I'm not advocating for or against the idea, but that could be handled
    the same way indexing into lists can index relative to the end:
    negative indices.
     
    Chris Kaynor, Sep 24, 2012
    #3
  4. On Tue, Sep 25, 2012 at 7:49 AM, Dave Angel <> wrote:
    > On 09/24/2012 05:35 PM, zipher wrote:
    >> Let file-type have an attribute .pos for position. Now you can get rid of the seek() and tell() methods and manipulate the file pointer more easily with standard arithmetic operations.
    >>
    >>>>> file.pos = x0ae1 #move file pointer to an absolute address
    >>>>> file.pos +=1 #increment the file pointer one byte
    >>>>> curr_pos = file.pos #read current file pointer

    >
    > And what approach would you use for positioning relative to
    > end-of-file? That's currently done with an optional second parameter to
    > seek() method.


    Presumably the same way you reference a list element relative to
    end-of-list: negative numbers. However, this starts to feel like magic
    rather than attribute assignment - it's like manipulating the DOM in
    JavaScript, you set an attribute and stuff happens. Sure it's legal,
    but is it right? Also, it makes bounds checking awkward:

    file.pos = 42 # Okay, you're at position 42
    file.pos -= 10 # That should put you at position 32
    foo = file.pos # Presumably foo is the integer 32
    file.pos -= 100 # What should this do?
    foo -= 100 # But this sets foo to the integer -68
    file.pos = foo # And this would set the file pointer 68 bytes from end-of-file.

    I don't see it making sense for "file.pos -= 100" to suddenly put you
    near the end of the file; it should either cap and put you at position
    0, or do what file.seek(-100,1) would do and throw an exception. But
    doing the exact same operation on a saved snapshot of the position and
    reassigning it would then have quite different semantics in an unusual
    case, while still appearing identical in the normal case.

    ChrisA
     
    Chris Angelico, Sep 24, 2012
    #4
  5. zipher

    Dave Angel Guest

    (forwarding to the list)

    On 09/24/2012 06:23 PM, Mark Adam wrote:
    > On Mon, Sep 24, 2012 at 4:49 PM, Dave Angel <> wrote:
    >> On 09/24/2012 05:35 PM, zipher wrote:
    >>> For some time now, I've wanted to suggest a better abstraction for the <file> type in Python. It currently uses an antiquated C-style interface for moving around in a file, with methods like tell() and seek(). But after attributes were introduced to Python, it seems it should be re-addressed.
    >>>
    >>> Let file-type have an attribute .pos for position. Now you can get rid of the seek() and tell() methods and manipulate the file pointer more easily with standard arithmetic operations.
    >>>
    >>>>>> file.pos = x0ae1 #move file pointer to an absolute address
    >>>>>> file.pos +=1 #increment the file pointer one byte
    >>>>>> curr_pos = file.pos #read current file pointer

    >
    >> And what approach would you use for positioning relative to
    >> end-of-file? That's currently done with an optional second parameter to
    >> seek() method.

    >
    > As size is an oft-useful construct, let it (like .name) be part of the
    > descriptor. Then
    >
    >>>> file.pos = file.size - 80 #80 chars from end-of-file

    >
    > (Or, one could make slices part of the API...)
    >
    > mark
    >


    Well, if one of the goals was to reduce the number of attributes, we're
    now back to the original number of them.

    --

    DaveA
     
    Dave Angel, Sep 24, 2012
    #5
  6. zipher

    zipher Guest

    You raise a valid point: that by abstracting the file pointer into a position attribute you risk "de-coupling" the conceptual link between the underlying file and your abstraction in the python interpreter, but I think the programmer can take responsibility for maintaining the abstraction.

    The key possible fault will be whether you can trap (OS-level) exceptions when assigning to the pos attribute beyond the bounds of the actual file on the system...

    markj
     
    zipher, Sep 24, 2012
    #6
  7. zipher

    zipher Guest

    You raise a valid point: that by abstracting the file pointer into a position attribute you risk "de-coupling" the conceptual link between the underlying file and your abstraction in the python interpreter, but I think the programmer can take responsibility for maintaining the abstraction.

    The key possible fault will be whether you can trap (OS-level) exceptions when assigning to the pos attribute beyond the bounds of the actual file on the system...

    markj
     
    zipher, Sep 24, 2012
    #7
  8. zipher

    Ian Kelly Guest

    On Mon, Sep 24, 2012 at 4:14 PM, Chris Angelico <> wrote:
    > file.pos = 42 # Okay, you're at position 42
    > file.pos -= 10 # That should put you at position 32
    > foo = file.pos # Presumably foo is the integer 32
    > file.pos -= 100 # What should this do?


    Since ints are immutable, the language specifies that it should be the
    equivalent of "file.pos = file.pos - 100", so it should set the file
    pointer to 68 bytes before EOF.

    > foo -= 100 # But this sets foo to the integer -68
    > file.pos = foo # And this would set the file pointer 68 bytes from end-of-file.


    Which is the same result.

    > I don't see it making sense for "file.pos -= 100" to suddenly put you
    > near the end of the file; it should either cap and put you at position
    > 0, or do what file.seek(-100,1) would do and throw an exception.


    I agree, but the language doesn't allow those semantics.

    Also, what about the use of `f.seek(0, os.SEEK_END)` to seek to EOF?
    I'm not certain what the use cases are, but a quick google reveals
    that this does happen in real code. If a pos of 0 means BOF, and a
    pos of -1 means 1 byte before EOF, then how do you seek to EOF without
    knowing the file length?
     
    Ian Kelly, Sep 24, 2012
    #8
  9. On Tue, Sep 25, 2012 at 8:37 AM, Ian Kelly <> wrote:
    > On Mon, Sep 24, 2012 at 4:14 PM, Chris Angelico <> wrote:
    >> file.pos = 42 # Okay, you're at position 42
    >> file.pos -= 10 # That should put you at position 32
    >> foo = file.pos # Presumably foo is the integer 32
    >> file.pos -= 100 # What should this do?

    >
    > Since ints are immutable, the language specifies that it should be the
    > equivalent of "file.pos = file.pos - 100", so it should set the file
    > pointer to 68 bytes before EOF.


    Oh, I forgot that guaranteed equivalency. Well, at least it removes
    the ambiguity. I don't like it though.

    ChrisA
     
    Chris Angelico, Sep 24, 2012
    #9
  10. On 24/09/2012 22:35, zipher wrote:
    > For some time now, I've wanted to suggest a better abstraction for the <file> type in Python. It currently uses an antiquated C-style interface for moving around in a file, with methods like tell() and seek(). But after attributes were introduced to Python, it seems it should be re-addressed.
    >
    > Let file-type have an attribute .pos for position. Now you can get rid of the seek() and tell() methods and manipulate the file pointer more easily with standard arithmetic operations.
    >
    >>>> file.pos = x0ae1 #move file pointer to an absolute address
    >>>> file.pos +=1 #increment the file pointer one byte
    >>>> curr_pos = file.pos #read current file pointer

    >
    > You've now simplified the API by the removal of two obscure legacy methods and replaced them with a more basic one called "position".
    >
    > Thoughts?
    >
    > markj
    >


    This strikes me as being a case of if it ain't broke don't fix it.

    --
    Cheers.

    Mark Lawrence.
     
    Mark Lawrence, Sep 25, 2012
    #10
  11. zipher

    Chris Kaynor Guest

    On Mon, Sep 24, 2012 at 3:37 PM, Ian Kelly <> wrote:
    > On Mon, Sep 24, 2012 at 4:14 PM, Chris Angelico <> wrote:
    >> file.pos = 42 # Okay, you're at position 42
    >> file.pos -= 10 # That should put you at position 32
    >> foo = file.pos # Presumably foo is the integer 32
    >> file.pos -= 100 # What should this do?

    >
    > Since ints are immutable, the language specifies that it should be the
    > equivalent of "file.pos = file.pos - 100", so it should set the file
    > pointer to 68 bytes before EOF.


    There is no reason that it has to be an int object, however. It could
    well return a "FilePosition" object which does not allow subtraction
    to produce a negative result. Not saying its a good idea... Similarly,
    it could be a more complex object with properties on it to determine
    whether to seek from beginning or end.
     
    Chris Kaynor, Sep 25, 2012
    #11
  12. On Tue, 25 Sep 2012 08:14:01 +1000, Chris Angelico wrote:

    > Presumably the same way you reference a list element relative to
    > end-of-list: negative numbers. However, this starts to feel like magic
    > rather than attribute assignment - it's like manipulating the DOM in
    > JavaScript, you set an attribute and stuff happens. Sure it's legal, but
    > is it right? Also, it makes bounds checking awkward:
    >
    > file.pos = 42 # Okay, you're at position 42
    > file.pos -= 10 # That should put you at position 32
    > foo = file.pos # Presumably foo is the integer 32
    > file.pos -= 100 # What should this do?
    > foo -= 100 # But this sets foo to the integer -68
    > file.pos = foo # And this would set the file pointer 68 bytes
    > from end-of-file.
    >
    > I don't see it making sense for "file.pos -= 100" to suddenly put you
    > near the end of the file; it should either cap and put you at position
    > 0, or do what file.seek(-100,1) would do and throw an exception.


    I would expect it to throw an exception, like file.seek and like list
    indexing.

    > But
    > doing the exact same operation on a saved snapshot of the position and
    > reassigning it would then have quite different semantics in an unusual
    > case, while still appearing identical in the normal case.


    But this applies equally to file.seek and list indexing today. In neither
    case can you perform your own index operations outside of the file/list
    and expect to get the same result, for the simple and obvious reason that
    arithmetic doesn't perform the same bounds checking as actual seeking and
    indexing.

    --
    Steven
     
    Steven D'Aprano, Sep 25, 2012
    #12
  13. On Mon, 24 Sep 2012 15:36:20 -0700, zipher wrote:

    > You raise a valid point: that by abstracting the file pointer into a
    > position attribute you risk "de-coupling" the conceptual link between
    > the underlying file and your abstraction in the python interpreter


    I don't think this argument holds water. With the ease of writing
    attributes, it is more likely that people will perform file position
    operations directly on file.pos rather than decoupling it into a
    variable. Decoupling is more likely with file.seek, because it is so much
    more verbose to use, and you get exactly the same lack of bounds checking:

    py> f = open("junk", "w") # make a sample file
    py> f.write("abcd\n")
    py> f.close()
    py> f = open("junk") # now do decoupled seek operations
    py> p = f.tell()
    py> p += 2000
    py> p -= 4000
    py> p += 2
    py> p += 2000
    py> f.seek(p)
    py> f.read(1)
    'c'


    But really, who does such a sequence of arithmetic operations on the file
    pointer without intervening reads or writes? We're arguing about
    something that almost never happens.

    By the way, the implementation of this is probably trivial in Python 2.x.
    Untested:

    class MyFile(file):
    @property
    def pos(self):
    return self.tell()
    @pos.setter
    def pos(self, p):
    if p < 0:
    self.seek(p, 2)
    else:
    self.seek(p)

    You could even use a magic sentinel to mean "see to EOF", say, None.

    if p is None:
    self.seek(0, 2)

    although I don't know if I like that.



    --
    Steven
     
    Steven D'Aprano, Sep 25, 2012
    #13
  14. zipher

    Mark Adam Guest

    On Mon, Sep 24, 2012 at 5:55 PM, Oscar Benjamin
    <> wrote:
    > There are many situations where a little bit of attribute access magic is a
    > good thing. However, operations that involve the underlying OS and that are
    > prone to raising exceptions even in bug free code should not be performed
    > implicitly like this. I find the following a little cryptic:
    > try:
    > f.pos = 256
    > except IOError:
    > print('Unseekable file')


    Well it might be that the coupling between the python interpreter and
    the operating system should be more direct and there should be a
    special exception class that bypasses the normal overhead in the
    CPython implementation so that error can be caught in the code without
    breaking syntax. But I don't think I'm ready to argue that point....

    markj
     
    Mark Adam, Sep 25, 2012
    #14
  15. Am 25.09.2012 04:28 schrieb Steven D'Aprano:

    > By the way, the implementation of this is probably trivial in Python 2.x.
    > Untested:
    >
    > class MyFile(file):
    > @property
    > def pos(self):
    > return self.tell()
    > @pos.setter
    > def pos(self, p):
    > if p< 0:
    > self.seek(p, 2)
    > else:
    > self.seek(p)
    >
    > You could even use a magic sentinel to mean "see to EOF", say, None.
    >
    > if p is None:
    > self.seek(0, 2)
    >
    > although I don't know if I like that.


    The whole concept is incomplete at one place: self.seek(10, 2) seeks
    beyond EOF, potentially creating a sparse file. This is a thing you
    cannot achieve.

    But the idea is great. I'd suggest to have another property:

    [...]
    @pos.setter
    def pos(self, p):
    self.seek(p)
    @property
    def eofpos(self): # to be consistent
    return self.tell()
    @eofpos.setter
    def eofpos(self, p):
    self.seek(p, 2)

    Another option could be a special descriptor which can be used as well
    for relative seeking:

    class FilePositionDesc(object):
    def __init__(self):
    pass
    def __get__(self, instance, owner):
    return FilePosition(self)
    def __set__(self, value):
    self.seek(value)

    class FilePosition(object):
    def __init__(self, file):
    self.file = file
    def __iadd__(self, offset):
    self.file.seek(offset, 1)
    def __isub__(self, offset):
    self.file.seek(-offset, 1)

    class MyFile(file):
    pos = FilePositionDesc()
    [...]

    Stop.

    This could be handled with a property as well.

    Besides, this breaks some other expectations to the pos. So let's
    introduce a 3rd property named relpos:

    class FilePosition(object):
    def __init__(self, file):
    self.file = file
    self.seekoffset = 0
    def __iadd__(self, offset):
    self.seekoffset += offset
    def __isub__(self, offset):
    self.seekoffset -= offset
    def __int__(self):
    return self.file.tell() + self.seekoffset

    class MyFile(file):
    @property
    def relpos(self):
    return FilePosition(self) # from above
    @relpos.setter
    def relpos(self, ofs):
    try:
    o = ofs.seekoffset # is it a FilePosition?
    except AttributeError:
    self.seek(ofs, 1) # no, but ofs can be an int as well
    else:
    self.seek(o, 1) # yes, it is


    Thomas
     
    Thomas Rachel, Sep 25, 2012
    #15
  16. Am 25.09.2012 00:37 schrieb Ian Kelly:
    > On Mon, Sep 24, 2012 at 4:14 PM, Chris Angelico<> wrote:
    >> file.pos = 42 # Okay, you're at position 42
    >> file.pos -= 10 # That should put you at position 32
    >> foo = file.pos # Presumably foo is the integer 32
    >> file.pos -= 100 # What should this do?

    >
    > Since ints are immutable, the language specifies that it should be the
    > equivalent of "file.pos = file.pos - 100", so it should set the file
    > pointer to 68 bytes before EOF.


    But this is not a "real int", it has a special use. So I don't think it
    is absolutely required to behave like an int.

    This reminds me of some special purpose registers in embedded
    programming, where bits can only be set by hardware and are cleared by
    the application by writing 1 to them.

    Or some bit setting registers, like on ATxmega: OUT = 0x10 sets bit 7
    and clears all others, OUTSET = 0x10 only sets bit 7, OUTTGL = 0x10
    toggles it and OUTCLR = 0x10 clears it.

    If this behaviour is documented properly enough, it is quite OK, IMHO.


    Thomas
     
    Thomas Rachel, Sep 25, 2012
    #16
  17. Am 24.09.2012 23:49, schrieb Dave Angel:
    > And what approach would you use for positioning relative to
    > end-of-file? That's currently done with an optional second
    > parameter to seek() method.


    Negative indices.

    ;)

    Uli
     
    Ulrich Eckhardt, Sep 25, 2012
    #17
  18. On 25/09/2012 03:32, Mark Adam wrote:
    > On Mon, Sep 24, 2012 at 5:55 PM, Oscar Benjamin
    > <> wrote:
    >> There are many situations where a little bit of attribute access magic is a
    >> good thing. However, operations that involve the underlying OS and that are
    >> prone to raising exceptions even in bug free code should not be performed
    >> implicitly like this. I find the following a little cryptic:
    >> try:
    >> f.pos = 256
    >> except IOError:
    >> print('Unseekable file')

    >
    > Well it might be that the coupling between the python interpreter and
    > the operating system should be more direct and there should be a
    > special exception class that bypasses the normal overhead in the
    > CPython implementation so that error can be caught in the code without
    > breaking syntax. But I don't think I'm ready to argue that point....
    >
    > markj
    >


    Something along these lines
    http://docs.python.org/dev/whatsnew/3.3.html#pep-3151-reworking-the-os-and-io-exception-hierarchy
    ?

    --
    Cheers.

    Mark Lawrence.
     
    Mark Lawrence, Sep 25, 2012
    #18
  19. On Tue, 25 Sep 2012 07:25:48 +0200, Thomas Rachel wrote:

    > Am 25.09.2012 04:28 schrieb Steven D'Aprano:
    >
    >> By the way, the implementation of this is probably trivial in Python
    >> 2.x. Untested:
    >>
    >> class MyFile(file):
    >> @property
    >> def pos(self):
    >> return self.tell()
    >> @pos.setter
    >> def pos(self, p):
    >> if p< 0:
    >> self.seek(p, 2)
    >> else:
    >> self.seek(p)
    >>
    >> You could even use a magic sentinel to mean "see to EOF", say, None.
    >>
    >> if p is None:
    >> self.seek(0, 2)
    >>
    >> although I don't know if I like that.

    >
    > The whole concept is incomplete at one place: self.seek(10, 2) seeks
    > beyond EOF, potentially creating a sparse file. This is a thing you
    > cannot achieve.


    On the contrary, since the pos attribute is just a wrapper around seek,
    you can seek beyond EOF easily:

    f.pos = None
    f.pos += 10

    But for anything but the most trivial usage, I would recommend sticking
    to the seek method.

    The problem with this idea is that the seek method takes up to three
    arguments (the file being operated on, the position, and the mode), and
    attribute syntax can only take two (the file, the position, e.g.:
    file.pos = position). So either there are cases that file.pos cannot
    handle (and so we need to keep tell/seek around, which leaves file.pos
    redundant), or we need multiple attributes, one for each mode), or we
    build a complicated, inconvenient API using special data types instead of
    plain integers.

    So all up, I'm -1 on trying to replace the tell/seek API, and -0 on
    adding a second, redundant API.

    Wait, there is another alternative: tuple arguments:

    f.pos = (where, whence)

    being the equivalent to seek(where, whence). At this point you just save
    two characters "f.pos=a,b" vs "f.seek(a,b)" so it simply isn't worth it
    for such a trivial benefit.


    --
    Steven
     
    Steven D'Aprano, Sep 25, 2012
    #19
  20. On Tue, 25 Sep 2012 07:32:31 +0200, Thomas Rachel
    <>
    declaimed the following in gmane.comp.python.general:

    > Or some bit setting registers, like on ATxmega: OUT = 0x10 sets bit 7
    > and clears all others, OUTSET = 0x10 only sets bit 7, OUTTGL = 0x10
    > toggles it and OUTCLR = 0x10 clears it.
    >
    > If this behaviour is documented properly enough, it is quite OK, IMHO.
    >

    I don't think I'd want to work with any device where 0x10 (00010000
    binary) modifies bit SEVEN. 0x40, OTOH, would fit my mental impression
    of bit 7.

    It doesn't even fit my mind if the value is suppose to be the /bit
    number/ unless the device considers "bit 7" to be the EIGHTH bit (that
    is, the LSB is considered bit 1, not bit 0)
    --
    Wulfraed Dennis Lee Bieber AF6VN
    HTTP://wlfraed.home.netcom.com/
     
    Dennis Lee Bieber, Sep 25, 2012
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Shlomo Anglister
    Replies:
    1
    Views:
    423
    Default User
    Aug 2, 2004
  2. Praveen, Tayal (IE10)
    Replies:
    0
    Views:
    384
    Praveen, Tayal (IE10)
    Mar 17, 2005
  3. John123

    Profiling API or Membership API

    John123, Oct 20, 2006, in forum: ASP .Net
    Replies:
    0
    Views:
    387
    John123
    Oct 20, 2006
  4. ray
    Replies:
    1
    Views:
    1,368
    Robert Kern
    Jun 4, 2010
  5. Bob Lu
    Replies:
    0
    Views:
    133
    Bob Lu
    Jun 25, 2009
Loading...

Share This Page