FileString - request for comments

Discussion in 'Ruby' started by apeiros@gmx.net, Nov 9, 2009.

  1. Guest

    Hi there

    I just put FileString on github: http://github.com/apeiros/filestring
    FileString is a class that wraps a path on the filesystem (a file) and provides an exact copy of the String API. This means you can code as if you had a String and your file on the disk gets manipulated just "magically".

    The library is very young (just a bit more than 24h), so please use with care.

    I'd appreciate any kind of comment.

    Regards
    Stefan
    --
    Jetzt kostenlos herunterladen: Internet Explorer 8 und Mozilla Firefox 3.5 -
    sicherer, schneller und einfacher! http://portal.gmx.net/de/go/atbrowser
     
    , Nov 9, 2009
    #1
    1. Advertising

  2. On Nov 8, 2009, at 7:47 PM, wrote:

    > I just put FileString on github: http://github.com/apeiros/filestring
    > FileString is a class that wraps a path on the filesystem (a file)
    > and provides an exact copy of the String API. This means you can
    > code as if you had a String and your file on the disk gets
    > manipulated just "magically".


    Interesting choice to use a String. I used Tie::File a couple of
    times in Perl code. It works as an Array instead:

    http://search.cpan.org/~mjd/Tie-File-0.96/lib/Tie/File.pm

    James Edward Gray II
     
    James Edward Gray II, Nov 9, 2009
    #2
    1. Advertising

  3. James Edward Gray II wrote:
    > On Nov 8, 2009, at 7:47 PM, wrote:
    >
    >> I just put FileString on github: http://github.com/apeiros/filestring
    >> FileString is a class that wraps a path on the filesystem (a file) and
    >> provides an exact copy of the String API. This means you can code as
    >> if you had a String and your file on the disk gets manipulated just
    >> "magically".

    >
    > Interesting choice to use a String. I used Tie::File a couple of times
    > in Perl code. It works as an Array instead:
    >
    > http://search.cpan.org/~mjd/Tie-File-0.96/lib/Tie/File.pm
    >
    > James Edward Gray II


    What would the advantage over mmap[1] be? FileString is pure ruby
    (right?) and hence more portable, but probably mmap is much more
    efficient? Any other tradeoffs?

    [1] http://moulon.inra.fr/ruby/mmap.html; looks like this project of Guy
    Decoux's has been recently adopted by knu: http://github.com/knu/ruby-mmap.

    --
    vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407
     
    Joel VanderWerf, Nov 9, 2009
    #3
  4. Guest

    Hi Joel

    > What would the advantage over mmap[1] be? FileString is pure ruby
    > (right?) and hence more portable, but probably mmap is much more
    > efficient? Any other tradeoffs?


    Interesting, I was looking if a solution existed already and didn't find mmap. Yes, FileString is pure ruby and should therefore run on all ruby implementations. And yes, I'd expect mmap to be more efficient on the other hand. It'd be interesting to combine the two (if that's at all possible).
    In a quick test it seems FileString is more complete too, e.g. Mmap doesn't have #replace (should be trivial to add). But Mmap has the feature to only tie a part of the file.

    > http://github.com/knu/ruby-mmap.

    Thanks for the link

    Regards
    Stefan
    --
    GRATIS für alle GMX-Mitglieder: Die maxdome Movie-FLAT!
    Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01
     
    , Nov 9, 2009
    #4
  5. Guest

    -------- Original-Nachricht --------
    > Datum: Mon, 9 Nov 2009 12:37:17 +0900
    > Von: James Edward Gray II <>
    > An:
    > Betreff: Re: FileString - request for comments


    > On Nov 8, 2009, at 7:47 PM, wrote:
    >
    > > I just put FileString on github: http://github.com/apeiros/filestring
    > > FileString is a class that wraps a path on the filesystem (a file)
    > > and provides an exact copy of the String API. This means you can
    > > code as if you had a String and your file on the disk gets
    > > manipulated just "magically".

    >
    > Interesting choice to use a String. I used Tie::File a couple of
    > times in Perl code. It works as an Array instead:
    >
    > http://search.cpan.org/~mjd/Tie-File-0.96/lib/Tie/File.pm
    >
    > James Edward Gray II


    Somebody I know already implemented a TieFile in ruby, the repository is at http://killerfox.protection-fault.ch/gitrepo/tie_file.git

    Personally I don't tend to think of a file as an array. I'd use Tie::File if I'd need a persistent array, so the problem is coming "the other way round". With FileString I explicitly want to deal with a File, but not with an IO like API (of course you could go at it as "I need a persistent String" too - but that wasn't/isn't the case for me).

    Regards
    Stefan
    --
    Jetzt kostenlos herunterladen: Internet Explorer 8 und Mozilla Firefox 3.5 -
    sicherer, schneller und einfacher! http://portal.gmx.net/de/go/chbrowser
     
    , Nov 9, 2009
    #5
  6. On 9 Nov 2009, at 13:54, wrote:
    > Hi Joel
    >
    >> What would the advantage over mmap[1] be? FileString is pure ruby
    >> (right?) and hence more portable, but probably mmap is much more
    >> efficient? Any other tradeoffs?

    >
    > Interesting, I was looking if a solution existed already and didn't
    > find mmap. Yes, FileString is pure ruby and should therefore run on
    > all ruby implementations. And yes, I'd expect mmap to be more
    > efficient on the other hand. It'd be interesting to combine the two
    > (if that's at all possible).
    > In a quick test it seems FileString is more complete too, e.g. Mmap
    > doesn't have #replace (should be trivial to add). But Mmap has the
    > feature to only tie a part of the file.


    It would probably be fairly trivial for you to directly support mmap
    at the OS level using Ruby/DL, Ruby-FFI or even syscall (although
    that's ugly and fragile). Take a look at some of my Plumber's Guide
    presentations at the link in my signature and also at http://kenai.com/projects/ruby-ffi
    for details of how to wrap these kinds of system calls such that
    they'll run identically on JRuby, Rubinius and MRI.


    Ellie

    Eleanor McHugh
    Games With Brains
    http://slides.games-with-brains.net
    ----
    raise ArgumentError unless @reality.responds_to? :reason
     
    Eleanor McHugh, Nov 9, 2009
    #6
  7. 2009/11/9 Eleanor McHugh <>:
    > On 9 Nov 2009, at 13:54, wrote:


    > It would probably be fairly trivial for you to directly support mmap at the
    > OS level using Ruby/DL, Ruby-FFI or even syscall (although that's ugly and
    > fragile).


    I am still trying to wrap my head around the question whether hiding
    file IO behind a String API is a good idea. Basically the reason to
    create something like this is to be able to use a file in places which
    expect to be given a String instance. However, code that uses String
    assumes fast access to arbitrary portions of the string. When those
    accesses are translated into random accesses to a file performance
    _might_ suffer dramatically. Put differently: hiding the fact that we
    are dealing with a file is convenient but may actually break your
    neck. And although at a certain level of abstraction a file and a
    String are pretty much the same (sequence of chars / bytes) it may
    actually be a good thing to keep the API separate in order to treat
    both appropriately. Stefan, what's your experience?

    Kind regards

    robert

    --
    remember.guy do |as, often| as.you_can - without end
    http://blog.rubybestpractices.com/
     
    Robert Klemme, Nov 9, 2009
    #7
  8. Robert,

    RK> I am still trying to wrap my head around the question whether hiding
    RK> file IO behind a String API is a good idea.

    As the PickAxe book points out, by having file i/o represented by a
    String ... that is, making it irrelevant whether one is talking to a
    String or a File ... makes for some nice unit testing.
     
    Ralph Shnelvar, Nov 9, 2009
    #8
  9. Guest

    -------- Original-Nachricht --------
    > Datum: Tue, 10 Nov 2009 00:28:56 +0900
    > Von: Robert Klemme <>
    > An:
    > Betreff: Re: FileString - request for comments
    >
    > I am still trying to wrap my head around the question whether hiding
    > file IO behind a String API is a good idea. Basically the reason to
    > create something like this is to be able to use a file in places which
    > expect to be given a String instance.


    No. At least that was not the idea (though, you could).
    The reason is that e.g. replacing a part of a file is cumbersome.
    Compare:

    # IO API:
    File.open(path, "r+b") do |fh|
    fh.seek(offset+length)
    rest = fh.read
    fh.seek(offset)
    fh.write(replacement)
    fh.write(rest)
    }

    # String API:
    fs = FileString.new(path)
    fs[offset, length] = replacment # done!

    Imagine how much more inconvenient it becomes when it's not offset & length but a Range, or when you have to accomodate negative offsets etc.

    And there are other examples, just dive a bit in FileString's source :)

    The String API is *far* more convenient.

    > However, code that uses String
    > assumes fast access to arbitrary portions of the string. When those
    > accesses are translated into random accesses to a file performance
    > _might_ suffer dramatically.


    Yes. If you get that kind of problem - you can always use File.read instead of FileString#to_s (or to_str).

    > Put differently: hiding the fact that we
    > are dealing with a file is convenient but may actually break your
    > neck.


    As all highlevel things. If you don't know the things you're dealing with you can easily kill performance. Consider e.g. ary.any? { |obj| other.include?(obj) } - there, just accidentally created an O(n^2) algorithm. It can happen everywhere and it can look totally innocent.
    That's not a problem that's specific to FileString but to everything that's abstract.

    > And although at a certain level of abstraction a file and a
    > String are pretty much the same (sequence of chars / bytes) it may
    > actually be a good thing to keep the API separate in order to treat
    > both appropriately. Stefan, what's your experience?


    As you see, I disagree :)
    However, what you say is of course correct. Using FileString means you have to keep in mind that you're dealing with a file.
    But: if you know you're dealing with a file, it can even help you making things faster. For example, if you indeed want to compare two files for equality, FileString#== will be faster and less memory intensive than you doing File.read(a) == File.read(b) if the two files are big.

    > Kind regards
    >
    > robert


    Thanks for your thoughts robert, much appreciated

    regards
    Stefan
    --
    GRATIS für alle GMX-Mitglieder: Die maxdome Movie-FLAT!
    Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01
     
    , Nov 9, 2009
    #9
  10. On 9 Nov 2009, at 16:49, Ralph Shnelvar wrote:
    > Robert,
    >
    > RK> I am still trying to wrap my head around the question whether
    > hiding
    > RK> file IO behind a String API is a good idea.
    >
    > As the PickAxe book points out, by having file i/o represented by a
    > String ... that is, making it irrelevant whether one is talking to a
    > String or a File ... makes for some nice unit testing.


    Using a given representation just because it's unit testing friendly
    isn't necessarily a good idea...


    Ellie

    Eleanor McHugh
    Games With Brains
    http://slides.games-with-brains.net
    ----
    raise ArgumentError unless @reality.responds_to? :reason
     
    Eleanor McHugh, Nov 9, 2009
    #10
  11. Eleanor McHugh wrote:
    [...]
    > Using a given representation just because it's unit testing friendly
    > isn't necessarily a good idea...


    ...or necessarily a bad idea. There's something to be said for letting
    architecture emerge from testability.

    >
    >
    > Ellie
    >
    > Eleanor McHugh
    > Games With Brains
    > http://slides.games-with-brains.net
    > ----
    > raise ArgumentError unless @reality.responds_to? :reason


    Best,
    --
    Marnen Laibow-Koser
    http://www.marnen.org

    --
    Posted via http://www.ruby-forum.com/.
     
    Marnen Laibow-Koser, Nov 9, 2009
    #11
  12. On 09.11.2009 17:29, wrote:
    > -------- Original-Nachricht --------
    >> Von: Robert Klemme <>

    > As all highlevel things. If you don't know the things you're dealing with you can easily kill performance. Consider e.g. ary.any? { |obj| other.include?(obj) } - there, just accidentally created an O(n^2) algorithm. It can happen everywhere and it can look totally innocent.
    > That's not a problem that's specific to FileString but to everything that's abstract.


    True.

    >> And although at a certain level of abstraction a file and a
    >> String are pretty much the same (sequence of chars / bytes) it may
    >> actually be a good thing to keep the API separate in order to treat
    >> both appropriately. Stefan, what's your experience?

    >
    > As you see, I disagree :)


    > However, what you say is of course correct. Using FileString means

    you > have to keep in mind that you're dealing with a file.
    > But: if you know you're dealing with a file, it can even help you
    > making things faster. For example, if you indeed want to compare two
    > files for equality, FileString#== will be faster and less memory
    > intensive than you doing File.read(a) == File.read(b) if the two

    files > are big.

    A good point! You're probably right and I was too pessimistic. I'd
    love to see

    fs[/foo(\w+)/, 1] = "bar"
    fs.gsub! /foo/, "bar"

    etc. because those would be the ones that would make FileString
    convenient for me. :)

    > Thanks for your thoughts robert, much appreciated


    Thanks for listening and sharing!

    Kind regards

    robert

    --
    remember.guy do |as, often| as.you_can - without end
    http://blog.rubybestpractices.com/
     
    Robert Klemme, Nov 9, 2009
    #12
  13. Guest

    -------- Original-Nachricht --------
    > Datum: Tue, 10 Nov 2009 06:25:08 +0900
    > Von: Robert Klemme <>
    > Betreff: Re: FileString - request for comments


    > A good point! You're probably right and I was too pessimistic. I'd
    > love to see
    >
    > fs[/foo(\w+)/, 1] = "bar"
    > fs.gsub! /foo/, "bar"
    >
    > etc. because those would be the ones that would make FileString
    > convenient for me. :)


    Those already exist. Unfortunately optimizing regex matching is too involved as that I could have done that in 24h :)
    Means fs[/foo(\w+)/, 1] = "bar" is just more convenient than writing:
    data = File.read
    data[/foo(\w+)/, 1] = "bar"
    File.open(path, "w") { |fh| fh.write(data) }
    But I think that's already quite worth it :)
    I mean - that's just lots of boilerplate.

    > Thanks for listening and sharing!


    Always :D
    The listening part has made me change the docs btw., I know hint on thinking about performance and probably just use a string and write back when all is done.

    Regards
    Stefan
    --
    Jetzt kostenlos herunterladen: Internet Explorer 8 und Mozilla Firefox 3.5 -
    sicherer, schneller und einfacher! http://portal.gmx.net/de/go/atbrowser
     
    , Nov 9, 2009
    #13
  14. Guest

    -------- Original-Nachricht --------
    > Datum: Tue, 10 Nov 2009 06:25:08 +0900
    > Von: Robert Klemme <>
    > Betreff: Re: FileString - request for comments


    > A good point! You're probably right and I was too pessimistic. I'd
    > love to see
    >
    > fs[/foo(\w+)/, 1] = "bar"


    I just noticed that I actually didn't have that functionality in. I added it now in the way described in the earlier reply.

    Also a small correction of one of my earlier statements (typo):
    You can use File.read or FileString#to_s (or to_str) instead of the FileString instance. FileString#to_s returns the contents of the file.

    Regards
    Stefan
    --
    Jetzt kostenlos herunterladen: Internet Explorer 8 und Mozilla Firefox 3.5 -
    sicherer, schneller und einfacher! http://portal.gmx.net/de/go/atbrowser
     
    , Nov 10, 2009
    #14
  15. On Monday 09 November 2009 06:54:15 am wrote:
    > Hi Joel
    >
    > > What would the advantage over mmap[1] be? FileString is pure ruby
    > > (right?) and hence more portable, but probably mmap is much more
    > > efficient? Any other tradeoffs?

    >
    > Interesting, I was looking if a solution existed already and didn't find
    > mmap. Yes, FileString is pure ruby and should therefore run on all ruby
    > implementations. And yes, I'd expect mmap to be more efficient on the other
    > hand.


    I'd have looked for mmap first, knowing the concept from Linux. I'd also expect
    that with mmap, you should be able to implement an efficient regex, though I'm
    not sure how well gsub! would work, unless you can guarantee the match is
    always exactly the length of the target string.

    (And for gsub to be efficient, you'd need some fancy copy-on-write stuff, which
    would make it that much more difficult to chain them.)

    But if you were looking for comments, it looks awesome. Thanks!
     
    David Masover, Nov 10, 2009
    #15
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Brian Birtle
    Replies:
    2
    Views:
    2,048
    John Saunders
    Oct 16, 2003
  2. Replies:
    0
    Views:
    1,135
  3. Jorge Godoy
    Replies:
    1
    Views:
    312
    Jorge Godoy
    Sep 9, 2003
  4. Monk
    Replies:
    10
    Views:
    1,478
    Michael Wojcik
    Apr 20, 2005
  5. Replies:
    4
    Views:
    614
    Dr John Stockton
    Jun 3, 2006
Loading...

Share This Page