is Random Access File really "random access"?

Discussion in 'Java' started by Kevin, Feb 6, 2006.

  1. Kevin

    Kevin Guest

    Hi,
    I am kind of new at this topic, but does anyone know that: is Java's
    Random Access File really "random access", or just java "simulate" it
    for newbie coders' easy coding?

    The difference is, for example, for a 100G file of many records, if the
    access to it is a REAL random access, then accessing any of its record
    will use almost the same time (and fast): accessing the first record
    will use the same time as accessing its 100th record, 1000000th record,
    etc., and all should be fast and use little resource.

    So I think it comes down to how the "seek(position)" work. Will it:
    1) just read forward/backward to the postion?
    or
    2) it "jump" to that position directly?

    >From my limited knowledge, I think they can do it this way: since each

    file's header will keep a linked list (or pointers, whatever) of the
    blocks of this file, so java can read in those "informative" blocks
    that keep information of those data blocks, and do the calculate to
    know which data block is the required one and read that block directly.


    Am I right at this point? Why I saw some articles say something like
    "since random access file needs to access the underlying OS, so its
    performance is not so good"?

    Thank you.
     
    Kevin, Feb 6, 2006
    #1
    1. Advertising

  2. Kevin

    Kevin Guest

    By the way, anyother description (which is my real case) will be:

    If I have 100000000 fixed size record, each record is 100 bytes for
    example, and I write them out to a file, which will be about 10G size.

    And I need to access those records randomly, at about (randomly) 100 of
    them each 1 - 5 seconds. And of course I don't have 10G memory so I can
    not keep the file in memory.

    Using random access file, can I expect to be able to access them in
    this way at a relatively fast way?

    Thanks. :)
     
    Kevin, Feb 6, 2006
    #2
    1. Advertising

  3. Kevin

    Roedy Green Guest

    On 5 Feb 2006 16:59:05 -0800, "Kevin" <> wrote,
    quoted or indirectly quoted someone who said :

    >I am kind of new at this topic, but does anyone know that: is Java's
    >Random Access File really "random access", or just java "simulate" it
    >for newbie coders' easy coding?


    It is true random access. At the low level on disk is a list of the
    clusters ( head, track and sectors) where the various fragments of the
    file are stored.

    If you seek to offset 345333 of the file, the OS figures out which
    fragment it is in, and the offset within that fragment. Then it
    calculates the head, track and sector containing that offset and how
    many sectors are need to fulfil your read. Then it schedules the disk
    to seek to that location.The CPU does not wait, it gets on with other
    things. When the disk arm gets to that location, it reads the data (in
    SCSCI without CPU help), and then it is done taps the CPU on the
    shoulder to tell it to look it RAM for the sectors requested. The cpu
    copies the bytes you wanted into your buffer.

    So the computer does not need to read the file sequentially at all.
    Even when it reads sequentially, it is just a series of random reads,
    one after the other.

    Sequential devices are : mag tape, CD writing. DVD writing, TCP/IP,
    printers


    --
    Canadian Mind Products, Roedy Green.
    http://mindprod.com Java custom programming, consulting and coaching.
     
    Roedy Green, Feb 6, 2006
    #3
  4. Kevin

    Roedy Green Guest

    On 5 Feb 2006 17:10:40 -0800, "Kevin" <> wrote,
    quoted or indirectly quoted someone who said :

    >And I need to access those records randomly, at about (randomly) 100 of
    >them each 1 - 5 seconds. And of course I don't have 10G memory so I can
    >not keep the file in memory.
    >
    >Using random access file, can I expect to be able to access them in
    >this way at a relatively fast way?


    With nio there is an intermediate alternative. You memory map the
    file. The OS then tries to keep as much of the file as it can in RAM.
    --
    Canadian Mind Products, Roedy Green.
    http://mindprod.com Java custom programming, consulting and coaching.
     
    Roedy Green, Feb 6, 2006
    #4
  5. Kevin

    Chris Uppal Guest

    Kevin wrote:

    > So I think it comes down to how the "seek(position)" work. Will it:
    > 1) just read forward/backward to the postion?
    > or
    > 2) it "jump" to that position directly?


    2.

    (I suppose that technically it is implementation dependent, but if the
    underlying OS provides random access files and the Java implementation didn't
    use them then we'd have a right to be more than merely astonished, we could
    lynch someone ;-) The same goes for an OS with a "real", general-purpose,
    filesystem that didn't provide random access).


    > > From my limited knowledge, I think they can do it this way: since each

    > file's header will keep a linked list (or pointers, whatever) of the
    > blocks of this file, so java can read in those "informative" blocks
    > that keep information of those data blocks, and do the calculate to
    > know which data block is the required one and read that block directly.


    That kind of complexity is implemented in the OS and/or filesystem rather than
    in the Java code.


    > [...] I saw some articles say something like
    > "since random access file needs to access the underlying OS, so its
    > performance is not so good"?


    I /suspect/ that what they mean is that random access and buffering are largely
    incompatible. The point of buffering is that by holding data in the process's
    own memory, you can avoid going to the OS with lots of small reads/writes. But
    that depends on the reads/writes being adjacent. If you read a byte at offset
    1, then one at offset 10000000, no implementation has any chance of finding the
    second byte in the buffer that it filled to satisfy the first request (unless
    you had stupidly big buffers -- which would be /very/ inefficient in this
    case). Broadly speaking, if you are doing random access then either you can't
    take advantage of buffering at all or you have to do it yourself. It may be
    that the Java implementation provides a /little/ buffering so that sequential
    reads (with no intervening seek()) will read from a small buffer. That's the
    way I'd implement it myself, but I'm afraid that I don't know whether the Java
    people did the same (the spec vanishes into a maze of abstract classes, and I
    can't be bothered to check the actual code) -- on the whole I'd guess not.

    BTW1, 10G is a bit on the large size for a file. You may find it unwieldy, if
    only for things like backing up, etc (and hope to Hell that you don't have a
    virus checker that insists on scanning the whole thing after every write ;-)
    It also may be less efficient, than -- say -- 10 x 1G files since the
    OS/filesystem will have to build rather complex on-disk structures to find each
    of your blocks on disk.

    BTW2. You say you want to handle a peak of approx 100 random reads per second.
    That translates to a disk-head seek time of at worst 10 microseconds. Which is
    plausible, but you are close to the hardware limit[*]. If the OS+filesystem
    has to do another (internal) seek to find data defining the location of your
    real data on-disk each time, then you are even closer to the hardware limit.
    That's another reason why you may find it better to use more than one file
    located on different /physical/ disks.

    ([*] I haven't been following hard-disk specs for some years, but I doubt if
    seek time has speeded up all that much)

    -- chris
     
    Chris Uppal, Feb 6, 2006
    #5
  6. Kevin

    Chris Uppal Guest

    Roedy Green wrote:


    > With nio there is an intermediate alternative. You memory map the
    > file. The OS then tries to keep as much of the file as it can in RAM.


    10 Gig ?

    Unless the OP's using a 64-bit JVM there won't be enough address space
    to map it in.

    -- chris
     
    Chris Uppal, Feb 6, 2006
    #6
  7. Kevin

    Roedy Green Guest

    On 06 Feb 2006 13:53:29 GMT, "Chris Uppal"
    <-THIS.org> wrote, quoted or indirectly
    quoted someone who said :

    >> With nio there is an intermediate alternative. You memory map the
    >> file. The OS then tries to keep as much of the file as it can in RAM.

    >
    >10 Gig ?


    If you have a 32 bit address space that will limit you to somewhere
    between 1 and 4 gig depending on how they implemented it. If you have
    a 64 bit addressing space, 10 gig, no problem.

    How are home-use 64-bit machines coming along?
    --
    Canadian Mind Products, Roedy Green.
    http://mindprod.com Java custom programming, consulting and coaching.
     
    Roedy Green, Feb 6, 2006
    #7
  8. Kevin

    Roedy Green Guest

    On Mon, 6 Feb 2006 12:19:43 -0000, "Chris Uppal"
    <-THIS.org> wrote, quoted or indirectly
    quoted someone who said :

    >I /suspect/ that what they mean is that random access and buffering are largely
    >incompatible


    In theory with sequential i/o, either Java or the OS could presume you
    are going to read the next block, and get it ready ahead of time while
    you are still computing. That is how it was done in the days of
    computers with 16K of RAM. We called it double buffering. You
    processed data in one buffer while it was reading the next. I know
    early version of Windows did not do this. When it did an i/o,
    computation stopped until the i/o was completed, and it never did any
    read ahead while you were busy computing. NT was a little bit
    cleverer, at least scheduling i/o for several different tasks. I
    don't know if XP has finally graduated to the level of circa 1965
    computers.



    With random I/O it has no idea what you will read next, so it can't
    very well read ahead.
    --
    Canadian Mind Products, Roedy Green.
    http://mindprod.com Java custom programming, consulting and coaching.
     
    Roedy Green, Feb 6, 2006
    #8
  9. Kevin

    Kevin Guest

    So, I think our conclusion is:

    1) it is real random access as far as the OS support.

    2) the time needed to access each data block is:

    a) time needed to compute the physical location of the data block
    plus
    b) time of disk head movement to that location.
    plus
    c) time needed to read in that physical data block from disk.

    Thank you all. :)
     
    Kevin, Feb 6, 2006
    #9
  10. "Roedy Green" <> wrote in
    message news:...
    > On 06 Feb 2006 13:53:29 GMT, "Chris Uppal"
    > <-THIS.org> wrote, quoted or indirectly
    > quoted someone who said :
    >
    >>> With nio there is an intermediate alternative. You memory map the
    >>> file. The OS then tries to keep as much of the file as it can in RAM.

    >>
    >>10 Gig ?

    >
    > If you have a 32 bit address space that will limit you to somewhere
    > between 1 and 4 gig depending on how they implemented it. If you have
    > a 64 bit addressing space, 10 gig, no problem.
    >
    > How are home-use 64-bit machines coming along?


    There are no single chip X86-64 solutions existing offering more than 8 Gb
    right now.

    Many dual xeon boards are limited to 8 Gb total, but dual Opteron board can
    accept up to 16 Gb. (Actually for opteron just multiple number of chips
    times 8 Gb and you get total main memory potential)

    It is at least possible that in the new socket switch AMD might allow more
    pins for additional main memory. (I haven't heard any discussion one way or
    the other.) Hopefully they don't have a 3 socket blunder again like they
    did last time. Sheesh! AMD has made PR mistake after PR mistake at the
    hands of Mr Huiz. But

    --
    LTP

    :)
     
    Luc The Perverse, Feb 6, 2006
    #10
  11. Roedy Green wrote:
    > Even when it reads sequentially, it is just a series of random reads,
    > one after the other.


    Which get turned back into sequential reads by the disk controller
    to avoid having to wave the drive arm around too much,

    Richard
     
    Richard Wheeldon, Feb 6, 2006
    #11
  12. Kevin

    Chris Uppal Guest

    Luc The Perverse wrote:

    > > If you have a 32 bit address space that will limit you to somewhere
    > > between 1 and 4 gig depending on how they implemented it. If you have
    > > a 64 bit addressing space, 10 gig, no problem.
    > >
    > > How are home-use 64-bit machines coming along?

    >
    > There are no single chip X86-64 solutions existing offering more than 8 Gb
    > right now.


    Oh bugger! And I'd been hoping for 128 Gig in my next laptop too...

    BTW, the issue here is actually the size of the address-space, rather than that
    of the physical RAM -- unless these chips/boards have limited address lines
    too.

    -- chris
     
    Chris Uppal, Feb 7, 2006
    #12
  13. "Chris Uppal" <-THIS.org> wrote in message
    news:43e85981$0$1173$...
    > Luc The Perverse wrote:
    >
    >> > If you have a 32 bit address space that will limit you to somewhere
    >> > between 1 and 4 gig depending on how they implemented it. If you have
    >> > a 64 bit addressing space, 10 gig, no problem.
    >> >
    >> > How are home-use 64-bit machines coming along?

    >>
    >> There are no single chip X86-64 solutions existing offering more than 8
    >> Gb
    >> right now.

    >
    > Oh bugger! And I'd been hoping for 128 Gig in my next laptop too...
    >
    > BTW, the issue here is actually the size of the address-space, rather than
    > that
    > of the physical RAM -- unless these chips/boards have limited address
    > lines
    > too.



    If I understand you correctly, yes this is a problem.

    I don't know about Xeon chips, they are way out of my price range - but
    Opterons have on board memory controller, and are actually limited by their
    pins. (I'm an AMD guy anyway.)

    128 Gb - that seems a little insane. I'm all about technological jumps -
    but I'm not sure what you'd fill it up with? Illegally downloading DVD
    ISOs . .. to ram?

    --
    LTP

    :)
     
    Luc The Perverse, Feb 7, 2006
    #13
  14. Kevin

    Roedy Green Guest

    On Mon, 06 Feb 2006 22:17:41 +0000, Richard Wheeldon
    <> wrote, quoted or indirectly quoted someone
    who said :

    >Which get turned back into sequential reads by the disk controller
    >to avoid having to wave the drive arm around too much,


    not quite sequential, elevator seeks. It waves the arms back and
    forth over the disk like a bus on a route, picking up passengers in
    order, different from the order the requests were made.

    I want some day write a defragger that uses similar "bus" logic.
    --
    Canadian Mind Products, Roedy Green.
    http://mindprod.com Java custom programming, consulting and coaching.
     
    Roedy Green, Feb 7, 2006
    #14
  15. Kevin

    Chris Uppal Guest

    Luc The Perverse wrote:

    [me:]
    > > Oh bugger! And I'd been hoping for 128 Gig in my next laptop too...
    > >
    > > BTW, the issue here is actually the size of the address-space, rather
    > > than that
    > > of the physical RAM -- unless these chips/boards have limited address
    > > lines
    > > too.

    >
    >
    > If I understand you correctly, yes this is a problem.
    >
    > I don't know about Xeon chips, they are way out of my price range - but
    > Opterons have on board memory controller, and are actually limited by
    > their pins. (I'm an AMD guy anyway.)


    Eek!

    Or, maybe not. I suppose (I wish I knew more about this stuff) it depends on
    whether the limitation is on addresses passed /in/ to the address-translation
    hardware, or on the translated addresses that it emits. I'd hope it's only the
    latter.


    > 128 Gb - that seems a little insane. I'm all about technological jumps -
    > but I'm not sure what you'd fill it up with? Illegally downloading DVD
    > ISOs . .. to ram?


    (I'm sure you realised that I was joking, but I'll take that question anyway)

    "Insane" ?!? You mean having "only" 1 Gig in a laptop /isn't/ insane ? ;-)

    Anyway -- seriously -- with that much RAM there are a number of interesting
    things you can do (assuming that program size hasn't grown in proportion --
    it's hard to imagine why it should[*]). For instance you could maintain a
    seriously useful amount of state in RAM, enough to be able to treat the
    hard-disk as merely a stable backup for memory (this whole "file" thing is
    really, like, so 20th century). Or you could run every program in its own OS
    (I'd really like to run all my network-facing applications -- webrowsers etc --
    as separate virtual Linuxes).

    -- chris

    ([*] Yeah, right...)
     
    Chris Uppal, Feb 7, 2006
    #15
  16. Kevin

    Roedy Green Guest

    On Tue, 07 Feb 2006 12:30:36 GMT, Roedy Green
    <> wrote, quoted or
    indirectly quoted someone who said :

    >not quite sequential, elevator seeks. It waves the arms back and
    >forth over the disk like a bus on a route, picking up passengers in
    >order, different from the order the requests were made.
    >
    >I want some day write a defragger that uses similar "bus" logic.


    I keep waiting for two hardware devices that never come.

    1. multiarm disks

    2. disks with marthaing in firmware -- remapping the logical tracks to
    physical ones, with background defragging to minimise head motion
    independent of the OS.
    see http://mindprod.com/jgloss/martha.html
    --
    Canadian Mind Products, Roedy Green.
    http://mindprod.com Java custom programming, consulting and coaching.
     
    Roedy Green, Feb 7, 2006
    #16
  17. On Tue, 07 Feb 2006 13:51:15 GMT, Roedy Green wrote:
    > I keep waiting for two hardware devices that never come.
    >
    > 1. multiarm disks


    AFAIK, the IBM 3340 (1973) and Connor Chinook (~1990) had multiple
    actuators.

    /gordon

    --
    [ do not email me copies of your followups ]
    g o r d o n + n e w s @ b a l d e r 1 3 . s e
     
    Gordon Beaton, Feb 7, 2006
    #17
  18. On 2006-02-07, Roedy Green penned:
    > On Tue, 07 Feb 2006 12:30:36 GMT, Roedy Green
    ><> wrote, quoted or
    >indirectly quoted someone who said :
    >
    >>not quite sequential, elevator seeks. It waves the arms back and
    >>forth over the disk like a bus on a route, picking up passengers in
    >>order, different from the order the requests were made.
    >>
    >>I want some day write a defragger that uses similar "bus" logic.

    >
    > I keep waiting for two hardware devices that never come.
    >
    > 1. multiarm disks
    >


    I know some folks in the disk drive industry. It seems like they have
    enough trouble keeping one arm from misbehaving.

    --
    monique

    Ask smart questions, get good answers:
    http://www.catb.org/~esr/faqs/smart-questions.html
     
    Monique Y. Mudama, Feb 7, 2006
    #18
  19. Kevin

    Chris Uppal Guest

    Monique Y. Mudama wrote:

    > I know some folks in the disk drive industry. It seems like they have
    > enough trouble keeping one arm from misbehaving.


    Like Dr. Strangelove...

    -- chris
     
    Chris Uppal, Feb 7, 2006
    #19
  20. "Monique Y. Mudama" <> burped up warm pablum in
    news::

    > I know some folks in the disk drive industry. It seems like they have
    > enough trouble keeping one arm from misbehaving.
    >


    Like "Police Inspector Hans Wilhelm Friederich Kemp" in Young Frankenstein?
     
    Tris Orendorff, Feb 13, 2006
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Amir
    Replies:
    3
    Views:
    599
  2. Maziar Aflatoun

    Random not really random...

    Maziar Aflatoun, Aug 4, 2004, in forum: ASP .Net
    Replies:
    4
    Views:
    26,712
    Maziar Aflatoun
    Aug 5, 2004
  3. nc
    Replies:
    1
    Views:
    502
    nice.guy.nige
    Feb 3, 2005
  4. globalrev
    Replies:
    4
    Views:
    772
    Gabriel Genellina
    Apr 20, 2008
  5. VK
    Replies:
    15
    Views:
    1,176
    Dr J R Stockton
    May 2, 2010
Loading...

Share This Page