finding a tag in a binary file

Discussion in 'Ruby' started by rob stanton, Feb 27, 2011.

  1. rob stanton

    rob stanton Guest

    I have a binary file in which I'd like to find multiple strings of 10
    00 10 00 (hex) amongst all the other values, then following that is a
    name.

    I've found that

    contents_array.find_all {|e| e== 0x10}
    shows all the 0x10 in the file but not the index, there are several
    hundred.

    contents_array.index(0x10)
    shows the first index of 0x10 (242), but how do I go on to list
    subsequent indexes of 0x10?

    puts(contents_array[242,4])
    16
    0
    85
    73
    => nil

    shows me that the first 0x10 I find is not correct, i.e. its 10 00 55 49
    so I need to go onto the next 0x10 and test again.

    I'm a bit stuck now as to how to do that, I'm very new and finding it
    difficult to find information...

    --
    Posted via http://www.ruby-forum.com/.
    rob stanton, Feb 27, 2011
    #1
    1. Advertising

  2. rob stanton

    Robert Dober Guest

    On Sun, Feb 27, 2011 at 11:50 AM, rob stanton <> wrote:
    > I have a binary file in which I'd like to find multiple strings of =A010
    > 00 10 00 (hex) amongst all the other values, then following that is a
    > name.


    ruby-1.9.2-p136 :024 > content =3D [ 97, 10, 0, 10, 0, 97, 98, 32, 32,
    10, 0, 10, 0, 98, 99, 10, 0, 10, 0 ].map(&:chr).join
    =3D> "a\n\x00\n\x00ab \n\x00\n\x00bc\n\x00\n\x00"
    ruby-1.9.2-p136 :025 >
    ruby-1.9.2-p136 :026 > p content.scan(/\n\0\n\0(\w+)/)
    [["ab"], ["bc"]]
    =3D> [["ab"], ["bc"]]

    should do the trick

    If you need the index for some other reason that checking for the
    name, let us know that would be a little more work ;).

    HTH
    Robert
    >
    > I've found that
    >
    > contents_array.find_all {|e| e=3D=3D 0x10}
    > shows all the 0x10 in the file but not the index, there are several
    > hundred.
    >
    > contents_array.index(0x10)
    > shows the first index of 0x10 (242), but how do I go on to list
    > subsequent indexes of 0x10?
    >
    > puts(contents_array[242,4])
    > 16
    > 0
    > 85
    > 73
    > =3D> nil
    >
    > shows me that the first 0x10 I find is not correct, i.e. its 10 00 55 49
    > so I need to go onto the next 0x10 and test again.
    >
    > I'm a bit stuck now as to how to do that, I'm very new and finding it
    > difficult to find information...
    >
    > --
    > Posted via http://www.ruby-forum.com/.
    >
    >




    --=20
    The 1,000,000th fibonacci number contains '42' 2039 times; that is
    almost 30 occurrences more than expected (208988 digits).
    N.B. The 42nd fibonacci number does not contain '1000000' that is
    almost the expected 3.0e-06 times.
    Robert Dober, Feb 27, 2011
    #2
    1. Advertising

  3. rob stanton

    rob stanton Guest

    wow a little beyond my just started status... So the array you created
    has a coupe of 10 00 10 00 in correct ?
    then I don't know what you did with it and you got ab bc ? thanks but
    could you explain a little more I'm new to this

    --
    Posted via http://www.ruby-forum.com/.
    rob stanton, Feb 27, 2011
    #3
  4. rob stanton

    Robert Dober Guest

    On Sun, Feb 27, 2011 at 12:34 PM, rob stanton <> wrote:
    > wow a little beyond my just started status... So the array you created
    > has a coupe of 10 00 10 00 in correct ?
    > then I don't know what you did with it and you got ab bc ? thanks but
    > could you explain a little more I'm new to this
    >

    right I created a string like "a\n\0\n\0bc..." than I used String#scan
    to get all matches of the regular expression matching \n\0\n\0
    followed by a non empty sequence of word characters (\w+) which I
    grouped.
    To demonstrate what that does let us look at this code ( I got rid of
    one \n\0 for laziness ;)

    content.scan /\n\0\w+/
    => ["\n\x00ab", "\n\x00bc"]

    but if we use a group in the regex we get only the group(s) (as a sub-array)

    content.scan /\n\0(\w+)/
    => [["ab"], ["bc"]]

    So if all you need is to scan the names following \n\0\n\0 you are
    done, if you need the
    positions in the string it is a little bit more work.



    > --
    > Posted via http://www.ruby-forum.com/.
    >
    >




    --
    The 1,000,000th fibonacci number contains '42' 2039 times; that is
    almost 30 occurrences more than expected (208988 digits).
    N.B. The 42nd fibonacci number does not contain '1000000' that is
    almost the expected 3.0e-06 times.
    Robert Dober, Feb 27, 2011
    #4
  5. rob stanton

    rob stanton Guest

    Hi Robert, got it now, but the data is all in hex, your code gives the
    ascii code ? but its almost there. the name follows 10 00 10 00 in this
    format 50 4e (P) (N) then xx 00 "surname" 5e "first name" followed by 10
    and then 00
    I'll see what I can do with your code but any help would be appreciated

    --
    Posted via http://www.ruby-forum.com/.
    rob stanton, Feb 27, 2011
    #5
  6. rob stanton

    Robert Dober Guest

    On Sun, Feb 27, 2011 at 2:32 PM, rob stanton <> wrote:
    > Hi Robert, got it now, but the data is all in hex, your code gives the
    > ascii code ? but its almost there. the name follows 10 00 10 00 in this
    > format 50 4e (P) (N) then xx 00 "surname" 5e "first name" followed by 10
    > and then 00
    > I'll see what I can do with your code but any help would be appreciated
    >

    Well if in your encoding letters do not match \w, you will need to
    indicate the values with hex values in the regex. This is a little
    more work but there should not be any difficulty.

    you can match the hex value 4e against /\x4e/
    a range of hex values with /[\x20-\x32]/
    or in the worst case you enumerate the characters that shall match
    with /[\x32,\x36,\x42...]/

    assuming that your letters are encoded with the characters 0x40, 0x42
    and 0x44 to 0x50
    the expression

    content.scan( /\n\0\n\0([\x40,\x42,\x44-\x50]+)/ )

    would do the trick.

    HTH
    R.
    > --
    > Posted via http://www.ruby-forum.com/.
    >
    >




    --
    The 1,000,000th fibonacci number contains '42' 2039 times; that is
    almost 30 occurrences more than expected (208988 digits).
    N.B. The 42nd fibonacci number does not contain '1000000' that is
    almost the expected 3.0e-06 times.
    Robert Dober, Feb 27, 2011
    #6
  7. rob stanton

    rob stanton Guest

    hmm does not work for me, could I send the file I'm working with, well a
    reduced file as it very big and see what you make of it, maybe you'll
    see what I'm after. It makes sense if you look at it with a hex viewer
    and search for 10 00 10 00 thanks for the help so far

    --
    Posted via http://www.ruby-forum.com/.
    rob stanton, Feb 27, 2011
    #7
  8. rob stanton

    Robert Dober Guest

    sure but by all means let us take this offline
    please send the file privately and I will try to make some time to
    look at it, but I'll probably not manage before next WE, maybe some
    good soul on this list volunteering?

    --
    The 1,000,000th fibonacci number contains '42' 2039 times; that is
    almost 30 occurrences more than expected (208988 digits).
    N.B. The 42nd fibonacci number does not contain '1000000' that is
    almost the expected 3.0e-06 times.
    Robert Dober, Feb 28, 2011
    #8
  9. rob stanton

    Robert Dober Guest

    Now I somehow succeeded to help our friend but I have to admit quite
    some ignorance with 1.9 encoding issues. In order to parse a binary
    file with a regex I needed to encode the regex in ASCII-8BIT the best
    I could do was adding a completely unnecessary byte to the regex (0xf2
    at the start)

    /\xf2?\x10\x00\x10\x00PN.\x00([\w^]+)/

    I sure would appreciate if someone could point me to how to do this properly.

    Thx in advance

    Robert
    --
    The 1,000,000th fibonacci number contains '42' 2039 times; that is
    almost 30 occurrences more than expected (208988 digits).
    N.B. The 42nd fibonacci number does not contain '1000000' that is
    almost the expected 3.0e-06 times.
    Robert Dober, Mar 1, 2011
    #9
  10. rob stanton

    Robert Dober Guest

    Eventually I found some time to investigate this. Searching on
    ruby-core, redmine and ruby-spec I found no indication whatsoever that
    it is possible to specify the encoding explicitly (with the exception
    of the u,n and s switches). I would love to have an `encoding:'
    parameter in Regexp#new.
    or at least a switch for force for ASCII-8BIT.
    Any thoughts on that.

    Cheers
    Robert
    Robert Dober, Mar 6, 2011
    #10
  11. rob stanton

    Chris Lervag Guest

    rob stanton wrote in post #984216:
    > I have a binary file in which I'd like to find multiple strings of 10
    > 00 10 00 (hex) amongst all the other values, then following that is a
    > name.
    >


    Sounds to me like you're trying to extract instances of Patient's Name
    from a DICOM file (to those of you who dont know, DICOM is a medical
    image format). Why don't you just use ruby-dicom? It will parse the
    DICOM file for you and give you alot of convenience methods to interact
    with the DICOM object.

    http://dicom.rubyforge.org/

    Best regards,
    Chris

    --
    Posted via http://www.ruby-forum.com/.
    Chris Lervag, Mar 6, 2011
    #11
  12. rob stanton

    rob stanton Guest

    Chris Lervag wrote in post #985736:
    > rob stanton wrote in post #984216:
    >> I have a binary file in which I'd like to find multiple strings of 10
    >> 00 10 00 (hex) amongst all the other values, then following that is a
    >> name.
    >>

    >
    > Sounds to me like you're trying to extract instances of Patient's Name
    > from a DICOM file (to those of you who dont know, DICOM is a medical
    > image format). Why don't you just use ruby-dicom? It will parse the
    > DICOM file for you and give you alot of convenience methods to interact
    > with the DICOM object.
    >
    > http://dicom.rubyforge.org/
    >
    > Best regards,
    > Chris


    Hi Chris, yes I have used ruby-dicom, it is very good at giving info for
    a given dicom image, but what I wanted to do was to read the DICOMDIR
    find the names and date scanned. This can be put into an excel sheet (or
    open office) with names and date scanned. I thought it might be easy but
    still trying to do it! Robert D helped a lot with suggestions and at the
    moment just need to format the date from DICOM YYYYMMDD into a format
    that's seen as a date in excel.

    --
    Posted via http://www.ruby-forum.com/.
    rob stanton, Mar 8, 2011
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. shruds
    Replies:
    1
    Views:
    783
    John C. Bollinger
    Jan 27, 2006
  2. yaipa
    Replies:
    13
    Views:
    722
    yaipa
    Jan 19, 2005
  3. Reading binary file finding EOF

    , Dec 13, 2004, in forum: C Programming
    Replies:
    11
    Views:
    649
    Lawrence Kirby
    Dec 14, 2004
  4. rob s.
    Replies:
    5
    Views:
    151
    David Jacobs
    Feb 25, 2011
  5. Shashank Khanvilkar

    finding a binary pattern in a file.

    Shashank Khanvilkar, Sep 20, 2005, in forum: Perl Misc
    Replies:
    2
    Views:
    125
    News KF
    Sep 20, 2005
Loading...

Share This Page