Byte Array to String

Discussion in 'ASP .Net' started by AG, Nov 22, 2007.

  1. AG

    AG Guest

    I have a file that contains ASCII and Extended ASCII characters.
    I need to get the file contents into a string, but the Extended ASCII
    characters (dec 128 and 129) are being changed to dec 63.

    I have tried several methods, but here is the one I thought would have
    worked.

    Dim strReturn As String
    Dim arBytes() As Byte
    arBytes = System.IO.File.ReadAllBytes(<myfile>)
    strReturn = System.Text.Encoding.UTF8.GetString(arBytes)

    When I examine strReturn, I find that the chars that should be chr(128) and
    chr(129) are all chr(63).

    The only thing I could get to work is

    Dim strReturn As String = String.Empty
    Dim arBytes() As Byte
    Dim sB As New StringBuilder
    Dim byT As Byte

    arBytes = System.IO.File.ReadAllBytes(strPathFile)
    For Each byT In arBytes
    sB.Append(Chr(byT))
    Next
    strReturn = sB.ToString

    Can anyone offer an explanation, and/or a better method?

    --

    AG
    Email: discussATadhdataDOTcom
    AG, Nov 22, 2007
    #1
    1. Advertising

  2. AG

    Nick Chan Guest

    i use this to read file contents

    Public Function GetFileContents(ByVal FullPath As String, _
    Optional ByRef ErrInfo As String = "") As String

    Dim strContents As String
    Dim objReader As StreamReader
    Try

    objReader = New StreamReader(FullPath)
    strContents = objReader.ReadToEnd()
    objReader.Close()
    Return strContents
    Catch Ex As Exception
    ErrInfo = Ex.Message
    Return Nothing
    End Try
    End Function

    On Nov 22, 11:56 am, "AG" <> wrote:
    > I have a file that contains ASCII and Extended ASCII characters.
    > I need to get the file contents into a string, but the Extended ASCII
    > characters (dec 128 and 129) are being changed to dec 63.
    >
    > I have tried several methods, but here is the one I thought would have
    > worked.
    >
    > Dim strReturn As String
    > Dim arBytes() As Byte
    > arBytes = System.IO.File.ReadAllBytes(<myfile>)
    > strReturn = System.Text.Encoding.UTF8.GetString(arBytes)
    >
    > When I examine strReturn, I find that the chars that should be chr(128) and
    > chr(129) are all chr(63).
    >
    > The only thing I could get to work is
    >
    > Dim strReturn As String = String.Empty
    > Dim arBytes() As Byte
    > Dim sB As New StringBuilder
    > Dim byT As Byte
    >
    > arBytes = System.IO.File.ReadAllBytes(strPathFile)
    > For Each byT In arBytes
    > sB.Append(Chr(byT))
    > Next
    > strReturn = sB.ToString
    >
    > Can anyone offer an explanation, and/or a better method?
    >
    > --
    >
    > AG
    > Email: discussATadhdataDOTcom
    Nick Chan, Nov 22, 2007
    #2
    1. Advertising

  3. Look at the Encoding object, as it is the quickest. THe other method is some
    form of loop, as suggested by Nick.

    --
    Gregory A. Beamer
    MVP, MCP: +I, SE, SD, DBA

    *************************************************
    | Think outside the box!
    |
    *************************************************
    "AG" <> wrote in message
    news:...
    >I have a file that contains ASCII and Extended ASCII characters.
    > I need to get the file contents into a string, but the Extended ASCII
    > characters (dec 128 and 129) are being changed to dec 63.
    >
    > I have tried several methods, but here is the one I thought would have
    > worked.
    >
    > Dim strReturn As String
    > Dim arBytes() As Byte
    > arBytes = System.IO.File.ReadAllBytes(<myfile>)
    > strReturn = System.Text.Encoding.UTF8.GetString(arBytes)
    >
    > When I examine strReturn, I find that the chars that should be chr(128)
    > and chr(129) are all chr(63).
    >
    > The only thing I could get to work is
    >
    > Dim strReturn As String = String.Empty
    > Dim arBytes() As Byte
    > Dim sB As New StringBuilder
    > Dim byT As Byte
    >
    > arBytes = System.IO.File.ReadAllBytes(strPathFile)
    > For Each byT In arBytes
    > sB.Append(Chr(byT))
    > Next
    > strReturn = sB.ToString
    >
    > Can anyone offer an explanation, and/or a better method?
    >
    > --
    >
    > AG
    > Email: discussATadhdataDOTcom
    >
    Cowboy \(Gregory A. Beamer\), Nov 22, 2007
    #3
  4. Hi AG,

    If the file contains character that exceed the ASCII char code scope(and
    those chars are stored correctly), that means the file's content is not
    stored as ASCII encoding(single byte charset).

    Generally speaking, if you're reading a text file(which means its content
    are character text rather than unreadable binary content), you should use
    text reading mode to read them(rather than read them as byte and convert
    them your self).

    And to read file as text mode, you need to know what is the
    encoding/charset of the text file's content. this info is needed when you
    try reading the file in Text Mode. For example, you can use the
    "StreamReader" class in .net to read file in text mode as below:

    =================
    StreamReader sr = new StreamReader("inputfile.txt", Encoding.UTF8);
    string content = sr.ReadToEnd();

    sr.Close();
    ================

    or you can also let the StreamReader to determine the encoding
    automatically (through file's BOM). But BOM(Byte Order mark) is not
    existent in text file:

    ======================
    StreamReader sr1 = new StreamReader("inputfile.txt", true);

    string content1 = sr1.ReadToEnd();

    sr1.Close();
    =================

    for your case, I think the file's encoding is likely not UTF8, and if you
    use UTF8 to decode the byte, you'll probably get wrong character.

    Sincerely,

    Steven Cheng

    Microsoft MSDN Online Support Lead



    This posting is provided "AS IS" with no warranties, and confers no rights.

    --------------------
    >Reply-To: "AG" <>
    >From: "AG" <>
    >Subject: Byte Array to String
    >Date: Wed, 21 Nov 2007 22:56:55 -0500
    >
    >I have a file that contains ASCII and Extended ASCII characters.
    >I need to get the file contents into a string, but the Extended ASCII
    >characters (dec 128 and 129) are being changed to dec 63.
    >
    >I have tried several methods, but here is the one I thought would have
    >worked.
    >
    > Dim strReturn As String
    > Dim arBytes() As Byte
    > arBytes = System.IO.File.ReadAllBytes(<myfile>)
    > strReturn = System.Text.Encoding.UTF8.GetString(arBytes)
    >
    >When I examine strReturn, I find that the chars that should be chr(128)

    and
    >chr(129) are all chr(63).
    >
    >The only thing I could get to work is
    >
    > Dim strReturn As String = String.Empty
    > Dim arBytes() As Byte
    > Dim sB As New StringBuilder
    > Dim byT As Byte
    >
    > arBytes = System.IO.File.ReadAllBytes(strPathFile)
    > For Each byT In arBytes
    > sB.Append(Chr(byT))
    > Next
    > strReturn = sB.ToString
    >
    >Can anyone offer an explanation, and/or a better method?
    >
    >--
    >
    >AG
    >Email: discussATadhdataDOTcom
    >
    >
    >
    Steven Cheng[MSFT], Nov 22, 2007
    #4
  5. AG

    AG Guest

    Thanks Nick.
    That is one of the methods that I tried which does not produce the desired
    results.

    --

    AG
    Email: discussATadhdataDOTcom
    "Nick Chan" <> wrote in message
    news:...
    >i use this to read file contents
    >
    > Public Function GetFileContents(ByVal FullPath As String, _
    > Optional ByRef ErrInfo As String = "") As String
    >
    > Dim strContents As String
    > Dim objReader As StreamReader
    > Try
    >
    > objReader = New StreamReader(FullPath)
    > strContents = objReader.ReadToEnd()
    > objReader.Close()
    > Return strContents
    > Catch Ex As Exception
    > ErrInfo = Ex.Message
    > Return Nothing
    > End Try
    > End Function
    >
    > On Nov 22, 11:56 am, "AG" <> wrote:
    >> I have a file that contains ASCII and Extended ASCII characters.
    >> I need to get the file contents into a string, but the Extended ASCII
    >> characters (dec 128 and 129) are being changed to dec 63.
    >>
    >> I have tried several methods, but here is the one I thought would have
    >> worked.
    >>
    >> Dim strReturn As String
    >> Dim arBytes() As Byte
    >> arBytes = System.IO.File.ReadAllBytes(<myfile>)
    >> strReturn = System.Text.Encoding.UTF8.GetString(arBytes)
    >>
    >> When I examine strReturn, I find that the chars that should be chr(128)
    >> and
    >> chr(129) are all chr(63).
    >>
    >> The only thing I could get to work is
    >>
    >> Dim strReturn As String = String.Empty
    >> Dim arBytes() As Byte
    >> Dim sB As New StringBuilder
    >> Dim byT As Byte
    >>
    >> arBytes = System.IO.File.ReadAllBytes(strPathFile)
    >> For Each byT In arBytes
    >> sB.Append(Chr(byT))
    >> Next
    >> strReturn = sB.ToString
    >>
    >> Can anyone offer an explanation, and/or a better method?
    >>
    >> --
    >>
    >> AG
    >> Email: discussATadhdataDOTcom

    >
    AG, Nov 22, 2007
    #5
  6. AG

    AG Guest

    Thanks Gregory.
    I have looked at it and tried several different methods. I guess I don't
    understand it well enough.

    --

    AG
    Email: discussATadhdataDOTcom
    "Cowboy (Gregory A. Beamer)" <> wrote in
    message news:%23C%...
    > Look at the Encoding object, as it is the quickest. THe other method is
    > some form of loop, as suggested by Nick.
    >
    > --
    > Gregory A. Beamer
    > MVP, MCP: +I, SE, SD, DBA
    >
    > *************************************************
    > | Think outside the box! |
    > *************************************************
    > "AG" <> wrote in message
    > news:...
    >>I have a file that contains ASCII and Extended ASCII characters.
    >> I need to get the file contents into a string, but the Extended ASCII
    >> characters (dec 128 and 129) are being changed to dec 63.
    >>
    >> I have tried several methods, but here is the one I thought would have
    >> worked.
    >>
    >> Dim strReturn As String
    >> Dim arBytes() As Byte
    >> arBytes = System.IO.File.ReadAllBytes(<myfile>)
    >> strReturn = System.Text.Encoding.UTF8.GetString(arBytes)
    >>
    >> When I examine strReturn, I find that the chars that should be chr(128)
    >> and chr(129) are all chr(63).
    >>
    >> The only thing I could get to work is
    >>
    >> Dim strReturn As String = String.Empty
    >> Dim arBytes() As Byte
    >> Dim sB As New StringBuilder
    >> Dim byT As Byte
    >>
    >> arBytes = System.IO.File.ReadAllBytes(strPathFile)
    >> For Each byT In arBytes
    >> sB.Append(Chr(byT))
    >> Next
    >> strReturn = sB.ToString
    >>
    >> Can anyone offer an explanation, and/or a better method?
    >>
    >> --
    >>
    >> AG
    >> Email: discussATadhdataDOTcom
    >>

    >
    >
    AG, Nov 22, 2007
    #6
  7. AG

    AG Guest

    Thanks for the reply Steven.

    I ended up reading as byte and converting myself because text reading mode
    (streamreader) produced the wrong characters for the extended ASCII
    characters.

    Perhaps a bit more of an explanation.

    The file is created by an Access application using VBA, as a method of
    exporting some database data.
    Since the data may contain all the usual record and field separators like
    crlf, commas, tabs, quotes, etc., the extended ASCII chars are used as
    record and field separators.

    It is created using the Open for append method and data added via the Print
    method, as follows. This method can not be changed, as it is in use in too
    many locations.

    Dim strRecord as string
    strRecord = "field1data" & Chr(128) & "field2data" & Chr(128) & "field3data"
    & Chr(129)
    Open <thefile> For Append As #1
    Print #1, strRecord
    Close #1

    As you can see, there is no BOM.

    The file is easily opened and read in VBA using Open For Binary:
    Dim strFileData as String
    Open <thefile> For Binary As #1
    strFileData = space(FileLen(<thefile>)
    Get #1, , strFileData
    Close #1

    This all works fine in VBA. Now, I would like to read the file using .NET
    framework.
    While my method of using Chr() on each byte works, it would seem that there
    should be a similar simple method in .NET to get the file contents without
    looping through each byte.
    According to the help file, Chr uses the Encoding class to return the
    appropriate character, so isn't there a method in the Encoding class that
    would perform the operation on the entire stream?

    --

    AG
    Email: discussATadhdataDOTcom
    "Steven Cheng[MSFT]" <> wrote in message
    news:...
    > Hi AG,
    >
    > If the file contains character that exceed the ASCII char code scope(and
    > those chars are stored correctly), that means the file's content is not
    > stored as ASCII encoding(single byte charset).
    >
    > Generally speaking, if you're reading a text file(which means its content
    > are character text rather than unreadable binary content), you should use
    > text reading mode to read them(rather than read them as byte and convert
    > them your self).
    >
    > And to read file as text mode, you need to know what is the
    > encoding/charset of the text file's content. this info is needed when you
    > try reading the file in Text Mode. For example, you can use the
    > "StreamReader" class in .net to read file in text mode as below:
    >
    > =================
    > StreamReader sr = new StreamReader("inputfile.txt", Encoding.UTF8);
    > string content = sr.ReadToEnd();
    >
    > sr.Close();
    > ================
    >
    > or you can also let the StreamReader to determine the encoding
    > automatically (through file's BOM). But BOM(Byte Order mark) is not
    > existent in text file:
    >
    > ======================
    > StreamReader sr1 = new StreamReader("inputfile.txt", true);
    >
    > string content1 = sr1.ReadToEnd();
    >
    > sr1.Close();
    > =================
    >
    > for your case, I think the file's encoding is likely not UTF8, and if you
    > use UTF8 to decode the byte, you'll probably get wrong character.
    >
    > Sincerely,
    >
    > Steven Cheng
    >
    > Microsoft MSDN Online Support Lead
    >
    >
    >
    > This posting is provided "AS IS" with no warranties, and confers no
    > rights.
    >
    > --------------------
    >>Reply-To: "AG" <>
    >>From: "AG" <>
    >>Subject: Byte Array to String
    >>Date: Wed, 21 Nov 2007 22:56:55 -0500
    >>
    >>I have a file that contains ASCII and Extended ASCII characters.
    >>I need to get the file contents into a string, but the Extended ASCII
    >>characters (dec 128 and 129) are being changed to dec 63.
    >>
    >>I have tried several methods, but here is the one I thought would have
    >>worked.
    >>
    >> Dim strReturn As String
    >> Dim arBytes() As Byte
    >> arBytes = System.IO.File.ReadAllBytes(<myfile>)
    >> strReturn = System.Text.Encoding.UTF8.GetString(arBytes)
    >>
    >>When I examine strReturn, I find that the chars that should be chr(128)

    > and
    >>chr(129) are all chr(63).
    >>
    >>The only thing I could get to work is
    >>
    >> Dim strReturn As String = String.Empty
    >> Dim arBytes() As Byte
    >> Dim sB As New StringBuilder
    >> Dim byT As Byte
    >>
    >> arBytes = System.IO.File.ReadAllBytes(strPathFile)
    >> For Each byT In arBytes
    >> sB.Append(Chr(byT))
    >> Next
    >> strReturn = sB.ToString
    >>
    >>Can anyone offer an explanation, and/or a better method?
    >>
    >>--
    >>
    >>AG
    >>Email: discussATadhdataDOTcom
    >>
    >>
    >>

    >
    AG, Nov 22, 2007
    #7
  8. AG

    Henk Kelder Guest

    You probably need the figure out what the codepage of the ASCII file is.

    It is probably 437 or 850 (if you are american of westeuropian), but other
    are also possible.
    Most likely it depends where you are located.

    GetEncoding("cp437") of something like that should give you the encoding.

    Henk


    "AG" <> wrote in message
    news:...
    >I have a file that contains ASCII and Extended ASCII characters.
    > I need to get the file contents into a string, but the Extended ASCII
    > characters (dec 128 and 129) are being changed to dec 63.
    >
    > I have tried several methods, but here is the one I thought would have
    > worked.
    >
    > Dim strReturn As String
    > Dim arBytes() As Byte
    > arBytes = System.IO.File.ReadAllBytes(<myfile>)
    > strReturn = System.Text.Encoding.UTF8.GetString(arBytes)
    >
    > When I examine strReturn, I find that the chars that should be chr(128)
    > and chr(129) are all chr(63).
    >
    > The only thing I could get to work is
    >
    > Dim strReturn As String = String.Empty
    > Dim arBytes() As Byte
    > Dim sB As New StringBuilder
    > Dim byT As Byte
    >
    > arBytes = System.IO.File.ReadAllBytes(strPathFile)
    > For Each byT In arBytes
    > sB.Append(Chr(byT))
    > Next
    > strReturn = sB.ToString
    >
    > Can anyone offer an explanation, and/or a better method?
    >
    > --
    >
    > AG
    > Email: discussATadhdataDOTcom
    >
    Henk Kelder, Nov 22, 2007
    #8
  9. AG

    AG Guest

    Thanks Henk.

    Neither 437 or 850 worked, but you put me on the right track.
    1252 did work!

    --

    AG
    Email: discussATadhdataDOTcom
    "Henk Kelder" <> wrote in message
    news:4745b481$0$5778$2.nl...
    > You probably need the figure out what the codepage of the ASCII file is.
    >
    > It is probably 437 or 850 (if you are american of westeuropian), but other
    > are also possible.
    > Most likely it depends where you are located.
    >
    > GetEncoding("cp437") of something like that should give you the encoding.
    >
    > Henk
    >
    >
    > "AG" <> wrote in message
    > news:...
    >>I have a file that contains ASCII and Extended ASCII characters.
    >> I need to get the file contents into a string, but the Extended ASCII
    >> characters (dec 128 and 129) are being changed to dec 63.
    >>
    >> I have tried several methods, but here is the one I thought would have
    >> worked.
    >>
    >> Dim strReturn As String
    >> Dim arBytes() As Byte
    >> arBytes = System.IO.File.ReadAllBytes(<myfile>)
    >> strReturn = System.Text.Encoding.UTF8.GetString(arBytes)
    >>
    >> When I examine strReturn, I find that the chars that should be chr(128)
    >> and chr(129) are all chr(63).
    >>
    >> The only thing I could get to work is
    >>
    >> Dim strReturn As String = String.Empty
    >> Dim arBytes() As Byte
    >> Dim sB As New StringBuilder
    >> Dim byT As Byte
    >>
    >> arBytes = System.IO.File.ReadAllBytes(strPathFile)
    >> For Each byT In arBytes
    >> sB.Append(Chr(byT))
    >> Next
    >> strReturn = sB.ToString
    >>
    >> Can anyone offer an explanation, and/or a better method?
    >>
    >> --
    >>
    >> AG
    >> Email: discussATadhdataDOTcom
    >>

    >
    >
    AG, Nov 23, 2007
    #9
  10. Thanks for your reply,

    Yes, for text file, if we doesn't get the correct encoding/charset, the
    retrieved text will mismatch the original characters.

    For your scenario, I think VBA may use the default system locale to
    encoding the characters. You can also try
    "Encoding.Default" as the parameter in the SreamReader's constructor.
    "Encoding.Default" means the current system ANSI codepage. If this still
    not work, I think the VBA is producing the file like a binary format
    one(doesn't use a consistent encoding for the entire file) and thus, using
    binary read mode to decode it individually should be reasonable.

    Anyway, if you have any further questions on this, welcome to post here.

    Sincerely,

    Steven Cheng

    Microsoft MSDN Online Support Lead


    This posting is provided "AS IS" with no warranties, and confers no rights.



    --------------------
    >Reply-To: "AG" <>
    >From: "AG" <>
    >References: <>

    <>
    >Subject: Re: Byte Array to String
    >Date: Thu, 22 Nov 2007 09:25:49 -0500


    >
    >Thanks for the reply Steven.
    >
    >I ended up reading as byte and converting myself because text reading mode
    >(streamreader) produced the wrong characters for the extended ASCII
    >characters.
    >
    >Perhaps a bit more of an explanation.
    >
    >The file is created by an Access application using VBA, as a method of
    >exporting some database data.
    >Since the data may contain all the usual record and field separators like
    >crlf, commas, tabs, quotes, etc., the extended ASCII chars are used as
    >record and field separators.
    >
    >It is created using the Open for append method and data added via the

    Print
    >method, as follows. This method can not be changed, as it is in use in too
    >many locations.
    >
    >Dim strRecord as string
    >strRecord = "field1data" & Chr(128) & "field2data" & Chr(128) &

    "field3data"
    >& Chr(129)
    >Open <thefile> For Append As #1
    >Print #1, strRecord
    >Close #1
    >
    >As you can see, there is no BOM.
    >
    >The file is easily opened and read in VBA using Open For Binary:
    >Dim strFileData as String
    >Open <thefile> For Binary As #1
    >strFileData = space(FileLen(<thefile>)
    >Get #1, , strFileData
    >Close #1
    >
    >This all works fine in VBA. Now, I would like to read the file using .NET
    >framework.
    >While my method of using Chr() on each byte works, it would seem that

    there
    >should be a similar simple method in .NET to get the file contents without
    >looping through each byte.
    >According to the help file, Chr uses the Encoding class to return the
    >appropriate character, so isn't there a method in the Encoding class that
    >would perform the operation on the entire stream?
    >
    >--
    >
    >AG
    >Email: discussATadhdataDOTcom
    >"Steven Cheng[MSFT]" <> wrote in message
    >news:...
    >> Hi AG,
    >>
    >> If the file contains character that exceed the ASCII char code scope(and
    >> those chars are stored correctly), that means the file's content is not
    >> stored as ASCII encoding(single byte charset).
    >>
    >> Generally speaking, if you're reading a text file(which means its content
    >> are character text rather than unreadable binary content), you should use
    >> text reading mode to read them(rather than read them as byte and convert
    >> them your self).
    >>
    >> And to read file as text mode, you need to know what is the
    >> encoding/charset of the text file's content. this info is needed when you
    >> try reading the file in Text Mode. For example, you can use the
    >> "StreamReader" class in .net to read file in text mode as below:
    >>
    >> =================
    >> StreamReader sr = new StreamReader("inputfile.txt", Encoding.UTF8);
    >> string content = sr.ReadToEnd();
    >>
    >> sr.Close();
    >> ================
    >>
    >> or you can also let the StreamReader to determine the encoding
    >> automatically (through file's BOM). But BOM(Byte Order mark) is not
    >> existent in text file:
    >>
    >> ======================
    >> StreamReader sr1 = new StreamReader("inputfile.txt", true);
    >>
    >> string content1 = sr1.ReadToEnd();
    >>
    >> sr1.Close();
    >> =================
    >>
    >> for your case, I think the file's encoding is likely not UTF8, and if you
    >> use UTF8 to decode the byte, you'll probably get wrong character.
    >>
    >> Sincerely,
    >>
    >> Steven Cheng
    >>
    >> Microsoft MSDN Online Support Lead
    >>
    >>
    >>
    >> This posting is provided "AS IS" with no warranties, and confers no
    >> rights.
    >>
    >> --------------------
    >>>Reply-To: "AG" <>
    >>>From: "AG" <>
    >>>Subject: Byte Array to String
    >>>Date: Wed, 21 Nov 2007 22:56:55 -0500
    >>>
    >>>I have a file that contains ASCII and Extended ASCII characters.
    >>>I need to get the file contents into a string, but the Extended ASCII
    >>>characters (dec 128 and 129) are being changed to dec 63.
    >>>
    >>>I have tried several methods, but here is the one I thought would have
    >>>worked.
    >>>
    >>> Dim strReturn As String
    >>> Dim arBytes() As Byte
    >>> arBytes = System.IO.File.ReadAllBytes(<myfile>)
    >>> strReturn = System.Text.Encoding.UTF8.GetString(arBytes)
    >>>
    >>>When I examine strReturn, I find that the chars that should be chr(128)

    >> and
    >>>chr(129) are all chr(63).
    >>>
    >>>The only thing I could get to work is
    >>>
    >>> Dim strReturn As String = String.Empty
    >>> Dim arBytes() As Byte
    >>> Dim sB As New StringBuilder
    >>> Dim byT As Byte
    >>>
    >>> arBytes = System.IO.File.ReadAllBytes(strPathFile)
    >>> For Each byT In arBytes
    >>> sB.Append(Chr(byT))
    >>> Next
    >>> strReturn = sB.ToString
    >>>
    >>>Can anyone offer an explanation, and/or a better method?
    >>>
    >>>--
    >>>
    >>>AG
    >>>Email: discussATadhdataDOTcom
    >>>
    >>>
    >>>

    >>

    >
    >
    >
    Steven Cheng[MSFT], Nov 23, 2007
    #10
  11. AG

    AG Guest

    Thanks Steven.
    Encoding.Default, which is 1252 does work.as I reported in my response to
    Henk's post.
    I knew there had to be a simple solution.
    Thanks to all responders.
    --

    AG
    Email: discussATadhdataDOTcom
    "Steven Cheng[MSFT]" <> wrote in message
    news:...
    > Thanks for your reply,
    >
    > Yes, for text file, if we doesn't get the correct encoding/charset, the
    > retrieved text will mismatch the original characters.
    >
    > For your scenario, I think VBA may use the default system locale to
    > encoding the characters. You can also try
    > "Encoding.Default" as the parameter in the SreamReader's constructor.
    > "Encoding.Default" means the current system ANSI codepage. If this still
    > not work, I think the VBA is producing the file like a binary format
    > one(doesn't use a consistent encoding for the entire file) and thus, using
    > binary read mode to decode it individually should be reasonable.
    >
    > Anyway, if you have any further questions on this, welcome to post here.
    >
    > Sincerely,
    >
    > Steven Cheng
    >
    > Microsoft MSDN Online Support Lead
    >
    >
    > This posting is provided "AS IS" with no warranties, and confers no
    > rights.
    >
    >
    >
    > --------------------
    >>Reply-To: "AG" <>
    >>From: "AG" <>
    >>References: <>

    > <>
    >>Subject: Re: Byte Array to String
    >>Date: Thu, 22 Nov 2007 09:25:49 -0500

    >
    >>
    >>Thanks for the reply Steven.
    >>
    >>I ended up reading as byte and converting myself because text reading mode
    >>(streamreader) produced the wrong characters for the extended ASCII
    >>characters.
    >>
    >>Perhaps a bit more of an explanation.
    >>
    >>The file is created by an Access application using VBA, as a method of
    >>exporting some database data.
    >>Since the data may contain all the usual record and field separators like
    >>crlf, commas, tabs, quotes, etc., the extended ASCII chars are used as
    >>record and field separators.
    >>
    >>It is created using the Open for append method and data added via the

    > Print
    >>method, as follows. This method can not be changed, as it is in use in too
    >>many locations.
    >>
    >>Dim strRecord as string
    >>strRecord = "field1data" & Chr(128) & "field2data" & Chr(128) &

    > "field3data"
    >>& Chr(129)
    >>Open <thefile> For Append As #1
    >>Print #1, strRecord
    >>Close #1
    >>
    >>As you can see, there is no BOM.
    >>
    >>The file is easily opened and read in VBA using Open For Binary:
    >>Dim strFileData as String
    >>Open <thefile> For Binary As #1
    >>strFileData = space(FileLen(<thefile>)
    >>Get #1, , strFileData
    >>Close #1
    >>
    >>This all works fine in VBA. Now, I would like to read the file using .NET
    >>framework.
    >>While my method of using Chr() on each byte works, it would seem that

    > there
    >>should be a similar simple method in .NET to get the file contents without
    >>looping through each byte.
    >>According to the help file, Chr uses the Encoding class to return the
    >>appropriate character, so isn't there a method in the Encoding class that
    >>would perform the operation on the entire stream?
    >>
    >>--
    >>
    >>AG
    >>Email: discussATadhdataDOTcom
    >>"Steven Cheng[MSFT]" <> wrote in message
    >>news:...
    >>> Hi AG,
    >>>
    >>> If the file contains character that exceed the ASCII char code scope(and
    >>> those chars are stored correctly), that means the file's content is not
    >>> stored as ASCII encoding(single byte charset).
    >>>
    >>> Generally speaking, if you're reading a text file(which means its
    >>> content
    >>> are character text rather than unreadable binary content), you should
    >>> use
    >>> text reading mode to read them(rather than read them as byte and convert
    >>> them your self).
    >>>
    >>> And to read file as text mode, you need to know what is the
    >>> encoding/charset of the text file's content. this info is needed when
    >>> you
    >>> try reading the file in Text Mode. For example, you can use the
    >>> "StreamReader" class in .net to read file in text mode as below:
    >>>
    >>> =================
    >>> StreamReader sr = new StreamReader("inputfile.txt", Encoding.UTF8);
    >>> string content = sr.ReadToEnd();
    >>>
    >>> sr.Close();
    >>> ================
    >>>
    >>> or you can also let the StreamReader to determine the encoding
    >>> automatically (through file's BOM). But BOM(Byte Order mark) is not
    >>> existent in text file:
    >>>
    >>> ======================
    >>> StreamReader sr1 = new StreamReader("inputfile.txt", true);
    >>>
    >>> string content1 = sr1.ReadToEnd();
    >>>
    >>> sr1.Close();
    >>> =================
    >>>
    >>> for your case, I think the file's encoding is likely not UTF8, and if
    >>> you
    >>> use UTF8 to decode the byte, you'll probably get wrong character.
    >>>
    >>> Sincerely,
    >>>
    >>> Steven Cheng
    >>>
    >>> Microsoft MSDN Online Support Lead
    >>>
    >>>
    >>>
    >>> This posting is provided "AS IS" with no warranties, and confers no
    >>> rights.
    >>>
    >>> --------------------
    >>>>Reply-To: "AG" <>
    >>>>From: "AG" <>
    >>>>Subject: Byte Array to String
    >>>>Date: Wed, 21 Nov 2007 22:56:55 -0500
    >>>>
    >>>>I have a file that contains ASCII and Extended ASCII characters.
    >>>>I need to get the file contents into a string, but the Extended ASCII
    >>>>characters (dec 128 and 129) are being changed to dec 63.
    >>>>
    >>>>I have tried several methods, but here is the one I thought would have
    >>>>worked.
    >>>>
    >>>> Dim strReturn As String
    >>>> Dim arBytes() As Byte
    >>>> arBytes = System.IO.File.ReadAllBytes(<myfile>)
    >>>> strReturn = System.Text.Encoding.UTF8.GetString(arBytes)
    >>>>
    >>>>When I examine strReturn, I find that the chars that should be chr(128)
    >>> and
    >>>>chr(129) are all chr(63).
    >>>>
    >>>>The only thing I could get to work is
    >>>>
    >>>> Dim strReturn As String = String.Empty
    >>>> Dim arBytes() As Byte
    >>>> Dim sB As New StringBuilder
    >>>> Dim byT As Byte
    >>>>
    >>>> arBytes = System.IO.File.ReadAllBytes(strPathFile)
    >>>> For Each byT In arBytes
    >>>> sB.Append(Chr(byT))
    >>>> Next
    >>>> strReturn = sB.ToString
    >>>>
    >>>>Can anyone offer an explanation, and/or a better method?
    >>>>
    >>>>--
    >>>>
    >>>>AG
    >>>>Email: discussATadhdataDOTcom
    >>>>
    >>>>
    >>>>
    >>>

    >>
    >>
    >>

    >
    AG, Nov 23, 2007
    #11
  12. AG

    Henk Kelder Guest

    1252 is the default windows codepage, most often also called ANSI.
    For american and western europa windows installation you probably can also use Encoding.Default....

    it is alway a problem detecting the codepage a text is in. Once loaded into memory a string is always unicode (a multibyte charset).
    But on disk or in streams the charset is always decoded.

    Under DOS this used to be codepage 437 (us american) or 850 (western europe), but many others can also be possible.
    Quite frankly ... a pain in the neck.. cauz you would never know which codepage a text was in, unless you knew where it came from.

    Nowedays, the most frequent used encodings are ANSI, UTF-8 or UTF-16.

    It is quite common that a file encoded in UTF-8 or UTF-16 has 3 bytes at the beginning of the file.
    These bytes are called the Byte Order Marker, or short BOM.

    ..NET is capable of detecting this BOM, but for situations when there is no BOM you need to tell .NET in which encoding a file is.
    In your situation it appears to be ANSI = CP1252 = Encoding.Default.

    I would go for a streamreader:

    Public Sub New ( _
    stream As Stream, _
    encoding As Encoding, _
    detectEncodingFromByteOrderMarks As Boolean _

    where detectEncodingFromByteOrderMarks is set to True, but for files without the BOM you would specify encoding.Default (=CP1252) as default encoding.






    "AG" <> wrote in message news:...
    > Thanks Henk.
    >
    > Neither 437 or 850 worked, but you put me on the right track.
    > 1252 did work!
    >
    > --
    >
    > AG
    > Email: discussATadhdataDOTcom
    > "Henk Kelder" <> wrote in message
    > news:4745b481$0$5778$2.nl...
    >> You probably need the figure out what the codepage of the ASCII file is.
    >>
    >> It is probably 437 or 850 (if you are american of westeuropian), but other
    >> are also possible.
    >> Most likely it depends where you are located.
    >>
    >> GetEncoding("cp437") of something like that should give you the encoding.
    >>
    >> Henk
    >>
    >>
    >> "AG" <> wrote in message
    >> news:...
    >>>I have a file that contains ASCII and Extended ASCII characters.
    >>> I need to get the file contents into a string, but the Extended ASCII
    >>> characters (dec 128 and 129) are being changed to dec 63.
    >>>
    >>> I have tried several methods, but here is the one I thought would have
    >>> worked.
    >>>
    >>> Dim strReturn As String
    >>> Dim arBytes() As Byte
    >>> arBytes = System.IO.File.ReadAllBytes(<myfile>)
    >>> strReturn = System.Text.Encoding.UTF8.GetString(arBytes)
    >>>
    >>> When I examine strReturn, I find that the chars that should be chr(128)
    >>> and chr(129) are all chr(63).
    >>>
    >>> The only thing I could get to work is
    >>>
    >>> Dim strReturn As String = String.Empty
    >>> Dim arBytes() As Byte
    >>> Dim sB As New StringBuilder
    >>> Dim byT As Byte
    >>>
    >>> arBytes = System.IO.File.ReadAllBytes(strPathFile)
    >>> For Each byT In arBytes
    >>> sB.Append(Chr(byT))
    >>> Next
    >>> strReturn = sB.ToString
    >>>
    >>> Can anyone offer an explanation, and/or a better method?
    >>>
    >>> --
    >>>
    >>> AG
    >>> Email: discussATadhdataDOTcom
    >>>

    >>
    >>

    >
    >
    Henk Kelder, Nov 25, 2007
    #12
  13. AG

    AG Guest

    Thanks Henk,

    You are correct, I am now using a stream and Encoding.Default does work. I had not previously not tried default as the documentation on encoding, while not specifying it, led me to believe that UTF-8 was the default. I agree, a pain...

    --

    AG
    Email: discussATadhdataDOTcom
    "Henk Kelder" <> wrote in message news:47497578$0$7666$2.nl...
    1252 is the default windows codepage, most often also called ANSI.
    For american and western europa windows installation you probably can also use Encoding.Default....

    it is alway a problem detecting the codepage a text is in. Once loaded into memory a string is always unicode (a multibyte charset).
    But on disk or in streams the charset is always decoded.

    Under DOS this used to be codepage 437 (us american) or 850 (western europe), but many others can also be possible.
    Quite frankly ... a pain in the neck.. cauz you would never know which codepage a text was in, unless you knew where it came from.

    Nowedays, the most frequent used encodings are ANSI, UTF-8 or UTF-16.

    It is quite common that a file encoded in UTF-8 or UTF-16 has 3 bytes at the beginning of the file.
    These bytes are called the Byte Order Marker, or short BOM.

    .NET is capable of detecting this BOM, but for situations when there is no BOM you need to tell .NET in which encoding a file is.
    In your situation it appears to be ANSI = CP1252 = Encoding.Default.

    I would go for a streamreader:

    Public Sub New ( _
    stream As Stream, _
    encoding As Encoding, _
    detectEncodingFromByteOrderMarks As Boolean _

    where detectEncodingFromByteOrderMarks is set to True, but for files without the BOM you would specify encoding.Default (=CP1252) as default encoding.






    "AG" <> wrote in message news:...
    > Thanks Henk.
    >
    > Neither 437 or 850 worked, but you put me on the right track.
    > 1252 did work!
    >
    > --
    >
    > AG
    > Email: discussATadhdataDOTcom
    > "Henk Kelder" <> wrote in message
    > news:4745b481$0$5778$2.nl...
    >> You probably need the figure out what the codepage of the ASCII file is.
    >>
    >> It is probably 437 or 850 (if you are american of westeuropian), but other
    >> are also possible.
    >> Most likely it depends where you are located.
    >>
    >> GetEncoding("cp437") of something like that should give you the encoding.
    >>
    >> Henk
    >>
    >>
    >> "AG" <> wrote in message
    >> news:...
    >>>I have a file that contains ASCII and Extended ASCII characters.
    >>> I need to get the file contents into a string, but the Extended ASCII
    >>> characters (dec 128 and 129) are being changed to dec 63.
    >>>
    >>> I have tried several methods, but here is the one I thought would have
    >>> worked.
    >>>
    >>> Dim strReturn As String
    >>> Dim arBytes() As Byte
    >>> arBytes = System.IO.File.ReadAllBytes(<myfile>)
    >>> strReturn = System.Text.Encoding.UTF8.GetString(arBytes)
    >>>
    >>> When I examine strReturn, I find that the chars that should be chr(128)
    >>> and chr(129) are all chr(63).
    >>>
    >>> The only thing I could get to work is
    >>>
    >>> Dim strReturn As String = String.Empty
    >>> Dim arBytes() As Byte
    >>> Dim sB As New StringBuilder
    >>> Dim byT As Byte
    >>>
    >>> arBytes = System.IO.File.ReadAllBytes(strPathFile)
    >>> For Each byT In arBytes
    >>> sB.Append(Chr(byT))
    >>> Next
    >>> strReturn = sB.ToString
    >>>
    >>> Can anyone offer an explanation, and/or a better method?
    >>>
    >>> --
    >>>
    >>> AG
    >>> Email: discussATadhdataDOTcom
    >>>

    >>
    >>

    >
    >
    AG, Nov 25, 2007
    #13
  14. Yes, for your machine, since it is a western european region one, the
    default encoding is usually windows 1252. However, this Encoding.Default
    will also work for other region based systems. For example, on a machine
    configured as east eastern asia locale, the Encoding.Default will return
    the encoding/codepage for non-unicode convertion set on that machine.

    Anyway, glad that it has been working for you now:)

    Sincerely,

    Steven Cheng

    Microsoft MSDN Online Support Lead


    This posting is provided "AS IS" with no warranties, and confers no rights.


    --------------------
    >Reply-To: "AG" <>
    >From: "AG" <>
    >Subject: Re: Byte Array to String
    >Date: Fri, 23 Nov 2007 08:19:58 -0500


    >
    >Thanks Steven.
    >Encoding.Default, which is 1252 does work.as I reported in my response to
    >Henk's post.
    >I knew there had to be a simple solution.
    >Thanks to all responders.
    >--
    >
    >AG
    >Email: discussATadhdataDOTcom
    >"Steven Cheng[MSFT]" <> wrote in message
    >news:...
    >> Thanks for your reply,
    >>
    >> Yes, for text file, if we doesn't get the correct encoding/charset, the
    >> retrieved text will mismatch the original characters.
    >>
    >> For your scenario, I think VBA may use the default system locale to
    >> encoding the characters. You can also try
    >> "Encoding.Default" as the parameter in the SreamReader's constructor.
    >> "Encoding.Default" means the current system ANSI codepage. If this still
    >> not work, I think the VBA is producing the file like a binary format
    >> one(doesn't use a consistent encoding for the entire file) and thus,

    using
    >> binary read mode to decode it individually should be reasonable.
    >>
    >> Anyway, if you have any further questions on this, welcome to post here.
    >>
    >> Sincerely,
    >>
    >> Steven Cheng
    >>
    >> Microsoft MSDN Online Support Lead
    >>
    >>
    >> This posting is provided "AS IS" with no warranties, and confers no
    >> rights.
    >>
    >>
    >>
    >> --------------------
    >>>Reply-To: "AG" <>
    >>>From: "AG" <>
    >>>References: <>

    >> <>
    >>>Subject: Re: Byte Array to String
    >>>Date: Thu, 22 Nov 2007 09:25:49 -0500

    >>
    >>>
    >>>Thanks for the reply Steven.
    >>>
    >>>I ended up reading as byte and converting myself because text reading

    mode
    >>>(streamreader) produced the wrong characters for the extended ASCII
    >>>characters.
    >>>
    >>>Perhaps a bit more of an explanation.
    >>>
    >>>The file is created by an Access application using VBA, as a method of
    >>>exporting some database data.
    >>>Since the data may contain all the usual record and field separators like
    >>>crlf, commas, tabs, quotes, etc., the extended ASCII chars are used as
    >>>record and field separators.
    >>>
    >>>It is created using the Open for append method and data added via the

    >> Print
    >>>method, as follows. This method can not be changed, as it is in use in

    too
    >>>many locations.
    >>>
    >>>Dim strRecord as string
    >>>strRecord = "field1data" & Chr(128) & "field2data" & Chr(128) &

    >> "field3data"
    >>>& Chr(129)
    >>>Open <thefile> For Append As #1
    >>>Print #1, strRecord
    >>>Close #1
    >>>
    >>>As you can see, there is no BOM.
    >>>
    >>>The file is easily opened and read in VBA using Open For Binary:
    >>>Dim strFileData as String
    >>>Open <thefile> For Binary As #1
    >>>strFileData = space(FileLen(<thefile>)
    >>>Get #1, , strFileData
    >>>Close #1
    >>>
    >>>This all works fine in VBA. Now, I would like to read the file using .NET
    >>>framework.
    >>>While my method of using Chr() on each byte works, it would seem that

    >> there
    >>>should be a similar simple method in .NET to get the file contents

    without
    >>>looping through each byte.
    >>>According to the help file, Chr uses the Encoding class to return the
    >>>appropriate character, so isn't there a method in the Encoding class that
    >>>would perform the operation on the entire stream?
    >>>
    >>>--
    >>>
    >>>AG
    >>>Email: discussATadhdataDOTcom
    >>>"Steven Cheng[MSFT]" <> wrote in message
    >>>news:...
    >>>> Hi AG,
    >>>>
    >>>> If the file contains character that exceed the ASCII char code

    scope(and
    >>>> those chars are stored correctly), that means the file's content is not
    >>>> stored as ASCII encoding(single byte charset).
    >>>>
    >>>> Generally speaking, if you're reading a text file(which means its
    >>>> content
    >>>> are character text rather than unreadable binary content), you should
    >>>> use
    >>>> text reading mode to read them(rather than read them as byte and

    convert
    >>>> them your self).
    >>>>
    >>>> And to read file as text mode, you need to know what is the
    >>>> encoding/charset of the text file's content. this info is needed when
    >>>> you
    >>>> try reading the file in Text Mode. For example, you can use the
    >>>> "StreamReader" class in .net to read file in text mode as below:
    >>>>
    >>>> =================
    >>>> StreamReader sr = new StreamReader("inputfile.txt", Encoding.UTF8);
    >>>> string content = sr.ReadToEnd();
    >>>>
    >>>> sr.Close();
    >>>> ================
    >>>>
    >>>> or you can also let the StreamReader to determine the encoding
    >>>> automatically (through file's BOM). But BOM(Byte Order mark) is not
    >>>> existent in text file:
    >>>>
    >>>> ======================
    >>>> StreamReader sr1 = new StreamReader("inputfile.txt", true);
    >>>>
    >>>> string content1 = sr1.ReadToEnd();
    >>>>
    >>>> sr1.Close();
    >>>> =================
    >>>>
    >>>> for your case, I think the file's encoding is likely not UTF8, and if
    >>>> you
    >>>> use UTF8 to decode the byte, you'll probably get wrong character.
    >>>>
    >>>> Sincerely,
    >>>>
    >>>> Steven Cheng
    >>>>
    >>>> Microsoft MSDN Online Support Lead
    >>>>
    >>>>
    >>>>
    >>>> This posting is provided "AS IS" with no warranties, and confers no
    >>>> rights.
    >>>>
    >>>> --------------------
    >>>>>Reply-To: "AG" <>
    >>>>>From: "AG" <>
    >>>>>Subject: Byte Array to String
    >>>>>Date: Wed, 21 Nov 2007 22:56:55 -0500
    >>>>>
    >>>>>I have a file that contains ASCII and Extended ASCII characters.
    >>>>>I need to get the file contents into a string, but the Extended ASCII
    >>>>>characters (dec 128 and 129) are being changed to dec 63.
    >>>>>
    >>>>>I have tried several methods, but here is the one I thought would have
    >>>>>worked.
    >>>>>
    >>>>> Dim strReturn As String
    >>>>> Dim arBytes() As Byte
    >>>>> arBytes = System.IO.File.ReadAllBytes(<myfile>)
    >>>>> strReturn = System.Text.Encoding.UTF8.GetString(arBytes)
    >>>>>
    >>>>>When I examine strReturn, I find that the chars that should be chr(128)
    >>>> and
    >>>>>chr(129) are all chr(63).
    >>>>>
    >>>>>The only thing I could get to work is
    >>>>>
    >>>>> Dim strReturn As String = String.Empty
    >>>>> Dim arBytes() As Byte
    >>>>> Dim sB As New StringBuilder
    >>>>> Dim byT As Byte
    >>>>>
    >>>>> arBytes = System.IO.File.ReadAllBytes(strPathFile)
    >>>>> For Each byT In arBytes
    >>>>> sB.Append(Chr(byT))
    >>>>> Next
    >>>>> strReturn = sB.ToString
    >>>>>
    >>>>>Can anyone offer an explanation, and/or a better method?
    >>>>>
    >>>>>--
    >>>>>
    >>>>>AG
    >>>>>Email: discussATadhdataDOTcom
    >>>>>
    >>>>>
    >>>>>
    >>>>
    >>>
    >>>
    >>>

    >>

    >
    >
    >
    Steven Cheng[MSFT], Nov 26, 2007
    #14
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Bharat Bhushan

    Appending byte[] to another byte[] array

    Bharat Bhushan, Aug 5, 2003, in forum: Java
    Replies:
    15
    Views:
    40,245
    Roedy Green
    Aug 5, 2003
  2. Kirby
    Replies:
    3
    Views:
    642
    Kirby
    Oct 8, 2004
  3. Replies:
    20
    Views:
    9,770
    licebmi
    Sep 7, 2009
  4. Tom McGlynn
    Replies:
    4
    Views:
    855
    Mark Space
    Apr 19, 2008
  5. Patricia Shanahan
    Replies:
    0
    Views:
    385
    Patricia Shanahan
    Apr 17, 2008
Loading...

Share This Page