String exceeding length - Getting absolute string length

Discussion in 'Java' started by james.w.appleby@gmail.com, Jan 9, 2007.

  1. Guest

    Hello,

    I am having a problem when inputting very long strings into a database.
    The application I am writing can use different databases (thanks to
    the wonders of JDBC) so this issue has been causing problems on both
    Oracle and SQL Server.

    Because one of the design objects was to support any JDBC compatible
    database, a concern was raised about text widths. It was therefore
    decided that the maximum column width for a VARCHAR would be a
    configurable value. We theoretically knew that data could be more than
    a single line so we introduced a sequence number to allow multiple
    rows. (Don't ask me why we didn't use CLOBs instead, this is the
    schema I'm stuck with.)

    We now need to store base64 data in the same fields. The problem is
    that in an example 4000 characters as defined by the Java string
    object, its physical size is approximently 4430. This seems to be
    because of the amount of mark-up involved, either in the base64 data or
    possibly with the text between.

    It occurs to me that while a non-ASCII value many be only a single
    character in a unicode string, it is 6 characters in UTF-8. Therefore
    I'm looking for a way of calculates the absolute length, rather than a
    count of characters.

    Is this possible or will I have to change the schema?
    , Jan 9, 2007
    #1
    1. Advertising

  2. Hybris Guest

    Il Tue, 09 Jan 2007 04:34:45 -0800, james.w.appleby ha scritto:


    > I'm looking for a way of calculates the absolute length, rather than a
    > count of characters.


    see String method getBytes
    Hybris, Jan 9, 2007
    #2
    1. Advertising

  3. Ian Wilson Guest

    wrote:
    > It occurs to me that while a non-ASCII value many be only a single
    > character in a unicode string,


    I think you mean the opposite, that an ASCII (not non-ASCII) character
    will be represented in UTF-8 using a single *byte*.

    > it is 6 characters in UTF-8.


    No it isn't. UTF-8 uses a *variable* number of *bytes* for one Unicode
    character.

    > Therefore
    > I'm looking for a way of calculates the absolute length, rather than a
    > count of characters.


    String has a getBytes() method for this purpose.
    Ian Wilson, Jan 10, 2007
    #3
  4. Manfred Rosenboom, Jan 10, 2007
    #4
  5. Oliver Wong Guest

    "Ian Wilson" <> wrote in message
    news:...
    > wrote:
    >> It occurs to me that while a non-ASCII value many be only a single
    >> character in a unicode string,

    >
    > I think you mean the opposite, that an ASCII (not non-ASCII) character
    > will be represented in UTF-8 using a single *byte*.
    >
    >> it is 6 characters in UTF-8.

    >
    > No it isn't. UTF-8 uses a *variable* number of *bytes* for one Unicode
    > character.


    And even then, UTF-8 only ranges from 1 to 4 octects. The values start
    at 0x000000 and go to 0x10FFFF.

    - Oliver
    Oliver Wong, Jan 10, 2007
    #5
  6. Oliver Wong wrote:
    > "Ian Wilson" <> wrote in message
    > news:...
    >> wrote:
    >>> It occurs to me that while a non-ASCII value many be only a single
    >>> character in a unicode string,

    >> I think you mean the opposite, that an ASCII (not non-ASCII) character
    >> will be represented in UTF-8 using a single *byte*.
    >>
    >>> it is 6 characters in UTF-8.

    >> No it isn't. UTF-8 uses a *variable* number of *bytes* for one Unicode
    >> character.

    >
    > And even then, UTF-8 only ranges from 1 to 4 octects. The values start
    > at 0x000000 and go to 0x10FFFF.


    CESU-8 and Java's "Modified UTF-8" use as many as six, because they
    first encode characters above U+FFFF as UTF-16, and then UTF-8 encode
    the result. "UTF-8", albeit wrongly, is often taken to include one or
    both of those schemes, so the incorrect figure of 6 is often encountered.

    --
    John W. Kennedy
    "The blind rulers of Logres
    Nourished the land on a fallacy of rational virtue."
    -- Charles Williams. "Taliessin through Logres: Prelude"
    John W. Kennedy, Jan 11, 2007
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Brett Robichaud

    Exceeding File Upload max size - trapping error

    Brett Robichaud, Apr 7, 2004, in forum: ASP .Net
    Replies:
    1
    Views:
    2,863
    bruce barker
    Apr 7, 2004
  2. Kevin Goodsell

    Exceeding container::max_size()?

    Kevin Goodsell, Apr 3, 2004, in forum: C++
    Replies:
    1
    Views:
    359
    Victor Bazarov
    Apr 4, 2004
  3. Chris

    Exceeding -Xmx memory size

    Chris, May 24, 2006, in forum: Java
    Replies:
    3
    Views:
    8,213
    Chris Smith
    May 24, 2006
  4. Elhanan
    Replies:
    0
    Views:
    438
    Elhanan
    May 25, 2006
  5. Mike Aubury

    Exceeding limits during arithmetic

    Mike Aubury, Jun 12, 2007, in forum: C Programming
    Replies:
    7
    Views:
    286
    Keith Thompson
    Jun 12, 2007
Loading...

Share This Page