Special Character Token

Discussion in 'Java' started by Sameer, Mar 1, 2005.

  1. Sameer

    Sameer Guest

    Hello,
    In the process of designing a chatting system, I have to send some text
    from one machine to another. This text usually contains 3 to 4 parts
    separated by a token like ~ or ^ or $. At the other end I use
    StringTokenizer to decode the text.
    It is expected that the texts separated by these tokens must not
    contain such tokens. We do not expect such things from users and a user
    may type a message which contain these tokens and it will lead to
    malfunctioning of the chatting system.

    Can I insert some special character tokens which can not be generated
    by keyboard easily or in general typing.
    How to generate such token characters?
    Please give answer in Java and Unicode context.
    Give methods for coding and decoding of characters and to embed them in
    text.
    -Sameer
    Sameer, Mar 1, 2005
    #1
    1. Advertising

  2. Sameer

    Eric Sosman Guest

    Sameer wrote:
    > Hello,
    > In the process of designing a chatting system, I have to send some text
    > from one machine to another. This text usually contains 3 to 4 parts
    > separated by a token like ~ or ^ or $. At the other end I use
    > StringTokenizer to decode the text.
    > It is expected that the texts separated by these tokens must not
    > contain such tokens. We do not expect such things from users and a user
    > may type a message which contain these tokens and it will lead to
    > malfunctioning of the chatting system.
    >
    > Can I insert some special character tokens which can not be generated
    > by keyboard easily or in general typing.
    > How to generate such token characters?
    > Please give answer in Java and Unicode context.
    > Give methods for coding and decoding of characters and to embed them in
    > text.


    "Security by obscurity" is not very robust. As soon as
    somebody figures out the right ALT sequence or similar trick,
    the vandals will have a field day with your chat system.

    A better way is to develop an encoding that can handle
    all characters, even those that would ordinarily have special
    meaning. One simple approach is to double a special character
    whenever it appears in a non-special context (e.g., in the
    message body). For example, if you use # to delimit the
    parts of the message and the three parts are

    Knick-knack paddy-whack

    Give # dog # bone

    This old ### came rolling home

    .... you could transmit the message as

    #
    Knick-knack paddy-whack
    #
    Give ## dog ## bone
    #
    This old ###### came rolling home
    #

    When the receiver gets this stream of characters it looks
    for each #. If a # is followed by another #, the two become
    one # considered as an ordinary data character. But if a #
    is followed by something other than a second #, it is a part
    separator, not a data character.

    --
    Eric Sosman, Mar 1, 2005
    #2
    1. Advertising

  3. Sameer

    Oscar kind Guest

    Sameer <> wrote:
    > In the process of designing a chatting system, I have to send some text
    > from one machine to another. This text usually contains 3 to 4 parts
    > separated by a token like ~ or ^ or $. At the other end I use
    > StringTokenizer to decode the text.
    > It is expected that the texts separated by these tokens must not
    > contain such tokens. We do not expect such things from users and a user
    > may type a message which contain these tokens and it will lead to
    > malfunctioning of the chatting system.
    >
    > Can I insert some special character tokens which can not be generated
    > by keyboard easily or in general typing.
    > How to generate such token characters?
    > Please give answer in Java and Unicode context.
    > Give methods for coding and decoding of characters and to embed them in
    > text.


    As stated earlier by Eric, such a thing will not work because the text of
    the user can include anything. His idea of doubling special characters is
    therefore a good one.

    <plug mode="shameless">

    Another solution is to use CSV records, although implementing this from
    scratch would be more work. See my playground project on
    http://oscar.stachanov.com/java/
    (look for the classes CSVParser & CSVFormatter)

    </plug mode="shameless">


    --
    Oscar Kind http://home.hccnet.nl/okind/
    Software Developer for contact information, see website

    PGP Key fingerprint: 91F3 6C72 F465 5E98 C246 61D9 2C32 8E24 097B B4E2
    Oscar kind, Mar 1, 2005
    #3
  4. Sameer

    shriop Guest

    I went and looked at this project of yours. Do you really think
    wrapping up ReadLine.Split(',') inside a class is going to fool anyone?
    And your description for the project says that you're cleanly and
    correctly handling the csv format. This is totally wrong. I'm sorry.
    shriop, Mar 3, 2005
    #4
  5. Sameer

    Oscar kind Guest

    shriop <> wrote:
    > I went and looked at this project of yours. Do you really think
    > wrapping up ReadLine.Split(',') inside a class is going to fool anyone?
    > And your description for the project says that you're cleanly and
    > correctly handling the csv format. This is totally wrong. I'm sorry.


    The implementation is correct: it handles the CSV format exactly as
    specified here:
    http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm

    This implementation exhibits a more stable behaviour than for example the
    unpredictable one from Microsoft: That one uses the list separator from
    the regional settings, but sometimes silently ignores it. Microsoft didn't
    document that, let alone when, their implementation does this, nor what
    record separator is used instead.

    Also, IMHO, using String.split(String, int) doesn't make an implementation
    unclean (and there is no ReadLine class btw). I'm therefore not trying to
    fool anyone.

    Admittedly, there are improvements possible, and I welcome any
    constructive criticism. This requires arguments though. Did you have any?


    --
    Oscar Kind http://home.hccnet.nl/okind/
    Software Developer for contact information, see website

    PGP Key fingerprint: 91F3 6C72 F465 5E98 C246 61D9 2C32 8E24 097B B4E2
    Oscar kind, Mar 3, 2005
    #5
  6. Sameer

    shriop Guest

    You're absolutely right. I was too quick to judgement and now I see how
    you're handling all the situations. The only rule I can find now taking
    a second look that as far as I can see you're still violating is

    Fields with leading or trailing spaces must be delimited with
    double-quote characters.

    You appear to always be trimming leading and trailing whitespace
    whether in quotes or not. Other than that, and other than that fact
    that your class is very string heavy, it does appear correct.
    shriop, Mar 4, 2005
    #6
  7. Sameer

    Oscar kind Guest

    shriop <> wrote:
    > You appear to always be trimming leading and trailing whitespace
    > whether in quotes or not. Other than that, and other than that fact
    > that your class is very string heavy, it does appear correct.


    It is rather heavy: String.split(String, int) uses regular expressions,
    which for a simple case as this isn't efficient. It's just easy to
    understand and maintain.

    Also, note that I trim leading and trailing whitespace first, and then
    remove surrounding quotes (if present): the record separator (',') may
    be surrounded by whitespace. This isn't considered part of the fields
    (hence it's trimmed). This is also the reason that fields with leading
    and/or trailing whitespace should be quoted.

    If I were to optimize it, I would need to do the following:
    - Read the stream character by character (probsbly buffered, but still)
    - Add field values character by character instead of token by token

    This works approximately the same, but the algorithm is (IMHO) less easy
    to understand, as it is more low-level. I'm not used to that.


    --
    Oscar Kind http://home.hccnet.nl/okind/
    Software Developer for contact information, see website

    PGP Key fingerprint: 91F3 6C72 F465 5E98 C246 61D9 2C32 8E24 097B B4E2
    Oscar kind, Mar 4, 2005
    #7
  8. Sameer

    shriop Guest

    You got me again, you're right about the trimming.
    shriop, Mar 5, 2005
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Cronus
    Replies:
    1
    Views:
    668
    Paul Mensonides
    Jul 15, 2004
  2. Replies:
    15
    Views:
    8,844
    Default User
    Jan 14, 2005
  3. G Fernandes
    Replies:
    1
    Views:
    526
  4. Wessi
    Replies:
    3
    Views:
    850
    Lawrence Kirby
    Aug 11, 2005
  5. =?Utf-8?B?Y2FzaGRlc2ttYWM=?=

    This is an unexpected token. The expected token is 'NAME'

    =?Utf-8?B?Y2FzaGRlc2ttYWM=?=, Jul 13, 2007, in forum: ASP .Net
    Replies:
    2
    Views:
    780
    =?Utf-8?B?Y2FzaGRlc2ttYWM=?=
    Jul 13, 2007
Loading...

Share This Page