Generating a unique string without normal character sets

A

angelochen960

Hi,

I use UID to generate a unique number in a server app, here is my
code:

UID inviteId = new UID();
String uid = new sun.misc.BASE64Encoder().encode(inviteId.toString
().getBytes());

sample output:

NjhkZmMyNDQ6MTIwMTQ0YTYwMDk6LTgwMDA=

I'd like to have a unique string consists of normal characters, not
sings like = and others, any hint on this? Thanks,

A.c
p.s. normal character set, i meant A to Z, a to z, 0..9
 
A

Angelo Chen

You only want alpha-numeric characters.

You want to look into hashing, not encoding. Did you use BASE64Encoder
without looking up what it's for?

Thanks, will give md5 hash a try, the = sign was acceptable before,
but now no more in the new situation. just a related question, a
unique string will always generate a unique md5 hash?
 
E

Eric Sosman

Angelo said:
[...] just a related question, a
unique string will always generate a unique md5 hash?

Obviously not. An MD5 hash is 128 bits long, so there
are at most 2^128 different MD5 values. The number of String
values is 65536^0 + 65536^1 + ... + 65536^2147483647, which
comes to (65536^2147483648 - 1) / 65535, or approximately
2^34359738352 distinct String values. The quotient -- the
average number of distinct Strings that have a given MD5
hash value -- is a number with about 10.3 billion decimal
digits.

Of course, the number of String values your program is
actually likely to encounter may be somewhat smaller than
2^34359738352 ...
 
D

Daniel Pitts

Hi,

I use UID to generate a unique number in a server app, here is my
code:

UID inviteId = new UID();
String uid = new sun.misc.BASE64Encoder().encode(inviteId.toString
().getBytes());

sample output:

NjhkZmMyNDQ6MTIwMTQ0YTYwMDk6LTgwMDA=

I'd like to have a unique string consists of normal characters, not
sings like = and others, any hint on this? Thanks,

A.c
p.s. normal character set, i meant A to Z, a to z, 0..9
I believe the "=" is added at the end of the string by BASE64Encoder as
an indicator that you've reached the end of the Base64 encoding. That
means that you can do uid = uid.subString(0, uid.size()-1), and you're
good to go.

Alternatively, you can convert all the bytes to hex strings.
 
D

Daniel Pitts

Angelo said:
Thanks, will give md5 hash a try, the = sign was acceptable before,
but now no more in the new situation. just a related question, a
unique string will always generate a unique md5 hash?
No, Hashes are never guaranteed to be unique, but the good ones have a
"very low" chance of collision.

If you need absolutely unique, hash is the wrong way to go.
 
J

Jon Gomez

Daniel said:
I believe the "=" is added at the end of the string by BASE64Encoder as
an indicator that you've reached the end of the Base64 encoding. That
means that you can do uid = uid.subString(0, uid.size()-1), and you're
good to go.

Alternatively, you can convert all the bytes to hex strings.


The "=" character is only used as padding at the end. However, there
may be more than one or zero [1]. In other applications where reversing
the encoding is necessary, it would be advisable to keep track of the
number.

It is probably a bad idea to use the sun.* packages, since they may be
unavailable on other JDK platforms or may change (or who knows, could
vanish) between JDK versions, at least according to a report by Sun in
1996. Their use is not advised unless you want to take that risk [2].

Jon.

--------

[1] "The Base16, Base32, and Base64 Data Encodings: Base64 Encoding"
http://tools.ietf.org/html/rfc4648#section-4

[2] "Why Developers Should Not Write Programs That Call 'sun' Packages"
http://java.sun.com/products/jdk/faq/faq-sun-packages.html

------
 
J

Jon Gomez

Jon said:
In other applications where reversing the encoding is necessary,
it would be advisable to keep track of the number.

Well, I suppose you could recalculate it, actually, now that I think
about it, unless I made a mistake in my calculations. 4i, 4i+2, 4i+3?

Jon.
 
J

Jon Gómez

Lew said:
p.s. normal character set, i [sic] meant A to Z, a to z, 0..9

What is so all-fired abnormal about ç, è, ñ, ß or ø?

We see posts in this forum from Arne Vajhøj nearly every day.

I'm finally motivated to add the accent to my last name on my identity
on my Linux newsgroup reader. I've been posting inconsistently back and
forth depending which dual boot OS I'm on.

Jon.
 
L

Lew

p.s. normal character set, i [sic] meant A to Z, a to z, 0..9
What is so all-fired abnormal about ç, è, ñ, ß or ø?

We see posts in this forum from Arne Vajhøj nearly every day.
I'm finally motivated to add the accent to my last name on my identity
on my Linux newsgroup reader.  I've been posting inconsistently back and
forth depending which dual boot OS I'm on.

It occurs to me that ó, ç, è, ñ, ß and ø may very well sort between a
and z for many locales.
 
A

Arne Vajhøj

Eric said:
The number of String
values is 65536^0 + 65536^1 + ... + 65536^2147483647, which
comes to (65536^2147483648 - 1) / 65535, or approximately
2^34359738352 distinct String values.

Can you list them all ?

:)

Arne
 
A

Arne Vajhøj

Lew said:
p.s. normal character set, i [sic] meant A to Z, a to z, 0..9

What is so all-fired abnormal about ç, è, ñ, ß or ø?

We see posts in this forum from Arne Vajhøj nearly every day.

It is a regular letter in the Danish alphabet.

A..ZÆØÅ
a..zæøå

15 years ago email was often 7 bit only and when the high bit
is lost then ø becomes x.

Back then several people thought I was Chinese.

:)

Arne
 
A

Arne Vajhøj

Jon said:
It is probably a bad idea to use the sun.* packages, since they may be
unavailable on other JDK platforms or may change (or who knows, could
vanish) between JDK versions, at least according to a report by Sun in
1996. Their use is not advised unless you want to take that risk [2].

And why should anyone want to do that when there is a supported
way !?!?

Only drawback is that ones needs a Java EE environment or
getting JavaMail RI.

Code below.

Arne

===========================================

public static String b64encode(byte[] b) throws MessagingException,
IOException {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
OutputStream b64os = MimeUtility.encode(baos, "base64");
b64os.write(b);
b64os.close();
return new String(baos.toByteArray());
}
public static byte[] b64decode(String s) throws
MessagingException, IOException {
ByteArrayInputStream bais = new ByteArrayInputStream(s.getBytes());
InputStream b64is = MimeUtility.decode(bais, "Base64");
byte[] tmp = new byte[s.length()];
int n = b64is.read(tmp);
byte[] res = new byte[n];
System.arraycopy(tmp, 0, res, 0, n);
return res;
}
 
A

Arne Vajhøj

Thomas said:
Arguably, if you need absolutely unique, using a computer is also the
wrong way to go. Any given computer has a (usually small) probability of
getting a result wrong, if only by interaction with a freak high-energy
particle aimlessly wandering the Universe and accidentally flipping a
bit in RAM or in a CPU register. The probability of such an occurrence
is low, but who can claim that he never saw a computer crash ?

Any computational process which gives you "unique" numbers with a risk
of collision much lower than the probability that the computer just goes
kaboom or replaces a 1 with a 0 in the data should be considered good
enough.

I would distinguish sharply between collisions caused by software
design and collisions caused by hardware malfunction.
It so happens that using a proper hash function (i.e. one which
is believed to be cryptographically strong and with a sufficiently large
output -- 256 bits ought to be enough for several decades of
technological advances) provides a low enough collision probability.

I.e., use:
java.security.MessageDigest.getInstance("SHA-256")

SHA-256 is good.

Arne
 
A

Arne Vajhøj

Lew said:
p.s. normal character set, i [sic] meant A to Z, a to z, 0..9
What is so all-fired abnormal about ç, è, ñ, ß or ø?

We see posts in this forum from Arne Vajhøj nearly every day.
I'm finally motivated to add the accent to my last name on my identity
on my Linux newsgroup reader. I've been posting inconsistently back and
forth depending which dual boot OS I'm on.

It occurs to me that ó, ç, è, ñ, ß and ø may very well sort between a
and z for many locales.

Some of them does.

Even in English (the letters are not in the English language but English
collation is defined for them)

Arne
 
A

alexandre_paterson

No, Hashes are never guaranteed to be unique, but the good ones have a
"very low" chance of collision.

Just to nitpick...

You seem to be talking about "Hashes" (with an uppercase 'H') in
general, so I'd argue that a perfect hash is a hash, that a perfect
hash is a good hash and that perfect hashes are guaranteed to be
unique and have zero chance of collision.

So I find that saying: "No, Hashes are never guaranteed
to be unique, but the good ones have a very low chance
of collision" isn't entirely correct and doesn't tell the
whole story about Hashes [sic].

If you need absolutely unique, hash is the wrong way to go.

I'd reword that to:

The hash produced by Java's String hashcode() method is the
wrong way to go in the OP's case.

A perfect hash would be unique and in many case perfect hashes
or minimal perfect hashes are the way to go.

Just my 0.02 nitpick for I just considerably speeded some
process by rewriting a binary search that was happening at
some point by a "one-table-lookup-no-collision" using a
minimal perfect hash ;)
 
D

Daniel Pitts

No, Hashes are never guaranteed to be unique, but the good ones have a
"very low" chance of collision.

Just to nitpick...

You seem to be talking about "Hashes" (with an uppercase 'H') in
general, so I'd argue that a perfect hash is a hash, that a perfect
hash is a good hash and that perfect hashes are guaranteed to be
unique and have zero chance of collision.

So I find that saying: "No, Hashes are never guaranteed
to be unique, but the good ones have a very low chance
of collision" isn't entirely correct and doesn't tell the
whole story about Hashes [sic].

If you need absolutely unique, hash is the wrong way to go.

I'd reword that to:

The hash produced by Java's String hashcode() method is the
wrong way to go in the OP's case.

A perfect hash would be unique and in many case perfect hashes
or minimal perfect hashes are the way to go.

Just my 0.02 nitpick for I just considerably speeded some
process by rewriting a binary search that was happening at
some point by a "one-table-lookup-no-collision" using a
minimal perfect hash ;)
In your description of perfect hashes, the hash-code itself would have
to have as much information in it as the original data. As was
described somewhere else in this thread, you *will* have collisions if
your hash space is smaller than your data space.

You *may* have collisions if your possible data space is larger than
your hash space.
 
A

Arne Vajhøj

Eric said:
Sure, but I won't. Among those String values are many that would
amount to libel if I published them and anyone happened to take offense.
Since every person on the planet -- living, dead, or yet to be born --
will appear in the list and will be charged with every imaginable
turpitude, the number of potential legal actions against me (while not
infinite) is more than I'm willing to risk.

If you're less timid, go ahead ...

I will do that.

But it may take some time to send.

You should receive it in a few billion years so stay tuned ....

:)

Arne
 
R

Roedy Green

Hi,

I use UID to generate a unique number in a server app, here is my
code:

UID inviteId = new UID();
String uid = new sun.misc.BASE64Encoder().encode(inviteId.toString
().getBytes());

sample output:

NjhkZmMyNDQ6MTIwMTQ0YTYwMDk6LTgwMDA=

I'd like to have a unique string consists of normal characters, not
sings like = and others, any hint on this? Thanks,

A.c
p.s. normal character set, i meant A to Z, a to z, 0..9

see http://mindprod.com/jgloss/hex.html

When you see how you pull it off for base 16, you can generalise to
any set of symbols.
--
Roedy Green Canadian Mind Products
http://mindprod.com

"Nature provides a free lunch, but only if we control our appetites."
~ William Ruckelshaus, America’s first head of the EPA
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top