Parsing "February 24th, 2006" to java.util.Date

S

stevengarcia

how does one write a SimpleDateFormat pattern to take into account the
"th" or the "nd" that might be present on any date?

March 1st, 2006
March 2nd, 2006
March 3rd, 2006
March 4th, 2006

I'm not sure how to write a mask that can take into acct "st", "nd",
"rd", "th".

Thanks for your help.
 
D

Dave Mandelin

I don't think SimpleDate format can do it. I'd use a regexp to remove
those characters.
 
R

Roedy Green

Anyone other takers?

// Parsing a Date of the form: "February 24th, 2006"

import java.text.DecimalFormat;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Date;


public class ParseDate
{
private static final SimpleDateFormat pattern = new
SimpleDateFormat( "MMM dd'th', yyyy" );

/**
* test harness
*
* @param args not used
*/
public static void main ( String[] args )
{



String dateString = "February 24th, 2006";
int where;
if ( (where = dateString.indexOf( "st," ) ) >= 0 )
{
dateString = dateString.substring( 0, where)
+ "th,"
+ dateString.substring( where + 3 );
}
else if ( (where = dateString.indexOf( "nd," ) ) >= 0 )
{
dateString = dateString.substring( 0, where)
+ "th,"
+ dateString.substring( where + 3 );
}
Date d = null;
try
{
d = pattern.parse( dateString );
}
catch ( ParseException e )
{
System.err.println( "oops:" + dateString );
}

System.out.println( d );

}
}

With JDK 1.5 you could use String.replace( "nd," ,"th," );
 
J

James McGill

With JDK 1.5 you could use String.replace( "nd," ,"th," );

Localizing it to handle e.g., "-ieme, -ere", or "-zig"... seems like
there's a case to be made for I18n-ized ordinal number parsing...
Hard-coding strings for "-st", "-nd", "-rd", "-th" just smells bad, in a
language that puts such emphasis on i18n.
 
O

Oliver Wong

James McGill said:
Localizing it to handle e.g., "-ieme, -ere", or "-zig"... seems like
there's a case to be made for I18n-ized ordinal number parsing...
Hard-coding strings for "-st", "-nd", "-rd", "-th" just smells bad, in a
language that puts such emphasis on i18n.

In some languages, the entire word is changed when going from number to
ordinal, rather than just having a suffix added. It's like how the word
"one" changes to "first" in English (note that the two words have zero
letters in common).

So yeah, this is a non-trivial problem, and it'd probably be a great
boon to programmers if a standardized i18n API call existed for this. But
the syntax wouldn't be as simple as "MMM dd[ordinal-suffix], yyyy", but
rather, something like "MMM
[pure-ordinal-or-number-followed-by-ordinal-suffix], yyyy".

- Oliver
 
T

Twisted

Even though "first" is utterly different from "one", "1st" is just "1"
with a suffix.

RFE: add getSuffixFor(int) and getWordFor(int) to Locale? Typically,
there'll be some special cases for small enough integers (and an
illegal argument exception if argument <= 0?) and a simple algorithm
for larger integers. (In the case of English, starting at 20.) The
English algorithm for suffixes is especially simple, as it's just

if (arg > 10 && arg < 14) return "th";
switch (arg%10) {
case 1:
return "st";
case 2:
return "nd";
case 3:
return "rd";
default:
return "th";
}

(The only special cases are 11th, 12th, and 13th instead of 11st, 12nd,
and 13rd.)

The word one in English is similar -- you special-case 11, 12, and 13
("eleventh, twelfth, thirteenth" -- note you can't just add "th" or you
get "twelveth" for 12), and for the rest, you turn the LSD into an
ending "-first", "-second", "-third", or "-" + number's name + "th",
and the remaining digits into a beginning, e.g. "three hundred and
seventy", generating e.g. "three hundred and seventh-sixth".

Doing this for other languages is left as an exercise for the reader.
:)
 
R

Roedy Green

So yeah, this is a non-trivial problem, and it'd probably be a great
boon to programmers if a standardized i18n API call existed for this. But
the syntax wouldn't be as simple as "MMM dd[ordinal-suffix], yyyy", but
rather, something like "MMM
[pure-ordinal-or-number-followed-by-ordinal-suffix], yyyy".

this is related to the problem of expressing numbers in words.

See http://mindprod.com/applets/inwords.html

It handles English ordinals in words.

Ordinals are used much less frequently than I remember them being used
as a child. Perhaps the irregularity discouraged their use in
computers.
 
O

Oliver Wong

Twisted said:
Even though "first" is utterly different from "one", "1st" is just "1"
with a suffix.

Yeah, my point was that English (and most Latin/European languages) have
this "feature" that you can add a suffix to a arabic numeral (e.g. '1', '2',
'3') to turn them into ordinals (e.g. '1st', '2nd', '3rd'), but this is not
true for ALL languages.

Then I tried to give an analogy, but took an example from English.
Admittedly, that might be confusing, but I felt if I had used any other
language, I could not expect most readers here to relate to the example.
RFE: add getSuffixFor(int) and getWordFor(int) to Locale?

Would getSuffixFor() return null, or throw an exception, for a Locale
for which these concepts of suffix don't exist?

Also, with "getWordFor(int)", there exists some languages where "the
word for a number" changes depending on what you are counting. For example,
in French, you might say "un homme" to mean "one man", but "une femme" to
mean "one woman". The word varies depending on the gender of the thing you
are counting. In Japanese, you vary the word depending on whether you're
counting something round, something flat, something pointy, etc.

- Oliver
 
J

James McGill

Also, with "getWordFor(int)", there exists some languages where
"the
word for a number" changes depending on what you are counting. For
example,
in French, you might say "un homme" to mean "one man", but "une femme"
to
mean "one woman". The word varies depending on the gender of the thing
you
are counting. In Japanese, you vary the word depending on whether
you're
counting something round, something flat, something pointy, etc.

Yes, this is the kind of stuff I think about whenever I notice that
people believe we've reached some sort of plateau in technology. We
have MUCH further left to go than we've come. I hope the comfortable
equilibrium compromise we're in right now doesn't destroy us with
complacency.
 
C

Chris Uppal

Oliver said:
Yeah, my point was that English (and most Latin/European languages)
have this "feature" that you can add a suffix to a arabic numeral (e.g.
'1', '2', '3') to turn them into ordinals (e.g. '1st', '2nd', '3rd'), but
this is not true for ALL languages.

And even in English the pattern isn't uniform. I would feel very odd talking
about the "thousand and first dalmatian" -- at some point (at least in British
English) the pattern reverts to "zillion-and-oneth".

But for the case in question -- where we are talking about number names for
days in a month -- I don't see why the whole lot can't be hard-wired into the
language/calendar-specific localisation.

Maybe there are languages where the number names for days don't follow a
(feasibly) computable pattern, and don't fit into a table-driven approach
either, but they must surely be in the tiny minority.

-- chris
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,743
Messages
2,569,478
Members
44,898
Latest member
BlairH7607

Latest Threads

Top