Changing case in a sentence to Capitalize Case.

Discussion in 'Javascript' started by jackson.rayne@gmail.com, Sep 22, 2008.

  1. Guest

    Hello,

    I am a javascript newbie and I'm stick at one place.

    I have a requirement where I will get a sentence in a variable

    example

    var v1 ="This is a sentence"

    Now I have to change the sentence to Capitalize case where the first
    alphabet of every word will be in caps

    So for above example the output should be

    This Is A Sentence

    I have found some scripts that can change case to uppercase or
    lowercase but I'm not able to come up with a solution for this.

    One more thing, there is no limit on the number of words that I'll get
    in the sentence. I may get one word or even ten words.

    I'm looking for a solution that will work for all scenarios,

    Regards,
    Rayne
    , Sep 22, 2008
    #1
    1. Advertising

  2. wrote:
    > I have a requirement where I will get a sentence in a variable
    >
    > example
    >
    > var v1 ="This is a sentence"
    >
    > Now I have to change the sentence to Capitalize case where the first
    > alphabet of every word will be in caps


    You mean _letter_; *not* alphabet, which is a set of letters.

    <http://en.wikipedia.org/wiki/Alphabet>

    > So for above example the output should be
    >
    > This Is A Sentence


    v1 = v1.replace(/(^|\s)([a-z])/g,
    function(m, p1, p2) {
    return p1 + p2.toUpperCase();
    });

    You may adapt the character class to fit your needs.


    PointedEars
    --
    var bugRiddenCrashPronePieceOfJunk = (
    navigator.userAgent.indexOf('MSIE 5') != -1
    && navigator.userAgent.indexOf('Mac') != -1
    ) // Plone, register_function.js:16
    Thomas 'PointedEars' Lahn, Sep 22, 2008
    #2
    1. Advertising

  3. Tom de Neef Guest

    <> schreef in bericht
    news:...
    > Hello,
    >
    > I am a javascript newbie and I'm stick at one place.
    >
    > I have a requirement where I will get a sentence in a variable
    >
    > example
    >
    > var v1 ="This is a sentence"
    >
    > Now I have to change the sentence to Capitalize case where the first
    > alphabet of every word will be in caps
    >
    > So for above example the output should be
    >
    > This Is A Sentence
    >


    a) Regular Expressions, but I don't know how to use them.

    b) 1:Split string into words; 2:capitalize first letters; 3:concatenate
    words into string.
    1: look up what var wordarray = []; wordarray = v1.split(' ') will do
    2: for all k: wordarray[k] = wordarray[k].charAt(0).toUpperCase() +
    wordarray[k].substr(1);
    3: check out the join function: output = wordarray.join(' ');

    Tom
    Tom de Neef, Sep 22, 2008
    #3
  4. Conrad Lender wrote:
    > On 2008-09-22 18:19, wrote:
    >> var v1 ="This is a sentence"
    >>
    >> Now I have to change the sentence to Capitalize case where the first
    >> alphabet of every word will be in caps

    >
    > v1 = v1.replace(/\b(\w)/g, function (s, c) {
    > return c.toUpperCase();
    > });


    \b matches a word boundary; it does not work with non-ASCII letters.
    \w matches ASCII letters, decimal digits and `_'.

    > BTW, your question looks like a typical homework assignment. If that's
    > the case: letting other people solve your beginner assignments is not a
    > not a clever idea, if you want to learn the language or have to pass
    > exams later. If this wasn't homework, please disregard.


    Full ACK.


    PointedEars
    --
    Use any version of Microsoft Frontpage to create your site.
    (This won't prevent people from viewing your source, but no one
    will want to steal it.)
    -- from <http://www.vortex-webdesign.com/help/hidesource.htm>
    Thomas 'PointedEars' Lahn, Sep 22, 2008
    #4
  5. SAM Guest

    a écrit :
    >
    > var v1 ="This is a sentence"
    >
    > Now I have to change the sentence to Capitalize case where the first
    > alphabet of every word will be in caps


    function capitalize( t ) {
    t = t.split(' ');
    for(var i=0; i<t.length; i++) {
    t = t.charAt(0).toUpperCase()+t.substring(1);
    }
    return t.join(' ');
    }

    alert(capitalize(v1));

    > I'm looking for a solution that will work for all scenarios,


    alert(capitalize('ask google for charAt, join, '+
    'substring and split in javaScript'));




    HTML :
    ======
    <a href="javascript:document.geElementById('here').innerHTML = v1">
    capitalize the variable v1</a>

    <p id="here" style="text-transform: capitalize"></p>
    SAM, Sep 22, 2008
    #5
  6. Conrad Lender wrote:
    > On 2008-09-22 20:00, Thomas 'PointedEars' Lahn wrote:
    >> Conrad Lender wrote:
    >>> v1 = v1.replace(/\b(\w)/g, function (s, c) { return c.toUpperCase();
    >>> });

    >> \b matches a word boundary; it does not work with non-ASCII letters.
    >> \w matches ASCII letters, decimal digits and `_'.

    >
    > Yes, I was assuming simple English sentences, where \b will usually work
    > (and it doesn't matter when toUpperCase is applied digits or the
    > underscore).


    It matters because it would be needlessly inefficient.

    > In this case, my earlier example could even be simplified to:
    >
    > v1 = v1.replace(/\b\w/g, function (c) {
    > return c.toUpperCase();
    > });


    Correct, \b would match the empty string before the \w then.

    > Your character class approach (in your other post) would work if the
    > character set is known and rather small. Latin1, for example, could use
    > [a-zàáâãäåæçèéêëìíîïðñòóôõöøßùúûüý]. But if we're assuming an random
    > international setting, this is going to be a lot harder.


    Harder, granted.

    > Creating a character class that would work on the complete Unicode set
    > would be almost impossible, and also error prone.


    I do not think it any of the above would apply, though. ISTM you are
    unaware of the fact that, while the Unicode Standard (4.0) already defines a
    finite character set of which ECMAScript implementations only support the
    Basic Multilingual Plane (U+0000 to U+FFFF), the number of characters that
    can be subject to case switching is even more limited, and that character
    ranges can be used in regular expressions, whereas their boundaries can also
    be written as Unicode escape sequences.

    All it takes is a bit of research on the defined Unicode character ranges
    and the scripts (as in writing) they provide support for. Take some Latin
    character ranges for example:

    /[a-z\u00c0-\u00f6\u00f8-\u00ff\u0100-\u017f\u0180-\u01bf\u01c4-\u024f]/i

    (This can be optimized, of course, but it helps [you] to get the picture.)

    See also: <http://www.unicode.org/charts/>

    > It would be simpler to define custom "word boundary" characters, and just
    > let JavaScript uppercase everything following them:


    Would it? ISTM the punctuation of languages is a lot more complicated than
    their letters; take Spanish, for example. But then ISTM capitalizing titles
    is not something that is common in other languages than English, and some
    even consider it deprecated there already. However, for uniformity one
    might be inclined to apply this formatting to non-English (song) titles as
    well; I have seen that before.

    > var wBound = '\\s,.;:?!\'"';
    > var rex = new RegExp('(^|[' + wBound + '])([^' + wBound + '])', 'g');
    >
    > v1 = v1.replace(rex, function (s, g1, g2) {
    > return g1 + g2.toUpperCase();
    > });


    That does not make much sense, though, since with the exception of white
    space, and single and double quote, none of those (punctuation) characters
    is likely to occur directly before something that can be considered a word
    character. In fact, it is customary to have (white) space between those
    characters and the word character to be uppercased, so there would never be
    a match then.

    > wBound would still have to be adjusted as required to include, for
    > example, different types of quotes, or the Japanese/Chinese full stop
    > character 。).


    I am afraid it would have to be rewritten entirely anyway.


    PointedEars
    --
    Anyone who slaps a 'this page is best viewed with Browser X' label on
    a Web page appears to be yearning for the bad old days, before the Web,
    when you had very little chance of reading a document written on another
    computer, another word processor, or another network. -- Tim Berners-Lee
    Thomas 'PointedEars' Lahn, Sep 22, 2008
    #6
  7. SAM Guest

    Thomas 'PointedEars' Lahn a écrit :
    > Conrad Lender wrote:
    >
    >> var wBound = '\\s,.;:?!\'"';
    >> var rex = new RegExp('(^|[' + wBound + '])([^' + wBound + '])', 'g');
    >>
    >> v1 = v1.replace(rex, function (s, g1, g2) {
    >> return g1 + g2.toUpperCase();
    >> });

    >
    > That does not make much sense, though, since with the exception of white
    > space, and single and double quote, none of those (punctuation) characters
    > is likely to occur directly before something that can be considered a word
    > character.


    Hu ? at least : ' and " can be found
    (the others too if typo)

    > In fact, it is customary to have (white) space between those


    I see no white space after ' or " in following :
    l'éléphant ça "trompe" énormément

    > characters and the word character to be uppercased, so there would never be
    > a match then.


    We never need to capitalize all words of a sentence in any case. It is a
    spelling mistake otherwise of grammar in french.

    >> wBound would still have to be adjusted as required to include, for
    >> example, different types of quotes, or the Japanese/Chinese full stop
    >> character 。).

    >
    > I am afraid it would have to be rewritten entirely anyway.


    and your solution doesn't work for me

    'l\'éléphant ça "trompe" énormément'.replace(/(^|\s)([a-z])/g,
    function(m, p1, p2) {
    return p1 + p2.toUpperCase();
    });

    result :
    L'éléphant ça "trompe" énormément

    While Conrad's code gives :
    L'Éléphant Ça "Trompe" Énormément
    if the charset is e.g. Latin 1 (and not utf-8)

    --
    sm
    SAM, Sep 23, 2008
    #7
  8. Conrad Lender wrote:
    > On 2008-09-23 00:20, Thomas 'PointedEars' Lahn wrote:
    > [...]
    >> All it takes is a bit of research on the defined Unicode character ranges
    >> and the scripts (as in writing) they provide support for.

    >
    > ... this is still quite an undertaking, and I wouldn't presume to
    > understand enough about, say, Mongolian or Burmese to decide which of
    > the characters could/should be converted to uppercase. There are over a
    > hundred scripts in the BMP, not including the symbol collections.


    ISTM few writing systems have a concept of letter case, see below. (CMIIW)

    >> Take some Latin character ranges for example:
    >>
    >> /[a-z\u00c0-\u00f6\u00f8-\u00ff\u0100-\u017f\u0180-\u01bf\u01c4-\u024f]/i
    >>
    >> (This can be optimized, of course, but it helps [you] to get the picture.)

    >
    > Again, I think I've already got a pretty good picture, but thanks for
    > the effort. Just to illustrate the pitfalls of your approach - out of
    > only 591 characters (basic latin to latin extended-b), you have
    >
    > - included all the uppercase characters like À (U+00C0)


    That was done on purpose, though, because although it should,
    case-insensitive matching might not recognize the proper uppercase character
    for a non-ASCII lowercase letter and vice-versa.

    > - included the × character (U+00D7) which is a symbol


    ACK, I overlooked that one.

    > - used the "i" modifier, which is redundant because you have already
    > listed the exact code points that you want included


    It is *not* redundant because it would definitely be supported for /[a-z]/.

    > That's for a group of characters that we're largely familiar with. Now,
    > to find out which of the characters in the more exotic groups are
    > lowercase letters, that would take more than just "a bit of research".
    >
    > Perhaps somebody else has already collected all the interesting
    > character ranges, and we could use that information in our character
    > class,


    <http://www.unicode.org/Public/UNIDATA/CaseFolding.txt> looks promising.

    > but why should we, if JavaScript's toUpperCase() already does the
    > right thing with all types of characters?


    Iff it does. And that would still not mean anything for other implementations.

    >>> It would be simpler to define custom "word boundary" characters, and just
    >>> let JavaScript uppercase everything following them:

    >> Would it? ISTM the punctuation of languages is a lot more complicated than
    >> their letters; take Spanish, for example. But then ISTM capitalizing titles
    >> is not something that is common in other languages than English, and some
    >> even consider it deprecated there already. However, for uniformity one
    >> might be inclined to apply this formatting to non-English (song) titles as
    >> well; I have seen that before.

    >
    > That's beside the point. For one thing, to a lesser extent, capitalising
    > the first letters in titles is also common in some Germanic languages,
    > in Italian, etc. More importantly, deciding which languages do or do not
    > use capitalisation, or have deprecated it, or are only using it for
    > certain words, is beyond what we can do in a simple script function;
    > that's up to the person requesting the functionality.


    My point was that ISTM punctuation is more difficult to handle than letters.

    >>> var wBound = '\\s,.;:?!\'"';
    >>> var rex = new RegExp('(^|[' + wBound + '])([^' + wBound + '])', 'g');
    >>>
    >>> v1 = v1.replace(rex, function (s, g1, g2) {
    >>> return g1 + g2.toUpperCase();
    >>> });

    >> That does not make much sense, though, since with the exception of white
    >> space, and single and double quote, none of those (punctuation) characters
    >> is likely to occur directly before something that can be considered a word
    >> character. In fact, it is customary to have (white) space between those
    >> characters and the word character to be uppercased, so there would never be
    >> a match then.

    >
    > "is likely"? "it is customary"? That's wishful thinking, just look at
    > some of the postings in this group (SCNR). You often see people omitting
    > the space after full stops or commas, for example:
    >
    > "this is a sentence!and so is this,see?"
    >
    > It may not be pretty, but there is no doubt that "and" and "see" are
    > both separate words, and thus should be capitalised.


    Non sequitur; one should only capitalize properly written text. YMMV.


    PointedEars
    --
    Use any version of Microsoft Frontpage to create your site.
    (This won't prevent people from viewing your source, but no one
    will want to steal it.)
    -- from <http://www.vortex-webdesign.com/help/hidesource.htm>
    Thomas 'PointedEars' Lahn, Sep 23, 2008
    #8
  9. SAM Guest

    Conrad Lender a écrit :
    > On 2008-09-23 01:30, SAM wrote:
    >> We never need to capitalize all words of a sentence in any case. It is a
    >> spelling mistake otherwise of grammar in french.

    >
    > Neither my example nor Thomas's were meant as complete implementations.


    Yes. It was just a way of saying.

    > All in all, a perfect solution isn't going to be posted here
    > (at least not by me).
    >
    >> and your solution doesn't work for me
    >>
    >> 'l\'éléphant ça "trompe" énormément'.replace(/(^|\s)([a-z])/g,
    >> function(m, p1, p2) {
    >> return p1 + p2.toUpperCase();
    >> });
    >>
    >> result :
    >> L'éléphant ça "trompe" énormément

    >
    > To be fair, he did mention that the character class should be adapted.


    And that should not have to be with your solution.
    (except with charset utf-8 in my Fx in quirksmode, and I do not really
    understand why)


    > If you use [a-z\u00DF-\u00FF] instead of [a-z], "ça" and "énormément"
    > will be capitalised as well.


    a little better :
    L'éléphant Ça "trompe" Énormément
    -----^------------^

    Oui mais ça fait pas propre ces \u... ou \x...
    This JS can't find by itself correct corresponding unicodes ?

    --
    sm
    SAM, Sep 23, 2008
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Geo
    Replies:
    0
    Views:
    462
  2. Geo
    Replies:
    6
    Views:
    511
  3. BernieH

    Sentence case

    BernieH, Oct 6, 2004, in forum: HTML
    Replies:
    7
    Views:
    6,860
    Mark Parnell
    Oct 7, 2004
  4. Skip Montanaro

    Re: string capitalize sentence

    Skip Montanaro, Jun 24, 2005, in forum: Python
    Replies:
    0
    Views:
    408
    Skip Montanaro
    Jun 24, 2005
  5. Konstantin Veretennicov

    Re: string capitalize sentence

    Konstantin Veretennicov, Jun 24, 2005, in forum: Python
    Replies:
    0
    Views:
    501
    Konstantin Veretennicov
    Jun 24, 2005
Loading...

Share This Page