Why is this sub removing newlines??

Discussion in 'Perl Misc' started by John Black, Dec 5, 2013.

  1. Since you apparently missed this: While there's doubtlessly many a
    developer who is convinced to have invented something comparable to 'the
    wheel', ie, a basic design which will remain in use for a few thousand
    years, as soon as he managed to tack three lines of code together doing
    something other than 'crash immediately', possibly even more so on CPAN,
    using this comparison is either a case of hybris bordering serious
    megalomania or just someone babbling along without spending much effort
    on thinking about what he's actually saying, not the least because this
    simile is actually wrong: Wheels come in many different kinds and even a
    seriously reality-impaired mathead should have noticed the difference
    between, say, tanks, push chairs, racing cars and pottery wheels. "But
    can't you see they all round and rotate!" isn't much of a similarity at
    the technical level.
     
    Rainer Weikusat, Dec 7, 2013
    #21
    1. Advertisements

  2. But I'm not quite ready to declare it unredeemable:

    $string =~ s/ ^\s+ | \s+(?=\n)$ | \s*[^\n\S]+$ //gx;


    [ depending on flavor of white space you want ]
     
    Charles DeRykus, Dec 10, 2013
    #22
    1. Advertisements

  3. One thinhg I missed was a trailing newline without other whitespace in
    front of it. Making this

    s/\s*?(?=\n)?$//;

    instead works with that as well (although this should surely be called a
    questionable construct, given the number of ?s ...).
     
    Rainer Weikusat, Dec 10, 2013
    #23
  4. John Black

    Dr.Ruud Guest

    See also perlrecharclass, look for [[:blank:]] and \h.
     
    Dr.Ruud, Dec 11, 2013
    #24
  5. John Black

    John Black Guest

    Thanks. Looks like what I really wanted in most cases was \h. [[:black:]] sounds like it
    would work too but its just too bulky to put into regexs since it can be easily avoided with
    \h.

    John Black
     
    John Black, Dec 11, 2013
    #25
  6. [remove whitespace at end of string but keep \n if it is there]
    [this also deal with whitespace at the beginning]
    Maybe logically simpler:

    s/\s*?(\n)?$/$1/;

    (this will likely a result in a warning when there's no newline at the
    end of the line and runtime warnings are enabled).
     
    Rainer Weikusat, Dec 12, 2013
    #26

  7. Hm, as you note though, doesn't handle initial w/s and coughs an
    'uninitialized" warning if no ending \n. However, you could take a cough
    suppressant:

    s{\s*?(\n)?$}{$1 // ''}e;


    But, more significantly, doesn't handle multiple ending newlines, eg,
    "foo \n\n\n"
    [which of course may not be an issue for the OP]
     
    Charles DeRykus, Dec 13, 2013
    #27
  8. It wasn't supposed to handle initial whitespace because that's not
    really related to the \n-issue (also true for the first) ...
    .... and it certainly wasn't supposed to do that, either: When processing
    something line-by-line which I assumed to be the case here, "foo \n\n\n"
    will be the three lines

    "foo \n"
    "\n"
    "\n"

    and assuming that handling "foo \n bla\n \n" should result in
    "foo\n blah\n \n", ie the purpose is to remove leading whitespace at
    the beginning of a multi-line text but not leading whitespace on the
    individual lines seems rather bizarre to me. Or that processing
    "a \n " should remove the \n given that newlines are not supposed to
    be removed. And what about " a b \n bbb\n\n c \n"?
     
    Rainer Weikusat, Dec 13, 2013
    #28
  9. What's confusing here is that $ matches two different things depending
    on the context: Apparently, if it is preceded by \s*?, it matches
    immediately before \n at the end of the line and if that is \s*, it
    matches after the \n. But that's certainly good to know.
     
    Rainer Weikusat, Dec 13, 2013
    #29
  10. Yes, I was over-generalizing. But, imo, a one-liner handles both goals
    without being rocket science[1].

    That wasn't really specified though. The goal was to remove trailing
    whitespace from the beginning and end of "strings" [rather than just
    well-behaved lines] without clobbering a trailing newline.

    Yes, agreed. Without more certainty about original intent, it becomes
    bizarre. But less bizarrely, a string might easily have multiple
    newlines on the end with the reasonable goal of removing all but the
    final one.
     
    Charles DeRykus, Dec 13, 2013
    #30
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.