diff multi-line whitespace to verify Beautifier output

Discussion in 'C++' started by Shug, Feb 12, 2007.

  1. Shug

    Shug Guest

    Hi,

    We're reformatting a lot of our project code using the excellent
    uncrustify beautifier.

    However, to gain confidence that it really is only changing whitespace
    (forget { } issues for just now), we were hoping to do a diff - a
    textual comparison of the files, ignoring whitespace.

    However, most diffs we've tried can't handle multi-line whitespace, so
    the following two
    prototypes are deemed to be different:

    void doStuff( int a, float b);

    void doStuff(int a,
    float b);

    Has anyone found a way to do a diff like this, that handles multi-line
    whitespace?

    Shug
     
    Shug, Feb 12, 2007
    #1
    1. Advertising

  2. Shug wrote:
    > We're reformatting a lot of our project code using the excellent
    > uncrustify beautifier.
    >
    > However, to gain confidence that it really is only changing whitespace
    > (forget { } issues for just now), we were hoping to do a diff - a
    > textual comparison of the files, ignoring whitespace.
    >
    > However, most diffs we've tried can't handle multi-line whitespace, so
    > the following two
    > prototypes are deemed to be different:
    >
    > void doStuff( int a, float b);
    >
    > void doStuff(int a,
    > float b);
    >
    > Has anyone found a way to do a diff like this, that handles multi-line
    > whitespace?


    I would actually do it differently: tokenize both sources. If the set
    of tokens is the same, you have the same source (now, don't ask me where
    you can find C++ tokenizers, I don't know, GIYF). The other way is to
    convert both of those into the third type of formatting (which should
    give you the exactly same output) and compare them. If the formatter
    make mistakes, it's likely to make them independently.

    V
    --
    Please remove capital 'A's when replying by e-mail
    I do not respond to top-posted replies, please don't ask
     
    Victor Bazarov, Feb 12, 2007
    #2
    1. Advertising

  3. On Feb 12, 3:24 pm, "Victor Bazarov" <> wrote:
    > Shug wrote:
    > > We're reformatting a lot of our project code using the excellent
    > > uncrustify beautifier.

    >
    > > However, to gain confidence that it really is only changing whitespace
    > > (forget { } issues for just now), we were hoping to do a diff - a
    > > textual comparison of the files, ignoring whitespace.

    >
    > > However, most diffs we've tried can't handle multi-line whitespace, so
    > > the following two
    > > prototypes are deemed to be different:

    >
    > > void doStuff( int a, float b);

    >
    > > void doStuff(int a,
    > > float b);

    >
    > > Has anyone found a way to do a diff like this, that handles multi-line
    > > whitespace?

    >
    > I would actually do it differently: tokenize both sources. If the set
    > of tokens is the same, you have the same source (now, don't ask me where
    > you can find C++ tokenizers, I don't know, GIYF). The other way is to
    > convert both of those into the third type of formatting (which should
    > give you the exactly same output) and compare them. If the formatter
    > make mistakes, it's likely to make them independently.


    A simple third format would be one where every whitespace is replaced
    by a newline, which will give a format that is easy to compare (and I
    think it will still be valid C++ :)

    --
    Erik Wikström
     
    =?iso-8859-1?q?Erik_Wikstr=F6m?=, Feb 12, 2007
    #3
  4. Erik Wikström wrote:
    > On Feb 12, 3:24 pm, "Victor Bazarov" <> wrote:
    >> Shug wrote:
    >>> We're reformatting a lot of our project code using the excellent
    >>> uncrustify beautifier.

    >>
    >>> However, to gain confidence that it really is only changing
    >>> whitespace (forget { } issues for just now), we were hoping to do a
    >>> diff - a textual comparison of the files, ignoring whitespace.

    >>
    >>> However, most diffs we've tried can't handle multi-line whitespace,
    >>> so the following two
    >>> prototypes are deemed to be different:

    >>
    >>> void doStuff( int a, float b);

    >>
    >>> void doStuff(int a,
    >>> float b);

    >>
    >>> Has anyone found a way to do a diff like this, that handles
    >>> multi-line whitespace?

    >>
    >> I would actually do it differently: tokenize both sources. If the
    >> set of tokens is the same, you have the same source (now, don't ask
    >> me where you can find C++ tokenizers, I don't know, GIYF). The
    >> other way is to convert both of those into the third type of
    >> formatting (which should give you the exactly same output) and
    >> compare them. If the formatter make mistakes, it's likely to make
    >> them independently.

    >
    > A simple third format would be one where every whitespace is replaced
    > by a newline, which will give a format that is easy to compare (and I
    > think it will still be valid C++ :)


    It wouldn't be valid C++ without some continuation characters (\) in
    macro definitions. And broken up include directives aren't going to
    work either. :)

    V
    --
    Please remove capital 'A's when replying by e-mail
    I do not respond to top-posted replies, please don't ask
     
    Victor Bazarov, Feb 12, 2007
    #4
  5. Shug

    Shug Guest

    On 12 Feb, 13:15, "Shug" <> wrote:
    > Hi,
    >
    > We're reformatting a lot of our project code using the excellent
    > uncrustifybeautifier.
    >
    > However, to gain confidence that it really is only changing whitespace
    > (forget { } issues for just now), we were hoping to do a diff - a
    > textual comparison of the files, ignoring whitespace.
    >
    > However, most diffs we've tried can't handle multi-line whitespace, so
    > the following two
    > prototypes are deemed to be different:
    >
    > void doStuff( int a, float b);
    >
    > void doStuff(int a,
    > float b);
    >
    > Has anyone found a way to do a diff like this, that handles multi-line
    > whitespace?
    >
    > Shug


    Thanks for your contributions guys.

    In the end, we've managed to find another satisfactory solution.

    After reformatting the source code, we run both the before and after
    source files through tr:

    tr -d '\r\n' < file1.cpp > temp1.txt
    tr -d '\r\n' < file2.cpp > temp2.txt

    then do a diff on the tr'd files

    C:\cygwin\bin\diff -bBw temp1.txt temp2.txt

    This is all using a cygwin installation on Windows XP.

    This does exactly what we need.

    Thanks again.

    Shug
     
    Shug, Feb 13, 2007
    #5
  6. Shug

    james cook

    Joined:
    Jun 12, 2008
    Messages:
    1
    it does strike me that this is a problem (which I also have found) with cygwin on windows.
    diff -w or -b should ignore whitespace, but whether this is 0A (nix) or 0D 0A (windows), it doesn't work.

    void doStuff( int a, float b);

    void doStuff(int a,[HEX 0a]or[HEX 0d 0a]
    float b);

    Should be the same, but is not recognised as such.
    Does this diff work on non-windows machines?
     
    james cook, Jun 12, 2008
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Cyril Vi?ville

    diff Process under diff users

    Cyril Vi?ville, Jun 29, 2004, in forum: Perl
    Replies:
    1
    Views:
    523
    Joe Smith
    Jun 29, 2004
  2. Berrucho
    Replies:
    2
    Views:
    668
    Infant Newbie
    Dec 5, 2003
  3. A Traveler

    Diff CSS styles for diff INPUT TYPE='s?

    A Traveler, Aug 31, 2004, in forum: ASP .Net
    Replies:
    6
    Views:
    4,958
    Steve Fulton
    Aug 31, 2004
  4. Sparhawk
    Replies:
    1
    Views:
    415
    Ira Baxter
    Nov 23, 2004
  5. Austin Ziegler

    [ANN] Diff::LCS 1.1.0, Diff::LCS 1.0.4

    Austin Ziegler, Aug 8, 2004, in forum: Ruby
    Replies:
    3
    Views:
    207
    Austin Ziegler
    Aug 9, 2004
Loading...

Share This Page