Strange behavior with strings...

Discussion in 'Perl Misc' started by zeroaffinity, May 26, 2006.

  1. zeroaffinity

    zeroaffinity Guest

    I have an array of strings. Each string (called $line) has various html
    formatting removed (with s//) to leave a substring that is basically a
    concatenated name-value pair.

    I'll show the code in a sec, but here is what is strange. When I print
    the string and try to append a character, it actually PREpends the
    character and overwrites the first character of the string in the
    process.

    My code:
    1 ($name,$value) = split('#',$line);
    2 print $value . "*\n";

    So $line had a # in it that was a delimiter. I just split it up and
    attempted to print the string. This is the output...

    *une 6, 2006

    If I change line 2 to this: print $value . "**\n"; then I
    get the following output:

    **ne 6, 2006

    The clincher. If I swap $value for $name, this problem goes away. In
    fact, the results would be Date* and Date** in the cases above,
    respectively. It seems like the data in $value is affecting the
    behavior.

    What could be causing this?
     
    zeroaffinity, May 26, 2006
    #1
    1. Advertisements

  2. Could you post all of the code, and also, try printing $line before you
    split. It'll help debug.
     
    it_says_BALLS_on_your forehead, May 26, 2006
    #2
    1. Advertisements

  3. zeroaffinity

    Guest Guest

    : I'll show the code in a sec

    Yes, but that's not enough. Please show the data as well.

    : When I print
    : the string and try to append a character, it actually PREpends the
    : character and overwrites the first character of the string in the
    : process.

    No, I can't believe it.

    : My code:
    : 1 ($name,$value) = split('#',$line);
    : 2 print $value . "*\n";

    And this is my code. Note that there are warnings and strictures enabled so
    you make sure your code is bullet-proof.

    #!/usr/bin/perl
    use strict;
    use warnings;
    my $line="nonsense,#June 12, 2006";
    my ($name,$value)=split('#',$line);
    print $value."*\n";

    And I get this result:

    June 12, 2006*

    As expected.

    : What could be causing this?

    Show us your $line - not the one you retype in your editor window, but
    the real one copied from the source (and, please! no copying and pasting
    by mouse!). I assume you have some unorthodox sequence of \n and|or \r
    somewhere in your data.

    Oliver.
     
    Guest, May 26, 2006
    #3
  4. zeroaffinity

    zeroaffinity Guest

    Thanks for looking at this. Here is more of the code.

    #Get the web content
    $html = get($url);

    #Break it up into lines
    @html = split(/\n/,$html);

    #Operate on each line
    foreach $line(@html) {
    $count++;

    #Yum
    chomp($line);

    #Clean up the formatting
    $line =~ s/\'/\\\'/g;

    #There is some IF-logic here to only do something
    #with lines containing keywords of interest

    #Get rid of <A HREF...>
    $line =~ s/<a(\s*|%|&|_|\.|\?|=|"|\w*|[0-9]*)*>//ig;

    #Blank space is separating the pair. Replace it
    #with something that's a single char
    $line =~ s/:&nbsp;/#/ig;

    #Now split on that char...
    ($name,$value) = split('#',$line);

    #Print the name
    print $name . "**\n";

    #Print the value
    print $value . "**\n";

    #End IF-logic
    }
    ===== end of code =====
    This prints the following:

    Date**
    **ne 6, 2006
     
    zeroaffinity, May 26, 2006
    #4
  5. zeroaffinity

    zeroaffinity Guest

    Okay. Here you go, Oliver. To test it, run from the command line like
    so ...

    C:\Perl\examples>DataGrabber.pl
    http://www.dailyreportonline.com/Public_Notice/Consumer_Alerts/new_listCA.asp

    === Code ===
    #!/usr/bin/perl
    use strict;
    use warnings;
    use LWP::Simple;

    print "Content-type: text/html\n\n";
    my($httpData);
    my($line);
    my($url);
    my($html);
    my($count);
    my(@html);
    my($prepend_address);
    my($name,$value);

    $url = $ARGV[0];

    if($url eq "") {
    print "You need to provide a URL.\n";
    exit;
    }


    $html = get($url);
    @html = split(/\n/,$html);

    if($html =~ /<frame/i) {
    print "Frames! Can't do it. Get the URL for the specific frame
    containing data you want.\n";
    exit;
    }



    $count = 0;
    foreach $line(@html) {
    $count++;
    chomp($line);
    $line =~ s/\'/\\\'/g;

    #Below is a filter for a specific web site, the Fulton County Daily
    Report.
    #I need to make this block executable based on command line parameters
    since
    #other web pages would require different filtering.
    if(
    ($line =~ /individual_SQL/i) ||
    ($line =~ /Publication Date/i) ||
    ($line =~ /Auction Date/i) ||
    ($line =~ /Deed Book/i) ||
    ($line =~ /Original Mortgage/i) ||
    ($line =~ /Borrower/i) ||
    ($line =~ /Lender/i) ||
    ($line =~ /Contact/i))
    {
    if($line =~ /individual_SQL/i)
    { $prepend_address = 1; }
    else { $prepend_address = 0; }

    $line =~ s/<table(\s*|=|"|\w*|[0-9]*)*>//ig; #Gets the
    <table...
    $line =~ s/<a(\s*|%|&|_|\.|\?|=|"|\w*|[0-9]*)*>//ig;
    #Gets the <a link...
    $line =~ s/<td width="[0-9]+">//ig; #Gets the <td
    width...>
    $line =~ s/<\w*>//ig; #Gets the <whatever>
    $line =~ s/<\/\w*>//ig; #Gets the </whatever>
    if($prepend_address == 1)
    { $line = "Address#" . $line;}
    else { $line =~ s/:&nbsp;/#/ig;}
    ($name,$value) = split('#',$line);
    print $name . "**\n";
    print $value . "**\n";
    }
    $prepend_address = 0;
    }


    if($count eq 0) {
    print "I couldn't find anything here: $url \n";
    }
     
    zeroaffinity, May 26, 2006
    #5
  6. zeroaffinity

    zeroaffinity Guest

    Yep, you're right. I added one more substitution line to globally
    replace all \r characters with nothing. When I saw the "\r" comment in
    your post, I knew exactly what dumb thing I had been tangling with for
    the past 2 hours.

    I should have guessed that since it's pretty obvious now that I was
    printing after a \r, which doesn't send you on to the next line (thus
    the purpose of \n) and so I was overwriting my string from the
    beginning.

    Thanks.
     
    zeroaffinity, May 26, 2006
    #6

  7. this is the only relevant piece for now. can you reproduce the 'strange
    behavior' with only the below code? Also, can you provide real data?

    my $line = q{<you paste a line here>};

    my ( $name, $value ) = split('#', $line);
    print $name . "**\n";
    print $value . "**\n";


    Here is a sample from me:

    use strict; use warnings;

    my $line = 'name#bob';

    my ($name, $value) = split( '#', $line );

    print $name . "**\n";
    print $value . "**\n";

    __OUTPUT__
    name**
    bob**


    (also, you should favor printing a list over concat in this instance.
    i.e.:
    print $name, "**\n";
    print $value, "**\n";
    )
     
    it_says_BALLS_on_your forehead, May 26, 2006
    #7
  8. zeroaffinity

    J. Gleixner Guest

    You do know that there are modules to parse HTML, don't you?
     
    J. Gleixner, May 26, 2006
    #8
  9. zeroaffinity

    hymie! Guest

    In our last episode, the evil Dr. Lacto had captured our hero,
    Your data isn't chomp'd.

    hymie! http://www.smart.net/~hymowitz
    ===============================================================================
     
    hymie!, May 30, 2006
    #9
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.