Strange behavior with strings...

Discussion in 'Perl Misc' started by zeroaffinity, May 26, 2006.

  1. zeroaffinity

    zeroaffinity Guest

    I have an array of strings. Each string (called $line) has various html
    formatting removed (with s//) to leave a substring that is basically a
    concatenated name-value pair.

    I'll show the code in a sec, but here is what is strange. When I print
    the string and try to append a character, it actually PREpends the
    character and overwrites the first character of the string in the

    My code:
    1 ($name,$value) = split('#',$line);
    2 print $value . "*\n";

    So $line had a # in it that was a delimiter. I just split it up and
    attempted to print the string. This is the output...

    *une 6, 2006

    If I change line 2 to this: print $value . "**\n"; then I
    get the following output:

    **ne 6, 2006

    The clincher. If I swap $value for $name, this problem goes away. In
    fact, the results would be Date* and Date** in the cases above,
    respectively. It seems like the data in $value is affecting the

    What could be causing this?
    zeroaffinity, May 26, 2006
    1. Advertisements

  2. Could you post all of the code, and also, try printing $line before you
    split. It'll help debug.
    it_says_BALLS_on_your forehead, May 26, 2006
    1. Advertisements

  3. zeroaffinity

    Guest Guest

    : I'll show the code in a sec

    Yes, but that's not enough. Please show the data as well.

    : When I print
    : the string and try to append a character, it actually PREpends the
    : character and overwrites the first character of the string in the
    : process.

    No, I can't believe it.

    : My code:
    : 1 ($name,$value) = split('#',$line);
    : 2 print $value . "*\n";

    And this is my code. Note that there are warnings and strictures enabled so
    you make sure your code is bullet-proof.

    use strict;
    use warnings;
    my $line="nonsense,#June 12, 2006";
    my ($name,$value)=split('#',$line);
    print $value."*\n";

    And I get this result:

    June 12, 2006*

    As expected.

    : What could be causing this?

    Show us your $line - not the one you retype in your editor window, but
    the real one copied from the source (and, please! no copying and pasting
    by mouse!). I assume you have some unorthodox sequence of \n and|or \r
    somewhere in your data.

    Guest, May 26, 2006
  4. zeroaffinity

    zeroaffinity Guest

    Thanks for looking at this. Here is more of the code.

    #Get the web content
    $html = get($url);

    #Break it up into lines
    @html = split(/\n/,$html);

    #Operate on each line
    foreach $line(@html) {


    #Clean up the formatting
    $line =~ s/\'/\\\'/g;

    #There is some IF-logic here to only do something
    #with lines containing keywords of interest

    #Get rid of <A HREF...>
    $line =~ s/<a(\s*|%|&|_|\.|\?|=|"|\w*|[0-9]*)*>//ig;

    #Blank space is separating the pair. Replace it
    #with something that's a single char
    $line =~ s/:&nbsp;/#/ig;

    #Now split on that char...
    ($name,$value) = split('#',$line);

    #Print the name
    print $name . "**\n";

    #Print the value
    print $value . "**\n";

    #End IF-logic
    ===== end of code =====
    This prints the following:

    **ne 6, 2006
    zeroaffinity, May 26, 2006
  5. zeroaffinity

    zeroaffinity Guest

    Okay. Here you go, Oliver. To test it, run from the command line like
    so ...


    === Code ===
    use strict;
    use warnings;
    use LWP::Simple;

    print "Content-type: text/html\n\n";

    $url = $ARGV[0];

    if($url eq "") {
    print "You need to provide a URL.\n";

    $html = get($url);
    @html = split(/\n/,$html);

    if($html =~ /<frame/i) {
    print "Frames! Can't do it. Get the URL for the specific frame
    containing data you want.\n";

    $count = 0;
    foreach $line(@html) {
    $line =~ s/\'/\\\'/g;

    #Below is a filter for a specific web site, the Fulton County Daily
    #I need to make this block executable based on command line parameters
    #other web pages would require different filtering.
    ($line =~ /individual_SQL/i) ||
    ($line =~ /Publication Date/i) ||
    ($line =~ /Auction Date/i) ||
    ($line =~ /Deed Book/i) ||
    ($line =~ /Original Mortgage/i) ||
    ($line =~ /Borrower/i) ||
    ($line =~ /Lender/i) ||
    ($line =~ /Contact/i))
    if($line =~ /individual_SQL/i)
    { $prepend_address = 1; }
    else { $prepend_address = 0; }

    $line =~ s/<table(\s*|=|"|\w*|[0-9]*)*>//ig; #Gets the
    $line =~ s/<a(\s*|%|&|_|\.|\?|=|"|\w*|[0-9]*)*>//ig;
    #Gets the <a link...
    $line =~ s/<td width="[0-9]+">//ig; #Gets the <td
    $line =~ s/<\w*>//ig; #Gets the <whatever>
    $line =~ s/<\/\w*>//ig; #Gets the </whatever>
    if($prepend_address == 1)
    { $line = "Address#" . $line;}
    else { $line =~ s/:&nbsp;/#/ig;}
    ($name,$value) = split('#',$line);
    print $name . "**\n";
    print $value . "**\n";
    $prepend_address = 0;

    if($count eq 0) {
    print "I couldn't find anything here: $url \n";
    zeroaffinity, May 26, 2006
  6. zeroaffinity

    zeroaffinity Guest

    Yep, you're right. I added one more substitution line to globally
    replace all \r characters with nothing. When I saw the "\r" comment in
    your post, I knew exactly what dumb thing I had been tangling with for
    the past 2 hours.

    I should have guessed that since it's pretty obvious now that I was
    printing after a \r, which doesn't send you on to the next line (thus
    the purpose of \n) and so I was overwriting my string from the

    zeroaffinity, May 26, 2006

  7. this is the only relevant piece for now. can you reproduce the 'strange
    behavior' with only the below code? Also, can you provide real data?

    my $line = q{<you paste a line here>};

    my ( $name, $value ) = split('#', $line);
    print $name . "**\n";
    print $value . "**\n";

    Here is a sample from me:

    use strict; use warnings;

    my $line = 'name#bob';

    my ($name, $value) = split( '#', $line );

    print $name . "**\n";
    print $value . "**\n";


    (also, you should favor printing a list over concat in this instance.
    print $name, "**\n";
    print $value, "**\n";
    it_says_BALLS_on_your forehead, May 26, 2006
  8. zeroaffinity

    J. Gleixner Guest

    You do know that there are modules to parse HTML, don't you?
    J. Gleixner, May 26, 2006
  9. zeroaffinity

    hymie! Guest

    In our last episode, the evil Dr. Lacto had captured our hero,
    Your data isn't chomp'd.

    hymie!, May 30, 2006
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.