accessing single characters of strings

M

Marten Lehmann

Hello,

how can I access a single character of a string in Perl like I can do it
in C with text[10]? I don't want to split it up and work on an array.
How can I work directly in place?

Kind Regards
Marten
 
T

Tad J McClellan

Marten Lehmann said:
like I can do it
in C with text[10]? I don't want to split it up and work on an array.


It is not possible to do it like in C, because in C it _is_ an array.
 
J

Jürgen Exner

Marten Lehmann said:
how can I access a single character of a string in Perl like I can do it
in C with text[10]? I don't want to split it up and work on an array.
How can I work directly in place?

substr() is the function you are looking for.
However I very strongly recommend that you change your mental model.
Perl strings are _NOT_ arrays of characters. That primitive model is
adaquate for C. Perl strings are much more powerful and if you use them
like you would use arrays of characters in C, then you are missing most
of their functionality.

jue
 
M

Marten Lehmann

Hello,
how can I access a single character of a string in Perl like I can do it
in C with text[10]? I don't want to split it up and work on an array.
How can I work directly in place?

substr() is the function you are looking for.
However I very strongly recommend that you change your mental model.
Perl strings are _NOT_ arrays of characters. That primitive model is
adaquate for C. Perl strings are much more powerful and if you use them
like you would use arrays of characters in C, then you are missing most
of their functionality.

I need to parse a file line by line. It is basically a CSV file, not not
completely.

Imagine this content:

"one","""two"",""three"""

"" mean a replacement for one "

Correctly parsed, I would get two values:

one
and
"two","three"

But I cannot split at , and I cannot split at ",", since both would lead
to wrong parsing. So I have to lexically go through every character. And
Thus I don't only need to know the current character, but also the next
one. Is substr() really the only choice? Looks a bit awkward to call
substr() dozends of times.

Kind regards
Marten
 
W

Willem

Marten Lehmann wrote:
) I need to parse a file line by line. It is basically a CSV file, not not
) completely.
)
) Imagine this content:
)
) "one","""two"",""three"""
)
) "" mean a replacement for one "
)
) Correctly parsed, I would get two values:
)
) one
) and
) "two","three"
)
) But I cannot split at , and I cannot split at ",", since both would lead
) to wrong parsing.

Obviously.

) So I have to lexically go through every character.

A strange conclusion. There are dozens of other ways to do it.

One example: split at , and then use some simple processing to determine
if an entry has an odd number of quotes and, if so, join it to the next
entry. Then postprocess the entries for the quotes.

Another example: Use one of the many existing CSV parsing module.


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
 
J

Jürgen Exner

Marten Lehmann said:
I need to parse a file line by line. It is basically a CSV file, not not
completely.

Imagine this content:

"one","""two"",""three"""

"" mean a replacement for one "

That is a normal, standard, boring CSV, nothing special. Why do you
think, the standard CSV parsers wouldn't be able to parse it?
Correctly parsed, I would get two values:
one
and
"two","three"

Which of the existing CSV parsers did you try? How did they fail to
parse that line?
But I cannot split at , and I cannot split at ",", since both would lead
to wrong parsing.

Yes, but that's not how you normally would write a parser anyway.
So I have to lexically go through every character. And
Thus I don't only need to know the current character, but also the next
one.

It has been a really long time, but AFAIR that depends on the kind of
tokenizer you are using.
Is substr() really the only choice? Looks a bit awkward to call
substr() dozends of times.

If you really insist on reinventing the wheel and writing your own CSV
parser (why would you want to do that? Excercise in parser writing?)
then the low-level tokenizer may indeed work best on an array of
characters.

jue
 
T

Tad J McClellan

Marten Lehmann said:
I need to parse a file line by line. It is basically a CSV file, not not
completely.

Imagine this content:

"one","""two"",""three"""

"" mean a replacement for one "


That looks like normal CSV to me...

Correctly parsed, I would get two values:

one
and
"two","three"


---------------------------
#!/usr/bin/perl
use warnings;
use strict;
use Text::CSV_XS;

my $line = q("one","""two"",""three""");
my $csv = Text::CSV_XS->new();
$csv->parse($line);
print "$_\n" for $csv->fields();
 
M

Marten Lehmann

Have a look at the Text::CSV_XS[1] module, which will parse your csv
data correctly and spare you a lot of headache about balanced
quotations. Using the double quote itself to escape a literal
double quote isn't that uncommon, so it's a default setting in
Text::CSV_XS.

Thanks, you are right. While I'm usually doing easy parsing line by line
own my own and I just planned to extend it a bit, I just realized, that
recognizing double quotations and new lines is just a bit more work, so
I'm now using the Text::CSV_XS module you recommended.

Regards
Marten
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,901
Latest member
Noble71S45

Latest Threads

Top