Splitting paragraph into array.

Sandman · Aug 13, 2004

I am splitting a text block into paragraphs, to be able to add images and stuff
like that to a specific paragraph in a content management system.

Well, right now I'm splittin on two or more newlines, so this text block
(indentation added for clarity):

Hello, my nickname is Sandman and I am coding
some Perl

Call me

Would be split into two parts, with "Call me" being the second one.

My problem now is that if I have a text block like below:

Hello, my nickname is Sandman and I am coding
some Perl. Here is an example:

<code>
print "Hello World!";

print "Foo";
</code>

Call me

The above would, given the rules I use now, yield four parts, as such:

---------------------------------------------
Hello, my nickname is Sandman and I am coding
some Perl. Here is an example:
---------------------------------------------
<code>
print "Hello World!";
---------------------------------------------
print "Foo";
</code>
---------------------------------------------
Call me
---------------------------------------------

But I would want it to end up in three parts, as such:

---------------------------------------------
Hello, my nickname is Sandman and I am coding
some Perl. Here is an example:
---------------------------------------------
<code>
print "Hello World!";

print "Foo";
</code>
---------------------------------------------
Call me
---------------------------------------------

So, basically, what I want to do is to split the text block up with the
delimiter "\n{2,}" but not when it is inside an *unclosed* html tag. Some
examples:

<div class='quote'>
Hello, my nickname is Sandman and I am coding
some Perl. Here is an example:

<code>
print "Hello World!";

print "Foo";
</code>

Call me
</div>

Ends up in:

---------------------------------------------
<div class='quote'>
Hello, my nickname is Sandman and I am coding
some Perl. Here is an example:

<code>
print "Hello World!";

print "Foo";
</code>

Call me
</div>
---------------------------------------------

And

<div class='quote'>
Hello, my nickname is Sandman and I am coding
some Perl. Here is an example:

<code>
print "Hello World!";

print "Foo";
</code>
</div>

Call me

Ends up in:

---------------------------------------------
<div class='quote'>
Hello, my nickname is Sandman and I am coding
some Perl. Here is an example:

<code>
print "Hello World!";

print "Foo";
</code>
</div>
---------------------------------------------
Call me
---------------------------------------------

Hopefully you get the idea.

Any ideas on how to solve it?

gnari · Aug 13, 2004

[balanced tags]

is the code html encoded, or can this happen?:

foo

<code>
$a = "</code>";

print "$a\n<code>";

print "\n";
</code>

bar

<code>
$b = "</code>";

print "$b\n<code>";

print "\n";
</code>

fubar

gnari

Sandman · Aug 13, 2004

[balanced tags]

is the code html encoded, or can this happen?:

foo

<code>
$a = "</code>";

print "$a\n<code>";

print "\n";
</code>

bar

<code>
$b = "</code>";

print "$b\n<code>";

print "\n";
</code>

fubar

That could happen, but it's pretty unlikely.

I have a working version now that works by iterating trough each line, seeing
if there is a start tag but not an end tag, and if so, add 1 to a variable, and
only adds the aggregated if this variable is zero.

Your above example outputs this:

Debug:
0: foo
1: <code>
0: = "</code>";
0: print "
1: <code>";
1:
1: print "
1: ";
0: </code>
0: bar
1: <code>
0: = "</code>";
0: print "
1: <code>";
1:
1: print "
1: ";
0: </code>

Paragraphs:
---------------
foo
---------------
<code>
= "</code>";
---------------
print "
<code>";

print "
";
</code>
---------------
bar
---------------
<code>
= "</code>";
---------------
print "
<code>";

print "
";
</code>
---------------

Which is completely wrong. But this text:

-------------------------------------------------------------
Hello, my nickname is Sandman, and I like PHP, some examples:

<code>
print "Hello World";

print "Foobar";
</code>

Here are nested tags:

<quote>
<quote>
He said he liked flowers
</quote>

Well, he doesn't, ok.

<quote>I like them</quote>

Good for you
</quote>

<div class="paragraph">
Nice paragraph
</div>

<img src="foo.jpg"> <- Nice pic!
-------------------------------------------------------------

Outputs this:

Debug:
0: Hello, my nickname is Sandman, and I like PHP, some examples:
1: <code>
1: print "Hello World";
1:
1: print "Foobar";
0: </code>
0: Here are nested tags:
1: <quote>
2: <quote>
2: He said he liked flowers
1: </quote>
1:
1: Well, he doesn't, ok.
1:
1: <quote>I like them</quote>
1:
1: Good for you
0: </quote>
1: <div class="paragraph">
1: Nice paragraph
0: </div>
0: <img src="foo.jpg"> <- Nice pic!

Paragraphs:
---------------
Hello, my nickname is Sandman, and I like PHP, some examples:
---------------
<code>
print "Hello World";

print "Foobar";
</code>
---------------
Here are nested tags:
---------------
<quote>
<quote>
He said he liked flowers
</quote>

Well, he doesn't, ok.

<quote>I like them</quote>

Good for you
</quote>

Tad McClellan · Aug 13, 2004

Well, right now I'm splittin on two or more newlines,

My problem now is that if I have a text block like below:

Hello, my nickname is Sandman and I am coding
some Perl. Here is an example:

<code>
print "Hello World!";

print "Foo";
</code>

Call me

The above would, given the rules I use now, yield four parts, as such:

---------------------------------------------
Hello, my nickname is Sandman and I am coding
some Perl. Here is an example:
---------------------------------------------
<code>
print "Hello World!";
---------------------------------------------
print "Foo";
</code>
---------------------------------------------
Call me
---------------------------------------------

But I would want it to end up in three parts, as such:

---------------------------------------------
Hello, my nickname is Sandman and I am coding
some Perl. Here is an example:
---------------------------------------------
<code>
print "Hello World!";

print "Foo";
</code>

Any ideas on how to solve it?

foreach ( grep {defined and length} split m#\n{2,}|(<code>.*?</code>)#s, $txt )

Buggy and fragile, but that is to be expected when processing HTML
without a real parser. (hint: you should use an HTML::* module
for processing HTML data).

Problem Splitting Text String	2	Dec 29, 2022
Dynamic Array Size Problem??	9	Jul 10, 2023
Clickable Div Block	1	Oct 13, 2023
How can I add arrows to my FAQ	0	Aug 9, 2023
Help with code	0	Jun 12, 2022
Trying to add text into an editable div that is in an iframe	0	Dec 15, 2022
Bash scripts for web apps	1	Jan 16, 2023
Stuck with html and css	25	Dec 14, 2022

Splitting paragraph into array.

Sandman

gnari

Sandman

Tad McClellan

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads