Regex Help

E

Ezra Zygmuntowicz

Hey Guys-
I have a regex problem that I am not sure how to tackle. I am
parsing some classified ads in order to format them for display
online. I have most of the parsing done but I need help with the
final step. So the file has one ad per line and a line looks like this:

<ftditm><begad:11559303>Selah Country Home 1.5 acres. 3 bdrm, 2 bath,
irrigation, horse barn. $122,000. 509-697-6519<endad>

Now I have already parsed everything to get it to this state but
what I need to do next is to count 50 chars after the <begad:
11559303> tag and insert </ftditm>
But the tricky part is that I need to place the </ftditm> 50
characters in to the line but if the 50 chars ends in the middle of a
word then I need to match the rest of the word as well. So I need a
way to match at least 50 chars plus the rest of the current word if
the 50'th char lands in the middle of a word.
So for this particular ad 50 chars makes it to here:
<ftditm><begad:11559303>Selah Country Home 1.5 acres. 3 bdrm, 2 bath,
irri #<= 50 chars ends here# gation, horse barn. $122,000.
509-697-6519<endad>
So it ends in the middle of the word irrigation and I need it to
consume the whole word.

Any help is much appreciated-
-Ezra Zygmuntowicz
Yakima Herald-Republic
WebMaster
509-577-7732
(e-mail address removed)
 
D

David A. Black

Hi --

Hey Guys-
I have a regex problem that I am not sure how to tackle. I am parsing
some classified ads in order to format them for display online. I have most
of the parsing done but I need help with the final step. So the file has one
ad per line and a line looks like this:

<ftditm><begad:11559303>Selah Country Home 1.5 acres. 3 bdrm, 2 bath,
irrigation, horse barn. $122,000. 509-697-6519<endad>

Now I have already parsed everything to get it to this state but what I
need to do next is to count 50 chars after the <begad:11559303> tag and
insert </ftditm>
But the tricky part is that I need to place the </ftditm> 50 characters in to
the line but if the 50 chars ends in the middle of a word then I need to
match the rest of the word as well. So I need a way to match at least 50
chars plus the rest of the current word if the 50'th char lands in the middle
of a word.
So for this particular ad 50 chars makes it to here:
<ftditm><begad:11559303>Selah Country Home 1.5 acres. 3 bdrm, 2 bath, irri
#<= 50 chars ends here# gation, horse barn. $122,000. 509-697-6519<endad>
So it ends in the middle of the word irrigation and I need it to consume
the whole word.

Here's one idea:

str.sub(/(<begad:[^>]+>.{1,50}.*?\b)/, "\\1<\/ftditm>")


David
 
J

John Halderman

------=_Part_13_33294542.1124747156591
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

Seems to me that you're trying to do too much with one regular expression. =
I=20
would just grab the content between your tags and then trim that down to 50=
=20
characters and reassemble it afterwards.

-j

=20
Hi --
=20
On Tue, 23 Aug 2005, Ezra Zygmuntowicz wrote:
=20
Hey Guys-
I have a regex problem that I am not sure how to tackle. I am parsing
some classified ads in order to format them for display online. I have=
=20
most
of the parsing done but I need help with the final step. So the file ha=
s=20
one
ad per line and a line looks like this:

<ftditm><begad:11559303>Selah Country Home 1.5 acres. 3 bdrm, 2 bath,
irrigation, horse barn. $122,000. 509-697-6519<endad>

Now I have already parsed everything to get it to this state but what I
need to do next is to count 50 chars after the <begad:11559303> tag and
insert </ftditm>
But the tricky part is that I need to place the </ftditm> 50 characters=
=20
in to
the line but if the 50 chars ends in the middle of a word then I need t= o
match the rest of the word as well. So I need a way to match at least 5= 0
chars plus the rest of the current word if the 50'th char lands in the=
=20
middle
of a word.
So for this particular ad 50 chars makes it to here:
<ftditm><begad:11559303>Selah Country Home 1.5 acres. 3 bdrm, 2 bath,=
=20
irri
#<=3D 50 chars ends here# gation, horse barn. $122,000.=20
509-697-6519 said:
So it ends in the middle of the word irrigation and I need it to consum= e
the whole word.
=20
Here's one idea:
=20
str.sub(/(<begad:[^>]+>.{1,50}.*?\b)/, "\\1<\/ftditm>")
=20
=20
David

------=_Part_13_33294542.1124747156591--
 
D

David A. Black

Hi --

Seems to me that you're trying to do too much with one regular expression. I
would just grab the content between your tags and then trim that down to 50
characters and reassemble it afterwards.

I'm not sure what you mean by "too much". I think the substitution I
suggested does what Ezra said he needed. Is there an error in it?


David
-j

Hi --

Hey Guys-
I have a regex problem that I am not sure how to tackle. I am parsing
some classified ads in order to format them for display online. I have most
of the parsing done but I need help with the final step. So the file has one
ad per line and a line looks like this:

<ftditm><begad:11559303>Selah Country Home 1.5 acres. 3 bdrm, 2 bath,
irrigation, horse barn. $122,000. 509-697-6519<endad>

Now I have already parsed everything to get it to this state but what I
need to do next is to count 50 chars after the <begad:11559303> tag and
insert </ftditm>
But the tricky part is that I need to place the </ftditm> 50 characters in to
the line but if the 50 chars ends in the middle of a word then I need to
match the rest of the word as well. So I need a way to match at least 50
chars plus the rest of the current word if the 50'th char lands in the middle
of a word.
So for this particular ad 50 chars makes it to here:
<ftditm><begad:11559303>Selah Country Home 1.5 acres. 3 bdrm, 2 bath, irri
#<= 50 chars ends here# gation, horse barn. $122,000.
509-697-6519 said:
So it ends in the middle of the word irrigation and I need it to consume
the whole word.

Here's one idea:

str.sub(/(<begad:[^>]+>.{1,50}.*?\b)/, "\\1<\/ftditm>")


David
 
E

Ezra Zygmuntowicz

David-
Thanks, the regex you posted works great. I had considered just
trimming the text inside the tags and then untrimming until a word
end, but I figured there would be a regex that would do it all at once.

Thanks Dave-
Ezra

Hi --

Seems to me that you're trying to do too much with one regular
expression. I
would just grab the content between your tags and then trim that
down to 50
characters and reassemble it afterwards.

I'm not sure what you mean by "too much". I think the substitution I
suggested does what Ezra said he needed. Is there an error in it?


David

-j

Hi --

On Tue, 23 Aug 2005, Ezra Zygmuntowicz wrote:


Hey Guys-
I have a regex problem that I am not sure how to tackle. I am
parsing
some classified ads in order to format them for display online.
I have

most

of the parsing done but I need help with the final step. So the
file has

one

ad per line and a line looks like this:

<ftditm><begad:11559303>Selah Country Home 1.5 acres. 3 bdrm, 2
bath,
irrigation, horse barn. $122,000. 509-697-6519<endad>

Now I have already parsed everything to get it to this state but
what I
need to do next is to count 50 chars after the <begad:11559303>
tag and
insert </ftditm>
But the tricky part is that I need to place the </ftditm> 50
characters

in to

the line but if the 50 chars ends in the middle of a word then I
need to
match the rest of the word as well. So I need a way to match at
least 50
chars plus the rest of the current word if the 50'th char lands
in the

middle

of a word.
So for this particular ad 50 chars makes it to here:
<ftditm><begad:11559303>Selah Country Home 1.5 acres. 3 bdrm, 2
bath,

irri

#<= 50 chars ends here# gation, horse barn. $122,000.

509-697-6519<endad>

So it ends in the middle of the word irrigation and I need it to
consume
the whole word.


Here's one idea:

str.sub(/(<begad:[^>]+>.{1,50}.*?\b)/, "\\1<\/ftditm>")


David

-Ezra Zygmuntowicz
Yakima Herald-Republic
WebMaster
509-577-7732
(e-mail address removed)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top