J
Jesse P.
Hi all,
Im trying to solve this problem:
string = "\302</u>"
TEXT_PATTERN = /\A([^<]*)/um
text_data = string.match(TEXT_PATTERN).to_s
=> "\302</u>"
As you can see, the regular expression incorrectly captures not only
the text part but also the closing tag, whereas what is supposed to be
captured is just "\302".
This problem is actually part of the REXML::Source#match method
(http://www.germane-software.com/projects/rexml/browser/trunk/src/
rexml/source.rb?rev=1266#L104) and causes REXML to parse UTF-8
documents incorrectly sometimes.
Any ideas why the pattern matching doesnt work? I dont see anything
wrong with the regular expression. Although, Im not sure what the \A
character class is for.
Best regards,
Jesse
Im trying to solve this problem:
string = "\302</u>"
TEXT_PATTERN = /\A([^<]*)/um
text_data = string.match(TEXT_PATTERN).to_s
=> "\302</u>"
As you can see, the regular expression incorrectly captures not only
the text part but also the closing tag, whereas what is supposed to be
captured is just "\302".
This problem is actually part of the REXML::Source#match method
(http://www.germane-software.com/projects/rexml/browser/trunk/src/
rexml/source.rb?rev=1266#L104) and causes REXML to parse UTF-8
documents incorrectly sometimes.
Any ideas why the pattern matching doesnt work? I dont see anything
wrong with the regular expression. Although, Im not sure what the \A
character class is for.
Best regards,
Jesse