RegExp Assistance

V

VUNETdotUS

Hi, I am working with this regexp to extract address: city, state, and
zip. This version kinda works but it extracts one element of an array
instead of three and keeps my "city" too long, including all text
before it.
.....................
var regex = /\s*(.*)\s*,\s*([A-Z]{2})\s+(\d{5}(\-\d{4})?)\s*/g;
function doit(){
var arr = d.innerHTML.match(regex);
if(arr.length=3){
d2.innerHTML = arr[0]+" | "+arr[1]+" | "+arr[2];
}else{
d2.innerHTML = "Found "+arr.length+" matches";
}
}

//-->
</script>
.......................
<div id="myDiv">
Some text here, not always break after <br>New Haven, CT 06460 plus
whatever text here too
</div>

Thanks.
 
E

Evertjan.

VUNETdotUS wrote on 17 okt 2007 in comp.lang.javascript:
Hi, I am working with this regexp to extract address: city, state, and
zip. This version kinda works but it extracts one element of an array
instead of three and keeps my "city" too long, including all text
before it.
....................
var regex = /\s*(.*)\s*,\s*([A-Z]{2})\s+(\d{5}(\-\d{4})?)\s*/g;
function doit(){
var arr = d.innerHTML.match(regex);

what is d?
if(arr.length=3){

'=' is an assignment operator, not a equality operator.

(arr.length == 3)

You made the mistake of thinking
it gives 3 array members per location
Bertter read up on match()
d2.innerHTML = arr[0]+" | "+arr[1]+" | "+arr[2];
}else{
d2.innerHTML = "Found "+arr.length+" matches";
}
}

//-->

do not use last century code, skip this line
</script>
......................
<div id="myDiv">
Some text here, not always break after <br>New Haven, CT 06460 plus
whatever text here too
</div>

Try:

<script type='text/javascript'>

var regex = /((\s*\b[A-Z]\w+)+),\s*([A-Z]{2})\s+(\d{5}(\-\d{4})?)/g;

var d = 'Some text here, not always break after'+
' <br>New Haven, CT 06460 plus whatever text here too';

// d = d + ' Buffalo, NY 12345 '; // dual test
// d = 'abc'; // empty test

var arr = d.match(regex);

if (arr) {
alert(arr.length + ' location(s) found');
for (var i = 0;i<arr.length;i++)
alert( arr.replace(/(,)/,' |').replace(/([A-Z]{2})/,'$1 |') );
};

</script>
 
P

pr

VUNETdotUS said:
Hi, I am working with this regexp to extract address: city, state, and
zip. This version kinda works but it extracts one element of an array

Do you mean result.length == 1?
instead of three and keeps my "city" too long, including all text
before it.
....................
var regex = /\s*(.*)\s*,\s*([A-Z]{2})\s+(\d{5}(\-\d{4})?)\s*/g;

Looks reasonable to me, although I'm no expert on what a zip code should
contain. Try to use fewer *s though, because they frequently match zero
occurrences (out of zero-or-more), which can be unintended. + is better
wherever you can do it. And you don't need a 'g' flag on a single match.
function doit(){
var arr = d.innerHTML.match(regex);
if(arr.length=3){

Assuming success, arr.length will be 5 (all the parentheticals plus one).
d2.innerHTML = arr[0]+" | "+arr[1]+" | "+arr[2];

Don't forget arr[0] is the *entire match*, arr[1] is the first bracketed
subexpression, arr[2] the second, etc.
}else{
d2.innerHTML = "Found "+arr.length+" matches";
}
}

//-->
</script>
......................
<div id="myDiv">
Some text here, not always break after <br>New Haven, CT 06460 plus

If your city is one or more words preceded by one or more words, then
it's impossible to tell where it starts, unless perhaps it is the only
thing that starts with initial capitals. Something to think about.
 
V

VUNETdotUS

VUNETdotUS wrote on 17 okt 2007 in comp.lang.javascript:
Hi, I am working with this regexp to extract address: city, state, and
zip. This version kinda works but it extracts one element of an array
instead of three and keeps my "city" too long, including all text
before it.
....................
var regex = /\s*(.*)\s*,\s*([A-Z]{2})\s+(\d{5}(\-\d{4})?)\s*/g;
function doit(){
var arr = d.innerHTML.match(regex);

what is d?
if(arr.length=3){

'=' is an assignment operator, not a equality operator.

(arr.length == 3)

You made the mistake of thinking
it gives 3 array members per location
Bertter read up on match()
d2.innerHTML = arr[0]+" | "+arr[1]+" | "+arr[2];
}else{
d2.innerHTML = "Found "+arr.length+" matches";
}
}

do not use last century code, skip this line
</script>
......................
<div id="myDiv">
Some text here, not always break after <br>New Haven, CT 06460 plus
whatever text here too
</div>

Try:

<script type='text/javascript'>

var regex = /((\s*\b[A-Z]\w+)+),\s*([A-Z]{2})\s+(\d{5}(\-\d{4})?)/g;

this worked fine for me... thanks for advice
var d = 'Some text here, not always break after'+
' <br>New Haven, CT 06460 plus whatever text here too';

// d = d + ' Buffalo, NY 12345 '; // dual test
// d = 'abc'; // empty test

var arr = d.match(regex);

if (arr) {
alert(arr.length + ' location(s) found');
for (var i = 0;i<arr.length;i++)
alert( arr.replace(/(,)/,' |').replace(/([A-Z]{2})/,'$1 |') );

};

</script>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top