Remove unwanted characters from column

M

matt.s.marotta

School assignment is to create a tab separated output with the original given addresses in one column and then the addresses split into other columns (ex, columns for city, postal code, street suffix).

Here is my code:

inHandler = open(inFile, 'r')
outHandler = open(outFile, 'w')
outHandler.write("FarmID\tAddress\tStreetNum\tStreetName\tSufType\tDir\tCity\tProvince\tPostalCode")
for line in inHandler:
str = line.replace("FarmID\tAddress", " ")
outHandler.write(str[0:-1])

str = str.replace(" ","\t", 1)
str = str.replace(" Rd, ","\tRd\t\t")
str = str.replace("Rd ","\tRd\t\t")
str = str.replace("Ave, ","\tAve\t\t")
str = str.replace("Ave ","\tAve\t\t")
str = str.replace("St ","\tSt\t\t")
str = str.replace("St, ","\tSt\t\t")
str = str.replace("Dr, ","\tDr\t\t")
str = str.replace("Lane, ","\tLane\t\t")
str = str.replace("Pky, ","\tPky\t\t")
str = str.replace(" Sq, ","\tSq\t\t")
str = str.replace(" Pl, ","\tPl\t\t")

str = str.replace("\tE, ","E\t")
str = str.replace("\tN, ","N\t")
str = str.replace("\tS, ","S\t")
str = str.replace("\tW, ","W\t")
str = str.replace(",\t","\t\t")
str = str.replace(", ON ","\tON\t")

outHandler.write(str)

inHandler.close()
outHandler.close()


Here is some sample addresses, there are over 100:

FarmID Address
1 1067 Niagara Stone Rd, Niagara-On-The-Lake, ON L0S 1J0
2 4260 Mountainview Rd, Lincoln, ON L0R 1B2
3 25 Hunter Rd, Grimsby, ON L3M 4A3
4 1091 Hutchinson Rd, Haldimand, ON N0A 1K0
5 5172 Green Lane Rd, Lincoln, ON L0R 1B3
6 500 Glenridge Ave, St. Catharines, ON L2S 3A1
7 471 Foss Rd, Pelham, ON L0S 1C0
8 758 Niagara Stone Rd, Niagara-On-The-Lake, ON L0S 1J0
9 3836 Main St, Lincoln, ON L0R 1S0



I have everything worked out, except that the final output places the farmID at the end of postal code as seen in the example below (notice the brackets showing where the farmID is placed):

FarmID Address StreetNum StreetName SufType Dir City Province PostalCode
1 1067 Niagara Stone Rd, Niagara-On-The-Lake, ON L0S 1J0(1) 1067 Niagara Stone Rd Niagara-On-The-Lake ON L0S 1J0

Any ideas on how to fix this? Keep in mind as well that the farmID will have 2 characters at a certain point.
 
D

Dave Angel

School assignment is to create a tab separated output with the original given addresses in one column and then the addresses split into other columns (ex, columns for city, postal code, street suffix).

Here is my code:

inHandler = open(inFile, 'r')
outHandler = open(outFile, 'w')
outHandler.write("FarmID\tAddress\tStreetNum\tStreetName\tSufType\tDir\tCity\tProvince\tPostalCode")
for line in inHandler:
str = line.replace("FarmID\tAddress", " ")
outHandler.write(str[0:-1])

str = str.replace(" ","\t", 1)
str = str.replace(" Rd, ","\tRd\t\t")
str = str.replace("Rd ","\tRd\t\t")
str = str.replace("Ave, ","\tAve\t\t")
str = str.replace("Ave ","\tAve\t\t")
str = str.replace("St ","\tSt\t\t")
str = str.replace("St, ","\tSt\t\t")
str = str.replace("Dr, ","\tDr\t\t")
str = str.replace("Lane, ","\tLane\t\t")
str = str.replace("Pky, ","\tPky\t\t")
str = str.replace(" Sq, ","\tSq\t\t")
str = str.replace(" Pl, ","\tPl\t\t")

str = str.replace("\tE, ","E\t")
str = str.replace("\tN, ","N\t")
str = str.replace("\tS, ","S\t")
str = str.replace("\tW, ","W\t")
str = str.replace(",\t","\t\t")
str = str.replace(", ON ","\tON\t")

outHandler.write(str)

inHandler.close()
outHandler.close()


Here is some sample addresses, there are over 100:

FarmID Address
1 1067 Niagara Stone Rd, Niagara-On-The-Lake, ON L0S 1J0
2 4260 Mountainview Rd, Lincoln, ON L0R 1B2
3 25 Hunter Rd, Grimsby, ON L3M 4A3
4 1091 Hutchinson Rd, Haldimand, ON N0A 1K0
5 5172 Green Lane Rd, Lincoln, ON L0R 1B3
6 500 Glenridge Ave, St. Catharines, ON L2S 3A1
7 471 Foss Rd, Pelham, ON L0S 1C0
8 758 Niagara Stone Rd, Niagara-On-The-Lake, ON L0S 1J0
9 3836 Main St, Lincoln, ON L0R 1S0



I have everything worked out, except that the final output places the farmID at the end of postal code as seen in the example below (notice the brackets showing where the farmID is placed):

FarmID Address StreetNum StreetName SufType Dir City Province PostalCode
1 1067 Niagara Stone Rd, Niagara-On-The-Lake, ON L0S 1J0(1) 1067 Niagara Stone Rd Niagara-On-The-Lake ON L0S 1J0

Any ideas on how to fix this? Keep in mind as well that the farmID will have 2 characters at a certain point.

Your specific concern is triggered by having two writes in the loop.

Get rid of the first and you're marginally closer.

But really, you've got much bigger troubles. All those
unrestricted replace calls are not at all robust. But maybe
you'll get away with it for a school assignment if the test data
is very limited.

Better would be to treat it like a parsing problem, figuring what
delimiter rule applies to each field, and building a list Then
use str.join to build the line for the outHandler.
 
M

matt.s.marotta

School assignment is to create a tab separated output with the original given addresses in one column and then the addresses split into other columns (ex, columns for city, postal code, street suffix).

Here is my code:

inHandler = open(inFile, 'r')
outHandler = open(outFile, 'w')

for line in inHandler:
str = line.replace("FarmID\tAddress", " ")
outHandler.write(str[0:-1])

str = str.replace(" ","\t", 1)
str = str.replace(" Rd, ","\tRd\t\t")
str = str.replace("Rd ","\tRd\t\t")
str = str.replace("Ave, ","\tAve\t\t")
str = str.replace("Ave ","\tAve\t\t")
str = str.replace("St ","\tSt\t\t")
str = str.replace("St, ","\tSt\t\t")
str = str.replace("Dr, ","\tDr\t\t")
str = str.replace("Lane, ","\tLane\t\t")
str = str.replace("Pky, ","\tPky\t\t")
str = str.replace(" Sq, ","\tSq\t\t")
str = str.replace(" Pl, ","\tPl\t\t")

str = str.replace("\tE, ","E\t")
str = str.replace("\tN, ","N\t")
str = str.replace("\tS, ","S\t")
str = str.replace("\tW, ","W\t")
str = str.replace(",\t","\t\t")
str = str.replace(", ON ","\tON\t")







Here is some sample addresses, there are over 100:


1 1067 Niagara Stone Rd, Niagara-On-The-Lake, ON L0S 1J0
2 4260 Mountainview Rd, Lincoln, ON L0R 1B2
3 25 Hunter Rd, Grimsby, ON L3M 4A3
4 1091 Hutchinson Rd, Haldimand, ON N0A 1K0
5 5172 Green Lane Rd, Lincoln, ON L0R 1B3
6 500 Glenridge Ave, St. Catharines, ON L2S 3A1
7 471 Foss Rd, Pelham, ON L0S 1C0
8 758 Niagara Stone Rd, Niagara-On-The-Lake, ON L0S 1J0
9 3836 Main St, Lincoln, ON L0R 1S0



I have everything worked out, except that the final output places the farmID at the end of postal code as seen in the example below (notice the brackets showing where the farmID is placed):


1 1067 Niagara Stone Rd, Niagara-On-The-Lake, ON L0S 1J0(1) 1067 Niagara Stone Rd Niagara-On-The-Lake ON L0S 1J0

Any ideas on how to fix this? Keep in mind as well that the farmID will have 2 characters at a certain point.



Your specific concern is triggered by having two writes in the loop.



Get rid of the first and you're marginally closer.



But really, you've got much bigger troubles. All those

unrestricted replace calls are not at all robust. But maybe

you'll get away with it for a school assignment if the test data

is very limited.



Better would be to treat it like a parsing problem, figuring what

delimiter rule applies to each field, and building a list Then

use str.join to build the line for the outHandler.

The code that I used is the proper way that we were supposed to complete the assignment. All I need now is an 'if...then' statement to get rid of the unwanted FarmID at the end of the addresses. I just don't know what will come after the 'if' part.
 
S

Steven D'Aprano

The code that I used is the proper way that we were supposed to complete
the assignment. All I need now is an 'if...then' statement to get rid
of the unwanted FarmID at the end of the addresses. I just don't know
what will come after the 'if' part.

Show us what you do know. If you don't know the "if", what about the
"then"?


if .... :
do what?


What do you intend to do inside the if? Under what circumstances would
you do it?

If you can answer those questions in English, then we can help you write
code to do it.
 
M

matt.s.marotta

Show us what you do know. If you don't know the "if", what about the

"then"?





if .... :

do what?





What do you intend to do inside the if? Under what circumstances would

you do it?



If you can answer those questions in English, then we can help you write

code to do it.

If the farmID < 10:
remove one character from the address column
Elif farmID > 10:
remove two characters from the address column
 
C

Chris Angelico

If the farmID < 10:
remove one character from the address column
Elif farmID > 10:
remove two characters from the address column

What if farmID == 10?

ChrisA
 
M

matt.s.marotta

What if farmID == 10?

ChrisA

Ok, sorry this is how it should be.

If the FarmID < 10:
remove one character from the address column

If the FarmID > 9:
remove two characters from the address column

My issue is I can't figure out what statement to use to define FarmID.
 
C

Chris Angelico

Ok, sorry this is how it should be.

If the FarmID < 10:
remove one character from the address column

If the FarmID > 9:
remove two characters from the address column

My issue is I can't figure out what statement to use to define FarmID.

More commonly, that would be written as

if farmID < 10:
# remove one character
else:
# remove two characters

Though this still suffers from the limitation of not handling 100 or
1000, so you might want to look at len(str(farmID)) instead.

ChrisA
 
D

Denis McMahon

School assignment is to create a tab separated output with the original
given addresses in one column and then the addresses split into other
columns (ex, columns for city, postal code, street suffix).

If you're trying to create fixed width output from variable width fields,
format specifiers may be better to use than tabs.

The problem with tabs is that columns end up misaligned when the data
fields in a column contain a mixture of items of less length than the tab
spacing and items of greater length than the tab spacing, unless you can
work out the tab spacing and adjust accordingly.

For example, my code which uses the re module to separate the various
record components and a format specifier to print the text and html
versions (and csvwriter for the csv) them creates the outputs seen here:

http://www.sined.co.uk/tmp/farms.txt
http://www.sined.co.uk/tmp/farms.csv
http://www.sined.co.uk/tmp/farms.htm
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,048
Latest member
verona

Latest Threads

Top