replacing multiple instances of commas beginning at specific position

S

striker

I have a comma delimited text file that has multiple instances of
multiple commas. Each file will contain approximatley 300 lines. For
example:

one, two, three,,,,four,five,,,,six
one, two, three,four,,,,,,,,,,eighteen, and so on.

There is one time when multiple commas are allowed. Just prior to the
letters ADMNSRC there should be one instance of 4 commas. (
,eight,,,,ADMNSRC,thirteen, ). The text ADMNSRC is NOT in the same
place on each line.

What would be the best approach to replace all instances of multiple
commas with just one comma, except for the 4 commas prior to ADMNSRC?

Any help would be greatly appreciated.
TIA,
Kevin
 
B

bruno at modulix

striker said:
I have a comma delimited text file that has multiple instances of
multiple commas. Each file will contain approximatley 300 lines. For
example:

one, two, three,,,,four,five,,,,six
one, two, three,four,,,,,,,,,,eighteen, and so on.

There is one time when multiple commas are allowed. Just prior to the
letters ADMNSRC there should be one instance of 4 commas. (
,eight,,,,ADMNSRC,thirteen, ). The text ADMNSRC is NOT in the same
place on each line.

What would be the best approach to replace all instances of multiple
commas with just one comma, except for the 4 commas prior to ADMNSRC?

Seems like a typical use case for the re module...
-> now you've got *2* problems- !-)
 
M

Micah Elliott

I have a comma delimited text file that has multiple instances of
multiple commas. Each file will contain approximatley 300 lines.
For example:

one, two, three,,,,four,five,,,,six
one, two, three,four,,,,,,,,,,eighteen, and so on.

There is one time when multiple commas are allowed. Just prior to
the letters ADMNSRC there should be one instance of 4 commas. (
,eight,,,,ADMNSRC,thirteen, ). The text ADMNSRC is NOT in the same
place on each line.

What would be the best approach to replace all instances of multiple
commas with just one comma, except for the 4 commas prior to
ADMNSRC?

One possible approach:

#! /usr/bin/env python

import re

# This list simulates the actual opened file.
infile = [
'one, two, three,four,,,,,,ADMNSRC,,,,eighteen,',
'one, two, three,four,five,six'
]

# Placeholder for resultant list.
result = []

for item in infile:
# Use a regex to just reduce *all* multi-commas to singles.
item = re.sub(r',{2,}', r',', item)
# Add back the desired commas for special case.
item = item.replace('ADMNSRC', ',,,ADMNSRC')
# Remove spaces??
item = item.replace(' ', '')
# Add to resultant list.
result.append(item)
 
D

Dennis Lee Bieber

What would be the best approach to replace all instances of multiple
commas with just one comma, except for the 4 commas prior to ADMNSRC?
Simplify the problem... Start by rephrasing... Since ADMNSRC is to
always have four commas leading up to it (at least, as I understand your
statement of intent), you could consider three of those to be part of
the string. So...

Phase one: split on commas, tossing out any null fields
Phase two: replace "ADMNSRC" with ",,,ADMNSRC"
Phase three: rejoin the parts that remain.

Doing this efficiently may be another matter but...

data = [ "one, two, three,,,,four,,,,ADMNSRC, five,,,,six",
"one, two, three,four,,,,,,,,ADMNSRC,,,,,,eighteen, and so
on" ]

result = []
for ln in data:
wds = [x.strip() for x in ln.split(",") if x]
for i in range(len(wds)):
if wds == "ADMNSRC":
wds = ",,,ADMNSRC"
result.append(",".join(wds))

print result
--
 
B

Bengt Richter

What would be the best approach to replace all instances of multiple
commas with just one comma, except for the 4 commas prior to ADMNSRC?
Simplify the problem... Start by rephrasing... Since ADMNSRC is to
always have four commas leading up to it (at least, as I understand your
statement of intent), you could consider three of those to be part of
the string. So...

Phase one: split on commas, tossing out any null fields
Phase two: replace "ADMNSRC" with ",,,ADMNSRC"
Phase three: rejoin the parts that remain.

Doing this efficiently may be another matter but...

data = [ "one, two, three,,,,four,,,,ADMNSRC, five,,,,six",
"one, two, three,four,,,,,,,,ADMNSRC,,,,,,eighteen, and so
on" ]

result = []
for ln in data:
wds = [x.strip() for x in ln.split(",") if x]
for i in range(len(wds)):
if wds == "ADMNSRC":
wds = ",,,ADMNSRC"
result.append(",".join(wds))

print result


Or if data is from a single file read, maybe (untested beyond what you see ;-)
... one, two, three,,,,four,,,,ADMNSRC, five,,,,six
... one, two, three,four,,,,,,,,ADMNSRC,,,,,,eighteen, and so on
... """ one, two, three,four,,,,ADMNSRC, five,six
one, two, three,four,,,,ADMNSRC,eighteen, and so on

Regards,
Bengt Richter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,056
Latest member
GlycogenSupporthealth

Latest Threads

Top