regexp to replace comma in text

K

Kev Jackson

given the following input:

"INSERT INTO PROJECT ( PROJECT_ID, PROJECTTITLE, PROJECTNO, PROJECTDATE,
CATEGORY, SECTOR, SECTION, DEPARTMENT, LOCATION, STATUS,
COMPLETIONSTATUS, CLIENT, TYPE ) VALUES ( 2319, 'SKIM PERPINDAHAN DAN
PERUMAHAN KAMPONG RATAIE, PERUMAHAN, KAMPONG RATAIE HOUSING CONTRACT H-3
(64 HOUSES)', 'AD520/0105', TO_Date( '01/01/1981 12:00:00 AM',
'MM/DD/YYYY HH:MI:SS AM'), '2', '2', 'T', 'G', 'D', 'D', NULL, NULL,
NULL, 3000000, 'Y', 2);"

How would you replace the commas embedded in the strings in the SQL?

I'm struggling with something like:
line.sub!(/(VALUES \(\s*\d+,\s*\'[\w\&\-\s]+)\',/,'\1')

to find the first 'string' after the VALUES and to attempt to remove any
commas. The problem is that the input can contain superfluous commas in
many places and I need to remove all commas in a quoted string (liek
gsub) before I can split the string.

I can't think of an easy way to achieve this, can any regexp/ruby expert
help me out?

Kev
 
N

nobuyoshi nakada

Hi,

At Thu, 6 Oct 2005 19:54:12 +0900,
Kev Jackson wrote in [ruby-talk:159336]:
How would you replace the commas embedded in the strings in the SQL?

line.gsub(/'(?:[^\']|'')*'/) {$&.delete(",")}
 
K

Kev Jackson

nobuyoshi said:
Hi,

At Thu, 6 Oct 2005 19:54:12 +0900,
Kev Jackson wrote in [ruby-talk:159336]:

How would you replace the commas embedded in the strings in the SQL?

line.gsub(/'(?:[^\']|'')*'/) {$&.delete(",")}
Thanks, it's almost there...

Here's my code

def munge_data(line)
unless line == nil
# remove star value
line.sub!(/'\*', /,'')
# replace client
line.sub!(/'([A-Z]{3})'/) {
val = @client_pks["'"+$1.dup+"'"]
val.to_s
}
# remove commas (",") from inside descriptions/titles/briefs etc
line.gsub!(/'(?:[^\']|'')*'/) {$&.delete(",")}
# remove TO_Date

line.sub!(/TO_Date\(\s(\'\d*\/\d*\/\d*)\s[\s\w\/\:]+\',\s\'MM\/DD\/YYYY\sHH\:MI\:SS\sAM\'\)/,'\1\'')

data = line.split(",")
p data.length
#p data[16]
line = data.join(",")
end
end

The regexp seems to work (on visual inspection), but when I split on ","
I still get varying line lengths, if the only commas where between
fields I should have all lines.split.length equal. Unless I'm being
stupid.

Here's another line,

INSERT INTO ADMUSER.TBLPROJ ( TBLPROJ_ID, PROJ_TITLE, PROJ_NO, STAR,
PROJ_DATE, PROJ_CATEGORY, PROJ_SECTOR, PROJ_SECTION_CODE,
PROJ_DEPT_CODE, PROJ_LOC_CODE, PROJ_STATUS, PROJ_STATC, PROJ_BRIEF,
PROJ_REMARKS, SCHEME_VAL, NEWPROJ, PROJ_CLIENT ) VALUES ( 2225,
'TANGGUNGAN TANGGUNGAN BAGI PROJEK PROJEK RKN 7 YANG TELAH SIAP - Kerja
Membaiki Kerosakan, Tempat Letak Kerita Bertingkat"', 'AD418/0501',
'*', TO_Date( '01/01/1986 12:00:00 AM', 'MM/DD/YYYY HH:MI:SS AM'), '3',
'4', 'D', 'A', 'A', 'E', 'C3', NULL, 'Completed, Finalised and Retention
Money Released on 21/11/2001.', 11328, 'Y', 'JKS');

Not sure how I'm going to do this as I thought that split would be the
way to go, but these extra commas are scuppering my plans completely :)

Kev
 
R

Robert Klemme

Kev said:
nobuyoshi said:
Hi,

At Thu, 6 Oct 2005 19:54:12 +0900,
Kev Jackson wrote in [ruby-talk:159336]:

How would you replace the commas embedded in the strings in the SQL?

line.gsub(/'(?:[^\']|'')*'/) {$&.delete(",")}
Thanks, it's almost there...

Here's my code

def munge_data(line)
unless line == nil
# remove star value
line.sub!(/'\*', /,'')
# replace client
line.sub!(/'([A-Z]{3})'/) {
val = @client_pks["'"+$1.dup+"'"]
val.to_s
}
# remove commas (",") from inside descriptions/titles/briefs etc
line.gsub!(/'(?:[^\']|'')*'/) {$&.delete(",")}
# remove TO_Date

line.sub!(/TO_Date\(\s(\'\d*\/\d*\/\d*)\s[\s\w\/\:]+\',\s\'MM\/DD\/YYYY\sH
H\:MI\:SS\sAM\'\)/,'\1\'')

data = line.split(",")
p data.length
#p data[16]
line = data.join(",")
end
end

The regexp seems to work (on visual inspection), but when I split on
"," I still get varying line lengths, if the only commas where between
fields I should have all lines.split.length equal. Unless I'm being
stupid.

Here's another line,

INSERT INTO ADMUSER.TBLPROJ ( TBLPROJ_ID, PROJ_TITLE, PROJ_NO, STAR,
PROJ_DATE, PROJ_CATEGORY, PROJ_SECTOR, PROJ_SECTION_CODE,
PROJ_DEPT_CODE, PROJ_LOC_CODE, PROJ_STATUS, PROJ_STATC, PROJ_BRIEF,
PROJ_REMARKS, SCHEME_VAL, NEWPROJ, PROJ_CLIENT ) VALUES ( 2225,
'TANGGUNGAN TANGGUNGAN BAGI PROJEK PROJEK RKN 7 YANG TELAH SIAP -
Kerja Membaiki Kerosakan, Tempat Letak Kerita Bertingkat"',
'AD418/0501', '*', TO_Date( '01/01/1986 12:00:00 AM', 'MM/DD/YYYY
HH:MI:SS AM'), '3', '4', 'D', 'A', 'A', 'E', 'C3', NULL, 'Completed,
Finalised and Retention Money Released on 21/11/2001.', 11328, 'Y',
'JKS');

Not sure how I'm going to do this as I thought that split would be the
way to go, but these extra commas are scuppering my plans completely
:)

Kev

You could split along strings with grouping. Something like

s.split(%r{('(?:[^\']|'')*')})

=> "'aaa', 'bbbb'"
s.split(%r{('(?:[^\']|'')*')})
=> ["", "'aaa'", ", ", "'bbbb'"]

Or use #scan
s.scan(%r{'(?:[^\']|'')*'})
=> ["'aaa'", "'bbbb'"]

Kind regards

robert
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,007
Latest member
obedient dusk

Latest Threads

Top