Reverse string-formatting (maybe?)

D

Dustan

Is there any builtin function or module with a function similar to my
made-up, not-written deformat function as follows? I can't imagine it
would be too easy to write, but possible...
('coding', 'coded', 'week')

expanded (for better visual):('coding', 'coded', 'week')

It would return a tuple of strings, since it has no way of telling what
the original type of each item was.


Any input? I've looked through the documentation of the string module
and re module, did a search of the documentation and a search of this
group, and come up empty-handed.
 
T

Tim Chase

template = 'I am %s, and he %s last %s.'
('coding', 'coded', 'week')

expanded (for better visual):
('coding', 'coded', 'week')

It would return a tuple of strings, since it has no way of telling what
the original type of each item was.

Any input? I've looked through the documentation of the string module
and re module, did a search of the documentation and a search of this
group, and come up empty-handed.


Yes, in the trivial case you provide, it can be done fairly
easily using the re module:
>>> import re
>>> template = 'I am %s, and he %s last %s.'
>>> values = ('coding', 'coded', 'week')
>>> formatted = template % values
>>> unformat_re = re.escape(template).replace('%s', '(.*)')
>>> # unformat_re = unformat_re.replace('%i', '([0-9]+)')
>>> r = re.compile(unformat_re)
>>> r.match(formatted).groups()
('coding', 'coded', 'week')

Thing's get crazier when you have things like
'The value is 00003.1415'

or
was very %(adj).' % {'name': 'Grandma', 'gift': 'sweater', 'adj':
'nice'}

Additionally, things go a little tangled when the replacement
values duplicate matters in the template. Should the unformatting
of "I am tired, and he didn't last last All Saint's Day" be
parsed as ('tired', "didn't last", "All Saint's Day") or
('tired', "didn't", "last All Saint's Day"). The /intent/ is
likely the former, but getting a computer to understand intent is
a non-trivial task ;)

Just a few early-morning thoughts...

-tkc
 
T

Tim Chase

template = 'I am %s, and he %s last %s.'
('coding', 'coded', 'week')

expanded (for better visual):
('coding', 'coded', 'week')

It would return a tuple of strings, since it has no way of telling what
the original type of each item was.

Any input? I've looked through the documentation of the string module
and re module, did a search of the documentation and a search of this
group, and come up empty-handed.


Yes, in the trivial case you provide, it can be done fairly
easily using the re module:
>>> import re
>>> template = 'I am %s, and he %s last %s.'
>>> values = ('coding', 'coded', 'week')
>>> formatted = template % values
>>> unformat_re = re.escape(template).replace('%s', '(.*)')
>>> # unformat_re = unformat_re.replace('%i', '([0-9]+)')
>>> r = re.compile(unformat_re)
>>> r.match(formatted).groups()
('coding', 'coded', 'week')

Thing's get crazier when you have things like
'The value is 00003.1415'

or
was very %(adj).' % {'name': 'Grandma', 'gift': 'sweater', 'adj':
'nice'}

Additionally, things go a little tangled when the replacement
values duplicate matters in the template. Should the unformatting
of "I am tired, and he didn't last last All Saint's Day" be
parsed as ('tired', "didn't last", "All Saint's Day") or
('tired', "didn't", "last All Saint's Day"). The /intent/ is
likely the former, but getting a computer to understand intent is
a non-trivial task ;)

Just a few early-morning thoughts...

-tkc
 
P

Peter Otten

Dustan said:
Is there any builtin function or module with a function similar to my
made-up, not-written deformat function as follows? I can't imagine it
would be too easy to write, but possible...

('coding', 'coded', 'week')

expanded (for better visual):
('coding', 'coded', 'week')

It would return a tuple of strings, since it has no way of telling what
the original type of each item was.


Any input? I've looked through the documentation of the string module
and re module, did a search of the documentation and a search of this
group, and come up empty-handed.

Simple, but unreliable:
.... r = re.compile("(.*)".join(template.split("%s")))
.... return r.match(formatted).groups()
....('coding', 'coded', 'week')

Peter
 
D

Dustan

Peter said:
Simple, but unreliable:

... r = re.compile("(.*)".join(template.split("%s")))
... return r.match(formatted).groups()
...
('coding', 'coded', 'week')

Peter

Trying to figure out the 'unreliable' part of your statement...

I'm sure 2 '%s' characters in a row would be a bad idea, and if you
have similar expressions for the '%s' characters within as well as in
the neighborhood of the '%s', that would cause difficulty. Is there any
other reason it might not work properly?

My template outside of the '%s' characters contains only commas and
spaces, and within, neither commas nor spaces. Given that information,
is there any reason it might not work properly?
 
T

Tim Chase

My template outside of the '%s' characters contains only commas and
spaces, and within, neither commas nor spaces. Given that information,
is there any reason it might not work properly?

Given this new (key) information along with the assumption that
you're doing straight string replacement (not dictionary
replacement of the form "%(key)s" or other non-string types such
as "%05.2f"), then yes, a reversal is possible. To make it more
explicit, one would do something like
>>> template = '%s, %s, %s'
>>> values = ('Tom', 'Dick', 'Harry')
>>> formatted = template % values
>>> import re
>>> unformat_string = template.replace('%s', '([^, ]+)')
>>> unformatter = re.compile(unformat_string)
>>> extracted_values = unformatter.search(formatted).groups()

using '[^, ]+' to mean "one or more characters that aren't a
comma or a space".

-tkc
 
D

Dustan

Tim said:
My template outside of the '%s' characters contains only commas and
spaces, and within, neither commas nor spaces. Given that information,
is there any reason it might not work properly?

Given this new (key) information along with the assumption that
you're doing straight string replacement (not dictionary
replacement of the form "%(key)s" or other non-string types such
as "%05.2f"), then yes, a reversal is possible. To make it more
explicit, one would do something like
template = '%s, %s, %s'
values = ('Tom', 'Dick', 'Harry')
formatted = template % values
import re
unformat_string = template.replace('%s', '([^, ]+)')
unformatter = re.compile(unformat_string)
extracted_values = unformatter.search(formatted).groups()

using '[^, ]+' to mean "one or more characters that aren't a
comma or a space".

-tkc

Thanks.

One more thing (I forgot to mention this other situation earlier)
The %s characters are ints, and outside can be anything except int
characters. I do have one situation of '%s%s%s', but I can change it to
'%s', and change the output into the needed output, so that's not
important. Think something along the lines of "abckdaldj iweo%s
qwierxcnv !%sjd".
 
D

Dustan

Dustan said:
Tim said:
My template outside of the '%s' characters contains only commas and
spaces, and within, neither commas nor spaces. Given that information,
is there any reason it might not work properly?

Given this new (key) information along with the assumption that
you're doing straight string replacement (not dictionary
replacement of the form "%(key)s" or other non-string types such
as "%05.2f"), then yes, a reversal is possible. To make it more
explicit, one would do something like
template = '%s, %s, %s'
values = ('Tom', 'Dick', 'Harry')
formatted = template % values
import re
unformat_string = template.replace('%s', '([^, ]+)')
unformatter = re.compile(unformat_string)
extracted_values = unformatter.search(formatted).groups()

using '[^, ]+' to mean "one or more characters that aren't a
comma or a space".

-tkc

Thanks.

One more thing (I forgot to mention this other situation earlier)
The %s characters are ints, and outside can be anything except int
characters. I do have one situation of '%s%s%s', but I can change it to
'%s', and change the output into the needed output, so that's not
important. Think something along the lines of "abckdaldj iweo%s
qwierxcnv !%sjd".

That was written in haste. All the information is true. The question:
I've already created a function to do this, using your original
deformat function. Is there any way in which it might go wrong?
 
D

Dustan

Dustan said:
Dustan said:
Tim said:
My template outside of the '%s' characters contains only commas and
spaces, and within, neither commas nor spaces. Given that information,
is there any reason it might not work properly?

Given this new (key) information along with the assumption that
you're doing straight string replacement (not dictionary
replacement of the form "%(key)s" or other non-string types such
as "%05.2f"), then yes, a reversal is possible. To make it more
explicit, one would do something like

template = '%s, %s, %s'
values = ('Tom', 'Dick', 'Harry')
formatted = template % values
import re
unformat_string = template.replace('%s', '([^, ]+)')
unformatter = re.compile(unformat_string)
extracted_values = unformatter.search(formatted).groups()

using '[^, ]+' to mean "one or more characters that aren't a
comma or a space".

-tkc

Thanks.

One more thing (I forgot to mention this other situation earlier)
The %s characters are ints, and outside can be anything except int
characters. I do have one situation of '%s%s%s', but I can change it to
'%s', and change the output into the needed output, so that's not
important. Think something along the lines of "abckdaldj iweo%s
qwierxcnv !%sjd".

That was written in haste. All the information is true. The question:
I've already created a function to do this, using your original
deformat function. Is there any way in which it might go wrong?

Again, haste. I used Peter's deformat function.
 
T

Tim Chase

template = '%s, %s, %s'
values = ('Tom', 'Dick', 'Harry')
formatted = template % values
import re
unformat_string = template.replace('%s', '([^, ]+)')
unformatter = re.compile(unformat_string)
extracted_values = unformatter.search(formatted).groups()

using '[^, ]+' to mean "one or more characters that aren't a
comma or a space".

One more thing (I forgot to mention this other situation earlier)
The %s characters are ints, and outside can be anything except int
characters. I do have one situation of '%s%s%s', but I can change it to
'%s', and change the output into the needed output, so that's not
important. Think something along the lines of "abckdaldj iweo%s
qwierxcnv !%sjd".

That was written in haste. All the information is true. The question:
I've already created a function to do this, using your original
deformat function. Is there any way in which it might go wrong?

Only you know what anomalies will be found in your data-sets. If
you know/assert that

-the only stuff in the formatting string is one set of characters

-that stuff in the replacement-values can never include any of
your format-string characters

-that you're not using funky characters/formatting in your format
string (such as "%%" possibly followed by an "s" to get the
resulting text of "%s" after formatting, or trying to use other
formatters such as the aforementioned "%f" or possibly "%i")

then you should be safe. It could also be possible (with my
original replacement of "(.*)") if your values will never include
any substring of your format string. If you can't guarantee
these conditions, you're trying to make a cow out of hamburger.
Or a pig out of sausage. Or a whatever out of a hotdog. :)

Conventional wisdom would tell you to create a test-suite of
format-strings and sample values (preferably worst-case funkiness
in your expected format-strings/values), and then have a test
function that will assert that the unformatting of every
formatted string in the set returns the same set of values that
went in. Something like

tests = {
'I was %s but now I am %s' : [
('hot', 'cold'),
('young', 'old'),
],
'He has 3 %s and 2 %s' : [
('brothers', 'sisters'),
('cats', 'dogs')
]
}

for format_string, values in tests:
unformatter = format.replace('%s', '(.*)')
for value_tuple in values:
formatted = format_string % value_tuple
unformatted = unformatter.search(formatted).groups()
if unformatted <> value_tuple:
print "%s doesn't match %s when unformatting %s" % (
unformatted,
value_tuple
format_string)

-tkc
 
D

Dan Sommers

Is there any builtin function or module with a function similar to my
made-up, not-written deformat function as follows? I can't imagine it
would be too easy to write, but possible...

[ snip ]
Any input? I've looked through the documentation of the string module
and re module, did a search of the documentation and a search of this
group, and come up empty-handed.

Track down pyscanf. (Google is your friend, but I can't find any sort
of licensing/copyright information, and the web addresses in the source
code aren't available, so I hesitate to post my ancient copy.)

HTH,
Dan
 
D

Dustan

Only you know what anomalies will be found in your data-sets. If
you know/assert that

-the only stuff in the formatting string is one set of characters

-that stuff in the replacement-values can never include any of
your format-string characters

-that you're not using funky characters/formatting in your format
string (such as "%%" possibly followed by an "s" to get the
resulting text of "%s" after formatting, or trying to use other
formatters such as the aforementioned "%f" or possibly "%i")

then you should be safe. It could also be possible (with my
original replacement of "(.*)") if your values will never include
any substring of your format string. If you can't guarantee
these conditions, you're trying to make a cow out of hamburger.
Or a pig out of sausage. Or a whatever out of a hotdog. :)

Conventional wisdom would tell you to create a test-suite of
format-strings and sample values (preferably worst-case funkiness
in your expected format-strings/values), and then have a test
function that will assert that the unformatting of every
formatted string in the set returns the same set of values that
went in. Something like

tests = {
'I was %s but now I am %s' : [
('hot', 'cold'),
('young', 'old'),
],
'He has 3 %s and 2 %s' : [
('brothers', 'sisters'),
('cats', 'dogs')
]
}

for format_string, values in tests:
unformatter = format.replace('%s', '(.*)')
for value_tuple in values:
formatted = format_string % value_tuple
unformatted = unformatter.search(formatted).groups()
if unformatted <> value_tuple:
print "%s doesn't match %s when unformatting %s" % (
unformatted,
value_tuple
format_string)

-tkc

Thanks for all your help. I've gotten the idea.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,011
Latest member
AjaUqq1950

Latest Threads

Top