Reverse string-formatting (maybe?)

Dustan · Oct 14, 2006

Is there any builtin function or module with a function similar to my
made-up, not-written deformat function as follows? I can't imagine it
would be too easy to write, but possible...
('coding', 'coded', 'week')

expanded (for better visual)

'coding', 'coded', 'week')

It would return a tuple of strings, since it has no way of telling what
the original type of each item was.

Any input? I've looked through the documentation of the string module
and re module, did a search of the documentation and a search of this
group, and come up empty-handed.

Tim Chase · Oct 14, 2006

template = 'I am %s, and he %s last %s.'

('coding', 'coded', 'week')

expanded (for better visual):
('coding', 'coded', 'week')

It would return a tuple of strings, since it has no way of telling what
the original type of each item was.

Any input? I've looked through the documentation of the string module
and re module, did a search of the documentation and a search of this
group, and come up empty-handed.

Yes, in the trivial case you provide, it can be done fairly
easily using the re module:

>>> import re
>>> template = 'I am %s, and he %s last %s.'
>>> values = ('coding', 'coded', 'week')
>>> formatted = template % values
>>> unformat_re = re.escape(template).replace('%s', '(.*)')
>>> # unformat_re = unformat_re.replace('%i', '([0-9]+)')
>>> r = re.compile(unformat_re)
>>> r.match(formatted).groups()

Click to expand...

Click to expand...

('coding', 'coded', 'week')

Thing's get crazier when you have things like

or

'The value is 00003.1415'

or
was very %(adj).' % {'name': 'Grandma', 'gift': 'sweater', 'adj':
'nice'}

Additionally, things go a little tangled when the replacement
values duplicate matters in the template. Should the unformatting
of "I am tired, and he didn't last last All Saint's Day" be
parsed as ('tired', "didn't last", "All Saint's Day") or
('tired', "didn't", "last All Saint's Day"). The /intent/ is
likely the former, but getting a computer to understand intent is
a non-trivial task

Just a few early-morning thoughts...

-tkc

Tim Chase · Oct 14, 2006

template = 'I am %s, and he %s last %s.'

('coding', 'coded', 'week')

expanded (for better visual):
('coding', 'coded', 'week')

It would return a tuple of strings, since it has no way of telling what
the original type of each item was.

Any input? I've looked through the documentation of the string module
and re module, did a search of the documentation and a search of this
group, and come up empty-handed.

Yes, in the trivial case you provide, it can be done fairly
easily using the re module:

>>> import re
>>> template = 'I am %s, and he %s last %s.'
>>> values = ('coding', 'coded', 'week')
>>> formatted = template % values
>>> unformat_re = re.escape(template).replace('%s', '(.*)')
>>> # unformat_re = unformat_re.replace('%i', '([0-9]+)')
>>> r = re.compile(unformat_re)
>>> r.match(formatted).groups()

Click to expand...

Click to expand...

('coding', 'coded', 'week')

Thing's get crazier when you have things like

or

'The value is 00003.1415'

or
was very %(adj).' % {'name': 'Grandma', 'gift': 'sweater', 'adj':
'nice'}

Additionally, things go a little tangled when the replacement
values duplicate matters in the template. Should the unformatting
of "I am tired, and he didn't last last All Saint's Day" be
parsed as ('tired', "didn't last", "All Saint's Day") or
('tired', "didn't", "last All Saint's Day"). The /intent/ is
likely the former, but getting a computer to understand intent is
a non-trivial task

Just a few early-morning thoughts...

-tkc

Peter Otten · Oct 14, 2006

Dustan said:
Is there any builtin function or module with a function similar to my
made-up, not-written deformat function as follows? I can't imagine it
would be too easy to write, but possible...

('coding', 'coded', 'week')

expanded (for better visual):
('coding', 'coded', 'week')

It would return a tuple of strings, since it has no way of telling what
the original type of each item was.

Any input? I've looked through the documentation of the string module
and re module, did a search of the documentation and a search of this
group, and come up empty-handed.

Simple, but unreliable:
.... r = re.compile("(.*)".join(template.split("%s")))
.... return r.match(formatted).groups()
....('coding', 'coded', 'week')

Peter

Dustan · Oct 15, 2006

Peter said:
Simple, but unreliable:

... r = re.compile("(.*)".join(template.split("%s")))
... return r.match(formatted).groups()
...
('coding', 'coded', 'week')

Peter

Trying to figure out the 'unreliable' part of your statement...

I'm sure 2 '%s' characters in a row would be a bad idea, and if you
have similar expressions for the '%s' characters within as well as in
the neighborhood of the '%s', that would cause difficulty. Is there any
other reason it might not work properly?

My template outside of the '%s' characters contains only commas and
spaces, and within, neither commas nor spaces. Given that information,
is there any reason it might not work properly?

Tim Chase · Oct 15, 2006

My template outside of the '%s' characters contains only commas and

spaces, and within, neither commas nor spaces. Given that information,
is there any reason it might not work properly?

Given this new (key) information along with the assumption that
you're doing straight string replacement (not dictionary
replacement of the form "%(key)s" or other non-string types such
as "%05.2f"), then yes, a reversal is possible. To make it more
explicit, one would do something like

>>> template = '%s, %s, %s'
>>> values = ('Tom', 'Dick', 'Harry')
>>> formatted = template % values
>>> import re
>>> unformat_string = template.replace('%s', '([^, ]+)')
>>> unformatter = re.compile(unformat_string)
>>> extracted_values = unformatter.search(formatted).groups()

Click to expand...

Click to expand...

using '[^, ]+' to mean "one or more characters that aren't a
comma or a space".

-tkc

Dustan · Oct 15, 2006

Tim said:
My template outside of the '%s' characters contains only commas and
spaces, and within, neither commas nor spaces. Given that information,
is there any reason it might not work properly?

Click to expand...

Given this new (key) information along with the assumption that
you're doing straight string replacement (not dictionary
replacement of the form "%(key)s" or other non-string types such
as "%05.2f"), then yes, a reversal is possible. To make it more
explicit, one would do something like

template = '%s, %s, %s'
values = ('Tom', 'Dick', 'Harry')
formatted = template % values
import re
unformat_string = template.replace('%s', '([^, ]+)')
unformatter = re.compile(unformat_string)
extracted_values = unformatter.search(formatted).groups()

Click to expand...

Click to expand...

using '[^, ]+' to mean "one or more characters that aren't a
comma or a space".

-tkc

Thanks.

One more thing (I forgot to mention this other situation earlier)
The %s characters are ints, and outside can be anything except int
characters. I do have one situation of '%s%s%s', but I can change it to
'%s', and change the output into the needed output, so that's not
important. Think something along the lines of "abckdaldj iweo%s
qwierxcnv !%sjd".

Dustan · Oct 15, 2006

Dustan said:
Tim said:

My template outside of the '%s' characters contains only commas and
spaces, and within, neither commas nor spaces. Given that information,
is there any reason it might not work properly?

Click to expand...

Given this new (key) information along with the assumption that
you're doing straight string replacement (not dictionary
replacement of the form "%(key)s" or other non-string types such
as "%05.2f"), then yes, a reversal is possible. To make it more
explicit, one would do something like

template = '%s, %s, %s'
values = ('Tom', 'Dick', 'Harry')
formatted = template % values
import re
unformat_string = template.replace('%s', '([^, ]+)')
unformatter = re.compile(unformat_string)
extracted_values = unformatter.search(formatted).groups()

Click to expand...

using '[^, ]+' to mean "one or more characters that aren't a
comma or a space".

-tkc

Click to expand...

Thanks.

One more thing (I forgot to mention this other situation earlier)
The %s characters are ints, and outside can be anything except int
characters. I do have one situation of '%s%s%s', but I can change it to
'%s', and change the output into the needed output, so that's not
important. Think something along the lines of "abckdaldj iweo%s
qwierxcnv !%sjd".

That was written in haste. All the information is true. The question:
I've already created a function to do this, using your original
deformat function. Is there any way in which it might go wrong?

Dustan · Oct 15, 2006

Dustan said:
Dustan said:

Tim said:

My template outside of the '%s' characters contains only commas and
spaces, and within, neither commas nor spaces. Given that information,
is there any reason it might not work properly?

Given this new (key) information along with the assumption that
you're doing straight string replacement (not dictionary
replacement of the form "%(key)s" or other non-string types such
as "%05.2f"), then yes, a reversal is possible. To make it more
explicit, one would do something like

template = '%s, %s, %s'
values = ('Tom', 'Dick', 'Harry')
formatted = template % values
import re
unformat_string = template.replace('%s', '([^, ]+)')
unformatter = re.compile(unformat_string)
extracted_values = unformatter.search(formatted).groups()

using '[^, ]+' to mean "one or more characters that aren't a
comma or a space".

-tkc

Click to expand...

Thanks.

One more thing (I forgot to mention this other situation earlier)
The %s characters are ints, and outside can be anything except int
characters. I do have one situation of '%s%s%s', but I can change it to
'%s', and change the output into the needed output, so that's not
important. Think something along the lines of "abckdaldj iweo%s
qwierxcnv !%sjd".

Click to expand...

That was written in haste. All the information is true. The question:
I've already created a function to do this, using your original
deformat function. Is there any way in which it might go wrong?

Again, haste. I used Peter's deformat function.

Tim Chase · Oct 15, 2006

template = '%s, %s, %s'

values = ('Tom', 'Dick', 'Harry')
formatted = template % values
import re
unformat_string = template.replace('%s', '([^, ]+)')
unformatter = re.compile(unformat_string)
extracted_values = unformatter.search(formatted).groups()

using '[^, ]+' to mean "one or more characters that aren't a
comma or a space".

Click to expand...

One more thing (I forgot to mention this other situation earlier)
The %s characters are ints, and outside can be anything except int
characters. I do have one situation of '%s%s%s', but I can change it to
'%s', and change the output into the needed output, so that's not
important. Think something along the lines of "abckdaldj iweo%s
qwierxcnv !%sjd".

Click to expand...

That was written in haste. All the information is true. The question:
I've already created a function to do this, using your original
deformat function. Is there any way in which it might go wrong?

Only you know what anomalies will be found in your data-sets. If
you know/assert that

-the only stuff in the formatting string is one set of characters

-that stuff in the replacement-values can never include any of
your format-string characters

-that you're not using funky characters/formatting in your format
string (such as "%%" possibly followed by an "s" to get the
resulting text of "%s" after formatting, or trying to use other
formatters such as the aforementioned "%f" or possibly "%i")

then you should be safe. It could also be possible (with my
original replacement of "(.*)") if your values will never include
any substring of your format string. If you can't guarantee
these conditions, you're trying to make a cow out of hamburger.
Or a pig out of sausage. Or a whatever out of a hotdog.

Conventional wisdom would tell you to create a test-suite of
format-strings and sample values (preferably worst-case funkiness
in your expected format-strings/values), and then have a test
function that will assert that the unformatting of every
formatted string in the set returns the same set of values that
went in. Something like

tests = {
'I was %s but now I am %s' : [
('hot', 'cold'),
('young', 'old'),
],
'He has 3 %s and 2 %s' : [
('brothers', 'sisters'),
('cats', 'dogs')
]
}

for format_string, values in tests:
unformatter = format.replace('%s', '(.*)')
for value_tuple in values:
formatted = format_string % value_tuple
unformatted = unformatter.search(formatted).groups()
if unformatted <> value_tuple:
print "%s doesn't match %s when unformatting %s" % (
unformatted,
value_tuple
format_string)

-tkc

Dan Sommers · Oct 15, 2006

Is there any builtin function or module with a function similar to my
made-up, not-written deformat function as follows? I can't imagine it
would be too easy to write, but possible...

[ snip ]

Any input? I've looked through the documentation of the string module
and re module, did a search of the documentation and a search of this
group, and come up empty-handed.

Track down pyscanf. (Google is your friend, but I can't find any sort
of licensing/copyright information, and the web addresses in the source
code aren't available, so I hesitate to post my ancient copy.)

HTH,
Dan

Dustan · Oct 15, 2006

Only you know what anomalies will be found in your data-sets. If
you know/assert that

-the only stuff in the formatting string is one set of characters

-that stuff in the replacement-values can never include any of
your format-string characters

-that you're not using funky characters/formatting in your format
string (such as "%%" possibly followed by an "s" to get the
resulting text of "%s" after formatting, or trying to use other
formatters such as the aforementioned "%f" or possibly "%i")

then you should be safe. It could also be possible (with my
original replacement of "(.*)") if your values will never include
any substring of your format string. If you can't guarantee
these conditions, you're trying to make a cow out of hamburger.
Or a pig out of sausage. Or a whatever out of a hotdog.

Conventional wisdom would tell you to create a test-suite of
format-strings and sample values (preferably worst-case funkiness
in your expected format-strings/values), and then have a test
function that will assert that the unformatting of every
formatted string in the set returns the same set of values that
went in. Something like

tests = {
'I was %s but now I am %s' : [
('hot', 'cold'),
('young', 'old'),
],
'He has 3 %s and 2 %s' : [
('brothers', 'sisters'),
('cats', 'dogs')
]
}

for format_string, values in tests:
unformatter = format.replace('%s', '(.*)')
for value_tuple in values:
formatted = format_string % value_tuple
unformatted = unformatter.search(formatted).groups()
if unformatted <> value_tuple:
print "%s doesn't match %s when unformatting %s" % (
unformatted,
value_tuple
format_string)

-tkc

Thanks for all your help. I've gotten the idea.

KirbyBase : replacing string exceptions	2	Nov 23, 2009
Proposal to extend PEP 257 (New Documentation String Spec)	16	Jul 15, 2011
[SUMMARY] Reverse Divisible Numbers (#161)	1	May 8, 2008
How do you print a string after it's been searched for an RE?	4	Jun 23, 2011
Replace unknow string varible in file.	4	Feb 10, 2009
IronPython 2.7 Now Available	0	Mar 13, 2011
PyWart: PEP8: a seething cauldron of inconsistencies.	1	Jul 28, 2011
PyWart: PEP8: A cauldron of inconsistencies.	7	Jul 27, 2011

Reverse string-formatting (maybe?)

Dustan

Tim Chase

Tim Chase

Peter Otten

Dustan

Tim Chase

Dustan

Dustan

Dustan

Tim Chase

Dan Sommers

Dustan

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads