Crawl nested data structure, apply code block to each

R

Randy Westlund

I have a simple problem, but am not sure how to solve it. I'm
getting data from a MongoDB database with the MongoDB module. This
returns a BSON (JSON-like) document as a nested data structure with
arbitrary element types. I'm taking that and building a LaTeX
document with the Template::Latex module. This is currently
working, most of the time.

My problem is that the strings I'm pulling from the database
sometimes have an '&' in them, which screws up my tabularx sections
in LaTeX. So I need some way to crawl this data structure and
escape them. I want to do something like call map on all the scalar
values found.

I looked at Data::Nested, but didn't see anything useful for me. Is
there a module that has a function like this, or a concise way to
write this myself?
 
B

Bjoern Hoehrmann

* Randy Westlund wrote in comp.lang.perl.misc:
I have a simple problem, but am not sure how to solve it. I'm
getting data from a MongoDB database with the MongoDB module. This
returns a BSON (JSON-like) document as a nested data structure with
arbitrary element types. I'm taking that and building a LaTeX
document with the Template::Latex module. This is currently
working, most of the time.

My problem is that the strings I'm pulling from the database
sometimes have an '&' in them, which screws up my tabularx sections
in LaTeX. So I need some way to crawl this data structure and
escape them. I want to do something like call map on all the scalar
values found.

This sounds like the `&` need to be escaped when the Template::Latex
module combines your template code with the data from the structure.
Ordinarily there should be ways to indicate how values need to be es-
caped (consider generating HTML documents from templates, sometimes
values need to be escaped per HTML rules, sometimes JavaScript rules,
maybe CSS rules, sometimes even a combination of the rules) and I'd
suggest looking for that instead of transforming your data this way.
 
R

Rainer Weikusat

Randy Westlund said:
I have a simple problem, but am not sure how to solve it. I'm
getting data from a MongoDB database with the MongoDB module. This
returns a BSON (JSON-like) document as a nested data structure with
arbitrary element types. I'm taking that and building a LaTeX
document with the Template::Latex module. This is currently
working, most of the time.

My problem is that the strings I'm pulling from the database
sometimes have an '&' in them, which screws up my tabularx sections
in LaTeX. So I need some way to crawl this data structure and
escape them.

Why don't you escape them on extraction?
 
J

John Bokma

Randy Westlund said:
I have a simple problem, but am not sure how to solve it. I'm
getting data from a MongoDB database with the MongoDB module. This
returns a BSON (JSON-like) document as a nested data structure with
arbitrary element types. I'm taking that and building a LaTeX
document with the Template::Latex module. This is currently
working, most of the time.

My problem is that the strings I'm pulling from the database
sometimes have an '&' in them, which screws up my tabularx sections
in LaTeX. So I need some way to crawl this data structure and
escape them. I want to do something like call map on all the scalar
values found.

Make your own TT filter?

http://template-toolkit.org/docs/modules/Template/Filters.html#section_FILTERS

IIRC you can chain filters, so you can first run your data to your
custom filter, then through the latex filter.
 
R

Randy Westlund

Make your own TT filter?

http://template-toolkit.org/docs/modules/Template/Filters.html#section_FILTERS

IIRC you can chain filters, so you can first run your data to your
custom filter, then through the latex filter.

This looks promising. The remaining obstacle is that when I'm
building the hash to feed Template::Latex, I'm intentionally
inserting some ampersands for formatting. So I need to escape some
of them, but not others. Perhaps for the ones I'm intentionally
putting there, I'll write them as '&&' and have the filter transform
it like this:
'&' > '\&'
'&&' > '&'

Of course, then any user data containing '&&' will break it :/
 
R

Rainer Weikusat

Randy Westlund said:
This looks promising. The remaining obstacle is that when I'm
building the hash to feed Template::Latex, I'm intentionally
inserting some ampersands for formatting.

Then why on earth don't you escape ampersands in the input data before
putting it in the hash ands insert real 'table format &s' afterwards?
 
R

Randy Westlund

Then why on earth don't you escape ampersands in the input data before
putting it in the hash ands insert real 'table format &s' afterwards?

That's why I was trying to figure out how I could crawl the data
structure, to do it before I inserted stuff. My code is laid out
like this:

- get complicated document from MongoDB
- spend two pages of perl pulling things out of the nested data
structure, transforming the complicated data structure into a
complicated mess of LaTex formatting mixed with variables in a
hash
- feed to template

The whole thing generates something like an invoice, but with a lot
of conditional formatting depending on what things are in the DB
record.
 
P

Peter J. Holzer

My code is laid out like this:

- get complicated document from MongoDB
- spend two pages of perl pulling things out of the nested data
structure, transforming the complicated data structure into a
complicated mess of LaTex formatting mixed with variables in a
hash
- feed to template

This may be part of the problem. I find that it is generally a good idea
to delay output conversion (in this case applying LaTeX formatting, but
the same applies for HTML or just character encoding as long as
possible, and ideally leave it to your templating engine, output filter,
or whatever. Otherwise it is too easy to lose track of what still needs
to be converted and what doesn't (leading to either double-converted
strings or unconverted input in the output).

hp
 
R

Rainer Weikusat

Randy Westlund said:
That's why I was trying to figure out how I could crawl the data
structure, to do it before I inserted stuff. My code is laid out
like this:

- get complicated document from MongoDB
- spend two pages of perl pulling things out of the nested data
structure, transforming the complicated data structure into a
complicated mess of LaTex formatting mixed with variables in a
hash

Did it already occur to you that this is already "code crawling the
database", although specialized for your problem? All you need to add is
an intermediate processing step between

'get data out of the BSON document'

and

'transform data before putting it into the hash'

How are you acessing the serialized data?
 
R

Randy Westlund

Did it already occur to you that this is already "code crawling the
database", although specialized for your problem? All you need to add is
an intermediate processing step between

'get data out of the BSON document'

and

'transform data before putting it into the hash'

How are you acessing the serialized data?

I'm using Data::Diver to pull fields out one at a time. I solved
the problem by wrapping those calls with my own sub that does some
simple substitution. It's the obvious solution, but it isn't very
pretty. This being perl, I was hoping I could find some nice
declarative way to it, like how map works. I guess in this case
there isn't one.
 
R

Rainer Weikusat

Randy Westlund said:
Randy Westlund <[email protected]> writes:
[...]

I'm using Data::Diver to pull fields out one at a time. I solved
the problem by wrapping those calls with my own sub that does some
simple substitution. It's the obvious solution, but it isn't very
pretty. This being perl, I was hoping I could find some nice
declarative way to it, like how map works.

'map' works by looping over the input list and collecting the results of
evaluating the 'map expressions' on an output list. You could use that,
too, by turning this into a multi-pass algorithm which first builds a
list of keys and values, then uses map to transform that into a list of
keys and escaped values, than runs whatever your other formatting code
happens to do on this list and finally, puts the results into a hash. I
don't quite get why someone would consider this 'a pretty solution',
especially when comparing it with a single-pass algorithm which performs
the escaping-step which must be done prior to the other processing so
that it doesn't escape the wrong ampersands before said 'other
processing' ever sees the data.

If Data::Diver was an OO-module, you could subclass that, overload Dive,
and then, your main 'processing logic' would be independent of the 'data
extraction logic' in the sense that escaping might or might not be
performed depending on which kind of 'diver object' is used to extract
the values. But since it isn't, that's not an option.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,733
Messages
2,569,440
Members
44,830
Latest member
ZADIva7383

Latest Threads

Top