Quoting literals properly

Y

yoni

Hi,

Consider the following XML document:

<article>
This is a sample <literal>document</literal>.
Some <literal>words</literal>, from some reason, are tagged with the
<literal>literal</literal> tag.
</article>

I'm using the following XSL to output html, where each literal is
surrounded by double quotes:

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template match="/">
<html>
<xsl:apply-templates/>
</html>
</xsl:template>

<xsl:template match="literal">
<xsl:text>&quot;</xsl:text>
<xsl:apply-templates/>
<xsl:text>&quot;</xsl:text>
</xsl:template>

</xsl:stylesheet>

The result is the following:
<html>
This is a sample "document". Some "words", from some reason, are tagged
with the "literal" tag.
</html>

Problem is that this output doesn't obey the (somewhat strange) rules
of American English. The proper way to quote those literals would be:

This is a sample "document." Some "words," from some reason, are tagged
with the "literal" tag.
Notice that periods and commas that follow a literal end up being
inside the quotes.

My question is: Can anyone help me writing an XSL that would do this
quoting job correctly?

Thanks!

Yoni
 
J

Joe Kesselman

My question is: Can anyone help me writing an XSL that would do this
quoting job correctly?

See recent discussion. XSLT is tuned much more for operating on nodes as
units than munging the contents of nodes. You can do it, but it isn't
going to be very pretty.

Might be easier to do what you're doing now, then run the output through
an appropriate sed script.
 
A

Andy Dingley

yoni said:
Problem is that this output doesn't obey the (somewhat strange) rules
of American English. The proper way to quote those literals would be:

This is a sample "document." Some "words," from some reason, are tagged
with the "literal" tag.
Notice that periods and commas that follow a literal end up being
inside the quotes.

Then stop using this bizarre American (?) construction and use the
Queen's English. It's right, it's rational, it's defensible by citation
and it's much easier to implement.

http://en.wikipedia.org/wiki/Wikipedia:Manual_of_Style#Quotation_marks

"When punctuating quoted passages include the mark of punctuation
inside the quotation marks only if the sense of the mark of punctuation
is part of the quotation. This is the style used in Australia, New
Zealand, and Britain, for example."
 
P

Peter Flynn

yoni said:
Hi,

Consider the following XML document:

<article>
This is a sample <literal>document</literal>.
Some <literal>words</literal>, from some reason, are tagged with the
<literal>literal</literal> tag.
</article>

An excellent, if possibly unintentional, example of tag abuse.

<literal> (at least in DocBook) is for identifying strings which must
be used character-for-character as they are shown, despite the possible
ambiguit of the surrounding text. They are normally displayed in a
monospace (typewriter) font to ensure that they are visually distinct
and that there is no l/1/I or 0/O misinterpretation.

I'm using the following XSL to output html, where each literal is
surrounded by double quotes:

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template match="/">
<html>
<xsl:apply-templates/>
</html>
</xsl:template>

<xsl:template match="literal">
<xsl:text>&quot;</xsl:text>
<xsl:apply-templates/>
<xsl:text>&quot;</xsl:text>
</xsl:template>

Except that that will output typewriter-style unidirectional quotes.
Use ‘ and ’ if you want typographic ("curly") quotes.
</xsl:stylesheet>

The result is the following:
<html>
This is a sample "document". Some "words", from some reason, are tagged
with the "literal" tag.
</html>

Problem is that this output doesn't obey the (somewhat strange) rules
of American English. The proper way to quote those literals would be:

This is a sample "document." Some "words," from some reason, are tagged
with the "literal" tag.

Yet another good reason why the MLA is wrong :) It's incredible
that supposedly intelligent people can continue to peddle this
canard year after year. Sadly, there seems to be no way to stop
them.
Notice that periods and commas that follow a literal end up being
inside the quotes.

Which is why using the right markup is actually quite important.
My question is: Can anyone help me writing an XSL that would do this
quoting job correctly?

Ewww. Gag. Spit. :) Sure, no problem...

<?xml version="1.0" encoding="iso-8859-1" ?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template match="/">
<html>
<xsl:apply-templates/>
</html>
</xsl:template>

<xsl:template match="literal">
<xsl:text>‘</xsl:text>
<xsl:apply-templates/>
<xsl:choose>
<!-- add more tests for other punctuation in the first parens -->
<xsl:when
test="(substring(following-sibling::text()[1],1,1)='.' or
substring(following-sibling::text()[1],1,1)=',') and
generate-id(following-sibling::node()[1])=
generate-id(following-sibling::text()[1])">
<xsl:value-of
select="substring(following-sibling::text()[1],1,1)"/>
<xsl:text>’</xsl:text>
<xsl:apply-templates
select="following-sibling::text()[1]" mode="mla"/>
</xsl:when>
<xsl:eek:therwise>
<xsl:text>’</xsl:text>
</xsl:eek:therwise>
</xsl:choose>
</xsl:template>

<!-- duplicate the test for any additional punctuation here also -->
<xsl:template
match="text()[substring(.,1,1)='.' or substring(.,1,1)=',']
[name(preceding-sibling::node()[1])='literal']"/>

<xsl:template match="text()" mode="mla">
<xsl:value-of select="substring(.,2)"/>
</xsl:template>

</xsl:stylesheet>

///Peter
 
J

Joe Kesselman

Peter said:
Ewww. Gag. Spit. :) Sure, no problem...

Good job of typing while holding your nose... <smile/>

Of course this solution doesn't handle elipses properly. I suppose
another case could be added to handle that; "it's just a Simple Matter
Of Programming".
 
Y

yoni

Thanks for the solution, Peter. It works great!
An excellent, if possibly unintentional, example of tag abuse.

<literal> (at least in DocBook) is for identifying strings which must
be used character-for-character as they are shown, despite the possible
ambiguit of the surrounding text. They are normally displayed in a
monospace (typewriter) font to ensure that they are visually distinct
and that there is no l/1/I or 0/O misinterpretation.

DocBook (again) has the <wordasword> element type, which is probably
closer to what your example intends.

My example document is probably not the best. The article documents
that I'm working on are used for an on-line help for a web application.
The words that are marked as literals would actually be UI elements
such as button names and screen names.
Initially, we had them displayed in bold, later it was decided to use
quotes instead, and that's how I came upon this problem.

Thanks again,

Yoni
 
J

Joe Kesselman

yoni said:
Thanks for the solution, Peter. It works great!




My example document is probably not the best. The article documents
that I'm working on are used for an on-line help for a web application.
The words that are marked as literals would actually be UI elements
such as button names and screen names.
Initially, we had them displayed in bold, later it was decided to use
quotes instead, and that's how I came upon this problem.

For that application, I'd definitely say "Ignore the rules, or go back
to bold -- or, better, actually render the buttons as buttons."
 
Y

yoni

Ignoring the rules is not an option in this case. Mainly because the
tech writer will kill me, claiming that people will think that she
doesn't know the rules...
Personally, I thought the bolding looked good, but our graphic designer
claims that it looks to much like the headers.
But hey, the code that Peter sent worked great. And I also learned a
couple of new XSL tricks.
 
P

Peter Flynn

Joe said:
Good job of typing while holding your nose... <smile/>

A double brandy solved the problem.
Of course this solution doesn't handle elipses properly. I suppose
another case could be added to handle that; "it's just a Simple Matter
Of Programming".

Hah! Depends whether the user has inserted a UTF-8 horizontal ellipsis
character (quâ character or &hellip; or …) or if they've
just typed dot dot dot into a dumb editor. And I'm not sure if even the
MLA require "word..." rather than "word"..., although I wouldn't put it
past them.

///Peter
 
P

Peter Flynn

yoni said:
Thanks for the solution, Peter. It works great!


My example document is probably not the best. The article documents
that I'm working on are used for an on-line help for a web application.

All the more reason to use markup designed for the purpose.
The words that are marked as literals would actually be UI elements
such as button names and screen names.

What vocabulary are you using? I'd be interested to know if there was
a specific reason you didn't pick DocBook for the job.
Initially, we had them displayed in bold, later it was decided to use
quotes instead, and that's how I came upon this problem.

Interesting...I wonder if the MLA insist on the period after a bold
keyword being a bold period? :)

///Peter
 
P

Peter Flynn

Joe said:
For that application, I'd definitely say "Ignore the rules, or go back
to bold -- or, better, actually render the buttons as buttons."

I did exactly that for a book once. Spent a while doing a neat LaTeX
macro so that DocBook <keycap>x</keycap> became \keycap{x} and made
a little curved-sided key, shaded in light and dark grey, with an
italic x on top just where the Mac keyboard had it. Publisher didn't
believe it was automated and asked me to confirm that I had got the
"copyright" agreement over the "image" I had used :) And then the
printer screwed it all up by over-inking on poor paper so they all
came out as muddy blotches :)

*Never* trust a publisher.

///Peter
 
P

Peter Flynn

yoni said:
Ignoring the rules is not an option in this case. Mainly because the
tech writer will kill me, claiming that people will think that she
doesn't know the rules...

Yes, that's a big problem. Maybe ask her if you can jointly write a
disclaimer for the foreword, explaining why the MLA is wrong :)
Personally, I thought the bolding looked good, but our graphic designer
claims that it looks to much like the headers.

Medium sans can be good for this, if the body copy is in a conventional
serif typeface.

///Peter
 
Y

yoni

What vocabulary are you using? I'd be interested to know if there was
a specific reason you didn't pick DocBook for the job.

I actually did not know of DocBook before. I had to build an on-line
help framework for a new web application, so I figured that XML will be
a good choice. I have 3 types of documents: 'glossary', 'FAQ' and 'help
articles'. The first 2 are quite straight forward. Glossary being a
list of terms, each with a name and a definition, and FAQ being a list
of questions and answers.
For the articles, the vocabulary is based on HTML documents that I got
from the tech writer. Looking at the class names she was using and
comparing it to DocBook, I do see some resemblance. I suspect that the
previous company she was working for did have their schema based on
DocBook.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,042
Latest member
icassiem

Latest Threads

Top