Text Encoding Initiative / Feature Requests / #356 @rend from data.word to text

Martin Holmes - 2012-04-23

Another argument in favour of this is that the Google algorithm which generates TEI code from Google Books currently puts CSS code in @rend -- here's an example:


<pb/>
<hi rend="font-weight: bold">THE</hi>

SHILLING ENTERTAINING LIBRAEY.

EDITED BY J. S. LAURIE.

<hi rend="font-weight: bold">GULLIVER'S TRAVELS.</hi>

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

James Cummings - 2012-04-27

I would be one of those arguing against this change. (And martin: Is arguing from the existence of poor programming really a good argument? Surely we should be pointing out this to google?)

During the discussion I summarised some of my disagreements at
http://blogs.oucs.ox.ac.uk/jamesc/2012/03/26/more-about-rend/
and earlier at
http://blogs.oucs.ox.ac.uk/jamesc/2011/12/01/rend-and-the-war-on-text-bearing-attributes/

I'm probably not the right person to see clearly on this matter since I'm fairly convinced of the exact opposite and believe strong datatyping to generally be a good thing. So I'll try not to argue more than my devil's advocate position. :-)

-James

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Sebastian Rahtz - 2012-05-07

The discussion on TEI-L struck me as one of those finely-balanced ones which never really reach a conclusion, and where it is hard to recall the arguments without starting them all over again. Going ahead with a change when there is no consensus seems wrong, a priori. Just changing the datatype to avoid the arguments about whether or not it can contain CSS will make @rend even more non-interoperable than it already is. IMHO. I confess, tho, I am sittting quite close to the fence.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

BODARD Gabriel - 2012-05-08

I'm in favour of keeping @rend as data.word, and being more explicit in the guidelines about what this means and how it should be used. In fact I argued recently on the TEI Council list that we should provide a recommended (but not closed) list of values for @rend globally, to try to encourage interoperability around a single value (e.g. rend="italic") as against all the millions of possible variants that occur in the wild ("i", "I", "italics", "it", "ital", "font-style:italic", etc.)

If people are using CSS to describe renditional features in a document, I would suggest (in decreasing order of preference):

1. they use @rendition and declare a class, although I can see this is cumbersome for unique values;

2. we propose the creation of a new attribute @rend-css for the purpose;

3. we allow the use of a function such as @rend="css(font-style:italic)" [I'm pretty sure I didn't just invent this--someone must have proposed this on TEI-L many years ago!]

4. we tell people just not to do this. (And presumably they ignore us, their CSS validates because RNG doesn't know it isn't multiple tokens, and the unsatisfactory status quo is preserved.)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Kevin Hawkins - 2012-05-08

For the record, Google is using CSS per the recommendation in the Best Practices for TEI in Libraries, which Peter Gorman and than I are guilty of encouraging Google to use. The request to Google to generate TEI originated from Google's library partners, so Google probably thought agreed to follow the BP for this reason.)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Lou Burnard - 2012-05-09

I think this makes good sense for the reasons John gives. The opposite view (that we should instead be constraining the values of @rend, or suggesting recommended values) also has merit, but adopting it would make invalid or "unrecommended" huge numbers of existing documents. Furthermore, the two opposed positions are not entirely the same in terms of their potential negative effects. If people do wish to constrain their @rend values, they can do so by a simple ODD with a vallist and remain conformant; if on the other hand they wished to unconstrain values constrained by the Guidelines this would not be the case. John's suggestion that we need to specify how to process @rend values in the <encodingDesc> is also a good one.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

James Cummings - 2012-05-10

I think we're setting up a false dichotomy between 'allow any text' and 'constrain the values'. Currently the Guidelines *do* make a recommendation on this, and that recommendation is that these are discrete space-separated tokens. That some people are abusing this to add free text is in no way a logical justification for throwing out the existing recommendations.

Yes, people can constrain their @rend values, and I hope they do. If a separate feature request goes in for proposing values that the TEI should suggest for @rend, then that might be helpful in convincing people to standardise (but, I accept, is so hard to do because of the nature of what @rend records),

The suggestion to have a mechanism to describe how you are using @rend is, however, generally a good one. (similar to the mechanism for documenting how one is using private URIs). However, if one is already going through that amount of trouble, it seems ludicrous not to use @rendition since it exists specifically for this purpose. Moreover, I still think the proper place to define your list attribute values and what they mean is in the ODD with a valList.

I like the flexibility given by @rend and believe that the order of these tokens is immaterial. rend="big bold beautiful" should be understood to be the same as rend="bold beautiful big", IMHO. However, I'm willing to concede that we do not currently specify that.

I do not in any way buy the argument that @rendition is difficult to use. Encoders can do rendition="#big #bold #beautiful" just as easy. There is no need to immediately define a <rendition> for each of these (the file still validates), and distinct rendition statements could be easily automatically generated for later processing. (However, the same is true for conversion from @rend to @rendition, which is another reason why white-space separation is so meaningful).

Looking at John's arguments here:
1) That it is backwards compatible is not an argument for loosening this up
2) I still don't think we should allow inline CSS, it certainly shouldn't be encouraged in @rend because this is expressly for magic tokens understood by the user. If people want to use CSS that is what @rendition is for, and that using that is so very difficult is a great exaggeration.
3) It may be useful to allow inline and referenced styles. The way the TEI does this is by having a pointer system to <rendition> and a magic token system where your processing contains the logic as to what is meant by 'bold'. To have a third way just seems unnecessary to me. To suggest that this inline style _must_ be CSS really is looking towards only certain types of output. If one allows other schemes, then one must document them and we already have a mechanism for that, called <rendition>.
4) The current datatype *does* discourage users from using CSS in @rend, because it is a bad practice and using the much more rigorous @rendition mechanism is a much better idea.
5) You believe it would be wise for the TEI to support this standard... and look, it *does*... that is what <rendition> is for and the way the TEI has chosen to support it. We expressly decided that we did not want free text attribute values, and <rendition> is a consequence of that. If we allow free text in @rend, then why not in @type? And if there why not everywhere else?
6) Some people not only believe the change is unnecessary but I believe that I and other argued that it was harmful and dangerous. I think it undoes lots of the good work we did in creating attribute value datatypes in the construction of P5. I feel it weakens the standard and very very significantly hampers interoperability and interchange by legitimising people doing things that are bad practice. I really think this would be a retrograde step and so far haven't seen any convincing arguments for why it is such a good idea.
7) As I said above, I wouldn't object to a way to document a scheme by which @rend values were given but do think that as soon as you are going to that much trouble either all you need is a paragraph somewhere in <encodingDesc> or to use @rendition properly.

Some of my own arguments and answers might be a bit flippant and not entirely rigorous in themselves, for which I apologise, no disrespect is intended.

Again, these are just my person opinions and I'd abide by whatever council decided, but the first customisation I would make (and I think most projects *SHOULD* make) is to constrain @rend to be a tight set of agreed values.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Rebecca Welzenbach - 2012-07-01

I am also troubled by the implications of this change.

I either don't understand or don't buy the argument that this change is backwards compatible. That multiple occurrences of data.word are also valid as data.text seems like just the reverse of the problem we have now. Currently, white space used in a phrase as the value of @rend is syntactically valid, because each word in the phrase is assumed to be a single token. As James has written elsewhere:

"if someone marks up a text using:

<hi rend=”It looks a bit like that other one”>text</hi>

This actually has 8 tokens “It”, “looks”, “a”, “bit”, “like”, “that”, “other”, “one”. The point is that the whitespace between these words in the attribute make these each separate values or tokens, not a phrase."

If we changed the datatype to data.text, don't we get just the opposite problem? <hi rend="orange tall bold"> will now be interpreted as one token instead of three? That is, by changing to data.text, we make valid and meaningful something that used to be @rend abuse. But don't we also render meaningless (if still syntactically valid) something that used to be the proper use of @rend?

The use of CSS inline also makes me a bit anxious because I have a hard time believing that it will really be used to capture what a source document looks like, and now how the encoder wants it to be rendered online. Using CSS to capture what is on a page seems convenient but somehow abusive. I think CSS belongs between the marked up text and the browser, not between the page and the marked up text. I'm not sure we want to encourage its use on @rend.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Martin Holmes - 2012-07-02

I think there is an ideological divide here, between those who believe (as I do) that CSS is a fantastic way to to describe the appearance and placement of text in a source document (especially a printed document), and those (like Becky) who think the opposite. Then there is a similar ideological divide between those who believe (like me) that the pragmatic acceptance of what is already common usage in the community (CSS in @rend attributes) is nothing but helpful, and those who think (like James) that it's a betrayal of all those who dutifully constrained their @rends all these years to space-delimited tokens.

I don't see any of us really changing our minds on these two issues, because they are really ideological positions. So I think the only satisfactory response is likely to be a compromise whereby @rend is supplemented with a new attribute (why not @style?) which is specifically intended for CSS code.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

John A. Walsh - 2012-07-02

I support Martin's compromise, although I still think it would be unfortunate to introduce a new attribute when the existing @rend, with a modified datatype, could easily support both use cases: tokens and CSS.

Supporting both tokens and CSS need not introduce ambiguity of the sort Rebecca describes. We still have the venerable <encodingDesc>. It would be easy enough to avoid any ambiguity by inserting one of the following in one's documents:

<encodingDesc>
In the this document, the <att>rend</att> attribute contains a whitespace separated list of token(s). Each token refers to a different renditional feature of the element.
</encodingDesc>

-or-

<encodingDesc>
In the this document, the <att>rend</att> attribute contains CSS code that describes the renditional features of the element.
</encodingDesc>

But if we can't get past the ideological divide summarized by Martin, I would endorse his proposed compromise.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

BODARD Gabriel - 2012-07-02

I prefer Martin's compromise to John's, because while a statement in the encodingDesc is nice and human-readable, it does nothing to help someone trying to process a large number of TEI files, and not being sure what to do with a mix of (a) space-delimited tokens, (b) CSS and (c) free-text in texts from different provenances.

I don't think I really disagree ideologically with Martin on either of his two axes of opinion. Sure CSS can be a useful way to describe existing text (although I don't do it myself), and we should indeed offer a way to do this. (Although we *can* already do this with @rendition, I'm generally with those who find inline tags easier than pointing to controlled lists in the header.) I don't see a resolution on the other axis, however, because either of the original proposed solutions (changing the datatype or leaving everything as it is) breaks somebody's usage. If we have to decide for one or the other, I think it's only fair that those who are using @rend correctly (by which I mean *as described* currently) continue to be able to do so, and those who have to date been abusing it should have the opportunity to make their XML compliant TEI again by changing to a new attribute.

(If someone came up with evidence that 90% of people abuse this attribute and only 10% use it correctly, I suppose that would be an argument for the new attribute to be the space-delimited tokens datatype, and @rend change to allowing [only?] CSS. It wouldn't make me very happy, but I might not argue very hard.)

In short: +1 to create a new attribute, e.g. @style.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Sebastian Rahtz - 2012-07-08

In a recent conversation on TEI Council, it was said that some people may be doing things like this:

 hello <hi rend="handwritten">everybody</hi>

which seems horribly plausible. If so, I believe this nails the lid on the coffin of the current proposal. Only a new @style attribute can unambiguously deal with the situation. I cannot
imagine anyone will countenance a rule which says "if there is a colon in the rend value it must be CSS"?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

BODARD Gabriel - 2012-07-30

Why not make @rendition the element which can contain *either* a pointer to a description of CSS in the header, or pure CSS in the attribute? That would seem to me both far less confusing than allowing a choice of tokens and css in @rend (a pointer always beginds '#', right?), and more semantically consistent, since rend is then for tokens and rendition for css (whether direct or pointed to).

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Kevin Hawkins - 2012-07-30

Gabby's suggestion (the previous comment on this ticket) would require that we change the data type of @rendition from data.pointer to something else. Wouldn't this prevent someone from checking the content of @rendition for well-formedness as a URI? I'm not thrilled about this and am therefore more interested in using an attribute that's neither @rend nor @rendition for CSS info (whether it's @html:style or a new @tei:style).

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Sebastian Rahtz - 2012-07-30

I am not very keen on a single attribute allowing the two datatypes, especially in this case, where a mallformed URL (gtp:/foo.bar) wil always be accepted as it woud match the alternative

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

BODARD Gabriel - 2012-07-30

I think I really agree with Kevin and Sebastian and am being a bit Devil's-advocatory here, but wouldn't it be much easier to choose which datatype you're using based on a flag in encodingDesc (as was suggested below for distinguishing between the two types of rend), without the problem that people are going to want to use @rendition-pointer and @rendition-css in the same file?

In any case, if this is unacceptable, then @rend-token/@rend-css ambiguity is even more unacceptable, right?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Sebastian Rahtz - 2012-07-30

a binary switch in the header precludes you from using both inline and header CSS, which seems perverse.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

BODARD Gabriel - 2012-07-30

Perhaps, although I had assumed that people want to use inline CSS because the hassle of using the header was too much; if you're going to use the header at all, why not use it always?

But in any case, not as perverse as having to choose between CSS and tokens in @rend, as you pointed out on 2012-07-08. :-)

I think the choice comes down to one between (a) banning the use of CSS in attributes altogether [because it's a lax way of doing what @rendition is designed for, and (b) a new @style attribute. Is anyone other than me and James willing to support (a)?

If not, is anyone opposed to (b)?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Martin Holmes - 2012-07-30

I'm in favour of b), and would not support a).

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Martin Holmes - 2012-07-30

To answer Gabby's question "why not use the header", the reality is that in any large document, there are hundreds of idiosyncratic styles that you need to encode, many of which are only used in one location in the document, so creating a separate <rendition> element for each is simply painful, and editing the resulting document becomes more awkward because the style is removed from the locus of its application.

In actual encoding, I would usually expect encoders to describe the style in situ, and then later, when the encoding is complete, I might use XSLT to find instances of duplicated styles rulesets, and remove those to the header to reduce the document size, while leaving unique rulesets in situ for easier editing later.

In HTML, it's common to use both a stylesheet and inline styles together, for convenience, and this is no different.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

James Cummings - 2012-08-09

For the record, my preference is for 'a' as people might suspect. 'b' is a compromise that I _might_ be willing to accept but it really makes me feel unwell.

Imagine someone asking on TEI-L what the difference between @rend, @rendition and @proposed:style were?

In each case these are 3 different ways to record the rendition of the original. @rend gives you a magic token way to record a classification that is easily constrained in your customisation for consistency. @rendition gives you a way to point to the header to define this rendition in more detail (in whatever formal language). The @proposed:style allows anything in it from magic tokens to CSS. When should they use one of these over the other? It is fairly clear when to use @rend (a quick approximate inline categorisation) and @rendition (no, we really care about being more precise so are going to measure things and do it properly). It seems less clear, to me, when to use style. If you are going to the effort to care that much about the original rendition to measure it, then recording it in the header is hardly a stumbling block and much better practice that we should encourage. If you would put @style="font-weight:bold;" then I would argue that this is bad practice. Either @rend="bold" is enough or pointing to the header is easy enough since you really care about the degree of boldness (which is something CSS is quite bad at recording!). If people really want a way to add one specific language (CSS) inline, then I think they should be defining their own attribute @my:style and then canonicalising this back to @rendition when they are done.

Another thing that bothers me about this is that a lot of new people will look at this, assume it is like @html:style, and that it is to be used for *output* stylistic information. I hope that we can agree that recording output stylistic information in a TEI file is nonsensical. (Since there may be hundreds of different outputs.) (Note: I don't feel the same about recording multiple potential output possibilities in a processing stylesheet pointed to be <equiv> in the ODD for any particular document instance. That could be done sensibly.)

I'll repeat that if I was quickly encoding documents and wanted inline stylistic information I would use the magic token system the TEI already provides and mark things as 'left1cm' and 'redNumber2', and rationalize these at the end to become <rendition> statements.

This will definitely be discussed at our next face2face meeting, where i'll try to keep an open mind. ;-)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

James Cummings - 2012-08-09

labels: 1223994 --> TEI: New or Changed Class

milestone: --> 871207
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

BODARD Gabriel - 2012-08-09

To clarify, my understanding is that the proposed triumvirate of attributes would be clearly distinct in their definitions:

(a) @rend is for magic token(s) *only* (as currently defined)

(b) @style is for inline css *only* (as html:style)

(c) @rendition is for pointer(s) to defined styles, css or otherwise, in the teiHeader (as currently defined)

My feeling, like James's, is that (b) is not entirely necessary given the existence of (c) for formal CSS declarations plus the fact that I suspect 90% of the CSS usage that people want can be expressed as magic tokens pretty unambiguously ('bold' = 'font-weight:bold').

*But* there is clearly a strong contingent of users who would like to be able validly to use CSS in-line, and if they are not to be persuaded to desist (as it seems they are not), then if the choice is between (ab)using @rend for this purpose or introducing a new attribute in which they can do so cleanly, then I'm for the new attribute. Att.global is already full of attributes that I don't understand and/or for which I have no use, so I'm not terribly bothered by it anyway.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Sebastian Rahtz - 2012-08-09

I am with Gabby. I think James is being a bit too purist (dare I say "old skool TEI") in thinking everyone must use @rendition and have no local overrides. People will abuse ALL the mechanisms for output formatting, but @style would not make that any worse. Some renditions are well expressed in CSS, and are one-off, that's life.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

John A. Walsh - 2012-08-09

It seems we have two camps here, the Champions of CSS, or CoCCS (pronounced, "cocks"), and the Defenders of Antiquated Token Systems, or DoATS (rhymes with "boats"). I am, of course, one of the CoCCS.

Like Gabby and Sebastian, I support the compromise @style. I would strongly recommend that it *not* be defined as containing inline css *only*. HTML's @style is (wisely, I think) not restricted to CSS: "This attribute specifies style information for the current element." And HTML has a mechanism for "setting the default style sheet language," e.g., <META http-equiv="Content-Style-Type" content="text/css">. But the style language does not have to be CSS. In practice, it always is, but the HTML spec wisely does not lock the attribute into a single style language. When a new style language comes along, HTML does not have to create yet another style-related attribute (as TEI would be doing with a new @style). Instead, one simply needs to declare an alternative default style language.

We have a perfectly good @rend attribute that could be used similarly and support tokens *or* CSS (certainly not both within the same document). I believe it would be much preferable to define a system to declare the default values of @rend and let people use it as they will, which is what they do now. Council would be improving the current situation tremendously by providing a mechanism to define exactly what is in @rend. Such a mechanism could indicate unambiguously the style language being used in @rend. It could be something like the <taxonomy> element that includes a <bibl> that could include a link to the CSS standard or to a document that defines tokens that are used, e.g., 'bold' = 'font-weight:bold'.

*But* there is clearly a strong contingent of users who would like to restrict @rend values to non-standard tokens, and if they are not to be persuaded to
desist (as it seems they are not), then if the choice is between (ab)using
@rend for CSS or introducing a new attribute which supports CSS (and potentially/hopefully other style languages), then I'm for the new @style attribute. But please keep @style flexible, and don't lock it into CSS. It should be an inline equivalent for @rendition, which is not restricted to CSS.

As for the need for inline style information, consider this use case: A modernist text composed in collage fashion with letters cut from magazines and newspapers. Every character on the page has a distinct font family, font size, style, etc. The style of each character is unique in the document. It seems entirely reasonable that one would want to use a <c> element for each character and an inline @rend (or if we must, @style) to define the rendition of each character. The @rendition pointing system works great for *classes* of styles that are reused. It doesn't make much sense for unique renditions such as we find in this use case.

Thanks again to Council for their thoughtful consideration of this issue.

John

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

@rend from data.word to text

TEI produces the TEI Guidelines and associated software

Group

Searches

Help

#356 @rend from data.word to text

Discussion