OddThinking

A blog for odd things and odd thoughts.

Marking Up Sections and Headings

In a typical document, there is a hierarchy of sections. Each section contains a heading, which is displayed more prominently than the section text.

In this article, I describe three separate issues with the handling of sections and headings in modern typesetting tools. I propose a dull but important change to markup languages to deal with these issues. This proposal is not particularly innovative, but just better engineered.

Then I mention a new problem that I only just noticed, and propose an innovative new feature that would be way cool.

Issue 1: Section Motility

During the editing of a document, the section hierarchy can be quite fluid. Sections can be moved around the document, or promoted or demoted in place.

Microsoft Word makes moving these sections relatively easy (especially if you configure a custom toolbar to always display the “Promote”, “Demote” and “Demote To Body Text” icons – they normally only appear in Outline mode, but are universally useful.)

However, the common markup languages do not make this easy at all: HTML, LaTeX and MarkDown all require you to visit every heading in the text being moved, and correct the heading level.

For example, if I decide to demote a large section of text in HTML, I would need to change all the <H1> tags to <H2>, but first change all the <H2> tags to <H3> etc. In LaTeX, I would need to change all the \section{} tags to \subsection{}, etc.

Compare that to bulleted lists, where it is possible to move list items around without renumbering everything. The browser can work out the appropriate bullet level from context.

Issue 2: Arbitrary Limits

I once wasted several days’ development time in producing code that marked up a stack-dump in LaTeX tables. It worked fine in the unit tests, but the first time that I tried a “substantial” program (for small values of substantial: it calculated the oddness of 2), LaTeX complained that I had blown its limit of sub-sub-sub-tables, and suggested that I modify its source code to increase the arbitrary limit and recompile it. I declined, and re-wrote the stack dump into raw PostScript. Having arbitrary limits in your software should be avoided; it just means you annoy people at arbitrary times.

I had a problem with tables, but the same applies to sections too: In HTML and LaTeX, there are arbitrary limits to the depth of the section hierarchy. HTML has <H1> to <H6>. LaTeX runs out of puff after subsubsections. Both of these are perfectly sufficient for 99% of documents, but why introduce any limit at all?

Issue 3: Body Text/Heading Associations

One method to emphasise the hierarchical nature of a document is to modify the style of the text that appears in a section. Here are some examples: deeply nested sections might be indented further to the right; the first word of a chapter might use drop-caps.

This is especially important if you want to be able to be able to add more text to the parent section beneath the text of the sub-section.

None of Word, HTML, LaTeX and MarkDown offers the ability to do this easily.

A Dull But Important Proposal

I would propose a markup like so:

 <SUBSECTION TITLE="Vertebrates">
    <SUBSECTION TITLE="Mammals">
        Mammals are warm-blooded and suckle their young. 
        They are often cute and cuddly.
    </SUBSECTION>
    <SUBSECTION TITLE="Reptiles">
        Reptiles are slimy and icky. Ewww! Don’t cuddle them.
    </SUBSECTION>
    There are other types of vertebrates, too, 
    but I don’t want to mention them here.
 </SUBSECTION>

This is a better model than the existing systems, because it fixes all the above issues:

  • Sections have greater modifiability and motility. This entire section could be placed under a section called “Animals”. Then that section could be put under a section called “Essay On Animals”. Then that section could be put into a chapter on “Selected Essays”. None of these would require any modification to the markup of the original text.

  • There are no arbitrary limitations to the depth.

  • Text can be associated with appropriate section, and formatted appropriately. Text, like “There are other types of verterbrates[…]” can be formatted distinctly from “Reptiles are slimy[…]” to make it clear than you have popped up a level.

XML-lawyers may require me to move the TITLE field out to a separate <TITLE> tag because of the sub-tags that are permitted in the heading. I can live with that.

Issue 4: Heading 1 is too big.

Like most users, I don’t want a new style-sheet for every document I write or every article on this blog. I define the rules once, and then use them over and over again.

The hierarchy is defined once, for the largest possible document, and then re-used. So the top-level heading is huge – it is suitable for chapter headings, and (in printed documents) will often start a new page. The second-level heading is very large, to represent subchapters. The third-level heading is a more comfortable size. The forth level heading is generally not much more than bolding.

When I write a small document, this means the top-level heading dominates the page. I tend to skip straight to the second or third heading. For example, the sections of this article start at <H3>.

Does that sound like the wrong semantics to you? It does to me.

An Innovative but Important Proposal

Rather than me lying to the computer to ensure that the formatting matches the importance, the importance should be inferred from the markup that I have done.

Somewhere in the typesetting (whether it is the web-browser, WordPress, Word, whatever), the typesetter should count the depth of the hierarchy, and build up from there.

If I only have one level of sections, then the top-level heading should merely be bolded.

If I have two level of sections, then the top-level heading should be in 16-point. The second-level heading should merely be bolded.

If I have five levels of sections, then the top-level heading should start a new page, and be in 28-point.

I think this is an important improvement to the notion of stylesheets. Is important enough for me to write a demonstration prototype plugin for WordPress? That seems rather unlikely.


Comments

  1. Your dull-but-important proposal was implemented in DocBook. DocBook originally had <sect1><sect5> elements, and they introduced the <section> element for much the same reasons as you cite in issue 1. It becomes much more of a problem when you have the document hierarchy split across multiple files, multiple authors, etc.

    Issue 3 sounds like a presentation issue, unlike the structural markup issues 1 & 2. Also is it not the case that CSS solves this?

    I agree that issue 4 is a problem (again a presentation one) that needs to be solved. I guess the issue can be generalised to the scalability of a stylesheet – the ability to apply appropriate formatting for short documents and to long ones.

    Let me see if I understand your innovative-but-important proposal. At the moment on girtby.net I just create an h4, knowing that this is sufficiently deep in the hierarchy to not conflict with the styles of the rest of the blog. It would be better to just say “start a new subsection here” without worrying about where you are in the hierarchy. I’m not sure how this could work as a plugin, but I certainly support your endeavour.

  2. Your dull-but-important proposal was implemented in DocBook.

    Excellent! Now, all I need is a DocBook-markup-language-to-WordPress plugin!

    Issue 3 sounds like a presentation issue, unlike the structural markup issues 1 & 2. Also is it not the case that CSS solves this?

    None of Word, HTML nor LaTeX offer such a concept (Heading + Section, tied together) by default. It would be necessary to mark the respective text with a home-made style or SPAN tag. (So, yes, it is solvable by CSS, but not with the out-of-the-box tags that most documents use.)

    At the moment on girtby.net I just create an h4, knowing that this is sufficiently deep in the hierarchy to not conflict with the styles of the rest of the blog.

    There is a confounding issue here that may be confusing you. Let me clear it up, and see if it helps.

    Your WordPress template, provides a framework for your article. It reserves H1 for the title of your blog, and H2 for the title of your article. So the biggest heading tag you should use in your article is probably H3, so it doesn’t conflict with the rest of your blog template. This isn’t the issue I am talking about.

    If you are like me, you might skip H3, and go straight to H4 for small articles, so the section headers don’t look so large as to be out of place. In this blog, I often use the section headers to help me keep my arguments linear (and hence intelligible), rather than to help people find the section that they are looking for in a large document, so I often make them relatively small, and I am tempted to make them even smaller. It is this issue I am talking about.

    (Just to confound items further, the custom somethinkodd template is different to the girtby.net one. I only use H1, not H2, in the template, but I am not proud of that fact – it means some of the semantic structure is hidden to automated tools. I hope to fix it one day. So for me, the largest legal tag is H2, and I often jump to H3.)

    STOP PRESS: Since writing this comment, I have changed the stylesheet to correct it. Now all posts should start at H3, so this article is more confusing.

    I was thinking of a WordPress plugin filter that took the whole article, with the new markup, counted the depth, subtract that from 7 (to get a value between 2 and 6), and instrument my text with the corresponding H2..H6 tag and custom SPAN tag.

  3. Note though that your proposal makes it somewhat harder to use stylesheets, and particularly makes it quite a bit harder to use stylesheets accurately for particular components in aggregate documents.

    I suppose you could wrap the whole subdocument in a <div id="foo"> and then use that ID to anchor your stylesheet… hmm.

  4. Aristotle,

    Could you please elaborate? Which proposal are you warning me about?

    If the dull-but-important proposal was implemented in out-of-the-box HTML (with a subtag for TITLE, in accordance with the XML-lawyers demands), then your stylesheet might look like this:

     section { /* regular text, with Drop Caps* /}
     section section {/* lightly indented, no Drop Caps */)
     section section section {/* indented a bit more */}
     section title { /* H1 equivalent */}
     section section title { /* H2 equivalent */}
     section section section title { /* H3 equivalent */}
    

    If the innovative-but-important proposal was implemented in out-of-the-box HTML, without the first one, then negative numbers come into effect! Your stylesheet might look like this:

     H0:  {/* bold */}
     H-1: (/* bold, large */}
     H-2: {/* bold, small-caps, large */}
     H-3: {/* huge, blue */}
    

    I see now that if both the first and second were implemented in out-of-the-box HTML, it would be trickier. Sections and Headings need to be separated, because they scale in opposite directions.
    Perhaps something like:

     section .level1 { /* regular text, with Drop Caps* /}
     section .level2 {/* lightly indented, no Drop Caps */)
     section .level3 {/* indented a bit more */}
    
     section title .level0   {/* bold */}
     section title .level-1  {/* bigger */ }
     section title .level-2  {/* You get the idea by now */
    

    I don’t feel entirely comfortable with that one yet.

    If you are not implementing these with out-of-the-box HTML, but are using a plug-in for WordPress, then the generated code will be full of <div class="foo"> tags.

  5. A DocBook plugin probably wouldn’t be that hard. PHP already has an XSLT engine, and the XSL stylesheets for DocBook are available. The trick, or course, is getting a usable editing environment…

    I think Aristotle is talking about the recommendation in the HTML spec to use <div> elements to associate headings with the sections they refer to.

  6. That HTML recommendation is a nice find, Alastair.

    While I wait for the HTML design committee to invite me to present to them, I would have to implement the first proposal within the existing HTML framework. Using their example, I would search the HTML and replace every instance of subsection with section. I would search the CSS and replace every instance of DIV.subsection with DIV.section DIV.section.

    That’s a start, but it still leaves H1 and H2 stuck there, hard-coding the structure with magic numbers).

    I note with wry amusement that:

    Some people consider skipping heading levels to be bad practice. They accept H1 H2 H1 while they do not accept H1 H3 H1 since the heading level H2 is skipped.

    I like to think that these people would support my proposals, even as they discredit my current practice.

  7. I think the only real reason they recommend still using Hn is because it supports older browsers. In addition, you could use relative scaling to set everything up correctly. So it’d be something like this:

    <div class=”2levels”>
    <section> <title> Level 1 </title>
    <section> <title> Level 2 </title>
    </section>
    </section>
    </div>

    Sure, this means the formatter doesn’t figure it out, but it’s still not so bad. Ideally, you’d have separate stylesheets for nlevels anyway, because the semantics of what you’re typing are likely to be different.

  8. Is this close to what you’re talking about?

  9. [Note to self: I should have posted these proposals as separate articles. This is getting confusing.]

    Sunny,

    Thanks for the interesting comments.

    First, you describe a compromise for the second (innovative) proposal. You suggest a hand-coded tag per document to select the appropriate art of the style sheet. I guess I could live with that compromise for the short-term. It is only a small step to modify “2Levels” to “Article”, “Book”, “Memo” – or perhaps “Long Article” versus “Short Article”.

    Then you offer an implementation for the first (dull) proposal. (I have created a mirror for technical reasons related to the validator. I hope you don’t mind.)

    Yes, that is close… but not quite. You have produced an example of what the code should look like, but that code is not valid. It looks good in FireFox but both IE and Lynx, quite validly, ignore your invented tags. It also doesn’t let automated tools extract an outline (see bottom of the validator link for an example.)

    So the reason why they still recommend using Hn, is because that is what the standard says!

    There are a couple of solutions to this problem:

    • Elect me President of the Internet. XHTML 2.0 standard will then contain these tags within 3 months, or heads will roll.

    • Map the markup language you have demonstrated to the lower-level standards-compliant HTML. The implementation choices include defining a DTD plus a (preferably server-side) XSLT transformation, creating a WordPress plugin, or building a blog system based on DocBook.

  10. Julian, writing my thesis in Latex made me agree very strongly with your dull-but-important proposal to offer true nested sections. In practise, I got around this sometimes by using well-delineated formatting types such as lists, that I could exit from but continue a section. I support a structural markup of sections, but can’t work out how it should relate to DIVs.

    It seems you’ve already noticed that negative indexing of heading styles gets you the upwards-scalability that you want for your innovative-and-important proposal. Aristotle’s issue is non-trivial, though: the problem is that the inclusion of some more text (potentially beyond your control) could radically change the layout of the whole page. For instance, if I had the ability to use sections and headings within my comment, I could add three more sections and then all of the headings in your blog article would become much “larger”.

    In terms of syntax, I think that it is a better idea to separate sections from their titles. Latex embeds titles in sections. HTML does two things: the page title gets a TITLE tag, and headings take their embedded span as the title. I think that the TITLE approach is nicer, so you would get:

    <section><title>Vertebrates</title>…

    The CSS would be nice, TITLE could have its own attributes beyond the text, and so on.

  11. Casey writes:

    For instance, if I had the ability to use sections and headings within my comment, I could add three more sections and then all of the headings in your blog article would become much “larger”.

    You say that as though it is a bad thing!

    There’s the issue of malicious commenters, but that exists now, and the solution is that you can’t post Hn tags in a comment.

    On the other hand, if the additional levels were genuine – if the document did become deeper – than it makes sense that the headings have to grow in size to maintain the hierarchy.

    If the document has many levels and I want them to be distinguishable, then the top-level naturally has to be pretty large. If the document doesn’t have many levels, I don’t need the top-level to be so awkwardly large.

    I don’t need to be bothered with this level of typesetting; the computer can work it out. Yes, it is beyond my control, but I am happy to abdicate that control to the typesetter.

    <section><title>Vertebrates</title>…

    I think you, Sunny and I have all converged towards this TITLE subtag solution over the TITLE=”Vertebrates” field of the SECTION tag. Sunny used H (for Heading) instead of TITLE. Either way is fine by me.

  12. I used ‘h’ because “title” is already used in HTML. While my document doesn’t make for valid XHTML, I’d like to see if a simple change of the doctype to XML would do the trick…

  13. In the interests of completeness it should be noted that reST also suffers from issue 1 (although the “simplified pseudo-XML” they used to illustrate the effect of the markup doesn’t).

  14. Oops, strike that last comment, replace with:

    reSt takes an interesting approach. The level of a given section is inferred from the style of underlining used for the heading. Each newly-encountered underlining style is inferred to be a newly-defined level in the heirarchy. Interesting, and addresses issue 1 at least partially … ?

  15. I’ve just discovered that XHTML 2 will implement the dull-but-important proposal. Which is great news.

  16. Alastair,

    This is great news. The author, Steve Pemberton, even identifies the advantages as including:

    easier to cut and paste and keep your heading levels consistent

    which is the same as my Issue 1, and

    you are no longer restricted to 6 levels of header.

    which is the same as my issue 2.

    It seems I need not run for Internet President quite yet.

Leave a comment

You must be logged in to post a comment.

Web Mentions

  1. OddThinking » Happy Birthday OddThinking