Page MenuHomePhabricator

Wrap each wiki page section contents in a container
Open, LowPublic

Description

On a wiki page a "section" is a HTML heading element (H2, H3, etc) followed by text using any type of formatting, up to the next heading element. It generally consists of an "editsection" DIV if they are enabled, followed by a P element containing an anchor presumably for the TOC etc, then the content - which may contain subsections.

Now if each section was wrapped from start to finish in an HTML DIV it would make it much easier to implement such things as section-folding or giving various kinds of sections special colours. To do this now involves parsing the entire "bodyContents" and rebuilding it with such DIVs inserted - a slow and uncertain process.

Even better would be an outer DIV including the H tag and an inner DIV beginning after the H tag.

Another enhancement would be to auto-generate and ID for each section and subsection based on the contents of the owning H tag, perhaps including its parents in the case of subsections.

See also

Details

Reference
bz6104

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 9:15 PM
bzimport set Reference to bz6104.
bzimport added a subscriber: Unknown Object (MLST).

This isn't practical with current system; sections may split up
across table cells, etc.

I'm reopening just to ask if a simple solution can be implemented in the short
term which does not attempt to surmount the problem Brion mentions above.
Specifically, the English Wiktionary and probably all Wiktionaries shouldn't
cause that problem but could make immediate use of section CSS.

On other wikis such a temporary solution should of course not be enabled.

See also Bug 4741: Semantic HTML for section anchors

ayg wrote:

(In reply to comment #2)

I'm reopening just to ask if a simple solution can be implemented in the short
term which does not attempt to surmount the problem Brion mentions above.
Specifically, the English Wiktionary and probably all Wiktionaries shouldn't
cause that problem but could make immediate use of section CSS.

Some kind of check would still be needed to make sure this is possible.
doHeadings() just uses a bunch of regex passes; it's not aware of contextual
stuff like whether there are table cells or whatnot nearby. Then again, maybe
Tidy and/or Sanitizer would be clever enough to fix any resulting screwups
acceptably. I suppose that would depend upon the exact styles and so on that
people would try to give it. If we had a *proper* parser, of course, we could
presumably use a <tbody> instead of a div inside tables and properly nest it so
as to maintain validity, but that's not happening soon.

I have a proof of concept of this running at http://wiktionarydev.leuksman.com
(Hit random - it's not on the main page)

So far it's modified core code though it only touches one function in
Parser.php. It doesn't seem to be compatible the normal TOC so I've disabled it.

ayg wrote:

Copying a relevant post I was going to make to wikitech-l but decided not to because it was off-topic:

Done naively, this breaks XHTML validity if the header is wrapped in any tag at all, and it seems very difficult to fix it for perfectly reasonable, nontrivial, legal cases like

This is the Declaration of Independence.
<div class="cited-document">

Section 1

We don't like the British.

Section 2

Therefore we're not going to be your colony anymore.
</div>
This is a very compelling and historical document.

Observe that there, the trailing text is not intended to be part of Section 2 even though MediaWiki would consider it as such at present. Possibly you could construct some algorithm that would figure this out, but it's not particularly easy to do, especially if the tag structure is not as reasonable as this (use your imagination!). Are we going to start rewriting the document structure when an algorithm doesn't think it makes any sense, even if it's valid XHTML? Further issues arise with tables, where <div> wrappers are illegal and you have to hope that you can fit a <tbody> around what you want.

I think section wrappers *could* be extremely useful, but when you get down to it, their utility is limited even in the abstract by the fact that not all text is required to be part of any section, except in the technical sense. It would take some fairly drastic overhauling of how we look at and deal with sections for section wrappers to be practicable.

One approach to solving cases like that would be to simply parse each section independently of all the others, and run Tidy and so on on each section separately. That would make scenarios like the above impossible. This would be totally unacceptable for Wikipedia, but if what you say is true, it might be reasonable for the main namespace of Wiktionary. There might be a way of marking a section as not needing a section div for some reason, too (cf. bug 6575).

michael wrote:

That's a good example of the kind of problem that can make this a sticky issue to resolve.

But it also shows why it ought to be resolved and why it blocks bug 10467 (Use semantic XHTML). With the current wikitext parser, the example code infers an incorrect semantic interpretation for the document. The HTML specification says "A heading element briefly describes the topic of the section it introduces", so "Section 2" is a heading introducing both the second part of the constitution and the article copy after it. The author clearly did not intend this.

In (X)HTML 4, every heading implies a section which ends at the next heading of the same or higher level. This bug proposes making that exact hierarchy explicit. If we accept this, then I think there is a relatively simple solution.

The multi-section div element entered in wikitext explicitly creates a new section within the surrounding text (i.e. one level lower than the previous section heading). Any section headings within that section imply enclosed sections, so they should be bumped down a further level in the hierarchy, and the last one closed before the closing /div tag. Following sections should resume the normal flow until the end of the document.

So the sample wikitext above implies the following structure, which ought to be rendered in the page's XHTML. I've assumed the original had preceding and following sections, to show what could happen (they are unaffected).

== Preceding section ==
This is the Declaration of Independence

  === Editor-entered div/section === <div class=cited-document">

    ==== Section 1 ====
    We don't like the British.
    </div><!-- Section 1 ends -->

    ==== Section 2 ====
    Therefore we're not going to be your colony anymore.
    </div><!-- Section 2 ends: implied closure made explicit by the renderer -->

  </div><!-- editor-entered div/section closure  -->

This is a very compelling and historical document.
</div><!-- Preceding section ends -->

== Following section ==
American Revolutionary War follows.
</div><!-- Following section ends -->

Unfortunately, it is impossible to duplicate this structure explicitly in wikitext only, since there is no way to end a section before the next equal or higher section (as happens at the end of Section 2 here).

Questions:

  • Do the automatically-generated sections get a heading or not? If so, how is the text generated.
  • Can this be logically extended to cover nested divs? Or should the div hierarchy remain flat, with following div tags automatically close previous ones.
  • What happens if div tags are not balanced? Can authors enter only a closing </div> tag to end a subordinate section?

michael wrote:

Another option: such a div element could be considered mis-nested, and ignored by the wikitext renderer.

ayg wrote:

(In reply to comment #7)

In (X)HTML 4, every heading implies a section which ends at the next heading of
the same or higher level.

Not really. My above example is reason enough to discard that. Even if you add a heading for the whole Declaration, HTML provides no way to indicate that the ended <div> terminates the section. It only says that user agents should be able to construct a table of contents automatically, which they can, and in fact MediaWiki does exactly that. To use another counterexample, the final heading tag in the source of http://www.w3.org/ is the one entitled "Systems", yet it precedes the completely unrelated footer, which has no heading tag.

(Incidentally, the last draft of XHTML 2.0 that I looked at had some kind of tag to explicitly delimit sections, <section> or something.)

The multi-section div element entered in wikitext explicitly creates a new
section within the surrounding text (i.e. one level lower than the previous
section heading).

Sure, but what about this template-generated table?

== Widget sales for 2006 ==
<a href="...">edit this template</a>
MonthNumber

This has the same form as the div example, but its semantics are different. That is, the heading is cordoned off from the section by a parent element, but it does *not* logically cover only its following siblings (in this case only the <a> element), it covers the entire table, which includes cousin nodes and even parents. How do you plan to automatically differentiate these cases? You'd need explicit, user-entered section delimiters for this to work reliably.

I've got a basic version of this working in JavaScript here: http://en.wiktionary.org/wiki/User:Hippietrail/addstructure.js

It is designed for and tested only on the English Wiktionary so far but is not installed there for all users.

It may however be of interest to anybody following this feature request.

michael wrote:

See also Bug 16190: Relate section anchors to section headings in HTML, describing an alternative which may be simpler to implement and provides some benefits. Bug 4741: Use id's for section anchors instead of <a name=...> is similar to this one.

michael wrote:

If HTML were to be supported, then a better solution would be to use a <section> element.

See also bug 23932 - “Enable, whitelist, and incorporate semantic HTML5 elements: article, aside, figcaption, figure, footer, header, hgroup, mark, nav, section, time.”

  • Bug 61615 has been marked as a duplicate of this bug. ***
  • Bug 70198 has been marked as a duplicate of this bug. ***

Since we do some stuff in this area with mobile and parsoid/VE these days. I wonder, what if we do this only for H2's, is there any way we can measure how many pages we would break ?

Parsoid has done metrics on similar problems right ? Perhaps trough that route we could explore it ?

I do know that this:
<div class="cited-document">

Section 1

We don't like the British.

Section 2

Therefore we're not going to be your colony anymore.
</div>

is often used on user pages, so those would likely all break..

Have it activate for h1s and h2s unless they're embedded in something (another div that doesn't span the entire page, a table, etc), perhaps?

So each h1 and following content would get its own div, which would include the divs for h2s and their following content.

If you have a in-wikitext <div> around two h2s and content, this could either just ignore those, or put the first h2 div around both and just ignore the second h2... or perhaps put both h2+content divs inside the parent div.

Whatever the solution, this would be very useful or even needed on several projects. wikiHow comes to mind, considering how all the content on a howto is broken up into sections in just such a way.

Eight years ahead of my time, apparently (-;
Glad to see others finally noticing some need for this!

He7d3r set Security to None.

@ssastry Possibly off topic.. I know doing this in the PHP parser is trickier, but can we imagine allowing skins to opt into this with the PHP parser in the near future? The Minerva desktop skin is suffering from various bugs due to a lack of section wrapping (https://phabricator.wikimedia.org/project/view/2859/) which it loses as soon as it loses the MobileFormatter which only applies in mobile mode.

I think the Parser has a separate cache for skins.. but I may be wrong.

Eight years ahead of my time, apparently (-;
Glad to see others finally noticing some need for this!

Over ten years, now! But we're getting really close to actually deploying this: https://gerrit.wikimedia.org/r/364933

This is implemented in Parsoid, but is waiting for T55784: [EPIC] Use Parsoid HTML for all page views to be widely visible to readers.