Page MenuHomePhabricator

Don't parse templates recursively
Closed, ResolvedPublic

Description

When previewing an edit of an article that contains template parameters inside
an article link in it ([[{{text}}]] or [[{{disambig}}]] doesn't matter), it
causes the following error:
Fatal error: Call to a member function on a non-object in
/home/wikipedia/htdocs/test/w/includes/OutputPage.php on line 814


Version: unspecified
Severity: major

Details

Reference
bz60

Event Timeline

bzimport raised the priority of this task from to High.Nov 21 2014, 6:47 PM
bzimport set Reference to bz60.
bzimport added a subscriber: Unknown Object (MLST).

jeluf wrote:

[[{{text}}]] only crashes if Template:text does not exist.
What happens:

{{text}} is checked for existence. since it does not exist, it's replaced by an
edit link to Template:text, which is parsed into <!--LINK Template:text bla bla
bla-->. Than, [[<!--LINK Template:text bla bla bla-->]] is parsed into another
link, and <!-- is not valid inside of titles.

OutputPage.php 1.165 and Skin.php 1.255 contain code that prevents the fatal
error. The generated output is not nice, though: It contains the bla bla bla
part of the above mentioned link.

(In reply to comment #1)

[[{{text}}]] only crashes if Template:text does not exist.

That's not true, because [[{{disambig}}]] causes the same error, and
{{disambig}} exists.

OutputPage.php 1.165 and Skin.php 1.255 contain code that prevents the fatal
error. The generated output is not nice, though: It contains the bla bla bla
part of the above mentioned link.

Are these only in CVS or are they on http://test.wikipedia.org/ too?

test.wikipedia.org is updated from time to time using the cvs code

The fix for bug 267 (removing { and } from legal title chars again) has some
effect on this, but it's not totally fixed. The [[{{text}}]] case still produces
awful broken HTML. Will investigate...

The problem is that when replaceInternalLinks() is operating, the text it's
looking at has had templates stripped. The replacement text is something like
"NaodW29-item21773b7b0640b5", which is perfectly legal in a title so the link is
generated.

Then, the text gets unstripped and the instances of "NaodW29-item21773b7b0640b5"
in the <a href> tag (1.3) or <!-- LINK --> placeholder (1.4) get replaced with
the template text... which may be itself a link or link placeholder, screwing
everything up.

A simple "fix" is to use a strip placeholder that's not legal in links, however
this has a couple problems. Namely, it means you can't use a template or
template parameter as the source of a link; this is problematic anyway, but
people try it and it might be better to make it work properly than to take it out.

Also, there are other places where text may be replicated or moved into HTML
tags where the unstripping causes breakage. A more formal solution that covers
all might be good.

  • Bug 464 has been marked as a duplicate of this bug. ***

wmahan_04 wrote:

Don't parse templates recursively

I was looking at what Brion described, and here are my thoughts:

This looks like a fundamental result of the way templates are stripped
and replaced by placeholders. Stripping things like <math>, <nowiki>,
and <pre> make sense because they are logically separate from the
surrounding text; for example, a link would never begin inside a
<math>...</math> pair and end outside. People do want to do this sort
of thing with templates and template arguments, however; there is really
no logical boundary between the template and the rest of the text.

For things like <nowiki> et al, stripping and substituting a placeholder
is a way of saying "this is parsed, so don't touch it". But perhaps
stripping out templates and template arguments and replacing them with
placeholders is the wrong thing to do.

What if instead of recursively parsing templates and template arguments, we
only recursively _substitute_ them? We could then parse the result once,
non-recursively.

This (experimental, proof-of-concept) patch replaces templates and
template arguments by just shoving in their text. No parsing is done
immediately, so no placeholders are used. With the the patch,
internalParse() is no longer recursive; replaceVariables() does
not actually do any stripping and parsing, but simply substitutes
variables recursively.

Does anyone see any obvious problems with this approach?

Attached:

jeluf wrote:

Right approach, in my eyes.

I think removeHTMLtags should be called after including templates.
Else one could insert "bad" HTML using templates.

wmahan_04 wrote:

Thanks for taking a look. The reason I call stripHTMLtags() before
replaceVariables() is that that's the order used in internalParse(), and I was
trying to preserve the existing behavior.

I think the existing patch is safe from inserting bad HTML, at least in these
two cases:

  1. Trying to include a template with bad HTML: we start out in internalParse(),

which calls replaceVariables(). The template with bad HTML gets inserted by
braceSubstitution(). Then right before we substitute any template arguments with
the recursive call to replaceVariables(), the bad HTML gets removed by my
stripHTMLtags() call.

  1. Trying to insert bad HTML as a template parameter: if it doesn't initially

get removed by the call in internalParse(), the bad HTML in the argument will
get removed by my call in argSubstitution().

Astronouth7303 wrote:

2 things:
(1) This bug prevents partial syntax parsing of templates (translation: If you put a row of a table in a
template, it will appear as data in the last cell ie,
http://endeavour.zapto.org/furc/Wiki/index.php/User:Astronouth7303/New_roster )

(2) Can you please backport this to 1.3?

(In reply to comment #11)

2 things:
(1) This bug prevents partial syntax parsing of templates (translation: If you

put a row of a table in a

template, it will appear as data in the last cell ie,
http://endeavour.zapto.org/furc/Wiki/index.php/User:Astronouth7303/New_roster )

Jamie Bliss case is solved by the patch submitted (Parser.php:1.272).

wmahan_04 wrote:

Patch for REL1_3

  1. Here is an untested patch with the change for REL1_3. Warning: my fix is a

lot more intrusive than might appear from the small number of lines changed. I
would be wary of using it in production before there is more testing.

  1. Another bug that should be fixed by my patch (which is in CVS HEAD) is bug 80.
  2. At the end of braceSubstitution(), notice that we suspend the LinkCache

before recursing and then resume it afterwards. I think with my patch this may
no longer be necessary. Since no parsing is done when including templates,
replaceInternalLinks() doesn't get called in this code path. I'm not certain
about this, however, so if someone could confirm I would appreciate it.

Attached:

jeluf wrote:

Tested in CVS HEAD:
Bug 60 Fixed
Bug 70 Fixed
Bug 80 Fixed

new issue occured while testing, I'll file a 
bug report for this new problem.

Bug 41 Fixed
Bug 283 Fixed
Bug 161 Fixed
Bug 131 Fixed
Bug 302 Fixed

jeluf wrote:

Another one in CVS HEAD:

Bug 16 Fixed

Going to check the patch in REL1_3.

jeluf wrote:

Tested in REL1_3:
Bug 16 Fixed
Bug 41 NOT FIXED
Bug 60 Fixed
Bug 70 Fixed
Bug 80 Fixed
Bug 283 Fixed
Bug 161 Fixed
Bug 131 Fixed
Bug 302 Fixed

river wrote:

This patch makes the issue at bug 266 more serious. Instead of template
sections editing the wrong section, _all_ edit links (even for the article)
after a template inclusion with a section header are wrong. Example at
[[User:Kate/sandbox2]].