Page MenuHomePhabricator

Image within an image caption breaks rendering
Closed, ResolvedPublic

Description

Author: rowan.collins

Description:
This was originally entered as bug 1033 comment 2, but I think it is actually a
seperate issue, with a different effect (an image with an image in its caption
is accepted as such by the parser, but the HTML produced is mangled).

It's hard to work out exactly what's being produced because of things like
'tidy' going round closing tags, but it looks like there's something wrong with
the order in which things are being escaped and/or removed from 'alt' and
'title' attributes, so that the HTML image markup for the inner image is being
inserted into those attributes. The lines in Skin::make{Image|Thumb}LinkObj()
beginning with
$alt = preg_replace( '/<[^>]*>/', '', $alt );
should cover this, leading me to suspect that the actual link-parsing is
happening in the wrong order (e.g. "...[[Image:...]]..." being replaced with
HTML when it's already in the attributes) , but I can't see anything wrong with
Parser::replaceInternalLinks() at a glance.

I'm CCing Wil Mahan on this, because we hacked the relevant bits of
replaceInternalLinks() around a lot between us (see bug 637), so he might have
some thoughts.


Version: 1.4.x
Severity: major
URL: http://en.wikipedia.org/wiki/Flag_of_Afghanistan

Details

Reference
bz1217

Revisions and Commits

Related Objects

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 8:06 PM
bzimport set Reference to bz1217.
bzimport added a subscriber: Unknown Object (MLST).

D.U.Thibault wrote:

This is a serious bug, introduced recently. Just about all of the "Flag of"
articles in Wikipedia are broken as a result, because they systematically
include a small symbol image in their captions. See for example [[Flag of Costa
Rica]] or [[Flag of Canada]]. Well-meaning editors have started "fixing" those
pages by removing the image within the caption, which is a loss of information.

Please get this fixed ASAP!

rowan.collins wrote:

fix: use "http-noparse" rather than a full-on insertStripItem()

I got round to tracking this one down at last: the entire image code was being
stripped out and replaced with a placeholder; when one image was inside
another, the placeholder was surviving in the alt and title attributes, and
then being replaced with the full code. Hence the attributes ended up
"containing" whole stanzas of HTML, as I suspected.

As far as I can see, the only thing that needed hiding was full URLs, which
replaceExternalLinks() mustn't touch. I've therefore borrowed somebody else's
hack (it's used in Skin::makeMediaLinkObj()) of replacing "http://" with
"http-noparse://" - a kind of very specific place-holder, which gets swapped
back again once replaceExternalLinks() has finished.

I ran parserTests.php, and was annoyed to find that it failed the test because
the test assumed an image existed which in reality didn't. I'm 99% sure the
code produced is actually correct. But in doing so, I realised that I'd also
inadvertently solved bug 1219, which I'd never heard of before. Which is nice.

Attached:

jeluf wrote:

Fixed in CVS HEAD and REL1_4.
Deployed to wikipedia.org

epriestley added a commit: Unknown Object (Diffusion Commit).Mar 4 2015, 8:21 AM
epriestley added a commit: Unknown Object (Diffusion Commit).Mar 4 2015, 8:24 AM
epriestley added a commit: Unknown Object (Diffusion Commit).