Page MenuHomePhabricator

page or section content duplication on edit conflict with self
Closed, ResolvedPublic

Description

Author: rowan.collins

Description:
BUG MIGRATED FROM SOURCEFORGE
http://sourceforge.net/tracker/?func=detail&aid=949323&group_id=34373&atid=411192
Originally submitted by IMSoP 2004-05-06 17:37

There have been numerous instances that I know of on
the English Wikipedia of the entire contents of a page
becoming duplicated - i.e. an additional copy of all
text being appended to itself. This is especially
problematic if it happens on large and busy utility or
discussion pages, since it is often not spotted
immediately and therefore leads to each discussion on
the page being forked without anyone realising, and
having to be carefully merged later.

This appears to be caused by users attempting to submit
more than one edit in competition with themselves, and
specifically submitting the same change twice. Since
large pages are likely to load rather slowly after
editing, people *will* think their changes haven't gone
through, and so click the submit button again - forum
software often includes specific filters to overcome
such multiple submissions. It possibly also interacts
with section editing, since this presumably entails
"construction" of the new page content from the form
data and the existing version.

When this was mentioned on the mailing list, Brion
stated: "There is explicitly no edit conflict
resolution between submissions by the same user."
Clearly, the edit created in such circumstances is
inappropriate, somehow concatenating two (probably
identical) versions, rather than over-writing one with
the other. The software needs to do at least one of:

  • Treat multiple submissions from the same user as a

normal edit conflict (creating potential confusion if
they just hit the same button twice)

  • detect multiple submissions which contain identical

data, and silently accept one or the other

  • detect submissions in very quick succession, and

flatten them into one edit (if they are by the same user)

  • at the very least, ignore such situations, as now,

but in a sane way - i.e. use the content from one and
only one edit submission, even if the edit was to a
particular section

Related mailing list posts:
http://mail.wikipedia.org/pipermail/wikitech-l/2004-April/009752.html
http://mail.wikipedia.org/pipermail/wikitech-l/2004-April/009750.html

IMSoP
[http://en.wikipedia.org/wiki/User:IMSoP]

  • Additional comments (in reverse order) ----

Date: 2004-06-28 22:06
Sender: robert_dodier
Logged In: YES
user_id=501686

Hello, another bit that might help track down this bug --
[[vfd]] got duplicated sometime today (June 28 2004). Maybe
by checking the editing history, it could be determined
which edit yielded the duplication. Hope this helps.


Date: 2004-06-22 00:18
Sender: wfmcwalter
Logged In: YES
user_id=1036616

It seems only to be with section editing. It's not confined
to edits of new sections (although it may be _caused_ by the
unrelated addition of another section). The duplicated
section certainly isn't always the new one.

It seems to happen more often when the system is slow,
leading me to believe it relates to a user resubmitting an
edit believing it to be "stuck". But that alone isn't
sufficient to cause it.

It happens occasionally (but frequently enough to be a
problem) on heavily edited pages. Before it returned to a
transclusion-based scheme, [[en:Wikipedia:Votes for
deletion]] would exhibit this behaviour several times per day.

It's going to be nearly impossible to obtain a reasonable
idea what users did to precipitate matters, I'm afraid, because:

  1. it seems to require interaction of two (or more)

simultaneous editors

  1. at the time it happens, neither is aware (both writes

seem to succeed without an error)

  1. by the time the error is discovered (often hours later)

it's unlikely either submitter will be able to recall
sufficient detail

  1. clearly the window in which this occurs is tiny, making

the chances of a manual attempt at reproducing it
diminishingly small

I figure the only way to be able to reliably reproduce it
will be to set two (or more) bots on the same page, making
section edits.


Date: 2004-06-21 23:53
Sender: vibber
Logged In: YES
user_id=446709

Is this only in section editing?

Is this only when adding new sections? [This has long been known
to
duplicate the added section on double submission, since it simply
adds a
new section to the end of whatever is there.]

Is this when editing existing sections? [This is virtually guaranteed
trouble as sections are numbered in a fashion that is liable
to change.]

Is this when editing whole pages?

Can you reliably reproduce the problem?

When you see it happen, please record *everything* you can. When
the
edit occured, whether it was by section or whole page, whether
any edit
conflicts were involved, how many times submitted, etc.


Date: 2004-06-16 17:06
Sender: imsop
Logged In: YES
user_id=1053535

The introduction of edit conflict merging in 1.3 seems to
have made this problem worse:

Firstly, pages like [[en:VfD]] have gone back to being one
large page, and in general slowness has been rearing its
ugly head a lot. Since if something's going that slowly,
people will be more likely to click save twice, this is
triggering more instances of the bug.

Secondly, some people are reporting problems with section
editing, where sections seem to overwrite each other - see
http://meta.wikipedia.org/wiki/MediaWiki_1.3_comments_and_bug_rep
orts#edit_conflict_management_problem

  • which may or may not be due to the new code. Since, from

Brion's comment, the behaviour in these conditions appears
to be essentially "undefined", it seems to me that the new
code could be interfering somehow and making the results
even more confusing.

Either way, this seems to be causing major problems, and
needs to be fixed ASAP.

IMSoP
[http://en.wikipedia.org/wiki/User:IMSoP]


Date: 2004-05-06 18:28
Sender: wfmcwalter
Logged In: YES
user_id=1036616

Here's one instance on en.wikipedia.org's [[Reference desk]]
earlier today:

Change log:
Of these two transactions, the latter seemed to cause the
duplication:

m 15:20, 6 May 2004 .. Bodnotbod (=Wikipedia Talk and
Google= how is suppression of Google indexing of VfD done?)
m 15:26, 6 May 2004 .. Bodnotbod (=Wikipedia Talk and
Google= how is suppression of Google indexing of VfD done?)

URL for the problematic change:
http://en.wikipedia.org/w/wiki.phtml?title=Wikipedia:Reference_de
sk&diff=3472769&oldid=3472734


Version: unspecified
Severity: major

Details

Reference
bz275

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 6:48 PM
bzimport set Reference to bz275.
bzimport added a subscriber: Unknown Object (MLST).

rowan.collins wrote:

Note that bug 56 may or may not be related to or the same as this - that report
suggests similar-sounding behaviour with two users editing different sections.
It may be that there's just interaction between two bugs, or one may be a
misinterpretation.

qleah wrote:

*** Bug 552 has been marked as a duplicate of this bug. ***

rowan.collins wrote:

Another bug that may or may not be related: bug 317, where users report page
*blanking*, but which it seems to me may also include getting into a conflict
with oneself.

beesley wrote:

A self-edit conflict, or section blanking when there is no conflict are
frequently reported on Wikicities when users have "Show preview on first edit"
option selected in their preferences.

richholton wrote:

As I believe Brion remarked once in a wikitech-l conversation, there is code in
the software to explicitly ignore any conflict with yourself. By ignore, I mean
that it treats it as a non-conflict. The later submitted change will just drop
itself on top of what was there.

Can anyone think of a reason for this behavior? The only thing I can imagine is
if a user has two windows open editting the same page. The user submits from one
window, then realizes that the other window has better changes, and submits that.

_If_ this is the only reason to have the code, I'd suggest forcing a normal edit
conflict resolution. Perhaps in the process we'll fix this bug.

-Rich Holton
en.Wikipedia:User:Rholton

The purpose is so that someone who saves an edit, clicks "back", makes another change, and clicks "save" again
won't receive an edit conflict message. We got a lot of complaints about that back in the day.

rickblock wrote:

Perhaps not exactly the same issue, but there have been instances where I think entire articles are duplicated by users inappropriately
responding to an edit conflict for a section edit by copying and pasting the entire article's content (shown in the "your changes" box)
and submitting this as the new contents for the section they were editing. Yes, this is user error - BUT, it's happened enough times
that I think the software should be changed so that when an edit conflict occurs for a section edit, only the section is shown.

rickblock wrote:

(In reply to comment #7)

Perhaps not exactly the same issue, but there have been instances where I think entire articles are duplicated by users

inappropriately

responding to an edit conflict for a section edit by copying and pasting the entire article's content (shown in the "your changes" box)
and submitting this as the new contents for the section they were editing. Yes, this is user error - BUT, it's happened enough times
that I think the software should be changed so that when an edit conflict occurs for a section edit, only the section is shown.

A user has confirmed the above sequence as a mechanism resulting in duplicated content, please see http://en.wikipedia.org/wiki/
User_talk:Rick_Block#How_to_duplicate . Is there any particular reason the edit conflict page shows the entire article rather than just
the section being edited? Seems like this should be a fairly simple fix.

rowan.collins wrote:

(In reply to comment #8)

Is there any particular reason the edit conflict page shows the entire article

rather than just

the section being edited? Seems like this should be a fairly simple fix.

The problem with only showing one section in an edit conflict screen is that
changes to other parts of the article could make the original section edit not
make sense - for instance, the section might no longer exist, or have been
moved, or the information one user was about to add to it has been added to
another section by another user. All these situations require the user presented
with the edit conflict to be able to see and manipulate the entire article, not
just the section they originally elected to edit.

I'm also unable to reproduce your analysis of the screen's behaviour - as far as
I can see, an edit conflict screen presented when editting a section correctly
displays the entire article in both boxes (for the reason explained above), and
the resulting save correctly replaces the entire article text with just the text
in the top box. So, unfortunately, the bug is not as simple as you are
suggesting (i.e. it's a genuine bug, not a bad UI)

The factors that all the examples I've seen have in common appear to be:

  • large pages - if anyone has an example that rules this out as a factor, it

would be worth knowing about

  • edit conflict with self - with or without seeing the warning screen, which

*should* be suppressed in this situation

  • editting a particular section, rather than the whole page - I'm still not 100%

clear that this is always the case, but it seems a reasonable assumption

It thus occurs to me that the following sequence of events would describe the
behaviour of the bug:

  1. user edits a section of a large page; the editted section is merged into the

rest of the page and saved

  1. same user edits same page in a way that would trigger an edit conflict;

again, the section is merged in to create the page to save

  1. when it overrides the edit conflict (because this is the same user), the

software "forgets" that is has already merged the section, and mistakenly treats
the new version of the page as the contents of a single section

I have yet to come up with the exact circumstances under which this happens (and
therefore can't reproduce the bug on demand for testing) - for all I know, it
may involve very subtle coincidences of timing and/or some specific size of
page, etc - but I think it's the most thorough hypothesis so far that fits the
facts.

sjorford wrote:

http://en.wikipedia.org/wiki/Wikipedia:Vandalism_in_progress seems to suffer
from this kind of duplication very badly, partly because the highly compact
style of writing means it's difficult to spot when it happens. I've twice in the
last month fixed almost complete duplication of the page, in both cases after a
whole week had passed.

Duplication edits:
http://en.wikipedia.org/w/index.php?title=Wikipedia:Vandalism_in_progress&diff=14316926&oldid=14315993
http://en.wikipedia.org/w/index.php?title=Wikipedia:Vandalism_in_progress&diff=next&oldid=15115600

I hate this bug. I hate it so much, that I sacrificed part of my life staring at the Mediawiki source to
try to figure out what is causing it, and I think I have figured it out.

In EditPage.php, function editForm, section "if ( 'save' == $formtype )":

I believe it is possible to get through this branch with both $isConflict = True and $this->section != ''.
If this occurs, then the edit conflict screen will place the full page's text in textbox1, but still have a
hidden field stating that all of this text belongs as a replacement of only a single section. Hence, if
someone then saves from this screen the entire page's content would be dumped into that single section,
effectively doubling the content of the page.

I believe the event that allows this to happen is a return of false from the call to $this->mArticle-

updateArticle(...), which can occur if there is a late edit conflict such that the database is updated

BETWEEN when editForm calls $this->mArticle->getTimestamp() and when Article.php::updateArticle
calls "$this->updateRevisionOn". Since updateArticle will fail even in self-conflicts (unlike editForm),
the easiest way to trigger this would be to get in a race with onesself by trying to submit multiple times,
though it is not neccesary that this be a self-conflict.

Once updateArticle returns false for any reason, the only response is to set $isConflict = True. Unlike
the earlier part of editForm, there is no code to ensure that $this->section is reset to '' before
proceeding to the Edit Conflict screen.

Hence, if one is A) performing a section edit, B) does not trigger an edit conflict in editForm, & C) does
trigger a conflict in the slightly later updateArticle, then one arrives at the Edit Conflict screen with
the section identifier still set and the possibility to save from this screen and dump the entire page's
content into a single section of that page.

The quick resolution is simple, if updateArticle return false, make sure $this->section = ''. (Though
ideally, such late edit conflicts should loop back to the beginning to see if they can be resolved through
merging or something similar).

So, please fix this.

-DF

rowan.collins wrote:

(In reply to comment #11)

I hate this bug. I hate it so much, that I sacrificed part of my life staring

at the Mediawiki source to

try to figure out what is causing it, and I think I have figured it out.

Wow! I think you win the prize - at least for the most thoroughly analysed
hypothesis! If you're right about the specific race condition, would it be
possible to artificially simulate it - i.e. put huge pauses in the code at the
point the second edit has to come. Would be great to replicate the bug on
demand, and then be confident that it was in fact fixed.

dogma wrote:

Please fix this as a matter of urgency. This has made my life hell for the last
9 hours or so as I've tried to keep the page
http://en.wikipedia.org/wiki/7_July_2005_London_bombings sane. I've had this bug
about 20 or 30 times or so in that space of time, the only sensible solution
being to revert to the last un-duplicated revision, sometimes wiping out dozens
of edits.

j.niesen wrote:

I think I saw this bug just happen with my own edits at
http://en.wikipedia.org/wiki/Wikipedia:Templates_for_deletion . I wanted to vote
delete for two templates, so I right-clicked on both the edit links for the
corresponding sections in quick succession. I'm using Firefox and right-clicking
opens the link in a new tab. Then I went to the first tab, added my vote, and
saved:
http://en.wikipedia.org/w/index.php?title=Wikipedia:Templates_for_deletion&diff=prev&oldid=19299850.
Then I went to the second tab, added my vote, and pressed save. This was
apparently also saved, see
http://en.wikipedia.org/w/index.php?title=Wikipedia:Templates_for_deletion&diff=prev&oldid=19299862,
but I did get an edit conflict screen, with the complete page in the top edit
box and only the section I was editing in the bottom edit box. I then copied the
text in the bottom edit box to the top edit box and saved:
http://en.wikipedia.org/w/index.php?title=Wikipedia:Templates_for_deletion&diff=prev&oldid=19300028.
I think that, if I hadn't copied the text, but just edited the top edit box, the
text in the top edit would have been substituted for the section and the page
would effectively be duplicated.

Example of doubling bug edit conflict

By accidentally double clicking the save button, I believe I have managed to
capture an image of an edit conflict page that leads to the doubling bug.

This is attached. If you look at the source, you will note that even though
this in an edit conflict, it includes: "<input type='hidden' value="9"
name="wpSection" />" indicating that section editting is still turned on,
consistent with my hypothesis for how an entire copy of a page gets dumped into
a single section's spot.

Curiously, "my text" is also represented as only the material in the section
being editted, rather than a new version of the whole page as is customary for
edit conflicts.

Hope this helps.

Attached:

This bug should now be fixed per the analysis in comment 11.

That fix was committed and synchronized around 15:20 UTC. Can
you confirm that this happened afterwards and that it still happens?

(In reply to comment #16)

This bug should now be fixed per the analysis in comment 11.
That fix was committed and synchronized around 15:20 UTC. Can
you confirm that this happened afterwards and that it still happens?

The file I uploaded showing the edit conflict dates from several days ago. I saved it at
that time, but didn't get around to posting it till today. So, hopefully the fix you have
now applied will have settled the issue.

Great. :)

Marking this tentatively FIXED, for both the duplication and the diff display.