Page MenuHomePhabricator

Add pagination to Special:Lonelypages
Open, LowPublicFeature

Description

Author: mapellegrini

Description:
[[Special:Lonelypages]] (which is linked to from the en main page to increase
the visibility of orphaned articles) cannot display more than the first 1,000
orphaned pages (by alphabetical order)


Version: unspecified
Severity: enhancement
URL: https://meta.wikimedia.org/w/index.php?title=Special:LonelyPages&limit=500&offset=500
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=69660

Details

Reference
bz2415

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 8:33 PM
bzimport set Reference to bz2415.
bzimport added a subscriber: Unknown Object (MLST).

The limit of 1000 pages is set to make it faster. Resolve
some of those lonely page and you will get the next.

zigger wrote:

(In reply to comment #1)

The limit of 1000 pages is set to make it faster. Resolve
some of those lonely page and you will get the next.

This is prevented by the combination of the limit, the caching, and no visual
indication of resolved orphans makes this difficult. See bug 2599.

Static dumps of these specialpage outputs once a [month? for starters] -- the entire output, even if it's 20k article titles -- would be helpful. Likewise for ancientpages, &c.

avarab wrote:

This bug depends on bug 4699 to be able to enable this without bogging down the
servers, adding dependancy. Changing component since this is a Wikimedia
configuration issue.

shunpiker wrote:

Patch for files in include and maintenance (updated 3/17 to include updaters/patch, nullable qc_serial to support this)

Instead of using LIMIT to restrict the number of rows fetched from the query
cache, SELECT an indexed range. This should make operations on the query cache
trivially expensive so that the limits can be disabled in the $wgQueryPages
array. (Loading the cache without a LIMIT is only a little more expensive since
the bulk of the expense of these queries is in assembling the results rather
than returning them.)

For the purpose of this patch, I only turned off the cache limit on
SpecialLonelypages, but I recommend that the rest of the cache limits be
likewise disabled.

Tested with a build of the SimpleEnglish data.

attachment consolidated.patch ignored as obsolete

shunpiker wrote:

Comment on attachment 3106
Patch for files in include and maintenance (updated 3/17 to include updaters/patch, nullable qc_serial to support this)

might also be relevant to #4699
http://bugzilla.wikimedia.org/show_bug.cgi?id=4699

shunpiker wrote:

Comment on attachment 3106
Patch for files in include and maintenance (updated 3/17 to include updaters/patch, nullable qc_serial to support this)

Please note schema changes -- before deploying code:

alter table querycache add column qc_serial int(5) unsigned NOT NULL default
'0';
alter table querycache add unique key (qc_type,qc_serial);

shunpiker wrote:

Wish I could edit previous comments! Anyway, with deeper study of MySql's LIMIT
optimizations, it seems like some queries may benefit from LIMITs more than
others, particularly ones where the results can be compiled during a range scan.
But queries like this one (and its evil twin, Deadendpages) seem to do most of
their work before the LIMIT is applied. Would it be possible to test this one
time -- disabling the LIMIT for Lonelypages and comparing run time and row count
to the LIMITed version?

shunpiker wrote:

Comment on attachment 3106
Patch for files in include and maintenance (updated 3/17 to include updaters/patch, nullable qc_serial to support this)

Index: includes/QueryPage.php

  • includes/QueryPage.php (revision 19480)

+++ includes/QueryPage.php (working copy)
@@ -19,7 +19,7 @@
array( 'DoubleRedirectsPage', 'DoubleRedirects'
),
array( 'ListUsersPage', 'Listusers'
),
array( 'ListredirectsPage', 'Listredirects' ),

  • array( 'LonelyPagesPage', 'Lonelypages'

),
+ array( 'LonelyPagesPage', 'Lonelypages', false
),
array( 'LongPagesPage', 'Longpages'
),
array( 'MostcategoriesPage', 'Mostcategories'
),
array( 'MostimagesPage', 'Mostimages'
),
@@ -114,6 +114,15 @@

		return "SELECT 'sample' as type, 0 as namespace, 'Sample

result' as title, 42 as value";
}

+ ASSUMPTION: arguments are SQL clean
+ function getCacheSQL($querycache,$type,$limit,$offset) {
+ $first = $offset + 1;
+ $last = $offset + $limit;
+ $sql = "SELECT qc_type as type, qc_serial as serial_id,
qc_namespace as namespace, qc_title as title, qc_value as value FROM
$querycache WHERE qc_type='$type' AND qc_serial BETWEEN $first AND $last ORDER
BY qc_serial";
+
Cache results are assumed to be sorted on insert
+ return $sql;
+ }
+
/**

  • Override to sort by increasing values
	 */

@@ -221,8 +230,9 @@

		if ( $res ) {
			$num = $dbr->numRows( $res );
			# Fetch results
  • $insertSql = "INSERT INTO $querycache

(qc_type,qc_namespace,qc_title,qc_value) VALUES ";
+ $insertSql = "INSERT INTO $querycache
(qc_type,qc_serial,qc_namespace,qc_title,qc_value) VALUES ";

			$first = true;

+ $serial_id = 0;

			while ( $res && $row = $dbr->fetchObject( $res ) ) {
				if ( $first ) {
					$first = false;

@@ -237,6 +247,7 @@

				$insertSql .= '(' .
					$dbw->addQuotes( $row->type ) . ',' .

+ $dbw->addQuotes( ++$serial_id ) . ',' .

					$dbw->addQuotes( $row->namespace ) .

',' .

					$dbw->addQuotes( $row->title ) . ',' .
					$dbw->addQuotes( $value ) . ')';

@@ -288,13 +299,13 @@

		if ( !$this->isCached() ) {
			$sql = $this->getSQL();

+ $sql .= $this->getOrder();
+ $sql = $dbr->limitResult($sql, $limit, $offset);

		} else {
			# Get the cached result
			$querycache = $dbr->tableName( 'querycache' );
			$type = $dbr->strencode( $sname );
  • $sql =
  • "SELECT qc_type as type, qc_namespace as

namespace,qc_title as title, qc_value as value

  • FROM $querycache WHERE qc_type='$type'";

+ $sql =
$this->getCacheSQL($querycache,$type,$limit,$offset);

			if( !$this->listoutput ) {

@@ -324,8 +335,6 @@

		}
  • $sql .= $this->getOrder();
  • $sql = $dbr->limitResult($sql, $limit, $offset);
		$res = $dbr->query( $sql );
		$num = $dbr->numRows($res);

Index: includes/SpecialLonelypages.php

  • includes/SpecialLonelypages.php (revision 19480)

+++ includes/SpecialLonelypages.php (working copy)
@@ -35,8 +35,7 @@

		return
		  "SELECT 'Lonelypages'  AS type,
			  page_namespace AS namespace,
  • page_title AS title,
  • page_title AS value

+ page_title AS title

		     FROM $page
		LEFT JOIN $pagelinks
		       ON page_namespace=pl_namespace AND page_title=pl_title

@@ -45,6 +44,11 @@

		      AND page_is_redirect=0";

}
+
+ function getOrder() {
+ return ' ORDER BY title '; // no value column for this query
+ }
+
}

/**

Index: maintenance/archives/patch-querycache-serial.sql

  • maintenance/archives/patch-querycache-serial.sql (revision 0)

+++ maintenance/archives/patch-querycache-serial.sql (revision 0)
@@ -0,0 +1,8 @@
+-- Add a serial column to store an id for each row according to qc_type.
+-- Add an index (qc_type, qc_serial) to facilitate paged sequential reads.
+
+ALTER TABLE querycache
+ ADD (qc_serial int(5) unsigned);
+
+ALTER TABLE querycache
+ ADD UNIQUE KEY(qc_type, qc_serial);

Index: maintenance/mysql5/tables-binary.sql

  • maintenance/mysql5/tables-binary.sql (revision 19480)

+++ maintenance/mysql5/tables-binary.sql (working copy)
@@ -946,6 +946,9 @@

  • A key name, generally the base name of of the special page. qc_type char(32) NOT NULL,

+ -- A serial id for each cache row according to type
+ qc_serial int(5) unsigned,
+

  • Some sort of stored value. Sizes, counts... qc_value int(5) unsigned NOT NULL default '0',

Index: maintenance/mysql5/tables.sql

  • maintenance/mysql5/tables.sql (revision 19480)

+++ maintenance/mysql5/tables.sql (working copy)
@@ -937,6 +937,9 @@

  • A key name, generally the base name of of the special page. qc_type char(32) NOT NULL,

+ -- A serial id for each cache row according to type
+ qc_serial int(5) unsigned,
+

  • Some sort of stored value. Sizes, counts... qc_value int(5) unsigned NOT NULL default '0',

Index: maintenance/postgres/tables.sql

  • maintenance/postgres/tables.sql (revision 19480)

+++ maintenance/postgres/tables.sql (working copy)
@@ -362,10 +362,12 @@

CREATE TABLE querycache (

qc_type	 TEXT	   NOT NULL,

+ qc_serial SMALLINT,

qc_value	 SMALLINT  NOT NULL,
qc_namespace  SMALLINT  NOT NULL,
qc_title	 TEXT	   NOT NULL

);
+CREATE UNIQUE INDEX querycache_type_serial ON querycache (qc_type, qc_serial);
CREATE INDEX querycache_type_value ON querycache (qc_type, qc_value);

CREATE TABLE querycache_info (

Index: maintenance/tables.sql

  • maintenance/tables.sql (revision 19480)

+++ maintenance/tables.sql (working copy)
@@ -925,6 +925,9 @@

  • A key name, generally the base name of of the special page. qc_type char(32) NOT NULL,

+ -- A serial id for each cache row according to type
+ qc_serial int(5) unsigned,
+

  • Some sort of stored value. Sizes, counts... qc_value int(5) unsigned NOT NULL default '0',

@@ -932,6 +935,7 @@

qc_namespace int NOT NULL default '0',
qc_title char(255) binary NOT NULL default '',

+ UNIQUE KEY (qc_type,qc_serial),

KEY (qc_type,qc_value)

) TYPE=InnoDB;

Index: maintenance/updaters.inc

  • maintenance/updaters.inc (revision 19480)

+++ maintenance/updaters.inc (working copy)
@@ -40,6 +40,7 @@

$wgNewFields = array(

  1. table field patch file (in

maintenance/archives)
+ array( 'querycache', 'qc_serial',
'patch-querycache-serial.sql' ),
array( 'ipblocks', 'ipb_id', 'patch-ipblocks.sql' ),
array( 'ipblocks', 'ipb_expiry', 'patch-ipb_expiry.sql' ),
array( 'recentchanges', 'rc_type', 'patch-rc_type.sql' ),

shunpiker wrote:

Querycache enhancement (serial range scans for paging), version 2

Update to previous patch to include updaters.inc/archives. qc_serial is now
nullable with no default in order to support updating that table without having
to empty the legacy querycache.

attachment consolidated3.patch ignored as obsolete

sumanah wrote:

The bug is still reproducible. https://meta.wikimedia.org/w/index.php?title=Special:LonelyPages&limit=500&offset=500

Riley, I'm sorry it's taken so long for you to get a response to your patch. I am trying to get developers to respond soon.

shunpiker wrote:

Thanks, Sumana. I'd be glad to look into the details if the points of contact for this patch have changed since (wow!) 2007.

Marking patch as obsolete since it doesn't apply anymore, at all. The QueryPage subsystem saw major architectural changes in December 2010. For one thing, there is now no raw SQL left, which means that most of the patch will need to be rewritten.

sumanah wrote:

Removed "patch" and "need-review" keywords.

Riley, it does look like -- as you put it -- the points of contact for the patch have changed rather a lot since 2007. You can look around on mediawiki.org with a search for "QueryPage", visit the MediaWiki development community in IRC -- https://www.mediawiki.org/wiki/MediaWiki_on_IRC -- or in the mailing list -- https://lists.wikimedia.org/mailman/listinfo/wikitech-l -- to learn more about the current state of this component and talk about approach. Thank you for your interest; hope we can help!

shunpiker wrote:

Rewrite of previous patch using branch REL1_17 (104384)

Rewrote the patch to conform to changes since it was originally written.

Attached:

shunpiker wrote:

With a querycache of 250,000 rows, including 50,000 lonelypages rows, a query with a large offset without the patch:

select SQL_NO_CACHE qc_type, qc_namespace, qc_title, qc_value from querycache
where qc_type = 'lonelypages'
order by qc_value ASC
limit 500 offset 35000;

--> about 0.191 seconds

The same offset with the patch:

select SQL_NO_CACHE qc_type, qc_namespace, qc_title, qc_value from querycache
where qc_type = 'lonelypages'"+
and qc_seq between 35001 and 35500
order by qc_value ASC;

--> about 0.004 seconds

sumanah wrote:

Riley, you based your current patch off branch REL1_17 -- for better freshness & reviewability, you should base it off SVN trunk.

sumanah wrote:

Riley noted to me:

"You linked Lonelypages on meta with a note that the bug is still reproducible:

https://meta.wikimedia.org/w/index.php?title=Special:LonelyPages&limit=500&offset=500

But when I click through, I don't see the problem, which is to say, I'm able to click through on the following page of 500 results, and the next, with the last result being the 1948th. Am I failing to notice something?

On the other hand, I do see the problem in the Spanish wikipedia:

http://es.wikipedia.org/w/index.php?title=Especial:P%C3%A1ginasHu%C3%A9rfanas&limit=500&offset=4500

Here you can see the that the results are limited to 5,000 which falls something short of the letter "C"."

I thought the bug was reproducible on meta when I posted that comment on the 9th, but now I cannot reproduce it there. However, I can, like Riley, reproduce it on Spanish Wikipedia.

shunpiker wrote:

Ok, will rewrite patch for trunk. Thanks!

sumanah wrote:

Riley, when you rewrite the patch, please submit it directly into Git, our source control system. You can do that with developer access: https://www.mediawiki.org/wiki/Developer_access

Thanks!

sumanah wrote:

Riley, were you able to rewrite for trunk?

Riley: Did you manage to rewrite the patch? If not, how could we help?

Aklapper changed the subtype of this task from "Task" to "Feature Request".Feb 4 2022, 11:02 AM
Aklapper removed a subscriber: wikibugs-l-list.