Page MenuHomePhabricator

Add evaluation param to action=raw and Special:Export page
Closed, ResolvedPublic

Description

Author: sfkeller

Description:
For a reading (and giving later back!) information from Wikipedia or other MediaWikis
we need to get data from a Machine-
friendly_wiki_interface. Categories help us selecting well specified content.

Now the command http://de.wikipedia.org/wiki/Kategorie:Foo?action=raw" as well as
Special:Export do a good job. The problem is, that for categories "action=raw"
returns the wiki tags. But what we need is the evaluated content, i.e. the list of
articles like they are presented in HTML in default mode.

How about a keyword "evaluated" (and an option in Special:Export) in addition to
action=raw?

Say like this: http://de.wikipedia.org/wiki/Kategorie:Ort_in_der_Schweiz?
action=raw&evaluated. Instead of ([[Kategorie: ....]] ....) this would give a list of
all articles as a response.


Version: unspecified
Severity: enhancement
Platform: PC
URL: http://www.geometa.info

Details

Reference
bz1012

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 8:01 PM
bzimport set Reference to bz1012.
bzimport added a subscriber: Unknown Object (MLST).

sfkeller wrote:

I have looked again what has been proposed until today I found this:

  • The most similar to this is "Bug 208: API for external access"
  • And Some requests at Wikitech-l like a "Minimalistic Web-API for use by Tools

nad Bots" at
http://mail.wikipedia.org/pipermail/wikitech-l/2004-September/025373.html .

Thes are solutions proposed so far:

  1. Use Special:Export with XML (Wikipedia built-in)
  2. Use action=raw (Wikipedia built-in) with Wiki-Text (e.g. with an Java-API)
  3. Use the Phyton Framework on Wikipedia-HTML
  4. Use Perl on a SQL-Dump of Wikipedia

Remarks:

  1. Special:Export uses an XML where I am not shure if it is well defined for

long term use (found: http://wikipedia.sourceforge.net/xml/export-0.1.xsd).

  1. action=raw is the same as users/editors see thus "the original" format (there

ist a
Java API prototype underway; see here
http://meta.wikimedia.org/wiki/Wikimaps/GeonamenDB#Java)

  1. The Phyton Framework was mentioned sometimes and we tested it: This is based

on screen scraping HTML the Wikipedia-Output; so this is definitely error prone
an breaks potentially after changes in Mediawiki and even on individual Style
Sheets.

  1. Using Perl on a SQL-Dump of Wikipedia reads Wiki-Text (like 2.) and is

limited on local installations.

So; the most promising approach to me is still "action=raw". But - as mentioned
in this bug (= feature request) - we still need an extension in case of categories!

> Can anyone point me to where to begin in the Wikimedia-PHP code?

Stefan

action=raw and Special:Export are for accessing editable page text.

Things like category memberships are a distinct kind of data, and require a distinct kind of interface.

avarab wrote:

This would ideally be part of some future SOAP API, marking it as a duplicate of
bug 208.

*** This bug has been marked as a duplicate of 208 ***

sfkeller wrote:

I would like to thank the former committers for paying attention to this
request, but pointing to #208 resolves only part of it.

Please let me point you to the following: This bug report mentions two requests
("get article by name" and "get category by category_name") and proposes one
possible API as a solution via HTTP/URI (see "REST"
http://c2.com/cgi/wiki?RestArchitecturalStyle). Now you propose SOAP as another
solution - which is alright too. But:

  1. Please don't forget in the SOAP API of #208 to include "get category..." and
  1. don't consider SOAP as the only API: RESTful is most probably more adapted to

this simple request with one(!) parameter for getting back an XML and/or a Wiki
text stream. RESTful means simply parametrized HTTP GET (POST, UPDATE,...) calls
and is definitely less time consuming as well as easier to implement than SOAP.

avarab wrote:

  1. don't consider SOAP as the only API: RESTful is most probably more adapted to

this simple request with one(!) parameter for getting back an XML and/or a Wiki
text stream. RESTful means simply parametrized HTTP GET (POST, UPDATE,...) calls
and is definitely less time consuming as well as easier to implement than SOAP.

Our needs aren't simple, ideally a SOAP api would handle all sorts of stuff such
as getting edit histories, feeds, rendered html, stuff that links to page $1 and
so on, also, there are standard API's for most programming languages that
implement it.

I'm closing the bug again, please make further comments at bug 208 if you want
to suggest other APIs.

*** This bug has been marked as a duplicate of 208 ***