Page MenuHomePhabricator

Request: properties in articles (structured data)
Closed, ResolvedPublic

Description

Author: john

Description:
I propose that articles should be allowed to have named properties.
For example, in an article on an actor, the actor's first and last
names, date of birth, films, roles etc. could all be marked up.

A possible syntax could be:
{{Prop:FirstName=Marlon}} {{Prop:Surname=Brando}}
which would render simply as
Marlon Brando
(if extensively used, a more compact syntax should be introduced).

Providing such machine readable properties, would greatly enhance the
ability to create lists automatically and to search, sort and cross-
reference entries. For example, one could search on
"Category:Actor Prop:FirstName=John"
which would returns all actors called John. The Prop: namespace
would contain articles describing exactly what each property means
and what valid values are.

This request is similar to the request at bug #1775, but it is more
maintainable because the existing information in the article is
marked up and so there is no need to keep it in more than one place.
Additionally, it allows for properties and their values to be
documented.

It should also be possible to create hidden properties, e.g.
{{Prop:ShortDescription=Marlon Brando, Jr. was an American actor who
is widely regarded as the greatest film actor of the twentieth
century|}}.
Note the pipe at the end of the text, indicating that the value of
the property should not be rendered.

It would be necessary for properties to be containable in links so
that the property value could itself be a link or part of one.

Finally, it might make sense to allow compound properties, e.g.
{{Prop:Role|film=The Godfather|part=Vito Corleone|year=1972}


Version: unspecified
Severity: enhancement
URL: http://en.wikipedia.org/wiki/Wikipedia:Village_pump_%28proposals%29#Semantics:_Categories.2C_properties_and_navigation

Details

Reference
bz1911

Related Objects

StatusSubtypeAssignedTask
InvalidNone
ResolvedNone
ResolvedNone

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 8:19 PM
bzimport set Reference to bz1911.
bzimport added a subscriber: Unknown Object (MLST).

scot wrote:

Templates can be created which would allow entry of such information. Searching
and data standards with template implementation are other issues.

boris.povazay wrote:

Don't you think that properties rather should be placed within
categories?
This way the number of database lookups could be reduced and properties
inside an article could be treated as variables, dynamically looked up
from the category.
The editor could feature a variable list and a property editor to edit
properties in a centralized mannor.
Templates would allow for sortable and limited tables and navigation
bars or theme rings, or ... whatever in a structured way by also
keeping the text read- and mainainable. (There could be standard
templates and user implemented ones with an option to turn them on/off
on demand)

NevilleDNZ.wikipedia wrote:

There have been various discussion on wikipedia on the use of such structured
data in a table enhancement:
http://en.wikipedia.org/wiki/Wikipedia:Categories_for_deletion/Log/2005_September_1#Enhancement_table_example_and_toolbar

NevilleDNZ

boris.povazay wrote:

There is also one point in properties concentrated in a category. Not every typo in an article would trigger a
completely new set of properties and the convergence of properties within a group, this would also resolve the
problem of having extremely large sets of global properties and allow for multiple "namespaces". I do not want to
link all the discussions that could be resolved with this feature combined with calling these properties as variables
within the article and other articles of this group (e.g. dynamic data lookups).
One could create self updating tables, navigation bars (lookup entries before and after the current), overviews (with
limited listings and sorting automatically for a property, including the name of the article also as one of these
properties), create timelines and theme rings, etc.
To summarize - we need: properties within categories, variables to access these properties (=single entry lookup) and
sql-like table lookups with templates.

You might want to search for "wikidata" in the mailing list archives,
there was some discussion about keywords some time ago.

http://www.google.com/search?hl=en&q=wikidata+site%3Amail.wikimedia.org
http://www.google.com/search?hl=en&q=wikidata+site%3Amail.wikipedia.org

john wrote:

There are some good reasons not to have properties in categories:

  1. Having the properties as marked-up text means that the information appears in only one place. e.g. {{Prop:FirstName=John}} {{Prop:LastName=Grisham}} (born {{Prop:Birthday=[[February 8]], [[1955]]}}) If the birthday is incorrect, it is only necessary to correct it in one place, rather than several.
  2. Suppose an article is in multiple categories, e.g. American novelists, Thriller writers, People from Arkansas. If we associate properties with categories, many of the same properties will have to be repeated across these categories e.g. firstname. This would be messy and annoying to maintain. Using global properties there is no need to repeat this information.
  3. It leaves no standard place to document the properties - documenting them on the category page would be messy and

would also lead to duplication of a lot of documentation.

However, to address some of the problems raised above:

  1. You could associate a list of properties with a category (using a very similar syntax) and print a warning if a page in that category does not have all the required properties e.g. Warning: this

article is in the 'American novelists' category, it should have 'FirstName' property.

  1. An attempt to give a value to an unknown property (e.g. due to a typo) should also give a warning or error when

the page is saved/previewed.

  1. Whilst there is a danger in having a global set of properties, there are already means for disambiguating

overloaded article names. The page for a property would provide sufficient description and examples to make the
meaning of each property quite clear.

  1. If it was necessary to treat a set of properties as a group for some reason (e.g. to disambiguate them all at

once) then a compound property could be used.

Additional idea - it should be possible to call a template but pass in the name of a page whose properties should be
set as parameters to the template. In the model/view programming model, the article would be the model and the
template would be the view. So to create an infobox in a page, pass the current page properties to the infobox
template.

Finally, if adopted, a more compact syntax than the one I suggested previously should be used e.g.
{{#FirstName=Marlon}} {{#LastName=Brando}}

john wrote:

One more thing - the page for a property could itself have properties
indicating the type of the property, valid values for the property, etc. so that
properties could be validated on entry.

For example in the English page for the 'Birthday' property you could have:
{{#Type=date}}.

or in the page for the 'Country' property, you could have

Valid values are:
*{{#Value=Afghanistan}}
*{{#Value=Albania}}
*{{#Value=Algeria}}
etc.

Then if you attempt to save a page with an invalid property e.g.

Born in {{#Country=Aghanistan}}, this great man...

You would get a warning (not an error):
'Aghanistan' is not a valid value for the 'Country' property.

The validation must be weakly enforced - so it would generate warnings rather than errors.
Weak enforcement allows correction of invalid properties to be carried out as a separate editing activity if
necessary (as can currently be done for tidying up grammar or spelling).

bp wrote:

I am still wondering if local storage of property information is an efficient
way to resolve the problems.
The example of the author would be that the main category would be "people", and
people have some very distinct properties than "mountains". The "American
novelists" would be a lookup of business="Novelist" and citizenship="United
States of America" both inheriting the properties of their parent. -That's what
a semantic structure is good for.
Thereby no duplication takes place and no artificial categories are generated,
like category:"American novelists, originally immigrated from Poland, now living
in cities 200m above sea-level".
If it is implementable and not to hard to keep these lists (=categories,
indices) updated, a local storage would be possible too. However it seems that
the category (group) + variables (properties) creates a natural namespace for
articles, that can be maintained more easily...

john wrote:

We need to distinguish between the proposed logical design (syntax, functionality) and physical design
(implementation, database structure). Given the logical design I propose, we can choose the physical design to be
efficient for whatever sort/search we want to offer i.e. we can extract the properties into database tables however
we like. If a common task is category specific sort/search then we can create category-specific tables, provided
properties are associated with categories in the manner I suggested above.

The idea of merging properties from different categories relies on these having no properties in common (what if
someone is both an author and a politician - how do we merge conflicting 'name' properties?). To resolve this, we
would still need a global property set.

Having global properties loosely assoicated with categories seems to give the best of both worlds. Flexibility
combined with the possibility of efficient implementation.

bp wrote:

I agree concerning the design, however if properties can be inherited there
should'nt be such a problem. Even the allowed countries could be defined within
the category the property is part of. (i.e. definition of the variable 'country
of birth' in the category:'famous people' looks like 'country of
birth'={category:country, lookup:name, planet=earth, sortby:name}).
There is no need for global properties that are lost in semantic space, but
disambiguation and association is performed directly by choosing the category.
Thereby anyone that wants to lookup a famous guy immediatly uses the correct
definition and the new article is can be sorted just like every other element.
Sorting in might be just a job of selecting the right properties out of a list.

  • That's usually what librarians do, but here it could be a much more detailed

structure. One might for instance ask: "Which U.S. american presidents were born
in Michigan?" or "famous people born in Cardiff, Wales U.K.?" just by one line.
Do you have an idea how one can do this do this without sacrificing the way
wikipedia works - IMHO a dropdown list with categories, mentioned in the article
and their associated variables is better then a list of warnings.

Please see [http://meta.wikimedia.org/wiki/Semantic_MediaWiki this MediaWiki
project]
and compare to your proposal.

bp wrote:

I would say it touches some very interesting aspects, however the implementation
of lists by template lookups of the ontology information has not even been
discussed yet.

voelkel wrote:

Semantic MediaWiki 0.4 has built-in support to <ask> for lists.

gero.scholz wrote:

Even without adding new "syntactic sugar" we can get more out of mediawikis if we use
a tool like DPL (DynamicPageList). With DPL you can generate lists of articles which
match certain criteria (i.e. belong to a category, use a certain template, contain a link
to a certain page, match a name pattern, resid ein a ceratin namespace) and you can
extract part of the contents, like chapters with a special heading,
marked sections or template arguments (replacing the original template invocation with a
different template that you define).

I know it is not an answer to all questions and it doesn´t compete with "real semantic wikis"
but it can be quite useful ...

see http://semeb.com/dpldemo

ayg wrote:

*** Bug 10295 has been marked as a duplicate of this bug. ***

p.selitskas wrote:

Can we transit this to Wikidata?

(In reply to Pavel Selitskas [wizardist] from comment #16)

Can we transit this to Wikidata?

Yes, while I think that some details pertain to Semantic MediaWiki.

Now that we have Wikidata, this is basically fixed. :)