Page MenuHomePhabricator

Unicode normalization leaves red links
Closed, ResolvedPublic

Description

Author: muke

Description:
The normalization equates Ὀξύς (where ύ is U+03CD,
upsilon with tonos) and Ὀξύς (where ύ is U+1F7B,
upsilon with oxia); visiting http://la.wiktionary.
org/wiki/Ὀξύς (with oxia) redirects without notice
to http://la.wiktionary.org/wiki/Ὀξύς (with tonos).
That's rational behavior.

However:

  • A wikilink to [[Ὀξύς]] (with oxia) is not

recognized by the software as pointing to an extant
page. The link does "work" in the sense that it
links to Ὀξύς (with tonos), but it will make it
into an edit link, not a regular link. (The
example is the first link in the table under
"declinatio").

The {{sofixit}} solution is to edit the link so
that it points to the tonos form; this works,
though counterintuitive to people with polytonic
keyboards, who may expect a page they thought they
created with oxia [which mediawiki silently
converted to tonos] to be linkable to with oxia.


Version: unspecified
Severity: minor
URL: http://la.wiktionary.org/wiki/Ὀξύς

Details

Reference
bz1375
TitleReferenceAuthorSource BranchDest Branch
Provide Phorge GitLab integration model using a Rails Enginerepos/releng/gitlab-phorge!1dduvallreview/provide-gemmain
SecurityPolicyEnforcerAction: remove useless phlog()repos/phabricator/extensions!17aklapperT337500bringMeNoisewmf/stable
* The dictionary of users created from various package filesrepos/security/wikimedia-code-health-check!14sbassettT337593-small-staff-support-bugmain
Basic README updatesrepos/security/wikimedia-code-health-check!12sbassettT337593-update-readmemain
Implemented language guidelines health checkrepos/security/wikimedia-code-health-check!11sbassettT337593-lang-guidelines-implmain
Implemented phab/bug-tracker-based health checksrepos/security/wikimedia-code-health-check!10sbassettT337593-phab-bugs-and-code-stew-implmain
Implemented staff support health checkrepos/security/wikimedia-code-health-check!9sbassettT337593-staff-support-metrics-implmain
Implement non-auto cmts, contrib conc and uniq contribs checksrepos/security/wikimedia-code-health-check!8sbassettT337593-non-auto-commits-implmain
Implemented package management checksrepos/security/wikimedia-code-health-check!7sbassettT337593-pkg-mgmt-config-implmain
Implemented test coverage checksrepos/security/wikimedia-code-health-check!6sbassettT337593-code-cov-implmain
Implemented static analysis health checkrepos/security/wikimedia-code-health-check!5sbassettT337593-sast-implmain
Added vulnerable package checkrepos/security/wikimedia-code-health-check!4sbassettT337593-vulnerable-packages-impl-2main
Implemented git cloning functionality and testsrepos/security/wikimedia-code-health-check!3sbassettT337593-git-cloning-workmain
More scaffolding work + basic repo searchrepos/security/wikimedia-code-health-check!2sbassettT337593-more-scaffolding-workmain
Basic python repo setup (will likely change, and that's fine)repos/security/wikimedia-code-health-check!1sbassettT337593-initial-repo-setupmain
build: Rename 'test' job to clearer 'build-coverage-report'repos/abstract-wiki/wikifunctions/function-orchestrator!19jforresterT337504main
ManiphestGetTaskTransactionsConduitAPIMethod: remove bogus phlog()repos/phabricator/phabricator!8brennenwork/rm-transaction-phlogwmf/stable
Show related patches Customize query in GitLab

Related Objects

StatusSubtypeAssignedTask
DeclinedNone
ResolvedNone

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 8:11 PM
bzimport set Reference to bz1375.
bzimport added a subscriber: Unknown Object (MLST).

muke wrote:

Another part of this problem is that links to Ὀξύς with oxia
do not show up in the whatlinkshere for Ὀξύς with tonos.
And because Ὀξύς with oxia and Ὀξύς with tonos are merged in
urls with the normalization, it's impossible to do a
whatlinkshere on Ὀξύς with oxia -- you get silently
redirected to the whatlinkshere on Ὀξύς with tonos. I don't
know if this is a separate bug or not.

For an example compare:
http://la.wiktionary.org/wiki/Usor:Mycēs/oxia
http://la.wiktionary.org/wiki/Usor:Mycēs/tonos

muke wrote:

Okay, text now appears to be normalized on input; this is no longer an issue.

gangleri wrote:

Hallo!

  1. [[wiktionary:la:Usor:Mycēs/oxia]]

ὀξύς with oxia generates
http://la.wiktionary.org/w/index.php?title=%E1%BD%88%CE%BE%E1%BD%BB%CF%82&action=edit

  1. [[wiktionary:la:Usor:Mycēs/tonos]]

ὀξύς with tonos ([[wiktionary:la:Special:whatlinkshere/ὀξύς]])
http://la.wiktionary.org/wiki/%E1%BD%88%CE%BE%CF%8D%CF%82

http://la.wiktionary.org/w/index.php?title=%E1%BD%88%CE%BE%E1%BD%BB%CF%82&action=edit

  1. http://la.wiktionary.org/wiki/%E1%BD%88%CE%BE%CF%8D%CF%82

or
http://la.wiktionary.org/w/index.php?title=%E1%BD%88%CE%BE %E1%BD%BB
%CF%82&action=edit

  1. http://la.wiktionary.org/wiki/%E1%BD%88%CE%BE %CF%8D %CF%82

*Special:Whatlinkshere*
generated at *both* test pages

  1. http://la.wiktionary.org/wiki/Special:Whatlinkshere/%E1%BD%80%CE%BE%CF%8D%CF%82
  2. http://la.wiktionary.org/wiki/Special:Whatlinkshere/%E1%BD%80%CE%BE%CF%8D%CF%82

because

http://la.wiktionary.org/wiki/Special:Whatlinkshere/%E1%BD%88%CE%BE%E1%BD%BB%CF%82
is normalised by MediaWiki to

  1. http://la.wiktionary.org/wiki/Special:Whatlinkshere/%E1%BD%88%CE%BE%CF%8D%CF%82

information about these characters can be found

  1. Character GREEK SMALL LETTER UPSILON WITH OXIA - U+1F7B

http://www.fileformat.info/info/unicode/char/1f7b/index.htm Unicode
Block Greek Extended
HTML Entity (decimal) ύ (hex) ύ UTF-8 (hex) 0xE1 0xBD 0xBB (e1bdbb)

  1. Unicode Character GREEK SMALL LETTER UPSILON WITH TONOS - U+03CD

http://www.fileformat.info/info/unicode/char/03cd/index.htm
Block Greek and Coptic
HTML Entity (decimal) ύ (hex) ύ UTF-8 (hex) 0xCF 0x8D (cf8d)

resolving as a duplicate of bug 1527
further comments are posted there

best regards reinhardt [[user:gangleri]]

gangleri wrote:

*** This bug has been marked as a duplicate of 1527 ***

gangleri wrote:

(In reply to comment #4)

http://la.wiktionary.org/w/index.php?title=%E1%BD%88%CE%BE%E1%BD%BB%CF%82&action=edit

  1. http://la.wiktionary.org/wiki/%E1%BD%88%CE%BE%CF%8D%CF%82

*note*
WIKT:LA does / can *not* use lower case titles. This means that in this testcase
the first small letter is replaced with the great one. That function is involved
here also.

If capitalisation would be switched off other links would be generated at
[[wiktinary:la:]].