Collation Demo Guide

(Java not enabled in this browser)

Text collation supports language-sensitive comparison of strings, allowing for text searching and alphabetical sorting. The collation classes provide a choice of ordering strength (for example, to ignore or not ignore case differences) and handle ignored, expanding, and contracting characters.

Developers don't need to know anything about the collation rules for various languages. Any features requiring collation can use the collation object associated with the current default locale, or with a specific locale (like France or Japan) if appropriate.

Collation Basics   Localizable Collation   Customization   Details


Collation Basics

Correctly sorting strings is tricky, even in English. The results of a sort must be consistent&emdash;any differences in strings must always be sorted the same way. The sort assigns relative priorities to different features of the text, based on the characters themselves and on the current ordering strength of the collation object.

To See This...

Do This...

Consistent sorting: In English, uppercase letters always sort after lowercase letters whenever there are no other differences in compared strings.

· Click on the Sort Ascending button.
· Click on the Sort Descending button.
· The relative order of "pat", "Pat", and "PAT" reverses.

Differences in ordering strength: Secondary ordering strength means case differences are disregarded (enabling case-insensitive searching). With primary ordering strength accents are also ignored&emdash;only base letters are compared.

· Select Primary from the Strength menu.
· Click alternately on Sort Ascending and Sort Descending.
· The relative order of "pat", "Pat", and "PAT" stays the same

Other special characters, including accented or grouped characters, add other complications. For example, the "-" hyphen character in the word "black-bird" is only significant if the other letters in the compared strings are identical.


Localizable Collation

Different collation objects associated with various locales handle the differences required when sorting text strings for different languages.

To See This...

Do This...

In French, accent differences are sorted from the end of the word, so the ordering of "pêche" and "péché" changes from the English ordering.

· Select Tertiary from the Strength menu.
· Select French (France) from the Locale menu

In German the ordering of "Töne" changes, because German treats o + umlaut (ö) as if it were oe.

· Select German (Germany) from the Locale menu.


Customization

You can produce a new collation by adding to or changing an existing one. You can do this in the demo using the Collation Rules field in the demonstration. This field shows the rules that make up the collation sequence for that language. (At the start of the list, are a number of odd-looking items such as"\u0308". These use Java notation for Unicode characters, used here because most browsers are currently unable to display the full range of Unicode characters.)

In all of the following examples, you can cut and paste sample rules or test cases instead of typing them in manually. Paste them at the end of the respective fields.

To See This...

Do This...

You can modify an existing collation. Adding items at the end of a collation overrides earlier information.

For example, you can make the letter P sort at the end of the alphabet.

· Enter the sample rules at the end of the Collation Rules field.
· Hit the Set Rules button.
· Select Sort Ascending to see the resulting sort order.

Sample Rules:

< p , P

Making P sort at the end may not seem terribly useful, but it is used to make modifications in the sorting sequence for different languages.

To See This...

Do This...

You can add new rules to an existing collation. For example, you can add CH as a single letter after C, as in traditional Spanish sorting.

· Enter the sample rules at the end of the Collation Rules field.
· Enter the test cases at the end of the test field.
· Hit the Set Rules button.
· Select Sort Ascending to see the resulting sort order.

Sample Rules:

& c < ch , cH, Ch, CH

Sample Test Cases:

cat
czar
churo
darn

As well as adding sequences of characters that act as a single character (this is known as contraction), you can also add characters that act like a sequence of characters (this is known as expansion).

To See This...

Do This...

You can also add other sequences to the collation rules, such as sorting symbols with their alphabetic equivalents.

· Enter the sample rules at the end of the Collation Rules field.
· Enter the test cases at the end of the test field.
· Hit the Set Rules button.
· Select Sort Ascending to see the resulting sort order.

Sample Rules:

& Question-mark ; ?
& Hash-mark ; #
& Ampersand ; '&'

Sample Test Cases:

?
#
&

Expansion and contraction can actually be combined.

To See This...

Do This...

In Japanese there is a length character that acts as though it doubles a character in sorting. Using analogous English letters, it would be as though "a-" sorts as "aa", "e-" sorts as "ee", etc.

· Enter the sample rules at the end of the Collation Rules field.
· Enter the test cases at the end of the test field.
· Hit the Set Rules button.
· Select Sort Ascending to see the resulting sort order.

Sample Rules:

& aa ; a-
& ee ; e-
& ii ; i-
& oo ; o-
& uu ; u-

Sample Test Cases:

aardvark
a-rdvark
abbot
coop
co-p
cop


For more information on how collation rules are constructed, see Details. You can also type in additional words to see different collation behaviors. Try it out!



The source.



This demo was developed by Taligent and modified by Javasoft.
© Copyright 1997. All rights reserved. Taligent, Inc., IBM Corp.