"unicodedata" --- Banco de dados de Unicode
*******************************************

======================================================================

Este módulo fornece acesso ao Banco de Dados de Caracteres Unicode
(UCD), que define as propriedades de todos os caracteres Unicode. Os
dados contidos neste banco de dados são compilados a partir do UCD
versão 16.0.0.

O módulo utiliza os mesmos nomes e símbolos definidos no Anexo 44 do
Padrão Unicode, "Unicode Character Database". Ele define as seguintes
funções:

Ver também:

  O Unicode para mais informações sobre Unicode e como usar este
  módulo.

unicodedata.lookup(name)

   Procurar personagem pelo nome. Se um personagem com o nome
   fornecido for encontrado, retornar o personagem correspondente. Se
   não for encontrado, "KeyError" é levantada. Por exemplo:

      >>> unicodedata.lookup('LEFT CURLY BRACKET')
      '{'

   Os caracteres retornados por esta função são os mesmos produzidos
   pela sequência de escape "\N" em literais de string. Por exemplo:

      >>> unicodedata.lookup('MIDDLE DOT') == '\N{MIDDLE DOT}'
      True

   Alterado na versão 3.3: Foi adicionado suporte para apelidos de
   nomes [1] e sequências nomeadas [2].

unicodedata.name(chr, default=None, /)

   Retorna o nome atribuído ao caractere *chr* como uma string. Se
   nenhum nome for definido, *default* será retornado ou, se não for
   fornecido, "ValueError" será levantada. Por exemplo:

      >>> unicodedata.name('½')
      'VULGAR FRACTION ONE HALF'
      >>> unicodedata.name('\uFFFF', 'fallback')
      'fallback'

unicodedata.decimal(chr, default=None, /)

   Retorna o valor decimal atribuído ao caractere *chr* como inteiro.
   Se nenhum valor for definido, *default* será retornado ou, se não
   for fornecido, "ValueError" será levantada. Por exemplo:

      >>> unicodedata.decimal('\N{ARABIC-INDIC DIGIT NINE}')
      9
      >>> unicodedata.decimal('\N{SUPERSCRIPT NINE}', -1)
      -1

unicodedata.digit(chr, default=None, /)

   Retorna o valor de dígito atribuído ao caractere *chr* como
   inteiro. Se nenhum valor for definido, *default* será retornado ou,
   se não for fornecido, "ValueError" será levantada.

      >>> unicodedata.digit('\N{SUPERSCRIPT NINE}')
      9

unicodedata.numeric(chr, default=None, /)

   Retorna o valor numérico atribuído ao caractere *chr* como float.
   Se nenhum valor for definido, *default* será retornado ou, se não
   for fornecido, "ValueError" será levantada.

      >>> unicodedata.numeric('½')
      0.5

unicodedata.category(chr)

   Returns the general category assigned to the character *chr* as
   string. General category names consist of two letters. See the
   General Category Values section of the Unicode Character Database
   documentation for a list of category codes. For example:

      >>> unicodedata.category('A')  # 'L'etter, 'u'ppercase
      'Lu'

unicodedata.bidirectional(chr)

   Returns the bidirectional class assigned to the character *chr* as
   string. If no such value is defined, an empty string is returned.
   See the Bidirectional Class Values section of the Unicode Character
   Database documentation for a list of bidirectional codes. For
   example:

      >>> unicodedata.bidirectional('\N{ARABIC-INDIC DIGIT SEVEN}') # 'A'rabic, 'N'umber
      'AN'

unicodedata.combining(chr)

   Returns the canonical combining class assigned to the character
   *chr* as integer. Returns "0" if no combining class is defined. See
   the Canonical Combining Class Values section of the Unicode
   Character Database for more information.

unicodedata.east_asian_width(chr)

   Returns the east asian width assigned to the character *chr* as
   string. For a list of widths and or more information, see the
   Unicode Standard Annex #11.

unicodedata.mirrored(chr)

   Returns the mirrored property assigned to the character *chr* as
   integer. Returns "1" if the character has been identified as a
   "mirrored" character in bidirectional text, "0" otherwise. For
   example:

      >>> unicodedata.mirrored('>')
      1

unicodedata.decomposition(chr)

   Returns the character decomposition mapping assigned to the
   character *chr* as string. An empty string is returned in case no
   such mapping is defined. For example:

      >>> unicodedata.decomposition('Ã')
      '0041 0303'

unicodedata.normalize(form, unistr)

   Return the normal form *form* for the Unicode string *unistr*.
   Valid values for *form* are 'NFC', 'NFKC', 'NFD', and 'NFKD'.

   The Unicode standard defines various normalization forms of a
   Unicode string, based on the definition of canonical equivalence
   and compatibility equivalence. In Unicode, several characters can
   be expressed in various way. For example, the character U+00C7
   (LATIN CAPITAL LETTER C WITH CEDILLA) can also be expressed as the
   sequence U+0043 (LATIN CAPITAL LETTER C) U+0327 (COMBINING
   CEDILLA).

   For each character, there are two normal forms: normal form C and
   normal form D. Normal form D (NFD) is also known as canonical
   decomposition, and translates each character into its decomposed
   form. Normal form C (NFC) first applies a canonical decomposition,
   then composes pre-combined characters again.

   In addition to these two forms, there are two additional normal
   forms based on compatibility equivalence. In Unicode, certain
   characters are supported which normally would be unified with other
   characters. For example, U+2160 (ROMAN NUMERAL ONE) is really the
   same thing as U+0049 (LATIN CAPITAL LETTER I). However, it is
   supported in Unicode for compatibility with existing character sets
   (for example, gb2312).

   The normal form KD (NFKD) will apply the compatibility
   decomposition, that is, replace all compatibility characters with
   their equivalents. The normal form KC (NFKC) first applies the
   compatibility decomposition, followed by the canonical composition.

   Even if two unicode strings are normalized and look the same to a
   human reader, if one has combining characters and the other
   doesn't, they may not compare equal.

unicodedata.is_normalized(form, unistr)

   Return whether the Unicode string *unistr* is in the normal form
   *form*. Valid values for *form* are 'NFC', 'NFKC', 'NFD', and
   'NFKD'.

   Adicionado na versão 3.8.

In addition, the module exposes the following constant:

unicodedata.unidata_version

   The version of the Unicode database used in this module.

unicodedata.ucd_3_2_0

   This is an object that has the same methods as the entire module,
   but uses the Unicode database version 3.2 instead, for applications
   that require this specific version of the Unicode database (such as
   IDNA).

-[ Notas de rodapé ]-

[1] https://www.unicode.org/Public/16.0.0/ucd/NameAliases.txt

[2] https://www.unicode.org/Public/16.0.0/ucd/NamedSequences.txt
