32.7. "tokenize" — Analyseur lexical de Python
**********************************************

**Code source :** Lib/tokenize.py

======================================================================

Le module "tokenize" fournit un analyseur lexical pour Python,
implémenté en Python. L’analyseur de ce module renvoie les commentaire
sous forme de *token*, se qui le rend intéressant pour implémenter des
*pretty-printers*, typiquement pour faire de la coloration syntaxique.

To simplify token stream handling, all Opérateurs and Délimiteurs
tokens are returned using the generic "token.OP" token type.  The
exact type can be determined by checking the second field (containing
the actual token string matched) of the tuple returned from
"tokenize.generate_tokens()" for the character sequence that
identifies a specific operator token.

Le point d’entrée principal est un *générateur* :

tokenize.generate_tokens(readline)

   The "generate_tokens()" generator requires one argument,
   *readline*, which must be a callable object which provides the same
   interface as the "readline()" method of built-in file objects (see
   section Objets fichiers).  Each call to the function should return
   one line of input as a string. Alternately, *readline* may be a
   callable object that signals completion by raising "StopIteration".

   The generator produces 5-tuples with these members: the token type;
   the token string; a 2-tuple "(srow, scol)" of ints specifying the
   row and column where the token begins in the source; a 2-tuple
   "(erow, ecol)" of ints specifying the row and column where the
   token ends in the source; and the line on which the token was
   found.  The line passed (the last tuple item) is the *logical*
   line; continuation lines are included.

   Nouveau dans la version 2.2.

An older entry point is retained for backward compatibility:

tokenize.tokenize(readline[, tokeneater])

   The "tokenize()" function accepts two parameters: one representing
   the input stream, and one providing an output mechanism for
   "tokenize()".

   The first parameter, *readline*, must be a callable object which
   provides the same interface as the "readline()" method of built-in
   file objects (see section Objets fichiers).  Each call to the
   function should return one line of input as a string. Alternately,
   *readline* may be a callable object that signals completion by
   raising "StopIteration".

   Modifié dans la version 2.5: Added "StopIteration" support.

   The second parameter, *tokeneater*, must also be a callable object.
   It is called once for each token, with five arguments,
   corresponding to the tuples generated by "generate_tokens()".

All constants from the "token" module are also exported from
"tokenize", as are two additional token type values that might be
passed to the *tokeneater* function by "tokenize()":

tokenize.COMMENT

   Valeur du jeton utilisée pour indiquer un commentaire.

tokenize.NL

   Token value used to indicate a non-terminating newline.  The
   NEWLINE token indicates the end of a logical line of Python code;
   NL tokens are generated when a logical line of code is continued
   over multiple physical lines.

Une autre fonction est fournie pour inverser le processus de
tokenisation. Ceci est utile pour créer des outils permettant de
codifier un script, de modifier le flux de jetons et de réécrire le
script modifié.

tokenize.untokenize(iterable)

   Converts tokens back into Python source code.  The *iterable* must
   return sequences with at least two elements, the token type and the
   token string.  Any additional sequence elements are ignored.

   Le script reconstruit est renvoyé sous la forme d’une chaîne
   unique. Le résultat est garanti pour que le jeton corresponde à
   l’entrée afin que la conversion soit sans perte et que les allers
   et retours soient assurés.  La garantie ne s’applique qu’au type de
   jeton et à la chaîne de jetons car l’espacement entre les jetons
   (positions des colonnes) peut changer.

   Nouveau dans la version 2.5.

exception tokenize.TokenError

   Déclenché lorsque soit une *docstring* soit une expression qui
   pourrait être divisée sur plusieurs lignes n’est pas complété dans
   le fichier, par exemple :

      """Beginning of
      docstring

   ou :

      [1,
       2,
       3

Note that unclosed single-quoted strings do not cause an error to be
raised. They are tokenized as "ERRORTOKEN", followed by the
tokenization of their contents.

Example of a script re-writer that transforms float literals into
Decimal objects:

   def decistmt(s):
       """Substitute Decimals for floats in a string of statements.

       >>> from decimal import Decimal
       >>> s = 'print +21.3e-5*-.1234/81.7'
       >>> decistmt(s)
       "print +Decimal ('21.3e-5')*-Decimal ('.1234')/Decimal ('81.7')"

       >>> exec(s)
       -3.21716034272e-007
       >>> exec(decistmt(s))
       -3.217160342717258261933904529E-7

       """
       result = []
       g = generate_tokens(StringIO(s).readline)   # tokenize the string
       for toknum, tokval, _, _, _  in g:
           if toknum == NUMBER and '.' in tokval:  # replace NUMBER tokens
               result.extend([
                   (NAME, 'Decimal'),
                   (OP, '('),
                   (STRING, repr(tokval)),
                   (OP, ')')
               ])
           else:
               result.append((toknum, tokval))
       return untokenize(result)
