32.1. parser
— Accès aux arbres syntaxiques¶
Le module parser
expose une interface à l’analyseur et au compilateur de byte-code internes de Python. Son objectif principal est de permettre à du code Python de modifier l’arbre syntaxique d’une expression Python puis de la rendre exécutable. Cette approche est plus fiable que celle consistant à manipuler des chaines de caractères, puisque l’analyse est faite avec le même analyseur que celui utilisé pour le code de l’application. C’est aussi plus rapide.
Note
À partir de Python 2.5, il est plus pratique de faire ces manipulations entre la génération de l’AST (Abstract Syntax Tree) et la compilation, en utilisant le module ast
.
Certaines particularités de ce module sont importantes à retenir pour en faire un bon usage. Ce n’est pas un tutoriel sur la modification d’arbres syntaxiques Python, mais certains exemples d’utilisation du module parser
sont présentés.
Le prérequis le plus important est une bonne compréhension de la grammaire Python utilisée par l’analyseur interne dont la syntaxe est documentée exhaustivement dans La référence du langage Python. L’analyseur lui-même est généré à partir d’une grammaire spécifiée dans le fichier Grammar/Grammar
dans la distribution standard de Python. Les arbres syntaxiques stockés dans les objets ST créés par les fonctions expr()
ou suite()
de ce module sont directement le résultat de l’analyseur interne, alors que les objets ST créés par sequence2st()
simulent ces structures. N’oubliez pas qu’une séquence considérée « correcte » dans une version de Python peut ne pas l’être dans une autre, la grammaire de Python pouvant évoluer. Cependant, déplacer du code source d’une version de Python à une autre sous forme textuelle donnera toujours des arbres syntaxique corrects, à l’exception qu’une version plus ancienne de l’interpréteur ne pourra pas analyser les constructions récentes du langage. Les arbres syntaxiques quant à eux ne sont généralement pas compatibles d’une version à l’autre, alors que le code source a toujours conservé la compatibilité ascendante.
Each element of the sequences returned by st2list()
or st2tuple()
has a simple form. Sequences representing non-terminal elements in the grammar
always have a length greater than one. The first element is an integer which
identifies a production in the grammar. These integers are given symbolic names
in the C header file Include/graminit.h
and the Python module
symbol
. Each additional element of the sequence represents a component
of the production as recognized in the input string: these are always sequences
which have the same form as the parent. An important aspect of this structure
which should be noted is that keywords used to identify the parent node type,
such as the keyword if
in an if_stmt
, are included in the
node tree without any special treatment. For example, the if
keyword
is represented by the tuple (1, 'if')
, where 1
is the numeric value
associated with all NAME
tokens, including variable and function names
defined by the user. In an alternate form returned when line number information
is requested, the same token might be represented as (1, 'if', 12)
, where
the 12
represents the line number at which the terminal symbol was found.
Terminal elements are represented in much the same way, but without any child
elements and the addition of the source text which was identified. The example
of the if
keyword above is representative. The various types of
terminal symbols are defined in the C header file Include/token.h
and
the Python module token
.
The ST objects are not required to support the functionality of this module, but are provided for three purposes: to allow an application to amortize the cost of processing complex parse trees, to provide a parse tree representation which conserves memory space when compared to the Python list or tuple representation, and to ease the creation of additional modules in C which manipulate parse trees. A simple « wrapper » class may be created in Python to hide the use of ST objects.
The parser
module defines functions for a few distinct purposes. The
most important purposes are to create ST objects and to convert ST objects to
other representations such as parse trees and compiled code objects, but there
are also functions which serve to query the type of parse tree represented by an
ST object.
Voir aussi
32.1.1. Creating ST Objects¶
ST objects may be created from source code or from a parse tree. When creating
an ST object from source, different functions are used to create the 'eval'
and 'exec'
forms.
-
parser.
expr
(source)¶ The
expr()
function parses the parameter source as if it were an input tocompile(source, 'file.py', 'eval')
. If the parse succeeds, an ST object is created to hold the internal parse tree representation, otherwise an appropriate exception is raised.
-
parser.
suite
(source)¶ The
suite()
function parses the parameter source as if it were an input tocompile(source, 'file.py', 'exec')
. If the parse succeeds, an ST object is created to hold the internal parse tree representation, otherwise an appropriate exception is raised.
-
parser.
sequence2st
(sequence)¶ This function accepts a parse tree represented as a sequence and builds an internal representation if possible. If it can validate that the tree conforms to the Python grammar and all nodes are valid node types in the host version of Python, an ST object is created from the internal representation and returned to the called. If there is a problem creating the internal representation, or if the tree cannot be validated, a
ParserError
exception is raised. An ST object created this way should not be assumed to compile correctly; normal exceptions raised by compilation may still be initiated when the ST object is passed tocompilest()
. This may indicate problems not related to syntax (such as aMemoryError
exception), but may also be due to constructs such as the result of parsingdel f(0)
, which escapes the Python parser but is checked by the bytecode compiler.Sequences representing terminal tokens may be represented as either two-element lists of the form
(1, 'name')
or as three-element lists of the form(1, 'name', 56)
. If the third element is present, it is assumed to be a valid line number. The line number may be specified for any subset of the terminal symbols in the input tree.
-
parser.
tuple2st
(sequence)¶ This is the same function as
sequence2st()
. This entry point is maintained for backward compatibility.
32.1.2. Converting ST Objects¶
ST objects, regardless of the input used to create them, may be converted to parse trees represented as list- or tuple- trees, or may be compiled into executable code objects. Parse trees may be extracted with or without line numbering information.
-
parser.
st2list
(st, line_info=False, col_info=False)¶ This function accepts an ST object from the caller in st and returns a Python list representing the equivalent parse tree. The resulting list representation can be used for inspection or the creation of a new parse tree in list form. This function does not fail so long as memory is available to build the list representation. If the parse tree will only be used for inspection,
st2tuple()
should be used instead to reduce memory consumption and fragmentation. When the list representation is required, this function is significantly faster than retrieving a tuple representation and converting that to nested lists.If line_info is true, line number information will be included for all terminal tokens as a third element of the list representing the token. Note that the line number provided specifies the line on which the token ends. This information is omitted if the flag is false or omitted.
-
parser.
st2tuple
(st, line_info=False, col_info=False)¶ This function accepts an ST object from the caller in st and returns a Python tuple representing the equivalent parse tree. Other than returning a tuple instead of a list, this function is identical to
st2list()
.If line_info is true, line number information will be included for all terminal tokens as a third element of the list representing the token. This information is omitted if the flag is false or omitted.
-
parser.
compilest
(st, filename='<syntax-tree>')¶ The Python byte compiler can be invoked on an ST object to produce code objects which can be used as part of a call to the built-in
exec()
oreval()
functions. This function provides the interface to the compiler, passing the internal parse tree from st to the parser, using the source file name specified by the filename parameter. The default value supplied for filename indicates that the source was an ST object.Compiling an ST object may result in exceptions related to compilation; an example would be a
SyntaxError
caused by the parse tree fordel f(0)
: this statement is considered legal within the formal grammar for Python but is not a legal language construct. TheSyntaxError
raised for this condition is actually generated by the Python byte-compiler normally, which is why it can be raised at this point by theparser
module. Most causes of compilation failure can be diagnosed programmatically by inspection of the parse tree.
32.1.3. Queries on ST Objects¶
Two functions are provided which allow an application to determine if an ST was
created as an expression or a suite. Neither of these functions can be used to
determine if an ST was created from source code via expr()
or
suite()
or from a parse tree via sequence2st()
.
-
parser.
isexpr
(st)¶ When st represents an
'eval'
form, this function returns true, otherwise it returns false. This is useful, since code objects normally cannot be queried for this information using existing built-in functions. Note that the code objects created bycompilest()
cannot be queried like this either, and are identical to those created by the built-incompile()
function.
32.1.4. Exceptions and Error Handling¶
The parser module defines a single exception, but may also pass other built-in exceptions from other portions of the Python runtime environment. See each function for information about the exceptions it can raise.
-
exception
parser.
ParserError
¶ Exception raised when a failure occurs within the parser module. This is generally produced for validation failures rather than the built-in
SyntaxError
raised during normal parsing. The exception argument is either a string describing the reason of the failure or a tuple containing a sequence causing the failure from a parse tree passed tosequence2st()
and an explanatory string. Calls tosequence2st()
need to be able to handle either type of exception, while calls to other functions in the module will only need to be aware of the simple string values.
Note that the functions compilest()
, expr()
, and suite()
may
raise exceptions which are normally raised by the parsing and compilation
process. These include the built in exceptions MemoryError
,
OverflowError
, SyntaxError
, and SystemError
. In these
cases, these exceptions carry all the meaning normally associated with them.
Refer to the descriptions of each function for detailed information.
32.1.5. ST Objects¶
Ordered and equality comparisons are supported between ST objects. Pickling of
ST objects (using the pickle
module) is also supported.
-
parser.
STType
¶ The type of the objects returned by
expr()
,suite()
andsequence2st()
.
ST objects have the following methods:
-
ST.
compile
(filename='<syntax-tree>')¶ Same as
compilest(st, filename)
.
-
ST.
isexpr
()¶ Same as
isexpr(st)
.
-
ST.
issuite
()¶ Same as
issuite(st)
.
-
ST.
tolist
(line_info=False, col_info=False)¶ Same as
st2list(st, line_info, col_info)
.
-
ST.
totuple
(line_info=False, col_info=False)¶ Same as
st2tuple(st, line_info, col_info)
.
32.1.6. Example: Emulation of compile()
¶
While many useful operations may take place between parsing and bytecode
generation, the simplest operation is to do nothing. For this purpose, using
the parser
module to produce an intermediate data structure is equivalent
to the code
>>> code = compile('a + 5', 'file.py', 'eval')
>>> a = 5
>>> eval(code)
10
The equivalent operation using the parser
module is somewhat longer, and
allows the intermediate internal parse tree to be retained as an ST object:
>>> import parser
>>> st = parser.expr('a + 5')
>>> code = st.compile('file.py')
>>> a = 5
>>> eval(code)
10
An application which needs both ST and code objects can package this code into readily available functions:
import parser
def load_suite(source_string):
st = parser.suite(source_string)
return st, st.compile()
def load_expression(source_string):
st = parser.expr(source_string)
return st, st.compile()