Abstract Syntax Trees

New in version 2.5.

The _ast module helps Python applications to process trees of the Python abstract syntax grammar. The abstract syntax itself might change with each Python release; this module helps to find out programmatically what the current grammar looks like.

An abstract syntax tree can be generated by passing _ast.PyCF_ONLY_AST as a flag to the compile() builtin function. The result will be a tree of objects whose classes all inherit from _ast.AST.

A modified abstract syntax tree can be compiled into a Python code object using the built-in compile() function.

The actual classes are derived from the Parser/Python.asdl file, which is reproduced below. There is one class defined for each left-hand side symbol in the abstract grammar (for example, _ast.stmt or _ast.expr). In addition, there is one class defined for each constructor on the right-hand side; these classes inherit from the classes for the left-hand side trees. For example, _ast.BinOp inherits from _ast.expr. For production rules with alternatives (aka “sums”), the left-hand side class is abstract: only instances of specific constructor nodes are ever created.

Each concrete class has an attribute _fields which gives the names of all child nodes.

Each instance of a concrete class has one attribute for each child node, of the type as defined in the grammar. For example, _ast.BinOp instances have an attribute left of type _ast.expr. Instances of _ast.expr and _ast.stmt subclasses also have lineno and col_offset attributes. The lineno is the line number of source text (1 indexed so the first line is line 1) and the col_offset is the utf8 byte offset of the first token that generated the node. The utf8 offset is recorded because the parser uses utf8 internally.

If these attributes are marked as optional in the grammar (using a question mark), the value might be None. If the attributes can have zero-or-more values (marked with an asterisk), the values are represented as Python lists. All possible attributes must be present and have valid values when compiling an AST with compile().

The constructor of a class _ast.T parses their arguments as follows:

  • If there are positional arguments, there must be as many as there are items in T._fields; they will be assigned as attributes of these names.
  • If there are keyword arguments, they will set the attributes of the same names to the given values.

For example, to create and populate a UnaryOp node, you could use

node = _ast.UnaryOp()
node.op = _ast.USub()
node.operand = _ast.Num()
node.operand.n = 5
node.operand.lineno = 0
node.operand.col_offset = 0
node.lineno = 0
node.col_offset = 0

or the more compact

node = _ast.UnaryOp(_ast.USub(), _ast.Num(5, lineno=0, col_offset=0),
                    lineno=0, col_offset=0)

Abstract Grammar

The module defines a string constant __version__ which is the decimal Subversion revision number of the file shown below.

The abstract grammar is currently defined as follows:

-- ASDL's five builtin types are identifier, int, string, object, bool

module Python version "$Revision: 62047 $"
{
	mod = Module(stmt* body)
	    | Interactive(stmt* body)
	    | Expression(expr body)

	    -- not really an actual node but useful in Jython's typesystem.
	    | Suite(stmt* body)

	stmt = FunctionDef(identifier name, arguments args, 
                            stmt* body, expr* decorator_list)
	      | ClassDef(identifier name, expr* bases, stmt* body, expr *decorator_list)
	      | Return(expr? value)

	      | Delete(expr* targets)
	      | Assign(expr* targets, expr value)
	      | AugAssign(expr target, operator op, expr value)

	      -- not sure if bool is allowed, can always use int
 	      | Print(expr? dest, expr* values, bool nl)

	      -- use 'orelse' because else is a keyword in target languages
	      | For(expr target, expr iter, stmt* body, stmt* orelse)
	      | While(expr test, stmt* body, stmt* orelse)
	      | If(expr test, stmt* body, stmt* orelse)
	      | With(expr context_expr, expr? optional_vars, stmt* body)

	      -- 'type' is a bad name
	      | Raise(expr? type, expr? inst, expr? tback)
	      | TryExcept(stmt* body, excepthandler* handlers, stmt* orelse)
	      | TryFinally(stmt* body, stmt* finalbody)
	      | Assert(expr test, expr? msg)

	      | Import(alias* names)
	      | ImportFrom(identifier module, alias* names, int? level)

	      -- Doesn't capture requirement that locals must be
	      -- defined if globals is
	      -- still supports use as a function!
	      | Exec(expr body, expr? globals, expr? locals)

	      | Global(identifier* names)
	      | Expr(expr value)
	      | Pass | Break | Continue

	      -- XXX Jython will be different
	      -- col_offset is the byte offset in the utf8 string the parser uses
	      attributes (int lineno, int col_offset)

	      -- BoolOp() can use left & right?
	expr = BoolOp(boolop op, expr* values)
	     | BinOp(expr left, operator op, expr right)
	     | UnaryOp(unaryop op, expr operand)
	     | Lambda(arguments args, expr body)
	     | IfExp(expr test, expr body, expr orelse)
	     | Dict(expr* keys, expr* values)
	     | ListComp(expr elt, comprehension* generators)
	     | GeneratorExp(expr elt, comprehension* generators)
	     -- the grammar constrains where yield expressions can occur
	     | Yield(expr? value)
	     -- need sequences for compare to distinguish between
	     -- x < 4 < 3 and (x < 4) < 3
	     | Compare(expr left, cmpop* ops, expr* comparators)
	     | Call(expr func, expr* args, keyword* keywords,
			 expr? starargs, expr? kwargs)
	     | Repr(expr value)
	     | Num(object n) -- a number as a PyObject.
	     | Str(string s) -- need to specify raw, unicode, etc?
	     -- other literals? bools?

	     -- the following expression can appear in assignment context
	     | Attribute(expr value, identifier attr, expr_context ctx)
	     | Subscript(expr value, slice slice, expr_context ctx)
	     | Name(identifier id, expr_context ctx)
	     | List(expr* elts, expr_context ctx) 
	     | Tuple(expr* elts, expr_context ctx)

	      -- col_offset is the byte offset in the utf8 string the parser uses
	      attributes (int lineno, int col_offset)

	expr_context = Load | Store | Del | AugLoad | AugStore | Param

	slice = Ellipsis | Slice(expr? lower, expr? upper, expr? step) 
	      | ExtSlice(slice* dims) 
	      | Index(expr value) 

	boolop = And | Or 

	operator = Add | Sub | Mult | Div | Mod | Pow | LShift 
                 | RShift | BitOr | BitXor | BitAnd | FloorDiv

	unaryop = Invert | Not | UAdd | USub

	cmpop = Eq | NotEq | Lt | LtE | Gt | GtE | Is | IsNot | In | NotIn

	comprehension = (expr target, expr iter, expr* ifs)

	-- not sure what to call the first argument for raise and except
	excepthandler = ExceptHandler(expr? type, expr? name, stmt* body)
	                attributes (int lineno, int col_offset)

	arguments = (expr* args, identifier? vararg, 
		     identifier? kwarg, expr* defaults)

        -- keyword arguments supplied to call
        keyword = (identifier arg, expr value)

        -- import name with optional 'as' alias.
        alias = (identifier name, identifier? asname)
}