dataclasses --- Data Classes

原始碼:Lib/dataclasses.py


This module provides a decorator and functions for automatically adding generated special methods such as __init__() and __repr__() to user-defined classes. It was originally described in PEP 557.

在这些生成的方法中使用的成员变量是使用 PEP 526 类型标注来定义的。 例如以下代码:

from dataclasses import dataclass

@dataclass
class InventoryItem:
    """Class for keeping track of an item in inventory."""
    name: str
    unit_price: float
    quantity_on_hand: int = 0

    def total_cost(self) -> float:
        return self.unit_price * self.quantity_on_hand

will add, among other things, a __init__() that looks like:

def __init__(self, name: str, unit_price: float, quantity_on_hand: int = 0):
    self.name = name
    self.unit_price = unit_price
    self.quantity_on_hand = quantity_on_hand

请注意,此方法会自动添加到类中:它不会在上面显示的 InventoryItem 定义中直接指定。

在 3.7 版本新加入.

模块内容

@dataclasses.dataclass(*, init=True, repr=True, eq=True, order=False, unsafe_hash=False, frozen=False, match_args=True, kw_only=False, slots=False, weakref_slot=False)

这个函数是 decorator ,用于将生成的 special method 添加到类中,如下所述。

dataclass() 装饰器会检查类以查找 fieldfield 被定义为具有 类型标注 的类变量。 除了下面描述的两个例外,在 dataclass() 中没有什么东西会去检查在变量标注中所指定的类型。

所有生成的方法中的字段顺序是它们在类定义中出现的顺序。

dataclass() 装饰器将向类中添加各种“dunder”方法,如下所述。 如果所添加的方法已存在于类中,则行为将取决于下面所列出的参数。 装饰器会返回调用它的类本身;不会创建新的类。

如果 dataclass() 仅用作没有参数的简单装饰器,它就像它具有此签名中记录的默认值一样。也就是说,这三种 dataclass() 用法是等价的:

@dataclass
class C:
    ...

@dataclass()
class C:
    ...

@dataclass(init=True, repr=True, eq=True, order=False, unsafe_hash=False, frozen=False,
           match_args=True, kw_only=False, slots=False, weakref_slot=False)
class C:
    ...

dataclass() 的参数有:

  • init: If true (the default), a __init__() method will be generated.

    If the class already defines __init__(), this parameter is ignored.

  • repr: If true (the default), a __repr__() method will be generated. The generated repr string will have the class name and the name and repr of each field, in the order they are defined in the class. Fields that are marked as being excluded from the repr are not included. For example: InventoryItem(name='widget', unit_price=3.0, quantity_on_hand=10).

    If the class already defines __repr__(), this parameter is ignored.

  • eq: If true (the default), an __eq__() method will be generated. This method compares the class as if it were a tuple of its fields, in order. Both instances in the comparison must be of the identical type.

    If the class already defines __eq__(), this parameter is ignored.

  • order: If true (the default is False), __lt__(), __le__(), __gt__(), and __ge__() methods will be generated. These compare the class as if it were a tuple of its fields, in order. Both instances in the comparison must be of the identical type. If order is true and eq is false, a ValueError is raised.

    If the class already defines any of __lt__(), __le__(), __gt__(), or __ge__(), then TypeError is raised.

  • unsafe_hash: If False (the default), a __hash__() method is generated according to how eq and frozen are set.

    __hash__() is used by built-in hash(), and when objects are added to hashed collections such as dictionaries and sets. Having a __hash__() implies that instances of the class are immutable. Mutability is a complicated property that depends on the programmer's intent, the existence and behavior of __eq__(), and the values of the eq and frozen flags in the dataclass() decorator.

    By default, dataclass() will not implicitly add a __hash__() method unless it is safe to do so. Neither will it add or change an existing explicitly defined __hash__() method. Setting the class attribute __hash__ = None has a specific meaning to Python, as described in the __hash__() documentation.

    If __hash__() is not explicitly defined, or if it is set to None, then dataclass() may add an implicit __hash__() method. Although not recommended, you can force dataclass() to create a __hash__() method with unsafe_hash=True. This might be the case if your class is logically immutable but can nonetheless be mutated. This is a specialized use case and should be considered carefully.

    Here are the rules governing implicit creation of a __hash__() method. Note that you cannot both have an explicit __hash__() method in your dataclass and set unsafe_hash=True; this will result in a TypeError.

    If eq and frozen are both true, by default dataclass() will generate a __hash__() method for you. If eq is true and frozen is false, __hash__() will be set to None, marking it unhashable (which it is, since it is mutable). If eq is false, __hash__() will be left untouched meaning the __hash__() method of the superclass will be used (if the superclass is object, this means it will fall back to id-based hashing).

  • frozen: If true (the default is False), assigning to fields will generate an exception. This emulates read-only frozen instances. If __setattr__() or __delattr__() is defined in the class, then TypeError is raised. See the discussion below.

  • match_args: If true (the default is True), the __match_args__ tuple will be created from the list of parameters to the generated __init__() method (even if __init__() is not generated, see above). If false, or if __match_args__ is already defined in the class, then __match_args__ will not be generated.

在 3.10 版本新加入.

  • kw_only: If true (the default value is False), then all fields will be marked as keyword-only. If a field is marked as keyword-only, then the only effect is that the __init__() parameter generated from a keyword-only field must be specified with a keyword when __init__() is called. There is no effect on any other aspect of dataclasses. See the parameter glossary entry for details. Also see the KW_ONLY section.

在 3.10 版本新加入.

  • slots: If true (the default is False), __slots__ attribute will be generated and new class will be returned instead of the original one. If __slots__ is already defined in the class, then TypeError is raised.

在 3.10 版本新加入.

在 3.11 版本變更: If a field name is already included in the __slots__ of a base class, it will not be included in the generated __slots__ to prevent overriding them. Therefore, do not use __slots__ to retrieve the field names of a dataclass. Use fields() instead. To be able to determine inherited slots, base class __slots__ may be any iterable, but not an iterator.

  • weakref_slot: If true (the default is False), add a slot named "__weakref__", which is required to make an instance weakref-able. It is an error to specify weakref_slot=True without also specifying slots=True.

在 3.11 版本新加入.

fields 可以选择使用普通的 Python 语法指定默认值:

@dataclass
class C:
    a: int       # 'a' has no default value
    b: int = 0   # assign a default value for 'b'

In this example, both a and b will be included in the added __init__() method, which will be defined as:

def __init__(self, a: int, b: int = 0):

如果具有默认值的字段之后存在没有默认值的字段,将会引发 TypeError。 无论此情况是发生在单个类中还是作为类继承的结果,都是如此。

dataclasses.field(*, default=MISSING, default_factory=MISSING, init=True, repr=True, hash=None, compare=True, metadata=None, kw_only=MISSING)

对于常见和简单的用例,不需要其他功能。但是,有些数据类功能需要额外的每字段信息。为了满足这种对附加信息的需求,你可以通过调用提供的 field() 函数来替换默认字段值。例如:

@dataclass
class C:
    mylist: list[int] = field(default_factory=list)

c = C()
c.mylist += [1, 2, 3]

如上所示, MISSING 值是一个哨兵对象,用于检测一些参数是否由用户提供。这个哨兵对象的使用是因为``None``是一些具有独特意义的参数的有效值。 任何代码都不应该直接使用 MISSING 值。

field() 参数有:

  • default :如果提供,这将是该字段的默认值。这是必需的,因为 field() 调用本身会替换一般的默认值。

  • default_factory :如果提供,它必须是一个零参数可调用对象,当该字段需要一个默认值时,它将被调用。除了其他目的之外,这可以用于指定具有可变默认值的字段,如下所述。 同时指定 defaultdefault_factory 将产生错误。

  • init: If true (the default), this field is included as a parameter to the generated __init__() method.

  • repr: If true (the default), this field is included in the string returned by the generated __repr__() method.

  • hash: This can be a bool or None. If true, this field is included in the generated __hash__() method. If None (the default), use the value of compare: this would normally be the expected behavior. A field should be considered in the hash if it's used for comparisons. Setting this value to anything other than None is discouraged.

    设置 hash=Falsecompare=True 的一个可能原因是,如果一个计算 hash 的代价很高的字段是检验等价性需要的,但还有其他字段可以计算类型的 hash 。 即使从 hash 中排除某个字段,它仍将用于比较。

  • compare: If true (the default), this field is included in the generated equality and comparison methods (__eq__(), __gt__(), et al.).

  • metadata :这可以是映射或 None 。 None 被视为一个空的字典。这个值包含在 MappingProxyType() 中,使其成为只读,并暴露在 Field 对象上。数据类根本不使用它,它是作为第三方扩展机制提供的。多个第三方可以各自拥有自己的键值,以用作元数据中的命名空间。

  • kw_only: If true, this field will be marked as keyword-only. This is used when the generated __init__() method's parameters are computed.

在 3.10 版本新加入.

如果通过调用 field() 指定字段的默认值,则该字段的类属性将替换为指定的 default 值。如果没有提供 default ,那么将删除类属性。目的是在 dataclass() 装饰器运行之后,类属性将包含字段的默认值,就像指定了默认值一样。例如,之后:

@dataclass
class C:
    x: int
    y: int = field(repr=False)
    z: int = field(repr=False, default=10)
    t: int = 20

类属性 C.z 将是 10 ,类属性 C.t 将是 20,类属性 C.xC.y 将不设置。

class dataclasses.Field

Field 对象描述每个定义的字段。这些对象在内部创建,并由 fields() 模块级方法返回(见下文)。用户永远不应该直接实例化 Field 对象。 其有文档的属性是:

  • name :字段的名字。

  • type :字段的类型。

  • default, default_factory, init, repr, hash, compare, metadatakw_only 具有与 field() 函数中对应参数相同的含义和值。

可能存在其他属性,但它们是私有的,不能被审查或依赖。

dataclasses.fields(class_or_instance)

返回 Field 对象的元组,用于定义此数据类的字段。 接受数据类或数据类的实例。如果没有传递一个数据类或实例将引发 TypeError 。 不返回 ClassVarInitVar 的伪字段。

dataclasses.asdict(obj, *, dict_factory=dict)

将数据类``obj``转换为一个字典(通过使用工厂函数``dict_factory``)。 每个数据类被转换为其字段的字典,作为``name: value``键值对。数据类、字典、列表和元组被递归到。 其他对象用 copy.deepcopy() 来复制。

在嵌套的数据类上使用 asdict() 的例子:

@dataclass
class Point:
     x: int
     y: int

@dataclass
class C:
     mylist: list[Point]

p = Point(10, 20)
assert asdict(p) == {'x': 10, 'y': 20}

c = C([Point(0, 0), Point(10, 4)])
assert asdict(c) == {'mylist': [{'x': 0, 'y': 0}, {'x': 10, 'y': 4}]}

要创建一个浅拷贝,可以使用以下方法:

dict((field.name, getattr(obj, field.name)) for field in fields(obj))

如果 obj 不是一个数据类实例, asdict() 引发 TypeError

dataclasses.astuple(obj, *, tuple_factory=tuple)

将数据类``obj``转换为一个元组(通过使用工厂函数``tuple_factory``)。 每个数据类被转换为其字段值的元组。数据类、字典、列表和元组被递归到。其他对象用 copy.deepcopy() 来复制。

继续前一个例子:

assert astuple(p) == (10, 20)
assert astuple(c) == ([(0, 0), (10, 4)],)

要创建一个浅拷贝,可以使用以下方法:

tuple(getattr(obj, field.name) for field in dataclasses.fields(obj))

如果``obj``不是一个数据类实例, astuple() 引发 TypeError

dataclasses.make_dataclass(cls_name, fields, *, bases=(), namespace=None, init=True, repr=True, eq=True, order=False, unsafe_hash=False, frozen=False, match_args=True, kw_only=False, slots=False, weakref_slot=False)

Creates a new dataclass with name cls_name, fields as defined in fields, base classes as given in bases, and initialized with a namespace as given in namespace. fields is an iterable whose elements are each either name, (name, type), or (name, type, Field). If just name is supplied, typing.Any is used for type. The values of init, repr, eq, order, unsafe_hash, frozen, match_args, kw_only, slots, and weakref_slot have the same meaning as they do in dataclass().

此函数不是严格要求的,因为用于任何创建带有 __annotations__ 的新类的 Python 机制都可以应用 dataclass() 函数将该类转换为数据类。提供此功能是为了方便。例如:

C = make_dataclass('C',
                   [('x', int),
                     'y',
                    ('z', int, field(default=5))],
                   namespace={'add_one': lambda self: self.x + 1})

等价于

@dataclass
class C:
    x: int
    y: 'typing.Any'
    z: int = 5

    def add_one(self):
        return self.x + 1
dataclasses.replace(obj, /, **changes)

创建一个与``obj``类型相同的新对象,将字段替换为来自``changes``的值。如果``obj``不是数据类,则引发 TypeError 。如果``changes``里面的值没有指定字段,引发 TypeError

The newly returned object is created by calling the __init__() method of the dataclass. This ensures that __post_init__, if present, is also called.

Init-only variables without default values, if any exist, must be specified on the call to replace() so that they can be passed to __init__() and __post_init__.

changes 包含任何定义为 init=False 的字段是错误的。在这种情况下会引发 ValueError

Be forewarned about how init=False fields work during a call to replace(). They are not copied from the source object, but rather are initialized in __post_init__, if they're initialized at all. It is expected that init=False fields will be rarely and judiciously used. If they are used, it might be wise to have alternate class constructors, or perhaps a custom replace() (or similarly named) method which handles instance copying.

dataclasses.is_dataclass(obj)

如果其形参为 dataclass 或其实例则返回 True,否则返回 False

如果你需要知道一个类是否是一个数据类的实例(而不是一个数据类本身),那么再添加一个 not isinstance(obj, type) 检查:

def is_dataclass_instance(obj):
    return is_dataclass(obj) and not isinstance(obj, type)
dataclasses.MISSING

一个表示缺失 default 或 default_factory 的监视值。

dataclasses.KW_ONLY

A sentinel value used as a type annotation. Any fields after a pseudo-field with the type of KW_ONLY are marked as keyword-only fields. Note that a pseudo-field of type KW_ONLY is otherwise completely ignored. This includes the name of such a field. By convention, a name of _ is used for a KW_ONLY field. Keyword-only fields signify __init__() parameters that must be specified as keywords when the class is instantiated.

在这个例子中,字段 yz 将被标记为仅限关键字字段:

@dataclass
class Point:
    x: float
    _: KW_ONLY
    y: float
    z: float

p = Point(0, y=1.5, z=2.0)

在单个数据类中,指定一个以上 KW_ONLY 类型的字段将导致错误。

在 3.10 版本新加入.

exception dataclasses.FrozenInstanceError

Raised when an implicitly defined __setattr__() or __delattr__() is called on a dataclass which was defined with frozen=True. It is a subclass of AttributeError.

初始化后处理

The generated __init__() code will call a method named __post_init__(), if __post_init__() is defined on the class. It will normally be called as self.__post_init__(). However, if any InitVar fields are defined, they will also be passed to __post_init__() in the order they were defined in the class. If no __init__() method is generated, then __post_init__() will not automatically be called.

在其他用途中,这允许初始化依赖于一个或多个其他字段的字段值。例如:

@dataclass
class C:
    a: float
    b: float
    c: float = field(init=False)

    def __post_init__(self):
        self.c = self.a + self.b

The __init__() method generated by dataclass() does not call base class __init__() methods. If the base class has an __init__() method that has to be called, it is common to call this method in a __post_init__() method:

@dataclass
class Rectangle:
    height: float
    width: float

@dataclass
class Square(Rectangle):
    side: float

    def __post_init__(self):
        super().__init__(self.side, self.side)

Note, however, that in general the dataclass-generated __init__() methods don't need to be called, since the derived dataclass will take care of initializing all fields of any base class that is a dataclass itself.

See the section below on init-only variables for ways to pass parameters to __post_init__(). Also see the warning about how replace() handles init=False fields.

类变量

One of the few places where dataclass() actually inspects the type of a field is to determine if a field is a class variable as defined in PEP 526. It does this by checking if the type of the field is typing.ClassVar. If a field is a ClassVar, it is excluded from consideration as a field and is ignored by the dataclass mechanisms. Such ClassVar pseudo-fields are not returned by the module-level fields() function.

仅初始化变量

Another place where dataclass() inspects a type annotation is to determine if a field is an init-only variable. It does this by seeing if the type of a field is of type dataclasses.InitVar. If a field is an InitVar, it is considered a pseudo-field called an init-only field. As it is not a true field, it is not returned by the module-level fields() function. Init-only fields are added as parameters to the generated __init__() method, and are passed to the optional __post_init__ method. They are not otherwise used by dataclasses.

例如,假设在创建类时没有为某个字段提供值,初始化时将从数据库中取值:

@dataclass
class C:
    i: int
    j: int | None = None
    database: InitVar[DatabaseType | None] = None

    def __post_init__(self, database):
        if self.j is None and database is not None:
            self.j = database.lookup('j')

c = C(10, database=my_database)

在这种情况下, fields() 将返回 ijField 对象,但不包括 database

冻结的实例

It is not possible to create truly immutable Python objects. However, by passing frozen=True to the dataclass() decorator you can emulate immutability. In that case, dataclasses will add __setattr__() and __delattr__() methods to the class. These methods will raise a FrozenInstanceError when invoked.

There is a tiny performance penalty when using frozen=True: __init__() cannot use simple assignment to initialize fields, and must use __setattr__().

继承

当数组由 dataclass() 装饰器创建时,它会查看反向 MRO 中的所有类的基类(即从 object 开始 ),并且对于它找到的每个数据类, 将该基类中的字段添加到字段的有序映射中。添加完所有基类字段后,它会将自己的字段添加到有序映射中。所有生成的方法都将使用这种组合的,计算的有序字段映射。由于字段是按插入顺序排列的,因此派生类会重载基类。一个例子:

@dataclass
class Base:
    x: Any = 15.0
    y: int = 0

@dataclass
class C(Base):
    z: int = 10
    x: int = 15

最后的字段列表依次是 xyzx 的最终类型是 int ,如类 C 中所指定的那样。

The generated __init__() method for C will look like:

def __init__(self, x: int = 15, y: int = 0, z: int = 10):

Re-ordering of keyword-only parameters in __init__()

After the parameters needed for __init__() are computed, any keyword-only parameters are moved to come after all regular (non-keyword-only) parameters. This is a requirement of how keyword-only parameters are implemented in Python: they must come after non-keyword-only parameters.

在这个例子中,Base.y, Base.w, and D.t 是仅限关键字字段,而 Base.xD.z 是常规字段:

@dataclass
class Base:
    x: Any = 15.0
    _: KW_ONLY
    y: int = 0
    w: int = 1

@dataclass
class D(Base):
    z: int = 10
    t: int = field(kw_only=True, default=0)

The generated __init__() method for D will look like:

def __init__(self, x: Any = 15.0, z: int = 10, *, y: int = 0, w: int = 1, t: int = 0):

请注意形参原来在字段列表中出现的位置已被重新排序:前面是来自常规字段的形参而后面是来自仅限关键字字段的形参。

The relative ordering of keyword-only parameters is maintained in the re-ordered __init__() parameter list.

默认工厂函数

如果一个 field() 指定了一个 default_factory ,当需要该字段的默认值时,将使用零参数调用它。例如,要创建列表的新实例,请使用:

mylist: list = field(default_factory=list)

If a field is excluded from __init__() (using init=False) and the field also specifies default_factory, then the default factory function will always be called from the generated __init__() function. This happens because there is no other way to give the field an initial value.

可变的默认值

Python 在类属性中存储默认成员变量值。思考这个例子,不使用数据类:

class C:
    x = []
    def add(self, element):
        self.x.append(element)

o1 = C()
o2 = C()
o1.add(1)
o2.add(2)
assert o1.x == [1, 2]
assert o1.x is o2.x

请注意,类 C 的两个实例共享相同的类变量 x ,如预期的那样。

使用数据类, 如果 此代码有效:

@dataclass
class D:
    x: list = []      # This code raises ValueError
    def add(self, element):
        self.x += element

它會生成類似的程式碼:

class D:
    x = []
    def __init__(self, x=x):
        self.x = x
    def add(self, element):
        self.x += element

assert D().x is D().x

This has the same issue as the original example using class C. That is, two instances of class D that do not specify a value for x when creating a class instance will share the same copy of x. Because dataclasses just use normal Python class creation they also share this behavior. There is no general way for Data Classes to detect this condition. Instead, the dataclass() decorator will raise a TypeError if it detects an unhashable default parameter. The assumption is that if a value is unhashable, it is mutable. This is a partial solution, but it does protect against many common errors.

使用默认工厂函数是一种创建可变类型新实例的方法,并将其作为字段的默认值:

@dataclass
class D:
    x: list = field(default_factory=list)

assert D().x is not D().x

在 3.11 版本變更: Instead of looking for and disallowing objects of type list, dict, or set, unhashable objects are now not allowed as default values. Unhashability is used to approximate mutability.

Descriptor-typed fields

Fields that are assigned descriptor objects as their default value have the following special behaviors:

  • The value for the field passed to the dataclass's __init__ method is passed to the descriptor's __set__ method rather than overwriting the descriptor object.

  • Similarly, when getting or setting the field, the descriptor's __get__ or __set__ method is called rather than returning or overwriting the descriptor object.

  • To determine whether a field contains a default value, dataclasses will call the descriptor's __get__ method using its class access form (i.e. descriptor.__get__(obj=None, type=cls). If the descriptor returns a value in this case, it will be used as the field's default. On the other hand, if the descriptor raises AttributeError in this situation, no default value will be provided for the field.

class IntConversionDescriptor:
    def __init__(self, *, default):
        self._default = default

    def __set_name__(self, owner, name):
        self._name = "_" + name

    def __get__(self, obj, type):
        if obj is None:
            return self._default

        return getattr(obj, self._name, self._default)

    def __set__(self, obj, value):
        setattr(obj, self._name, int(value))

@dataclass
class InventoryItem:
    quantity_on_hand: IntConversionDescriptor = IntConversionDescriptor(default=100)

i = InventoryItem()
print(i.quantity_on_hand)   # 100
i.quantity_on_hand = 2.5    # calls __set__ with 2.5
print(i.quantity_on_hand)   # 2

Note that if a field is annotated with a descriptor type, but is not assigned a descriptor object as its default value, the field will act like a normal field.