dataclasses --- 数据类

源码: Lib/dataclasses.py


这个模块提供了一个装饰器和一些函数,用于自动为用户自定义的类添加生成的 special method 例如 __init__()__repr__()。 它的初始描述见 PEP 557

在这些生成的方法中使用的成员变量是使用 PEP 526 类型标注来定义的。例如以下代码:

from dataclasses import dataclass

@dataclass
class InventoryItem:
    """Class for keeping track of an item in inventory."""
    name: str
    unit_price: float
    quantity_on_hand: int = 0

    def total_cost(self) -> float:
        return self.unit_price * self.quantity_on_hand

将添加多项内容,包括如下所示的 __init__():

def __init__(self, name: str, unit_price: float, quantity_on_hand: int = 0):
    self.name = name
    self.unit_price = unit_price
    self.quantity_on_hand = quantity_on_hand

请注意,此方法会自动添加到类中:而不是在如上所示的 InventoryItem 定义中被直接指定。

在 3.7 版本加入.

模块内容

@dataclasses.dataclass(*, init=True, repr=True, eq=True, order=False, unsafe_hash=False, frozen=False, match_args=True, kw_only=False, slots=False, weakref_slot=False)

此函数是一个 decorator,它被用于将生成的 特殊方法 添加到类中,如下所述。

@dataclass 装饰器会检查类以找到其中的 fieldfield 被定义为具有 类型标注 的类变量。 除了下面所述的两个例外,在 @dataclass 中没有任何东西会去检查变量标注中指定的类型。

这些字段在所有生成的方法中的顺序,都是它们在类定义中出现的顺序。

The @dataclass decorator will add various "dunder" methods to the class, described below. If any of the added methods already exist in the class, the behavior depends on the parameter, as documented below. The decorator returns the same class that it is called on; no new class is created.

If @dataclass is used just as a simple decorator with no parameters, it acts as if it has the default values documented in this signature. That is, these three uses of @dataclass are equivalent:

@dataclass
class C:
    ...

@dataclass()
class C:
    ...

@dataclass(init=True, repr=True, eq=True, order=False, unsafe_hash=False, frozen=False,
           match_args=True, kw_only=False, slots=False, weakref_slot=False)
class C:
    ...

The parameters to @dataclass are:

  • init: 如为真值(默认),将生成一个 __init__() 方法。

    If the class already defines __init__(), this parameter is ignored.

  • repr: 如果为真值(默认),将生成一个 __repr__() 方法。 生成的 repr 字符串将带有类名及每个字符的名称和 repr,并按它们在类中定义的顺序排列。 不包括被标记为从 repr 排除的字段。 例如: InventoryItem(name='widget', unit_price=3.0, quantity_on_hand=10)

    If the class already defines __repr__(), this parameter is ignored.

  • eq: 如果为真值(默认),将生成 __eq__() 方法。 此方法将把类当作由其字段组成的元组那样按顺序进行比较。 要比较的两个实例必须是相同的类型。

    If the class already defines __eq__(), this parameter is ignored.

  • order: 如果为真值 (默认为 False),将生成 __lt__(), __le__(), __gt__()__ge__() 方法。 这些方法将把类当作由其字段组成的元组那样按顺序进行比较。 要比较的两个实例必须是相同的类型。 如果 order 为真值并且 eq 为假值,则会引发 ValueError

    If the class already defines any of __lt__(), __le__(), __gt__(), or __ge__(), then TypeError is raised.

  • unsafe_hash: 如果为 False (默认值),则会根据 eqfrozen 的设置情况生成 __hash__() 方法。

    __hash__() is used by built-in hash(), and when objects are added to hashed collections such as dictionaries and sets. Having a __hash__() implies that instances of the class are immutable. Mutability is a complicated property that depends on the programmer's intent, the existence and behavior of __eq__(), and the values of the eq and frozen flags in the @dataclass decorator.

    By default, @dataclass will not implicitly add a __hash__() method unless it is safe to do so. Neither will it add or change an existing explicitly defined __hash__() method. Setting the class attribute __hash__ = None has a specific meaning to Python, as described in the __hash__() documentation.

    If __hash__() is not explicitly defined, or if it is set to None, then @dataclass may add an implicit __hash__() method. Although not recommended, you can force @dataclass to create a __hash__() method with unsafe_hash=True. This might be the case if your class is logically immutable but can still be mutated. This is a specialized use case and should be considered carefully.

    Here are the rules governing implicit creation of a __hash__() method. Note that you cannot both have an explicit __hash__() method in your dataclass and set unsafe_hash=True; this will result in a TypeError.

    If eq and frozen are both true, by default @dataclass will generate a __hash__() method for you. If eq is true and frozen is false, __hash__() will be set to None, marking it unhashable (which it is, since it is mutable). If eq is false, __hash__() will be left untouched meaning the __hash__() method of the superclass will be used (if the superclass is object, this means it will fall back to id-based hashing).

  • frozen: 如果为真值 (默认为 False),则对字段赋值将引发异常。 这模拟了只读的冻结实例。 如果在类中定义了 __setattr__()__delattr__(),则将引发 TypeError。 参见下文的讨论。

  • match_args: If true (the default is True), the __match_args__ tuple will be created from the list of parameters to the generated __init__() method (even if __init__() is not generated, see above). If false, or if __match_args__ is already defined in the class, then __match_args__ will not be generated.

在 3.10 版本加入.

  • kw_only: If true (the default value is False), then all fields will be marked as keyword-only. If a field is marked as keyword-only, then the only effect is that the __init__() parameter generated from a keyword-only field must be specified with a keyword when __init__() is called. There is no effect on any other aspect of dataclasses. See the parameter glossary entry for details. Also see the KW_ONLY section.

在 3.10 版本加入.

  • slots: If true (the default is False), __slots__ attribute will be generated and new class will be returned instead of the original one. If __slots__ is already defined in the class, then TypeError is raised.

在 3.10 版本加入.

在 3.11 版本发生变更: 如果某个字段名称已经包括在基类的 __slots__ 中,它将不会被包括在所生成的 __slots__ 中以防止 覆盖它们。 因此,请不要使用 __slots__ 来获取数据类的字段名称。 而应改用 fields()。 为了能够确定所继承的槽位,基类 __slots__ 可以为任意可迭代对象,但是 不可以为 迭代器。

  • weakref_slot:如果为真值(默认为 False),则添加一个名为 “__weakref__” 的槽位,这是使得一个实例可以被弱引用所必需的。指定 weakref_slot=True 而不同时指定 slots=True 将会导致错误。

在 3.11 版本加入.

可以用普通的 Python 语法为各个 field 指定默认值:

@dataclass
class C:
    a: int       # 'a' has no default value
    b: int = 0   # assign a default value for 'b'

在这个例子中,ab 都将被包括在所添加的 __init__() 方法中,该方法将被定义为:

def __init__(self, a: int, b: int = 0):

如果在具有默认值的字段之后存在没有默认值的字段,将会引发 TypeError。无论此情况是发生在单个类中还是作为类继承的结果,都是如此。

dataclasses.field(*, default=MISSING, default_factory=MISSING, init=True, repr=True, hash=None, compare=True, metadata=None, kw_only=MISSING)

For common and simple use cases, no other functionality is required. There are, however, some dataclass features that require additional per-field information. To satisfy this need for additional information, you can replace the default field value with a call to the provided field() function. For example:

@dataclass
class C:
    mylist: list[int] = field(default_factory=list)

c = C()
c.mylist += [1, 2, 3]

如上所示,MISSING 值是一个哨兵对象,用于检测一些形参是否由用户提供。使用它是因为 None 对于一些形参来说是有效的用户值。任何代码都不应该直接使用 MISSING 值。

The parameters to field() are:

  • default: If provided, this will be the default value for this field. This is needed because the field() call itself replaces the normal position of the default value.

  • default_factory:如果提供,它必须是一个需要零个参数的可调用对象,当该字段需要一个默认值时,它将被调用。这能解决当默认值是可变对象时会带来的问题,如下所述。同时指定 defaultdefault_factory 将产生错误。

  • init: 如果为真值(默认),则该字段将作为一个形参被包括在所生成的 __init__() 方法中。

  • repr: 如果为真值(默认),则该字段将被包括在所生成的 __repr__() 方法返回的字符串中。

  • hash: 这可以是一个布尔值或为 None。 如果为真值,则此字段将被包括在所生成的 __hash__() 方法中。 如果为 None (默认),则将使用 compare 的值:这通常是预期的行为。 一个字段如果被用于比较那么就应当在哈希时考虑到它。 不建议将该值设为 None 以外的任何其他对象。

    设置 hash=Falsecompare=True 的一个合理情况是,一个计算哈希值的代价很高的字段是检验等价性需要的,且还有其他字段可以用于计算类型的哈希值。可以从哈希值中排除该字段,但仍令它用于比较。

  • compare: 如果为真值(默认),则该字段将被包括在所生成的相等性和大小比较方法中 (__eq__(), __gt__() 等等)。

  • metadata:可以是映射或 None。None 被视为一个空的字典。这个值将被包装在 MappingProxyType() 中,使其只读,并暴露在 Field 对象上。数据类不使用它——它是作为第三方扩展机制提供的。多个第三方可以各自拥有自己的键,以用作元数据中的命名空间。

  • kw_only: 如果为真值,则该字段将被标记为仅限关键字字段。 这将在计算所生成的 __init__() 方法的形参时被使用。

在 3.10 版本加入.

If the default value of a field is specified by a call to field(), then the class attribute for this field will be replaced by the specified default value. If no default is provided, then the class attribute will be deleted. The intent is that after the @dataclass decorator runs, the class attributes will all contain the default values for the fields, just as if the default value itself were specified. For example, after:

@dataclass
class C:
    x: int
    y: int = field(repr=False)
    z: int = field(repr=False, default=10)
    t: int = 20

类属性 C.z 将是 10,类属性 C.t 将是 20,类属性 C.xC.y 将不设置。

class dataclasses.Field

Field objects describe each defined field. These objects are created internally, and are returned by the fields() module-level method (see below). Users should never instantiate a Field object directly. Its documented attributes are:

  • name:字段的名称。

  • type:字段的类型。

  • default, default_factory, init, repr, hash, compare, metadatakw_only 具有与 field() 函数中对应参数相同的含义和值。

可能存在其他属性,但它们是私有的。用户不应检查或依赖于这些属性。

dataclasses.fields(class_or_instance)

返回一个能描述此数据类所包含的字段的元组,元组的每一项都是 Field 对象。接受数据类或数据类的实例。如果没有传递一个数据类或实例将引发 TypeError。不返回 ClassVarInitVar 等伪字段。

dataclasses.asdict(obj, *, dict_factory=dict)

将数据类 obj 转换为一个字典(使用工厂函数 dict_factory)。每个数据类被转换为以 name: value 键值对来储存其字段的字典。数据类、字典、列表和元组的内容会被递归地访问。其它对象用 copy.deepcopy() 来复制。

Example of using asdict() on nested dataclasses:

@dataclass
class Point:
     x: int
     y: int

@dataclass
class C:
     mylist: list[Point]

p = Point(10, 20)
assert asdict(p) == {'x': 10, 'y': 20}

c = C([Point(0, 0), Point(10, 4)])
assert asdict(c) == {'mylist': [{'x': 0, 'y': 0}, {'x': 10, 'y': 4}]}

要创建一个浅拷贝,可以使用以下的变通方法:

dict((field.name, getattr(obj, field.name)) for field in fields(obj))

asdict() raises TypeError if obj is not a dataclass instance.

dataclasses.astuple(obj, *, tuple_factory=tuple)

将数据类 obj 转换为一个元组(使用工厂函数 tuple_factory)。每个数据类被转换为其字段的值的元组。数据类、字典、列表和元组的内容会被递归地访问。其它对象用 copy.deepcopy() 来复制。

继续前一个例子:

assert astuple(p) == (10, 20)
assert astuple(c) == ([(0, 0), (10, 4)],)

要创建一个浅拷贝,可以使用以下的变通方法:

tuple(getattr(obj, field.name) for field in dataclasses.fields(obj))

astuple() raises TypeError if obj is not a dataclass instance.

dataclasses.make_dataclass(cls_name, fields, *, bases=(), namespace=None, init=True, repr=True, eq=True, order=False, unsafe_hash=False, frozen=False, match_args=True, kw_only=False, slots=False, weakref_slot=False)

Creates a new dataclass with name cls_name, fields as defined in fields, base classes as given in bases, and initialized with a namespace as given in namespace. fields is an iterable whose elements are each either name, (name, type), or (name, type, Field). If just name is supplied, typing.Any is used for type. The values of init, repr, eq, order, unsafe_hash, frozen, match_args, kw_only, slots, and weakref_slot have the same meaning as they do in @dataclass.

This function is not strictly required, because any Python mechanism for creating a new class with __annotations__ can then apply the @dataclass function to convert that class to a dataclass. This function is provided as a convenience. For example:

C = make_dataclass('C',
                   [('x', int),
                     'y',
                    ('z', int, field(default=5))],
                   namespace={'add_one': lambda self: self.x + 1})

等价于:

@dataclass
class C:
    x: int
    y: 'typing.Any'
    z: int = 5

    def add_one(self):
        return self.x + 1
dataclasses.replace(obj, /, **changes)

创建一个与 obj 类型相同的新对象,将字段替换为 changes 里的值。如果 obj 不是数据类,则抛出 TypeError 。如果 changes 里的值没有指定要替换的字段名,则抛出 TypeError

The newly returned object is created by calling the __init__() method of the dataclass. This ensures that __post_init__, if present, is also called.

Init-only variables without default values, if any exist, must be specified on the call to replace() so that they can be passed to __init__() and __post_init__().

changes 包含任何定义为 init=False 的字段是错误的。在这种情况下会引发 ValueError

Be forewarned about how init=False fields work during a call to replace(). They are not copied from the source object, but rather are initialized in __post_init__(), if they're initialized at all. It is expected that init=False fields will be rarely and judiciously used. If they are used, it might be wise to have alternate class constructors, or perhaps a custom replace() (or similarly named) method which handles instance copying.

dataclasses.is_dataclass(obj)

如果其形参为数据类,或其实例,返回 True,否则返回 False

如果你需要知道一个类是否是一个数据类的实例(而不是一个数据类本身),那么再添加一个 not isinstance(obj, type) 检查:

def is_dataclass_instance(obj):
    return is_dataclass(obj) and not isinstance(obj, type)
dataclasses.MISSING

一个指明“没有提供 default 或 default_factory”的监视值。

dataclasses.KW_ONLY

A sentinel value used as a type annotation. Any fields after a pseudo-field with the type of KW_ONLY are marked as keyword-only fields. Note that a pseudo-field of type KW_ONLY is otherwise completely ignored. This includes the name of such a field. By convention, a name of _ is used for a KW_ONLY field. Keyword-only fields signify __init__() parameters that must be specified as keywords when the class is instantiated.

在这个例子中,字段 yz 将被标记为仅限关键字字段:

@dataclass
class Point:
    x: float
    _: KW_ONLY
    y: float
    z: float

p = Point(0, y=1.5, z=2.0)

In a single dataclass, it is an error to specify more than one field whose type is KW_ONLY.

在 3.10 版本加入.

exception dataclasses.FrozenInstanceError

在定义时设置了 frozen=True 的类上调用隐式定义的 __setattr__()__delattr__() 时引发。 这是 AttributeError 的一个子类。

初始化后处理

The generated __init__() code will call a method named __post_init__(), if __post_init__() is defined on the class. It will normally be called as self.__post_init__(). However, if any InitVar fields are defined, they will also be passed to __post_init__() in the order they were defined in the class. If no __init__() method is generated, then __post_init__() will not automatically be called.

When defined on the class, it will be called by the generated __init__(), normally as self.__post_init__(). However, if any InitVar fields are defined, they will also be passed to __post_init__() in the order they were defined in the class. If no __init__() method is generated, then __post_init__() will not automatically be called.

@dataclass class C:

a: float b: float c: float = field(init=False)

def __post_init__(self):

self.c = self.a + self.b

The __init__() method generated by @dataclass does not call base class __init__() methods. If the base class has an __init__() method that has to be called, it is common to call this method in a __post_init__() method:

class Rectangle:
    def __init__(self, height, width):
      self.height = height
      self.width = width

@dataclass
class Square(Rectangle):
    side: float

    def __post_init__(self):
        super().__init__(self.side, self.side)

Note, however, that in general the dataclass-generated __init__() methods don't need to be called, since the derived dataclass will take care of initializing all fields of any base class that is a dataclass itself.

请参阅下面有关仅初始化变量的小节来了解如何将形参传递给 __post_init__()。 另请参阅关于 replace() 如何处理 init=False 字段的警告。

类变量

One of the few places where @dataclass actually inspects the type of a field is to determine if a field is a class variable as defined in PEP 526. It does this by checking if the type of the field is typing.ClassVar. If a field is a ClassVar, it is excluded from consideration as a field and is ignored by the dataclass mechanisms. Such ClassVar pseudo-fields are not returned by the module-level fields() function.

仅初始化变量

Another place where @dataclass inspects a type annotation is to determine if a field is an init-only variable. It does this by seeing if the type of a field is of type dataclasses.InitVar. If a field is an InitVar, it is considered a pseudo-field called an init-only field. As it is not a true field, it is not returned by the module-level fields() function. Init-only fields are added as parameters to the generated __init__() method, and are passed to the optional __post_init__ method. They are not otherwise used by dataclasses.

例如,假设在创建类时没有为某个字段提供值,初始化时将从数据库中取值:

@dataclass
class C:
    i: int
    j: int | None = None
    database: InitVar[DatabaseType | None] = None

    def __post_init__(self, database):
        if self.j is None and database is not None:
            self.j = database.lookup('j')

c = C(10, database=my_database)

在这种情况下, fields() 将返回 ijField 对象,但不包括 database

冻结的实例

It is not possible to create truly immutable Python objects. However, by passing frozen=True to the @dataclass decorator you can emulate immutability. In that case, dataclasses will add __setattr__() and __delattr__() methods to the class. These methods will raise a FrozenInstanceError when invoked.

There is a tiny performance penalty when using frozen=True: __init__() cannot use simple assignment to initialize fields, and must use __setattr__().

继承

When the dataclass is being created by the @dataclass decorator, it looks through all of the class's base classes in reverse MRO (that is, starting at object) and, for each dataclass that it finds, adds the fields from that base class to an ordered mapping of fields. After all of the base class fields are added, it adds its own fields to the ordered mapping. All of the generated methods will use this combined, calculated ordered mapping of fields. Because the fields are in insertion order, derived classes override base classes. An example:

@dataclass
class Base:
    x: Any = 15.0
    y: int = 0

@dataclass
class C(Base):
    z: int = 10
    x: int = 15

最后的字段列表依次是 xyzx 的最终类型是 int ,如类 C 中所指定的那样。

C 生成的 __init__() 方法看起来像是这样:

def __init__(self, x: int = 15, y: int = 0, z: int = 10):

Re-ordering of keyword-only parameters in __init__()

在计算出 __init__() 所需要的形参之后,任何仅限关键字形参会被移至所有常规(非仅限关键字)形参的后面。 这是 Python 中实现仅限关键字形参所要求的:它们必须位于非仅限关键字形参之后。

在这个例子中,Base.y, Base.w, and D.t 是仅限关键字字段,而 Base.xD.z 是常规字段:

@dataclass
class Base:
    x: Any = 15.0
    _: KW_ONLY
    y: int = 0
    w: int = 1

@dataclass
class D(Base):
    z: int = 10
    t: int = field(kw_only=True, default=0)

The generated __init__() method for D will look like:

def __init__(self, x: Any = 15.0, z: int = 10, *, y: int = 0, w: int = 1, t: int = 0):

请注意形参原来在字段列表中出现的位置已被重新排序:前面是来自常规字段的形参而后面是来自仅限关键字字段的形参。

The relative ordering of keyword-only parameters is maintained in the re-ordered __init__() parameter list.

默认工厂函数

如果一个 field() 指定了一个 default_factory ,当需要该字段的默认值时,将使用零参数调用它。例如,要创建列表的新实例,请使用:

mylist: list = field(default_factory=list)

If a field is excluded from __init__() (using init=False) and the field also specifies default_factory, then the default factory function will always be called from the generated __init__() function. This happens because there is no other way to give the field an initial value.

可变的默认值

Python 在类属性中存储默认成员变量值。思考这个例子,不使用数据类:

class C:
    x = []
    def add(self, element):
        self.x.append(element)

o1 = C()
o2 = C()
o1.add(1)
o2.add(2)
assert o1.x == [1, 2]
assert o1.x is o2.x

请注意,类 C 的两个实例共享相同的类变量 x ,如预期的那样。

使用数据类,如果 此代码有效:

@dataclass
class D:
    x: list = []      # This code raises ValueError
    def add(self, element):
        self.x.append(element)

它生成的代码类似于:

class D:
    x = []
    def __init__(self, x=x):
        self.x = x
    def add(self, element):
        self.x.append(element)

assert D().x is D().x

This has the same issue as the original example using class C. That is, two instances of class D that do not specify a value for x when creating a class instance will share the same copy of x. Because dataclasses just use normal Python class creation they also share this behavior. There is no general way for Data Classes to detect this condition. Instead, the @dataclass decorator will raise a ValueError if it detects an unhashable default parameter. The assumption is that if a value is unhashable, it is mutable. This is a partial solution, but it does protect against many common errors.

使用默认工厂函数是一种创建可变类型新实例的方法,并将其作为字段的默认值:

@dataclass
class D:
    x: list = field(default_factory=list)

assert D().x is not D().x

在 3.11 版本发生变更: 现在不再是寻找并阻止使用类型为 list, dictset 的对象,而是不允许使用不可哈希的对象作为默认值。 就是将不可哈希性当作是不可变性的等价物。

描述器类型的字段

当字段被 描述器对象 赋值为默认值时会遵循以下行为:

  • The value for the field passed to the dataclass's __init__() method is passed to the descriptor's __set__() method rather than overwriting the descriptor object.

  • Similarly, when getting or setting the field, the descriptor's __get__() or __set__() method is called rather than returning or overwriting the descriptor object.

  • To determine whether a field contains a default value, @dataclass will call the descriptor's __get__() method using its class access form: descriptor.__get__(obj=None, type=cls). If the descriptor returns a value in this case, it will be used as the field's default. On the other hand, if the descriptor raises AttributeError in this situation, no default value will be provided for the field.

class IntConversionDescriptor:
    def __init__(self, *, default):
        self._default = default

    def __set_name__(self, owner, name):
        self._name = "_" + name

    def __get__(self, obj, type):
        if obj is None:
            return self._default

        return getattr(obj, self._name, self._default)

    def __set__(self, obj, value):
        setattr(obj, self._name, int(value))

@dataclass
class InventoryItem:
    quantity_on_hand: IntConversionDescriptor = IntConversionDescriptor(default=100)

i = InventoryItem()
print(i.quantity_on_hand)   # 100
i.quantity_on_hand = 2.5    # calls __set__ with 2.5
print(i.quantity_on_hand)   # 2

若一个字段的类型是描述器,但其默认值并不是描述器对象,那么该字段只会像普通的字段一样工作。