dataclasses
--- 数据类¶
这个模块提供了一个装饰器和一些函数,用于自动为用户自定义的类添加生成的 special method 例如 __init__()
和 __repr__()
。 它的初始描述见 PEP 557。
在这些生成的方法中使用的成员变量是使用 PEP 526 类型标注来定义的。例如以下代码:
from dataclasses import dataclass
@dataclass
class InventoryItem:
"""Class for keeping track of an item in inventory."""
name: str
unit_price: float
quantity_on_hand: int = 0
def total_cost(self) -> float:
return self.unit_price * self.quantity_on_hand
将添加多项内容,包括如下所示的 __init__()
:
def __init__(self, name: str, unit_price: float, quantity_on_hand: int = 0):
self.name = name
self.unit_price = unit_price
self.quantity_on_hand = quantity_on_hand
请注意,此方法会自动添加到类中:而不是在如上所示的 InventoryItem
定义中被直接指定。
在 3.7 版新加入.
模块内容¶
- @dataclasses.dataclass(*, init=True, repr=True, eq=True, order=False, unsafe_hash=False, frozen=False, match_args=True, kw_only=False, slots=False, weakref_slot=False)¶
此函数是一个 decorator,它被用于将生成的 特殊方法 添加到类中,如下所述。
@dataclass
装饰器会检查类以找到其中的field
。field
被定义为具有 类型标注 的类变量。 除了下面所述的两个例外,在@dataclass
中没有任何东西会去检查变量标注中指定的类型。这些字段在所有生成的方法中的顺序,都是它们在类定义中出现的顺序。
The
@dataclass
decorator will add various "dunder" methods to the class, described below. If any of the added methods already exist in the class, the behavior depends on the parameter, as documented below. The decorator returns the same class that it is called on; no new class is created.If
@dataclass
is used just as a simple decorator with no parameters, it acts as if it has the default values documented in this signature. That is, these three uses of@dataclass
are equivalent:@dataclass class C: ... @dataclass() class C: ... @dataclass(init=True, repr=True, eq=True, order=False, unsafe_hash=False, frozen=False, match_args=True, kw_only=False, slots=False, weakref_slot=False) class C: ...
The parameters to
@dataclass
are:init
: 如为真值(默认),将生成一个__init__()
方法。If the class already defines
__init__()
, this parameter is ignored.repr
: 如果为真值(默认),将生成一个__repr__()
方法。 生成的 repr 字符串将带有类名及每个字符的名称和 repr,并按它们在类中定义的顺序排列。 不包括被标记为从 repr 排除的字段。 例如:InventoryItem(name='widget', unit_price=3.0, quantity_on_hand=10)
。If the class already defines
__repr__()
, this parameter is ignored.eq
: 如果为真值(默认),将生成__eq__()
方法。 此方法将把类当作由其字段组成的元组那样按顺序进行比较。 要比较的两个实例必须是相同的类型。If the class already defines
__eq__()
, this parameter is ignored.order
: 如果为真值 (默认为False
),将生成__lt__()
,__le__()
,__gt__()
和__ge__()
方法。 这些方法将把类当作由其字段组成的元组那样按顺序进行比较。 要比较的两个实例必须是相同的类型。 如果order
为真值并且eq
为假值,则会引发ValueError
。If the class already defines any of
__lt__()
,__le__()
,__gt__()
, or__ge__()
, thenTypeError
is raised.unsafe_hash
: 如果为False
(默认值),则会根据eq
和frozen
的设置情况生成__hash__()
方法。__hash__()
is used by built-inhash()
, and when objects are added to hashed collections such as dictionaries and sets. Having a__hash__()
implies that instances of the class are immutable. Mutability is a complicated property that depends on the programmer's intent, the existence and behavior of__eq__()
, and the values of theeq
andfrozen
flags in the@dataclass
decorator.By default,
@dataclass
will not implicitly add a__hash__()
method unless it is safe to do so. Neither will it add or change an existing explicitly defined__hash__()
method. Setting the class attribute__hash__ = None
has a specific meaning to Python, as described in the__hash__()
documentation.If
__hash__()
is not explicitly defined, or if it is set toNone
, then@dataclass
may add an implicit__hash__()
method. Although not recommended, you can force@dataclass
to create a__hash__()
method withunsafe_hash=True
. This might be the case if your class is logically immutable but can still be mutated. This is a specialized use case and should be considered carefully.Here are the rules governing implicit creation of a
__hash__()
method. Note that you cannot both have an explicit__hash__()
method in your dataclass and setunsafe_hash=True
; this will result in aTypeError
.If
eq
andfrozen
are both true, by default@dataclass
will generate a__hash__()
method for you. Ifeq
is true andfrozen
is false,__hash__()
will be set toNone
, marking it unhashable (which it is, since it is mutable). Ifeq
is false,__hash__()
will be left untouched meaning the__hash__()
method of the superclass will be used (if the superclass isobject
, this means it will fall back to id-based hashing).frozen
: 如果为真值 (默认为False
),则对字段赋值将引发异常。 这模拟了只读的冻结实例。 如果在类中定义了__setattr__()
或__delattr__()
,则将引发TypeError
。 参见下文的讨论。match_args
: If true (the default isTrue
), the__match_args__
tuple will be created from the list of parameters to the generated__init__()
method (even if__init__()
is not generated, see above). If false, or if__match_args__
is already defined in the class, then__match_args__
will not be generated.
在 3.10 版新加入.
kw_only
: If true (the default value isFalse
), then all fields will be marked as keyword-only. If a field is marked as keyword-only, then the only effect is that the__init__()
parameter generated from a keyword-only field must be specified with a keyword when__init__()
is called. There is no effect on any other aspect of dataclasses. See the parameter glossary entry for details. Also see theKW_ONLY
section.
在 3.10 版新加入.
slots
: If true (the default isFalse
),__slots__
attribute will be generated and new class will be returned instead of the original one. If__slots__
is already defined in the class, thenTypeError
is raised.
weakref_slot
:如果为真值(默认为False
),则添加一个名为 “__weakref__” 的槽位,这是使得一个实例可以被弱引用所必需的。指定weakref_slot=True
而不同时指定slots=True
将会导致错误。
在 3.11 版新加入.
可以用普通的 Python 语法为各个
field
指定默认值:@dataclass class C: a: int # 'a' has no default value b: int = 0 # assign a default value for 'b'
在这个例子中,
a
和b
都将被包括在所添加的__init__()
方法中,该方法将被定义为:def __init__(self, a: int, b: int = 0):
如果在具有默认值的字段之后存在没有默认值的字段,将会引发
TypeError
。无论此情况是发生在单个类中还是作为类继承的结果,都是如此。
- dataclasses.field(*, default=MISSING, default_factory=MISSING, init=True, repr=True, hash=None, compare=True, metadata=None, kw_only=MISSING)¶
For common and simple use cases, no other functionality is required. There are, however, some dataclass features that require additional per-field information. To satisfy this need for additional information, you can replace the default field value with a call to the provided
field()
function. For example:@dataclass class C: mylist: list[int] = field(default_factory=list) c = C() c.mylist += [1, 2, 3]
如上所示,
MISSING
值是一个哨兵对象,用于检测一些形参是否由用户提供。使用它是因为None
对于一些形参来说是有效的用户值。任何代码都不应该直接使用MISSING
值。The parameters to
field()
are:default
: If provided, this will be the default value for this field. This is needed because thefield()
call itself replaces the normal position of the default value.default_factory
:如果提供,它必须是一个需要零个参数的可调用对象,当该字段需要一个默认值时,它将被调用。这能解决当默认值是可变对象时会带来的问题,如下所述。同时指定default
和default_factory
将产生错误。init
: 如果为真值(默认),则该字段将作为一个形参被包括在所生成的__init__()
方法中。repr
: 如果为真值(默认),则该字段将被包括在所生成的__repr__()
方法返回的字符串中。hash
: 这可以是一个布尔值或为None
。 如果为真值,则此字段将被包括在所生成的__hash__()
方法中。 如果为None
(默认),则将使用compare
的值:这通常是预期的行为。 一个字段如果被用于比较那么就应当在哈希时考虑到它。 不建议将该值设为None
以外的任何其他对象。设置
hash=False
但compare=True
的一个合理情况是,一个计算哈希值的代价很高的字段是检验等价性需要的,且还有其他字段可以用于计算类型的哈希值。可以从哈希值中排除该字段,但仍令它用于比较。compare
: 如果为真值(默认),则该字段将被包括在所生成的相等性和大小比较方法中 (__eq__()
,__gt__()
等等)。metadata
:可以是映射或 None。None 被视为一个空的字典。这个值将被包装在MappingProxyType()
中,使其只读,并暴露在Field
对象上。数据类不使用它——它是作为第三方扩展机制提供的。多个第三方可以各自拥有自己的键,以用作元数据中的命名空间。kw_only
: 如果为真值,则该字段将被标记为仅限关键字字段。 这将在计算所生成的__init__()
方法的形参时被使用。
在 3.10 版新加入.
If the default value of a field is specified by a call to
field()
, then the class attribute for this field will be replaced by the specifieddefault
value. If nodefault
is provided, then the class attribute will be deleted. The intent is that after the@dataclass
decorator runs, the class attributes will all contain the default values for the fields, just as if the default value itself were specified. For example, after:@dataclass class C: x: int y: int = field(repr=False) z: int = field(repr=False, default=10) t: int = 20
类属性
C.z
将是10
,类属性C.t
将是20
,类属性C.x
和C.y
将不设置。
- class dataclasses.Field¶
Field
objects describe each defined field. These objects are created internally, and are returned by thefields()
module-level method (see below). Users should never instantiate aField
object directly. Its documented attributes are:name
:字段的名称。type
:字段的类型。default
,default_factory
,init
,repr
,hash
,compare
,metadata
和kw_only
具有与field()
函数中对应参数相同的含义和值。
可能存在其他属性,但它们是私有的。用户不应检查或依赖于这些属性。
- dataclasses.fields(class_or_instance)¶
返回一个能描述此数据类所包含的字段的元组,元组的每一项都是
Field
对象。接受数据类或数据类的实例。如果没有传递一个数据类或实例将引发TypeError
。不返回ClassVar
或InitVar
等伪字段。
- dataclasses.asdict(obj, *, dict_factory=dict)¶
将数据类
obj
转换为一个字典(使用工厂函数dict_factory
)。每个数据类被转换为以name: value
键值对来储存其字段的字典。数据类、字典、列表和元组的内容会被递归地访问。其它对象用copy.deepcopy()
来复制。Example of using
asdict()
on nested dataclasses:@dataclass class Point: x: int y: int @dataclass class C: mylist: list[Point] p = Point(10, 20) assert asdict(p) == {'x': 10, 'y': 20} c = C([Point(0, 0), Point(10, 4)]) assert asdict(c) == {'mylist': [{'x': 0, 'y': 0}, {'x': 10, 'y': 4}]}
要创建一个浅拷贝,可以使用以下的变通方法:
dict((field.name, getattr(obj, field.name)) for field in fields(obj))
asdict()
raisesTypeError
ifobj
is not a dataclass instance.
- dataclasses.astuple(obj, *, tuple_factory=tuple)¶
将数据类
obj
转换为一个元组(使用工厂函数tuple_factory
)。每个数据类被转换为其字段的值的元组。数据类、字典、列表和元组的内容会被递归地访问。其它对象用copy.deepcopy()
来复制。继续前一个例子:
assert astuple(p) == (10, 20) assert astuple(c) == ([(0, 0), (10, 4)],)
要创建一个浅拷贝,可以使用以下的变通方法:
tuple(getattr(obj, field.name) for field in dataclasses.fields(obj))
astuple()
raisesTypeError
ifobj
is not a dataclass instance.
- dataclasses.make_dataclass(cls_name, fields, *, bases=(), namespace=None, init=True, repr=True, eq=True, order=False, unsafe_hash=False, frozen=False, match_args=True, kw_only=False, slots=False, weakref_slot=False)¶
Creates a new dataclass with name
cls_name
, fields as defined infields
, base classes as given inbases
, and initialized with a namespace as given innamespace
.fields
is an iterable whose elements are each eithername
,(name, type)
, or(name, type, Field)
. If justname
is supplied,typing.Any
is used fortype
. The values ofinit
,repr
,eq
,order
,unsafe_hash
,frozen
,match_args
,kw_only
,slots
, andweakref_slot
have the same meaning as they do in@dataclass
.This function is not strictly required, because any Python mechanism for creating a new class with
__annotations__
can then apply the@dataclass
function to convert that class to a dataclass. This function is provided as a convenience. For example:C = make_dataclass('C', [('x', int), 'y', ('z', int, field(default=5))], namespace={'add_one': lambda self: self.x + 1})
等价于:
@dataclass class C: x: int y: 'typing.Any' z: int = 5 def add_one(self): return self.x + 1
- dataclasses.replace(obj, /, **changes)¶
创建一个与
obj
类型相同的新对象,将字段替换为changes
里的值。如果obj
不是数据类,则抛出TypeError
。如果changes
里的值没有指定要替换的字段名,则抛出TypeError
。The newly returned object is created by calling the
__init__()
method of the dataclass. This ensures that __post_init__, if present, is also called.Init-only variables without default values, if any exist, must be specified on the call to
replace()
so that they can be passed to__init__()
and__post_init__()
.changes
包含任何定义为init=False
的字段是错误的。在这种情况下会引发ValueError
。Be forewarned about how
init=False
fields work during a call toreplace()
. They are not copied from the source object, but rather are initialized in__post_init__()
, if they're initialized at all. It is expected thatinit=False
fields will be rarely and judiciously used. If they are used, it might be wise to have alternate class constructors, or perhaps a customreplace()
(or similarly named) method which handles instance copying.
- dataclasses.is_dataclass(obj)¶
如果其形参为数据类,或其实例,返回
True
,否则返回False
。如果你需要知道一个类是否是一个数据类的实例(而不是一个数据类本身),那么再添加一个
not isinstance(obj, type)
检查:def is_dataclass_instance(obj): return is_dataclass(obj) and not isinstance(obj, type)
- dataclasses.MISSING¶
一个指明“没有提供 default 或 default_factory”的监视值。
- dataclasses.KW_ONLY¶
A sentinel value used as a type annotation. Any fields after a pseudo-field with the type of
KW_ONLY
are marked as keyword-only fields. Note that a pseudo-field of typeKW_ONLY
is otherwise completely ignored. This includes the name of such a field. By convention, a name of_
is used for aKW_ONLY
field. Keyword-only fields signify__init__()
parameters that must be specified as keywords when the class is instantiated.在这个例子中,字段
y
和z
将被标记为仅限关键字字段:@dataclass class Point: x: float _: KW_ONLY y: float z: float p = Point(0, y=1.5, z=2.0)
In a single dataclass, it is an error to specify more than one field whose type is
KW_ONLY
.在 3.10 版新加入.
- exception dataclasses.FrozenInstanceError¶
在定义时设置了
frozen=True
的类上调用隐式定义的__setattr__()
或__delattr__()
时引发。 这是AttributeError
的一个子类。
初始化后处理¶
The generated __init__()
code will call a method named
__post_init__()
, if __post_init__()
is defined on the
class. It will normally be called as self.__post_init__()
.
However, if any InitVar
fields are defined, they will also be
passed to __post_init__()
in the order they were defined in the
class. If no __init__()
method is generated, then
__post_init__()
will not automatically be called.
When defined on the class, it will be called by the generated
__init__()
, normally asself.__post_init__()
. However, if anyInitVar
fields are defined, they will also be passed to__post_init__()
in the order they were defined in the class. If no__init__()
method is generated, then__post_init__()
will not automatically be called.@dataclass class C:
a: float b: float c: float = field(init=False)
- def __post_init__(self):
self.c = self.a + self.b
The __init__()
method generated by @dataclass
does not call base
class __init__()
methods. If the base class has an __init__()
method
that has to be called, it is common to call this method in a
__post_init__()
method:
class Rectangle:
def __init__(self, height, width):
self.height = height
self.width = width
@dataclass
class Square(Rectangle):
side: float
def __post_init__(self):
super().__init__(self.side, self.side)
Note, however, that in general the dataclass-generated __init__()
methods
don't need to be called, since the derived dataclass will take care of
initializing all fields of any base class that is a dataclass itself.
请参阅下面有关仅初始化变量的小节来了解如何将形参传递给 __post_init__()
。 另请参阅关于 replace()
如何处理 init=False
字段的警告。
类变量¶
One of the few places where @dataclass
actually inspects the type
of a field is to determine if a field is a class variable as defined
in PEP 526. It does this by checking if the type of the field is
typing.ClassVar
. If a field is a ClassVar
, it is excluded
from consideration as a field and is ignored by the dataclass
mechanisms. Such ClassVar
pseudo-fields are not returned by the
module-level fields()
function.
仅初始化变量¶
Another place where @dataclass
inspects a type annotation is to
determine if a field is an init-only variable. It does this by seeing
if the type of a field is of type dataclasses.InitVar
. If a field
is an InitVar
, it is considered a pseudo-field called an init-only
field. As it is not a true field, it is not returned by the
module-level fields()
function. Init-only fields are added as
parameters to the generated __init__()
method, and are passed to
the optional __post_init__ method. They are not otherwise used
by dataclasses.
例如,假设在创建类时没有为某个字段提供值,初始化时将从数据库中取值:
@dataclass
class C:
i: int
j: int | None = None
database: InitVar[DatabaseType | None] = None
def __post_init__(self, database):
if self.j is None and database is not None:
self.j = database.lookup('j')
c = C(10, database=my_database)
冻结的实例¶
It is not possible to create truly immutable Python objects. However,
by passing frozen=True
to the @dataclass
decorator you can
emulate immutability. In that case, dataclasses will add
__setattr__()
and __delattr__()
methods to the class. These
methods will raise a FrozenInstanceError
when invoked.
There is a tiny performance penalty when using frozen=True
:
__init__()
cannot use simple assignment to initialize fields, and
must use __setattr__()
.
继承¶
When the dataclass is being created by the @dataclass
decorator,
it looks through all of the class's base classes in reverse MRO (that
is, starting at object
) and, for each dataclass that it finds,
adds the fields from that base class to an ordered mapping of fields.
After all of the base class fields are added, it adds its own fields
to the ordered mapping. All of the generated methods will use this
combined, calculated ordered mapping of fields. Because the fields
are in insertion order, derived classes override base classes. An
example:
@dataclass
class Base:
x: Any = 15.0
y: int = 0
@dataclass
class C(Base):
z: int = 10
x: int = 15
最后的字段列表依次是 x
、 y
、 z
。 x
的最终类型是 int
,如类 C
中所指定的那样。
为 C
生成的 __init__()
方法看起来像是这样:
def __init__(self, x: int = 15, y: int = 0, z: int = 10):
Re-ordering of keyword-only parameters in __init__()
¶
在计算出 __init__()
所需要的形参之后,任何仅限关键字形参会被移至所有常规(非仅限关键字)形参的后面。 这是 Python 中实现仅限关键字形参所要求的:它们必须位于非仅限关键字形参之后。
在这个例子中,Base.y
, Base.w
, and D.t
是仅限关键字字段,而 Base.x
和 D.z
是常规字段:
@dataclass
class Base:
x: Any = 15.0
_: KW_ONLY
y: int = 0
w: int = 1
@dataclass
class D(Base):
z: int = 10
t: int = field(kw_only=True, default=0)
The generated __init__()
method for D
will look like:
def __init__(self, x: Any = 15.0, z: int = 10, *, y: int = 0, w: int = 1, t: int = 0):
请注意形参原来在字段列表中出现的位置已被重新排序:前面是来自常规字段的形参而后面是来自仅限关键字字段的形参。
The relative ordering of keyword-only parameters is maintained in the
re-ordered __init__()
parameter list.
默认工厂函数¶
如果一个 field()
指定了一个 default_factory
,当需要该字段的默认值时,将使用零参数调用它。例如,要创建列表的新实例,请使用:
mylist: list = field(default_factory=list)
If a field is excluded from __init__()
(using init=False
)
and the field also specifies default_factory
, then the default
factory function will always be called from the generated
__init__()
function. This happens because there is no other
way to give the field an initial value.
可变的默认值¶
Python 在类属性中存储默认成员变量值。思考这个例子,不使用数据类:
class C:
x = []
def add(self, element):
self.x.append(element)
o1 = C()
o2 = C()
o1.add(1)
o2.add(2)
assert o1.x == [1, 2]
assert o1.x is o2.x
请注意,类 C
的两个实例共享相同的类变量 x
,如预期的那样。
使用数据类,如果 此代码有效:
@dataclass
class D:
x: list = [] # This code raises ValueError
def add(self, element):
self.x.append(element)
它會生成類似的程式碼:
class D:
x = []
def __init__(self, x=x):
self.x = x
def add(self, element):
self.x.append(element)
assert D().x is D().x
This has the same issue as the original example using class C
.
That is, two instances of class D
that do not specify a value
for x
when creating a class instance will share the same copy
of x
. Because dataclasses just use normal Python class
creation they also share this behavior. There is no general way
for Data Classes to detect this condition. Instead, the
@dataclass
decorator will raise a ValueError
if it
detects an unhashable default parameter. The assumption is that if
a value is unhashable, it is mutable. This is a partial solution,
but it does protect against many common errors.
使用默认工厂函数是一种创建可变类型新实例的方法,并将其作为字段的默认值:
@dataclass
class D:
x: list = field(default_factory=list)
assert D().x is not D().x
在 3.11 版的變更: 现在不再是寻找并阻止使用类型为 list
, dict
或 set
的对象,而是不允许使用不可哈希的对象作为默认值。 就是将不可哈希性当作是不可变性的等价物。
描述器类型的字段¶
当字段被 描述器对象 赋值为默认值时会遵循以下行为:
The value for the field passed to the dataclass's
__init__()
method is passed to the descriptor's__set__()
method rather than overwriting the descriptor object.Similarly, when getting or setting the field, the descriptor's
__get__()
or__set__()
method is called rather than returning or overwriting the descriptor object.To determine whether a field contains a default value,
@dataclass
will call the descriptor's__get__()
method using its class access form:descriptor.__get__(obj=None, type=cls)
. If the descriptor returns a value in this case, it will be used as the field's default. On the other hand, if the descriptor raisesAttributeError
in this situation, no default value will be provided for the field.
class IntConversionDescriptor:
def __init__(self, *, default):
self._default = default
def __set_name__(self, owner, name):
self._name = "_" + name
def __get__(self, obj, type):
if obj is None:
return self._default
return getattr(obj, self._name, self._default)
def __set__(self, obj, value):
setattr(obj, self._name, int(value))
@dataclass
class InventoryItem:
quantity_on_hand: IntConversionDescriptor = IntConversionDescriptor(default=100)
i = InventoryItem()
print(i.quantity_on_hand) # 100
i.quantity_on_hand = 2.5 # calls __set__ with 2.5
print(i.quantity_on_hand) # 2
若一个字段的类型是描述器,但其默认值并不是描述器对象,那么该字段只会像普通的字段一样工作。