10.10. shutil — 高阶文件操作

源代码: Lib/shutil.py


shutil 模块提供了一系列对文件和文件集合的高阶操作。 特别是提供了一些支持文件拷贝和删除的函数。 对于单个文件的操作,请参阅 os 模块。

警告

Even the higher-level file copying functions (shutil.copy(), shutil.copy2()) can’t copy all file metadata.

在 POSIX 平台上,这意味着将丢失文件所有者和组以及 ACL 数据。 在 Mac OS 上,资源钩子和其他元数据不被使用。 这意味着将丢失这些资源并且文件类型和创建者代码将不正确。 在 Windows 上,将不会拷贝文件所有者、ACL 和替代数据流。

10.10.1. 目录和文件操作

shutil.copyfileobj(fsrc, fdst[, length])

将文件类对象 fsrc 的内容拷贝到文件类对象 fdst。 整数值 length 如果给出则为缓冲区大小。 特别地, length 为负值表示拷贝数据时不对源数据进行分块循环处理;默认情况下会分块读取数据以避免不受控制的内存消耗。 请注意如果 fsrc 对象的当前文件位置不为 0,则只有从当前文件位置到文件末尾的内容会被拷贝。

shutil.copyfile(src, dst)

Copy the contents (no metadata) of the file named src to a file named dst. dst must be the complete target file name; look at shutil.copy() for a copy that accepts a target directory path. If src and dst are the same files, Error is raised. The destination location must be writable; otherwise, an IOError exception will be raised. If dst already exists, it will be replaced. Special files such as character or block devices and pipes cannot be copied with this function. src and dst are path names given as strings.

shutil.copymode(src, dst)

Copy the permission bits from src to dst. The file contents, owner, and group are unaffected. src and dst are path names given as strings.

shutil.copystat(src, dst)

Copy the permission bits, last access time, last modification time, and flags from src to dst. The file contents, owner, and group are unaffected. src and dst are path names given as strings.

shutil.copy(src, dst)

Copy the file src to the file or directory dst. If dst is a directory, a file with the same basename as src is created (or overwritten) in the directory specified. Permission bits are copied. src and dst are path names given as strings.

shutil.copy2(src, dst)

类似于 copy(),区别在于 copy2() 还会尝试保留文件的元数据。

copy2() uses copystat() to copy the file metadata. Please see copystat() for more information.

shutil.ignore_patterns(*patterns)

这个工厂函数会创建一个函数,它可被用作 copytree()ignore 可调用对象参数,以忽略那些匹配所提供的 glob 风格的 patterns 之一的文件和目录。 参见以下示例。

2.6 新版功能.

shutil.copytree(src, dst, symlinks=False, ignore=None)

Recursively copy an entire directory tree rooted at src. The destination directory, named by dst, must not already exist; it will be created as well as missing parent directories. Permissions and times of directories are copied with copystat(), individual files are copied using shutil.copy2().

If symlinks is true, symbolic links in the source tree are represented as symbolic links in the new tree, but the metadata of the original links is NOT copied; if false or omitted, the contents and metadata of the linked files are copied to the new tree.

如果给出了 ignore,它必须是一个可调用对象,该对象将接受 copytree() 所访问的目录以及 os.listdir() 所返回的目录内容列表作为其参数。 由于 copytree() 是递归地被调用的,ignore 可调用对象对于每个被拷贝目录都将被调用一次。 该可调用对象必须返回一个相对于当前目录的目录和文件名序列(即其第二个参数的子集);随后这些名称将在拷贝进程中被忽略。 ignore_patterns() 可被用于创建这种基于 glob 风格模式来忽略特定名称的可调用对象。

如果发生了异常,将引发一个附带原因列表的 Error

The source code for this should be considered an example rather than the ultimate tool.

在 2.3 版更改: Error is raised if any exceptions occur during copying, rather than printing a message.

在 2.5 版更改: Create intermediate directories needed to create dst, rather than raising an error. Copy permissions and times of directories using copystat().

在 2.6 版更改: Added the ignore argument to be able to influence what is being copied.

shutil.rmtree(path[, ignore_errors[, onerror]])

删除一个完整的目录树;path 必须指向一个目录(但不能是一个目录的符号链接)。 如果 ignore_errors 为真值,删除失败导致的错误将被忽略;如果为假值或是省略,此类错误将通过调用由 onerror 所指定的处理程序来处理,或者如果此参数被省略则将引发一个异常。

If onerror is provided, it must be a callable that accepts three parameters: function, path, and excinfo. The first parameter, function, is the function which raised the exception; it will be os.path.islink(), os.listdir(), os.remove() or os.rmdir(). The second parameter, path, will be the path name passed to function. The third parameter, excinfo, will be the exception information return by sys.exc_info(). Exceptions raised by onerror will not be caught.

在 2.6 版更改: Explicitly check for path being a symbolic link and raise OSError in that case.

shutil.move(src, dst)

Recursively move a file or directory (src) to another location (dst).

如果目标是已存在的目录,则 src 会被移至该目录下。 如果目标已存在但不是目录,它可能会被覆盖,具体取决于 os.rename() 的语义。

If the destination is on the current filesystem, then os.rename() is used. Otherwise, src is copied (using shutil.copy2()) to dst and then removed.

2.3 新版功能.

exception shutil.Error

此异常会收集在多文件操作期间所引发的异常。 对于 copytree(),此异常参数将是一个由三元组 (srcname, dstname, exception) 构成的列表。

2.3 新版功能.

10.10.1.1. copytree 示例

这个示例就是上面所描述的 copytree() 函数的实现,其中省略了文档字符串。 它还展示了此模块所提供的许多其他函数。

def copytree(src, dst, symlinks=False, ignore=None):
    names = os.listdir(src)
    if ignore is not None:
        ignored_names = ignore(src, names)
    else:
        ignored_names = set()

    os.makedirs(dst)
    errors = []
    for name in names:
        if name in ignored_names:
            continue
        srcname = os.path.join(src, name)
        dstname = os.path.join(dst, name)
        try:
            if symlinks and os.path.islink(srcname):
                linkto = os.readlink(srcname)
                os.symlink(linkto, dstname)
            elif os.path.isdir(srcname):
                copytree(srcname, dstname, symlinks, ignore)
            else:
                copy2(srcname, dstname)
            # XXX What about devices, sockets etc.?
        except (IOError, os.error) as why:
            errors.append((srcname, dstname, str(why)))
        # catch the Error from the recursive copytree so that we can
        # continue with other files
        except Error as err:
            errors.extend(err.args[0])
    try:
        copystat(src, dst)
    except WindowsError:
        # can't copy file access times on Windows
        pass
    except OSError as why:
        errors.extend((src, dst, str(why)))
    if errors:
        raise Error(errors)

另一个使用 ignore_patterns() 辅助函数的例子:

from shutil import copytree, ignore_patterns

copytree(source, destination, ignore=ignore_patterns('*.pyc', 'tmp*'))

这将会拷贝除 .pyc 文件和以 tmp 打头的文件或目录以外的所有条目.

另一个使用 ignore 参数来添加记录调用的例子:

from shutil import copytree
import logging

def _logpath(path, names):
    logging.info('Working in %s' % path)
    return []   # nothing will be ignored

copytree(source, destination, ignore=_logpath)

10.10.2. 归档操作

本模块也提供了用于创建和读取压缩和归档文件的高层级工具。 它们依赖于 zipfiletarfile 模块。

shutil.make_archive(base_name, format[, root_dir[, base_dir[, verbose[, dry_run[, owner[, group[, logger]]]]]]])

Create an archive file (eg. zip or tar) and returns its name.

base_name is the name of the file to create, including the path, minus any format-specific extension. format is the archive format: one of “zip” (if the zlib module or external zip executable is available), “tar”, “gztar” (if the zlib module is available), or “bztar” (if the bz2 module is available).

root_dir is a directory that will be the root directory of the archive; ie. we typically chdir into root_dir before creating the archive.

base_dir is the directory where we start archiving from; ie. base_dir will be the common prefix of all files and directories in the archive.

root_dirbase_dir 默认均为当前目录。

ownergroup 将在创建 tar 归档文件时被使用。 默认会使用当前的所有者和分组。

logger 必须是一个兼容 PEP 282 的对象,通常为 logging.Logger 的实例。

2.7 新版功能.

shutil.get_archive_formats()

返回支持的归档格式列表。 所返回序列中的每个元素为一个元组 (name, description)

默认情况下 shutil 提供以下格式:

  • zip: ZIP file (if the zlib module or external zip executable is available).

  • tar: 未压缩的 tar 文件。

  • gztar: gzip 压缩的 tar 文件(如果 zlib 模块可用)。

  • bztar: bzip2 压缩的 tar 文件(如果 bz2 模块可用)。

你可以通过使用 register_archive_format() 注册新的格式或为任何现有格式提供你自己的归档程序。

2.7 新版功能.

shutil.register_archive_format(name, function[, extra_args[, description]])

Register an archiver for the format name. function is a callable that will be used to invoke the archiver.

If given, extra_args is a sequence of (name, value) that will be used as extra keywords arguments when the archiver callable is used.

description is used by get_archive_formats() which returns the list of archivers. Defaults to an empty list.

2.7 新版功能.

shutil.unregister_archive_format(name)

从支持的格式中移除归档格式 name

2.7 新版功能.

10.10.2.1. 归档程序示例

在这个示例中,我们创建了一个 gzip 压缩的 tar 归档文件,其中包含用户的 .ssh 目录下的所有文件:

>>> from shutil import make_archive
>>> import os
>>> archive_name = os.path.expanduser(os.path.join('~', 'myarchive'))
>>> root_dir = os.path.expanduser(os.path.join('~', '.ssh'))
>>> make_archive(archive_name, 'gztar', root_dir)
'/Users/tarek/myarchive.tar.gz'

结果归档文件中包含有:

$ tar -tzvf /Users/tarek/myarchive.tar.gz
drwx------ tarek/staff       0 2010-02-01 16:23:40 ./
-rw-r--r-- tarek/staff     609 2008-06-09 13:26:54 ./authorized_keys
-rwxr-xr-x tarek/staff      65 2008-06-09 13:26:54 ./config
-rwx------ tarek/staff     668 2008-06-09 13:26:54 ./id_dsa
-rwxr-xr-x tarek/staff     609 2008-06-09 13:26:54 ./id_dsa.pub
-rw------- tarek/staff    1675 2008-06-09 13:26:54 ./id_rsa
-rw-r--r-- tarek/staff     397 2008-06-09 13:26:54 ./id_rsa.pub
-rw-r--r-- tarek/staff   37192 2010-02-06 18:23:10 ./known_hosts