bisect --- 陣列二分演算法 (Array bisection algorithm)

原始碼: Lib/bisect.py


這個模組維護一個已經排序過的 list ,當我們每次做完插入後不需要再次排序整個 list 。一個很長的 list 的比較操作很花費時間,為了改進這點,這個模組是其中一個常用的方法。這個模組被命名為 bisect 來自他使用一個基礎的 bisection 演算法實作。模組的原始碼是這個演算法的一個完善的實作(邊界條件已經是正確的了)。

此模組提供下面的函式

bisect.bisect_left(a, x, lo=0, hi=len(a))

a 當中找到一個位置,讓 x 插入後 a 仍然是排序好的。參數 lohi 用來指定 list 中應該被考慮的子區間,預設是考慮整個 list 。如果 a 裡面已經有 x 出現,插入的位置會在所有 x 的前面(左邊)。回傳值可以被當作 list.insert() 的第一個參數,但列表 a 必須先排序過。

回傳的插入位置 i 將陣列 a 分為兩半,使得 all(val < x for val in a[lo:i]) 都在左側且 all(val >= x for val in a[i:hi]) 都在右側。

bisect.bisect_right(a, x, lo=0, hi=len(a))
bisect.bisect(a, x, lo=0, hi=len(a))

類似 bisect_left() ,但回傳的插入位置會在所有 a 當中的 x 的後面(右邊)。

回傳的插入位置 i 將陣列 a 分為兩半,使得 all(val <= x for val in a[lo:i]) 都在左側且 all(val > x for val in a[i:hi]) 都在右側。

bisect.insort_left(a, x, lo=0, hi=len(a))

不破壞 a 的排序下插入 x ,這等價於 a.insert(bisect.bisect_left(a, x, lo, hi), x) ,注意 a 必須是已經排序過的 list 。注意搜尋只需要 O(log n) 時間而插入需要很慢的 O(n) 時間,這使得插入操作主導了需要的花費時間。

bisect.insort_right(a, x, lo=0, hi=len(a))
bisect.insort(a, x, lo=0, hi=len(a))

類似 insort_left() ,但插入的位置會在所有 a 當中的 x 的後面(右邊)。

也參考

SortedCollection recipe that uses bisect to build a full-featured collection class with straight-forward search methods and support for a key-function. The keys are precomputed to save unnecessary calls to the key function during searches.

Searching Sorted Lists

The above bisect() functions are useful for finding insertion points but can be tricky or awkward to use for common searching tasks. The following five functions show how to transform them into the standard lookups for sorted lists:

def index(a, x):
    'Locate the leftmost value exactly equal to x'
    i = bisect_left(a, x)
    if i != len(a) and a[i] == x:
        return i
    raise ValueError

def find_lt(a, x):
    'Find rightmost value less than x'
    i = bisect_left(a, x)
    if i:
        return a[i-1]
    raise ValueError

def find_le(a, x):
    'Find rightmost value less than or equal to x'
    i = bisect_right(a, x)
    if i:
        return a[i-1]
    raise ValueError

def find_gt(a, x):
    'Find leftmost value greater than x'
    i = bisect_right(a, x)
    if i != len(a):
        return a[i]
    raise ValueError

def find_ge(a, x):
    'Find leftmost item greater than or equal to x'
    i = bisect_left(a, x)
    if i != len(a):
        return a[i]
    raise ValueError

Other Examples

The bisect() function can be useful for numeric table lookups. This example uses bisect() to look up a letter grade for an exam score (say) based on a set of ordered numeric breakpoints: 90 and up is an 'A', 80 to 89 is a 'B', and so on:

>>> def grade(score, breakpoints=[60, 70, 80, 90], grades='FDCBA'):
...     i = bisect(breakpoints, score)
...     return grades[i]
...
>>> [grade(score) for score in [33, 99, 77, 70, 89, 90, 100]]
['F', 'A', 'C', 'C', 'B', 'A', 'A']

Unlike the sorted() function, it does not make sense for the bisect() functions to have key or reversed arguments because that would lead to an inefficient design (successive calls to bisect functions would not "remember" all of the previous key lookups).

Instead, it is better to search a list of precomputed keys to find the index of the record in question:

>>> data = [('red', 5), ('blue', 1), ('yellow', 8), ('black', 0)]
>>> data.sort(key=lambda r: r[1])
>>> keys = [r[1] for r in data]         # precomputed list of keys
>>> data[bisect_left(keys, 0)]
('black', 0)
>>> data[bisect_left(keys, 1)]
('blue', 1)
>>> data[bisect_left(keys, 5)]
('red', 5)
>>> data[bisect_left(keys, 8)]
('yellow', 8)