Binary search (bisection) in Python
Is there a library function that performs binary search on a list/tuple and return the position of the item if found and 'False' (-1, None, etc.) if not?
I found the functions bisect_left/right in the bisect module, but they still return a position even if the item is not in the list. That's perfectly fine for their intended usage, but I just want to know if an item is in the list or not (don't want to insert anything).
I thought of using bisect_left and then checking if the item at that position is equal to what I'm searching, but that seems cumbersome (and I also need to do bounds checking if the number can be larger than the largest number in my list). If there is a nicer method I'd like to know about it.
Edit To clarify what I need this for: I'm aware that a dictionary would be very well suited for this, but I'm trying to keep the memory consumption as low as possible. My intended usage would be a sort of double-way look-up table. I have in the table a list of values and I need to be able to access the values based on their index. And also I want to be able to find the index of a particular value or None if the value is not in the list.
Using a dictionary for this would be the fastest way, but would (approximately) double the memory requirements.
I was asking this question thinking that I may have overlooked something in the Python libraries. It seems I'll have to write my own code, as Moe suggested.
from bisect import bisect_left def binary_search(a, x, lo=0, hi=None): # can't use a to specify default for hi hi = hi if hi is not None else len(a) # hi defaults to len(a) pos = bisect_left(a, x, lo, hi) # find insertion position return (pos if pos != hi and a[pos] == x else -1) # don't walk off the end
Why not look at the code for bisect_left/right and adapt it to suit your purpose.
def binary_search(a, x, lo=0, hi=None): if hi is None: hi = len(a) while lo < hi: mid = (lo+hi)//2 midval = a[mid] if midval < x: lo = mid+1 elif midval > x: hi = mid else: return mid return -1
This is a little off-topic (since Moe's answer seems complete to the OP's question), but it might be worth looking at the complexity for your whole procedure from end to end. If you're storing thing in a sorted lists (which is where a binary search would help), and then just checking for existence, you're incurring (worst-case, unless specified):
- O( n log n) to initially create the list (if it's unsorted data. O(n), if it's sorted )
- O( log n) lookups (this is the binary search part)
- O( n ) insert / delete (might be O(1) or O(log n) average case, depending on your pattern)
Whereas with a set(), you're incurring
- O(n) to create
- O(1) lookup
- O(1) insert / delete
The thing a sorted list really gets you are "next", "previous", and "ranges" (including inserting or deleting ranges), which are O(1) or O(|range|), given a starting index. If you aren't using those sorts of operations often, then storing as sets, and sorting for display might be a better deal overall. set() incurs very little additional overhead in python.
It might be worth mentioning that the bisect docs now provide searching examples: http://docs.python.org/library/bisect.html#searching-sorted-lists
(Raising ValueError instead of returning -1 or None is more pythonic – list.index() does it, for example. But of course you can adapt the examples to your needs.)
Simplest is to use bisect and check one position back to see if the item is there:
def binary_search(a,x,lo=0,hi=-1): i = bisect(a,x,lo,hi) if i == 0: return -1 elif a[i-1] == x: return i-1 else: return -1
This is right from the manual:
8.5.1. Searching Sorted Lists
The above bisect() functions are useful for finding insertion points but can be tricky or awkward to use for common searching tasks. The following five functions show how to transform them into the standard lookups for sorted lists:
def index(a, x): 'Locate the leftmost value exactly equal to x' i = bisect_left(a, x) if i != len(a) and a[i] == x: return i raise ValueError
So with the slight modification your code should be:
def index(a, x): 'Locate the leftmost value exactly equal to x' i = bisect_left(a, x) if i != len(a) and a[i] == x: return i return -1
I agree that @DaveAbrahams's answer using the bisect module is the correct approach. He did not mention one important detail in his answer.
From the docs bisect.bisect_left(a, x, lo=0, hi=len(a))
The bisection module does not require the search array to be precomputed ahead of time. You can just present the endpoints to the bisect.bisect_left instead of it using the defaults of 0 and len(a).
Even more important for my use, looking for a value X such that the error of a given function is minimized. To do that, I needed a way to have the bisect_left's algorithm call my computation instead. This is really simple.
Just provide an object that defines __getitem__ as a
For example, we could use the bisect algorithm to find a square root with arbitrary precision!
import bisect class sqrt_array(object): def __init__(self, digits): self.precision = float(10**(digits)) def __getitem__(self, key): return (key/self.precision)**2.0 sa = sqrt_array(4) # "search" in the range of 0 to 10 with a "precision" of 0.0001 index = bisect.bisect_left(sa, 7, 0, 10*10**4) print 7**0.5 print index/(10**4.0)
If you just want to see if it's present, try turning the list into a dict:
# Generate a list l = [n*n for n in range(1000)] # Convert to dict - doesn't matter what you map values to d = dict((x, 1) for x in l) count = 0 for n in range(1000000): # Compare with "if n in l" if n in d: count += 1
On my machine, "if n in l" took 37 seconds, while "if n in d" took 0.4 seconds.
This one is:
- not recursive (which makes it more memory-efficient than most recursive approaches)
- actually working
- fast since it runs without any unnecessary if's and conditions
- based on a mathematical assertion that the floor of (low + high)/2 is always smaller than high where low is the lower limit and high is the upper limit.
- tested :D
def binsearch(t, key, low = 0, high = len(t) - 1): # bisecting the range while low < high: mid = (low + high)//2 if t[mid] < key: low = mid + 1 else: high = mid # at this point 'low' should point at the place # where the value of 'key' is possibly stored. return low if t[low] == key else -1
Dave Abrahams' solution is good. Although I have would have done it minimalistic:
def binary_search(L, x): i = bisect.bisect_left(L, x) if i == len(L) or L[i] != x: return -1 return i
While there's no explicit binary search algorithm in Python, there is a module - bisect - designed to find the insertion point for an element in a sorted list using a binary search. This can be "tricked" into performing a binary search. The biggest advantage of this is the same advantage most library code has - it's high-performing, well-tested and just works (binary searches in particular can be quite difficult to implement successfully - particularly if edge cases aren't carefully considered).
For basic types like Strings or ints it's pretty easy - all you need is the bisect module and a sorted list:
>>> import bisect >>> names = ['bender', 'fry', 'leela', 'nibbler', 'zoidberg'] >>> bisect.bisect_left(names, 'fry') 1 >>> keyword = 'fry' >>> x = bisect.bisect_left(names, keyword) >>> names[x] == keyword True >>> keyword = 'arnie' >>> x = bisect.bisect_left(names, keyword) >>> names[x] == keyword False
You can also use this to find duplicates:
... >>> names = ['bender', 'fry', 'fry', 'fry', 'leela', 'nibbler', 'zoidberg'] >>> keyword = 'fry' >>> leftIndex = bisect.bisect_left(names, keyword) >>> rightIndex = bisect.bisect_right(names, keyword) >>> names[leftIndex:rightIndex] ['fry', 'fry', 'fry']
Obviously you could just return the index rather than the value at that index if desired.
For custom types or objects, things are a little bit trickier: you have to make sure to implement rich comparison methods to get bisect to compare correctly.
>>> import bisect >>> class Tag(object): # a simple wrapper around strings ... def __init__(self, tag): ... self.tag = tag ... def __lt__(self, other): ... return self.tag < other.tag ... def __gt__(self, other): ... return self.tag > other.tag ... >>> tags = [Tag('bender'), Tag('fry'), Tag('leela'), Tag('nibbler'), Tag('zoidbe rg')] >>> key = Tag('fry') >>> leftIndex = bisect.bisect_left(tags, key) >>> rightIndex = bisect.bisect_right(tags, key) >>> print([tag.tag for tag in tags[leftIndex:rightIndex]]) ['fry']
This should work in at least Python 2.7 -> 3.3
Using a dict wouldn't like double your memory usage unless the objects you're storing are really tiny, since the values are only pointers to the actual objects:
>>> a = 'foo' >>> b = [a] >>> c = [a] >>> b is c True
In that example, 'foo' is only stored once. Does that make a difference for you? And exactly how many items are we talking about anyway?
This code works with integer lists in a recursive way. Looks for the simplest case scenario, which is: list length less than 2. It means the answer is already there and a test is performed to check for the correct answer. If not, a middle value is set and tested to be the correct, if not bisection is performed by calling again the function, but setting middle value as the upper or lower limit, by shifting it to the left or right
def binary_search(intList, intValue, lowValue, highValue): if(highValue - lowValue) < 2: return intList[lowValue] == intValue or intList[highValue] == intValue middleValue = lowValue + ((highValue - lowValue)/2) if intList[middleValue] == intValue: return True if intList[middleValue] > intValue: return binary_search(intList, intValue, lowValue, middleValue - 1) return binary_search(intList, intValue, middleValue + 1, highValue)
Check out the examples on Wikipedia http://en.wikipedia.org/wiki/Binary_search_algorithm
def binary_search(a, key, imin=0, imax=None): if imax is None: # if max amount not set, get the total imax = len(a) - 1 while imin <= imax: # calculate the midpoint mid = (imin + imax)//2 midval = a[mid] # determine which subarray to search if midval < key: # change min index to search upper subarray imin = mid + 1 elif midval > key: # change max index to search lower subarray imax = mid - 1 else: # return index number return mid raise ValueError
''' Only used if set your position as global ''' position #set global def bst(array,taget): # just pass the array and target global position low = 0 high = len(array) while low <= high: mid = (lo+hi)//2 if a[mid] == target: position = mid return -1 elif a[mid] < target: high = mid+1 else: low = mid-1 return -1
I guess this is much better and effective. please correct me :) . Thank you
- s is a list.
- binary(s, 0, len(s) - 1, find) is the initial call.
Function returns an index of the queried item. If there is no such item it returns -1.
def binary(s,p,q,find): if find==s[(p+q)/2]: return (p+q)/2 elif p==q-1 or p==q: if find==s[q]: return q else: return -1 elif find < s[(p+q)/2]: return binary(s,p,(p+q)/2,find) elif find > s[(p+q)/2]: return binary(s,(p+q)/2+1,q,find)
def binary_search_length_of_a_list(single_method_list): index = 0 first = 0 last = 1 while True: mid = ((first + last) // 2) if not single_method_list.get(index): break index = mid + 1 first = index last = index + 1 return mid
Binary Search :
// List - values inside list // searchItem - Item to search // size - Size of list // upperBound - higher index of list // lowerBound - lower index of list def binarySearch(list, searchItem, size, upperBound, lowerBound): print(list) print(upperBound) print(lowerBound) mid = ((upperBound + lowerBound)) // 2 print(mid) if int(list[int(mid)]) == value: return "value exist" elif int(list[int(mid)]) < value: return searchItem(list, value, size, upperBound, mid + 1) elif int(list[int(mid)]) > value: return searchItem(list, value, size, mid - 1, lowerBound)
// To call above function use :
list = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] searchItem = 1 print(searchItem(list, item, len(list) -1, len(list) - 1, 0))
I needed binary search in python and generic for Django models. In Django models, one model can have foreign key to another model and I wanted to perform some search on the retrieved models objects. I wrote following function you can use this.
def binary_search(values, key, lo=0, hi=None, length=None, cmp=None): """ This is a binary search function which search for given key in values. This is very generic since values and key can be of different type. If they are of different type then caller must specify `cmp` function to perform a comparison between key and values' item. :param values: List of items in which key has to be search :param key: search key :param lo: start index to begin search :param hi: end index where search will be performed :param length: length of values :param cmp: a comparator function which can be used to compare key and values :return: -1 if key is not found else index """ assert type(values) == type(key) or cmp, "can't be compared" assert not (hi and length), "`hi`, `length` both can't be specified at the same time" lo = lo if not lo: lo = 0 if hi: hi = hi elif length: hi = length - 1 else: hi = len(values) - 1 while lo <= hi: mid = lo + (hi - lo) // 2 if not cmp: if values[mid] == key: return mid if values[mid] < key: lo = mid + 1 else: hi = mid - 1 else: val = cmp(values[mid], key) # 0 -> a == b # > 0 -> a > b # < 0 -> a < b if val == 0: return mid if val < 0: lo = mid + 1 else: hi = mid - 1 return -1
Many good solutions above but I haven't seen a simple (KISS keep it simple (cause I'm) stupid use of the Python built in/generic bisect function to do a binary search. With a bit of code around the bisect function, I think I have an example below where I have tested all cases for a small string array of names. Some of the above solutions allude to/say this, but hopefully the simple code below will help anyone confused like I was.
Python bisect is used to indicate where to insert an a new value/search item into a sorted list. The below code which uses bisect_left which will return the index of the hit if the search item in the list/array is found (Note bisect and bisect_right will return the index of the element after the hit or match as the insertion point) If not found, bisect_left will return an index to the next item in the sorted list which will not == the search value. The only other case is where the search item would go at the end of the list where the index returned would be beyond the end of the list/array, and which in the code below the early exit by Python with "and" logic handles. (first condition False Python does not check subsequent conditions)
#Code from bisect import bisect_left names=["Adam","Donny","Jalan","Zach","Zayed"] search="" lenNames = len(names) while search !="none": search =input("Enter name to search for or 'none' to terminate program:") if search == "none": break i = bisect_left(names,search) print(i) # show index returned by Python bisect_left if i < (lenNames) and names[i] == search: print(names[i],"found") #return True - if function else: print(search,"not found") #return False – if function ##Exhaustive test cases: ##Enter name to search for or 'none' to terminate program:Zayed ##4 ##Zayed found ##Enter name to search for or 'none' to terminate program:Zach ##3 ##Zach found ##Enter name to search for or 'none' to terminate program:Jalan ##2 ##Jalan found ##Enter name to search for or 'none' to terminate program:Donny ##1 ##Donny found ##Enter name to search for or 'none' to terminate program:Adam ##0 ##Adam found ##Enter name to search for or 'none' to terminate program:Abie ##0 ##Abie not found ##Enter name to search for or 'none' to terminate program:Carla ##1 ##Carla not found ##Enter name to search for or 'none' to terminate program:Ed ##2 ##Ed not found ##Enter name to search for or 'none' to terminate program:Roger ##3 ##Roger not found ##Enter name to search for or 'none' to terminate program:Zap ##4 ##Zap not found ##Enter name to search for or 'none' to terminate program:Zyss ##5 ##Zyss not found