Your Web News in One Place

Help Webnuz

Referal links:

Sign up for GreenGeeks web hosting
November 23, 2020 02:38 pm GMT

How to find an impostor binary search implementation in Python! :-)

Recently I have been working on writing STL algorithms of C++ in Python (here). I came across a typical problem, which was how to test the implementation of binary search algorithm? Let us write some tests first.
You can write tests using any Python testing framework like pytest , unittest etc, here I am using unittest which is part of Python Standard Library.

import randomimport unittestfrom binary_search import binary_searchclass BinarySearchTestCase(unittest.TestCase):    def test_empty(self):        arr = []        self.assertFalse(binary_search(arr, 5))    def test_true(self):        arr = [1,2,3,4,5]        self.assertTrue(binary_search(arr, 4))    def test_false(self):        arr = [1,2,3,4,5]        self.assertFalse(binary_search(arr, 99))    def test_on_random_list_false(self):        arr = [random.randint(-500, 500) for _ in range(500)]        arr.sort()        self.assertFalse(binary_search(arr, 999))if __name__ == '__main__':    unittest.main()
Enter fullscreen mode Exit fullscreen mode

The testcases are divided as follows:

  • Searching for any element in an empty list should result False.
  • Searching for an element present in the list should result True.
  • Searching for an element not present in the list should result False.

The above testcases seem reasonable. To be more robust about writing the testcases we should use hypothesis library which is the Python port of QuickCheck library in Haskell. You can simply install it using pip install hypothesis.
The tests using hypothesis are as below:

import randomimport unittestfrom hypothesis import givenimport hypothesis.strategies as stfrom binary_search import binary_searchclass BinarySearchTestCase(unittest.TestCase):    @given(st.integers())    def test_empty(self, target):        arr = []        arr.sort()        self.assertFalse(binary_search(arr, target))    @given(st.lists(st.integers(), min_size=1))    def test_binary_search_true(self, arr):        arr.sort()        target = random.choice(arr)        self.assertTrue(binary_search(arr, target))    @given(st.lists(st.integers(), min_size=1))    def test_binary_search_false(self, arr):        arr.sort()        target = arr[-1] + 1        self.assertFalse(binary_search(arr, target))if __name__ == '__main__':    unittest.main()
Enter fullscreen mode Exit fullscreen mode
test.py

Hypothesis automatically generates different testcases given the specification, which in this case is a list of integers.

Now the fun part is the binary search code:

def binary_search(arr, target):    return target in arr
Enter fullscreen mode Exit fullscreen mode
binary_search.py

Let us run the test now.

$ python test.py...----------------------------------------------------------------------Ran 3 tests in 0.380sOK
Enter fullscreen mode Exit fullscreen mode

The above code is no where near the binary search implementation, but passes all the tests! The linear search algorithm passes the binary search testcases! What?? Now how can we rule out this impostor code?

The problem with these tests are that it doesn't use any of the property of binary search algorithm, it just checks the property of a searching algorithm.

We know one property of binary search that at maximum log2(n) + 1 items will be seen, as it discards half the search space at every iteration.
Here n is the total number of elements in the array.

So we write a class which behaves like a list, by implementing __iter__ and __getitem__ special methods.

class Node:    def __init__(self, arr):        self.arr = arr        self.count = 0    def __iter__(self):        for x in self.arr:            self.count += 1            yield x    def __getitem__(self, key):        self.count += 1        return self.arr[key]    def __len__(self):        return len(self.arr)
Enter fullscreen mode Exit fullscreen mode

We now have a Node class which is similar to list class but additionally has a count variable, which increments every time an element is accessed. This will help to keep track of how many elements the binary search code checks.

In Python, there is a saying, if something walks like a duck, quacks like a duck, it is a duck.

We add this extra testcase using the above Node class.

import math@given(st.lists(st.integers(), min_size=1))def test_binary_search_with_node(self, arr):    arr.sort()    target = arr[-1]    max_count = int(math.log2(len(arr))) + 1     arr = Node(arr)    ans = binary_search(arr, target)    self.assertTrue(ans)    self.assertTrue(arr.count <= max_count)
Enter fullscreen mode Exit fullscreen mode

Let us run the tests again now:

$ python test.py..Falsifying example: test_binary_search_with_node(    self=<__main__.BinarySearchTestCase testMethod=test_binary_search_with_node>,    arr=[0, 0, 1],)F.======================================================================FAIL: test_binary_search_with_node (__main__.BinarySearchTestCase)----------------------------------------------------------------------Traceback (most recent call last):  File "code.py", line 48, in test_binary_search_with_node    def test_binary_search_with_node(self, arr):  File "/home/tmp/venv/lib/python3.6/site-packages/hypothesis/core.py", line 1162, in wrapped_test    raise the_error_hypothesis_found  File "code.py", line 54, in test_binary_search_with_node    self.assertTrue(arr.count <= math.log2(len(arr)) + 1)AssertionError: False is not true----------------------------------------------------------------------Ran 4 tests in 0.435sFAILED (failures=1)
Enter fullscreen mode Exit fullscreen mode

This code fails because each and every element will be checked once, which is not true for binary search. It discards half the search space at every iteration. Hypothesis also provides the minimum testcase which failed the test, which in this case is an array of size 3.
Impostor code found!

Complete test code

import randomimport mathimport unittestfrom hypothesis import givenimport hypothesis.strategies as stfrom binary_search import binary_searchclass Node:    def __init__(self, arr):        self.arr = arr        self.count = 0    def __iter__(self):        for x in self.arr:            self.count += 1            yield x    def __getitem__(self, key):        self.count += 1        return self.arr[key]    def __len__(self):        return len(self.arr)class BinarySearchTestCase(unittest.TestCase):    @given(st.integers())    def test_empty(self, target):        arr = []        arr.sort()        self.assertFalse(binary_search(arr, target))    @given(st.lists(st.integers(), min_size=1))    def test_binary_search_true(self, arr):        arr.sort()        target = random.choice(arr)        self.assertTrue(binary_search(arr, target))    @given(st.lists(st.integers(), min_size=1))    def test_binary_search_false(self, arr):        arr.sort()        target = arr[-1] + 1        self.assertFalse(binary_search(arr, target))    @given(st.lists(st.integers(), min_size=1))    def test_binary_search_with_node(self, arr):        arr.sort()        target = arr[-1]        arr = Node(arr)        max_count = int(math.log2(len(arr))) + 1        ans = binary_search(arr, target)        self.assertTrue(ans)        self.assertTrue(arr.count <= max_count)if __name__ == '__main__':    unittest.main()
Enter fullscreen mode Exit fullscreen mode

test.py

Where to go from here?

  • Check out this awesome talk by John Huges on Testing the hard stuff and staying sane, where he talks about how he used QuickCheck for finding and fixing bugs for different companies.
  • Check out this talk on hypothesis, the port of QuickCheck in Python by ZacHatfield-Dodds.
  • Read more on unittest framework here.

Happy learning!


Original Link: https://dev.to/geekypandey/how-to-find-an-impostor-binary-search-implementation-in-python-56d0

Share this article:    Share on Facebook
View Full Article

Dev To

An online community for sharing and discovering great ideas, having debates, and making friends

More About this Source Visit Dev To