Sources Contact Advanced Search Tutorials

An Interest In:

Web News this Week

Search Archive

Some of Our Sources

View All Sources

Help Webnuz

Referal links:

July 17, 2021 10:00 am GMT

The Best Way to Compare Two Dictionaries in Python

Python's dictionary is a very versatile data structure. We can use it for many tasks that require a key->value mapping. A dict is not only efficient in terms of performance but also easy to use. Despite its admitted strengths, specific operations are somewhat tricky. For example, Python has no built-in feature allowing us to:

Compare two dictionaries and check how many pairs are equal
Get the difference between two dicts
Check if the dictionaries are equal, specially if when they have nested structures.
dicts that have floating-point numbers as values

In this article, I will show how you can do those operations and many more, so lets go.

Why You Need a Robust Way to Compare Dictionaries
Using the Right Tool for the Job
Getting a Simple Difference
Ignoring String Case
Comparing Float Values
Comparing numpy Values
Comparing Dictionaries With datetime Objects
Comparing String Values
Excluding Fields
Conclusion

Why You Need a Robust Way to Compare Dictionaries

Let's imagine the following scenario: you have two simple dictionaries. How can we assert if they match? Easy, right?

Yeah! You could use the == operator, off course!

>>> a = {    'number': 1,    'list': ['one', 'two']}>>> b = {    'list': ['one', 'two'],    'number': 1}>>> a == bTrue

That's kind of expected, the dictionaries are the same. But what if some value is different, the result will be False but can we tell where do they differ?

>>> a = {    'number': 1,    'list': ['one', 'two']}>>> b = {    'list': ['one', 'two'],    'number': 2}>>> a == bFalse

Hum... Just False doesn't tell us much...

What about the str's inside the list. Let's say that we want to ignore their cases.

>>> a = {    'number': 1,    'list': ['ONE', 'two']}>>> b = {    'list': ['one', 'two'],    'number': 1}>>> a == bFalse

Oops...

What if the number was a float and we consider two floats to be the same if they have at least 3 significant digits equal? Put another way, we want to check if only 3 digits after the decimal point match.

>>> a = {    'number': 1,    'list': ['one', 'two']}>>> b = {    'list': ['one', 'two'],    'number': 1.00001}>>> a == bFalse

You might also want to exclude some fields from the comparison. As an example, we might now want to remove the list key->value from the check. Unless we create a new dictionary without it, there's no method to do that for you.

Can't it get any worse?

Yes, what if a value is a numpy array?

>>> a = {    'number': 1,    'list': ['one', 'two'],     'array': np.ones(3)}>>> b = {    'list': ['one', 'two'],    'number': 1,    'array': np.ones(3)}>>> a == b---------------------------------------------------------------------------ValueError                                Traceback (most recent call last)<ipython-input-4-eeadcaeab874> in <module>----> 1 a == bValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Oh no, it raises an exception in the middle of our faces!

Damn it, what can we do then?

Using the Right Tool for the Job

Since dicts cannot perform advanced comparisons, there are only two forms of achieving that. You can either implement the functionality yourself or use a third party library. At some point in your life you probably heard about not reinventing the wheel. So that's precisely what we're going to do in this tutorial.

We'll adopt a library called DeepDiff, from zepworks. DeepDiff can pick up the difference between dictionaries, iterables, strings and other objects. It accomplishes that by searching for changes in a recursively manner.

DeepDiff is not the only kid on the block, there's also Dictdiffer, developed by the folks at CERN. Dictdiffer is also cool but lacks a lot of the features that make DeepDiff so interesting. In any case, I encourage you to look at both and determine which one works best for you.

Getting a Simple Difference

In this example, we'll be solving the first example I showed you. We want to find the key whose value differs between the two dicts. Consider the following code snippet, but this time using deepdiff.

In [1]: from deepdiff import DeepDiffIn [2]: a = {   ...:     'number': 1,   ...:     'list': ['one', 'two']   ...: }In [3]: b = {   ...:     'list': ['one', 'two'],   ...:     'number': 2   ...: }In [4]: diff = DeepDiff(a, b)In [5]: diffOut[5]: {'values_changed': {"root['number']": {'new_value': 2, 'old_value': 1}}}

Awesome! It tells us that the key 'number' had value 1 but the new dict, b, has a new value, 2.

Ignoring String Case

In our second example, we saw an example where one element of the list was in uppercase, but we didn't care about that. We wanted to ignore it and treat "one" as "ONE"

You can solve that by setting ignore_string_case=True

In [10]: a = {    ...:     'number': 1,    ...:     'list': ['ONE', 'two']    ...: }    ...: In [11]: b = {    ...:     'list': ['one', 'two'],    ...:     'number': 1    ...: }In [12]: diff = DeepDiff(a, b, ignore_string_case=True)In [13]: diffOut[13]: {}

If we don't do that, a very helpful message is printed.

In [14]: diff = DeepDiff(a, b)In [15]: diffOut[15]: {'values_changed': {"root['list'][0]": {'new_value': 'one',   'old_value': 'ONE'}}}

Comparing Float Values

We also saw a case where we had a float number that we only wanted to check if the first 3 significant digits were equal. With DeepDiff it's possible to pass the exact number of digits AFTER the decimal point. Also, since floats differ from int's, we might want to ignore type comparison as well. We can solve that by setting ignore_numeric_type_changes=True.

In [16]: a = {    ...:     'number': 1,    ...:     'list': ['one', 'two']    ...: }In [17]: b = {    ...:     'list': ['one', 'two'],    ...:     'number': 1.00001    ...: }In [18]: diff = DeepDiff(a, b)In [19]: diffOut[19]: {'type_changes': {"root['number']": {'old_type': int,   'new_type': float,   'old_value': 1,   'new_value': 1.00001}}}In [24]: diff = DeepDiff(a, b, significant_digits=3, ignore_numeric_type_changes=True)In [25]: diffOut[25]: {}

Comparing `numpy` Values

When we tried comparing two dictionaries with a numpy array in it we failed miserably. Fortunately, DeepDiff has our backs here. It supports numpy objects by default!

In [27]: import numpy as npIn [28]: a = {    ...:     'number': 1,    ...:     'list': ['one', 'two'],    ...:      'array': np.ones(3)    ...: }In [29]: b = {    ...:     'list': ['one', 'two'],    ...:     'number': 1,    ...:     'array': np.ones(3)    ...: }In [30]: diff = DeepDiff(a, b)In [31]: diffOut[31]: {}

What if the arrays are different?

No problem!

In [28]: a = {    ...:     'number': 1,    ...:     'list': ['one', 'two'],    ...:      'array': np.ones(3)    ...: }In [32]: b = {    ...:     'list': ['one', 'two'],    ...:     'number': 1,    ...:     'array': np.array([1, 2, 3])    ...: }In [33]: diff = DeepDiff(a, b)In [34]: diffOut[34]: {'type_changes': {"root['array']": {'old_type': numpy.float64,   'new_type': numpy.int64,   'old_value': array([1., 1., 1.]),   'new_value': array([1, 2, 3])}}}

It shows that not only the values are different but also the types!

Comparing Dictionaries With `datetime` Objects

Another common use case is comparing datetime objects. This kind of object has the following signature:

class datetime.datetime(year, month, day, hour=0, minute=0, second=0, microsecond=0, tzinfo=None, *, fold=0)

In case we have a dict with datetime objects, DeepDiff allows us to compare only certain parts of it. For instance, if only care about year, month, and day, then we can truncate it.

In [1]: import datetimeIn [2]: from deepdiff import DeepDiffIn [3]: a = {            'list': ['one', 'two'],            'number': 1,             'date': datetime.datetime(2020, 6, 17, 22, 45, 34, 513371)        }In [4]: b = {            'list': ['one', 'two'],            'number': 1,            'date': datetime.datetime(2020, 6, 17, 12, 12, 51, 115791)        }In [5]: diff = DeepDiff(a, b, truncate_datetime='day')In [6]: diffOut[7]: {}

Comparing String Values

We've looked at interesting examples so far, and it's a common use case to use dicts to store strings values. Having a better way of contrasting them can help us a lot! In this section I'm going to explain you another lovely feature, the str diff.

In [13]: from pprint import pprintIn [17]: b = {    ...:     'number': 1,    ...:     'text': 'hi,
 my awesome world!'    ...: }In [18]: a = {    ...:     'number': 1,    ...:     'text': 'hello, my
 dear
 world!'    ...: }In [20]: ddiff = DeepDiff(a, b, verbose_level=2)In [21]: pprint(ddiff, indent=2){ 'values_changed': { "root['text']": { 'diff': '--- 
'                                                '+++ 
'                                                '@@ -1,3 +1,2 @@
'                                                '-hello, my
'                                                '- dear
'                                                '- world!
'                                                '+hi,
'                                                '+ my awesome world!',                                        'new_value': 'hi,
 my awesome world!',                                        'old_value': 'hello, my
'                                                     ' dear
'                                                     ' world!'}}}

That's nice! We can see the exact lines where the two strings differ.

Excluding Fields

In this last example, I'll show you yet another common use case, excluding a field. We might want to exclude one or more items from the comparison. For instance, using the previous example, we might want to leave out the text field.

In [17]: b = {    ...:     'number': 1,    ...:     'text': 'hi,
 my awesome world!'    ...: }In [18]: a = {    ...:     'number': 1,    ...:     'text': 'hello, my
 dear
 world!'    ...: }In [26]: ddiff = DeepDiff(a, b, verbose_level=2, exclude_paths=["root['text']"])    ...: In [27]: ddiffOut[27]: {}

If you want even more advanced exclusions, DeepDiff also allow you to pass a regex expression. Check this out: https://zepworks.com/deepdiff/current/exclude_paths.html#exclude-regex-paths.

Conclusion

That's it for today, folks! I really hope you've learned something new and useful. Comparing dict's is a common use case since they can used to store almost any kind of data. As a result, having a proper tool to easy this effort is indispensable. DeepDiff has many features and can do reasonably advanced comparisons. If you ever need to compare dict's go check it out.

Dev To

An online community for sharing and discovering great ideas, having debates, and making friends

More About this Source Visit Dev To