Your Web News in One Place

Help Webnuz

Referal links:

Sign up for GreenGeeks web hosting
April 26, 2021 01:26 am GMT

Distance Measures: Levenshtein Distance

From exact search to in-exact search to phonetics, to change and edits in text, we use these algorithms on a daily basis. How about we explore how they really work?
We will be exploring one particular algorithm that was probably used by your device when you were searching for this post. You also use it every day when you text your family and friends. Sometimes, you blame it for mistakes in your spelling. Sit back and enjoy this ride.

The Levenshtein Distance.

This algorithm was designed in 1965 by a Russian Mathematician, Vladimir Levenshtein. It measures the minimum number of insertions, deletions, or replacements that are required to change one string to another. The higher the distance value, the greater the difference between the strings.
A simple example looks like this. The Levenshtein distance between the word run and run is 0 because both words are identical. No insertion, deletion, or replacement is needed. In contrast hand, the Levenshtein distance between the words run and ran is 1. We need one substitution to change the word run to ran i.e. u to a.

Implementation in Python

To check how many operations we need to transform the word MONDAY to FRIDAY, we will draw a dynamic programming table to visualize.

This is our initial table:
Alt Text
All values are currently initialized to zero. Notice some changes in the second column of the table.
The reason for that will be explained using the second row because the same process occurs.
Alt Text
The next task is to start comparing substrings and calculate distances. To fill up the second row of the table, perform the following operations:
Alt Text
Alt Text
Alt Text
Alt Text
Notice the way the numbers and colors are changing for each element.
This key will be used as a reference when traversing the table.
Alt Text
This means at any particular location table[x,y] where source[x]!=target[y], a replacement will cost table[x-1,y-1]+1, an insertion table[x,y-1]+1 and a deletion table[x-1,y] +1. The solution table[x,y] is the minumum of either of those values. When source[x]==target[y], table[x,y] will be min(table[x-1,y-1],table[x,y-1]+1,table[x-1,y] +1 ). The intuition behind previous statements will become clearer in the code and subsequent tables. See how to apply them in the table.
Alt Text
Intuition :
Alt Text

After all the iterations are completed, the global solution to the problem can be found at the bottom right coner of the table.
Alt Text

import numpy as npdef levenshtein(token_1,token_2):    len_token_1 =len(token_1) +1    len_token_2 = len(token_2) +1    matrix  = np.zeros((len_token_1, len_token_2))    for i in range(len_token_1):        matrix[i,0] = i    for j in range(len_token_2):        matrix[0,j] = j    for x in range(1,len_token_1):        for y in range(1,len_token_2):            if token_1[x-1] == token_2[y-1]:                matrix [x,y] = min(                    matrix[x-1,y-1],                    matrix[x,y-1]+1,                    matrix[x-1,y]+1                )              else:                matrix [x,y] = min(                    matrix[x-1,y-1]+1,                    matrix[x,y-1]+1,                    matrix[x-1,y]+1                )    return matrix[len_token_1-1, len_token_2-1]print(levenshtein("monday","friday"))

Time complexity : O(x*y)
Space complexity: O(x*y)

Dynamic programming helps us break big problems into smaller problems. There are other approaches to solving this problem. This algorithm can also be enhanced to give more information about the operation to be performed on either character.

If you have any questions, contact me @forthecode__. This can also help you understand better.


Original Link: https://dev.to/bamimoretomi/distance-measures-levenshtein-distance-i1d

Share this article:    Share on Facebook
View Full Article

Dev To

An online community for sharing and discovering great ideas, having debates, and making friends

More About this Source Visit Dev To