Saturday, February 28, 2015

Levenshtein Distance in R

Edit distance and Levenshtein distance are nonparametric distance measures that not like well known metric distance measures such as Euclidean or Mahalanobis distances in some persfectives.

Levenshtein distance is a measure of how many characters should be replaced or moved to get two strings same.

In the example below, a string text is asked from the user in console mode. Then the input string is compared to colour names defined in R.  Similar colour names are then reported:



user.string <- readline("Enter a word: ")
wordlist <- colours()
dists <- adist(user.string, wordlist)
mindist <- min(dists)
best.ones <- which(dists == mindist)

for (index in best.ones){
    cat("Did you mean: ", wordlist[index],"\n")
}



Here is the results:

Enter a word: turtoise
Did you mean:  turquoise 


Enter a word: turtle
Did you mean:  purple 


Enter a word: night blue
Did you mean:  lightblue 


Enter a word: parliament
Did you mean:  darkmagenta 


Enter a word: marooon
Did you mean:  maroon



Have a nice read





No comments:

Post a Comment

Thanks