2

I am attempting to calculate the Levenshtein distance between two strings using dynamic programming. This is being done through Hackerrank, so I have timing constraints. I used a techenique I saw in: How are Dynamic Programming algorithms implemented in idiomatic Haskell? and it seems to be working. Unfortunaly, it is timing out in one test case. I do not have access to the specific test case, so I don't know the exact size of the input.

import Control.Monad
import Data.Array.IArray
import Data.Array.Unboxed

main = do
  n <- readLn
  replicateM_ n $ do
    s1 <- getLine
    s2 <- getLine
    print $ editDistance s1 s2

editDistance :: String -> String -> Int
editDistance s1 s2 = dynamic editDistance' (length s1, length s2)
  where
    s1' :: UArray Int Char
    s1' = listArray (1,length s1) s1
    s2' :: UArray Int Char
    s2' = listArray (1,length s2) s2
    editDistance' table (i,j)
      | min i j == 0 = max i j
      | otherwise = min' (table!((i-1),j) + 1) (table!(i,(j-1)) + 1) (table!((i-1),(j-1)) + cost)
      where
        cost =  if s1'!i == s2'!j then 0 else 1
        min' a b = min (min a b)

dynamic :: (Array (Int,Int) Int -> (Int,Int) -> Int) -> (Int,Int) -> Int
dynamic compute (xBnd, yBnd) = table!(xBnd,yBnd)
  where
    table = newTable $ map (\coord -> (coord, compute table coord)) [(x,y) | x<-[0..xBnd], y<-[0..yBnd]]
    newTable xs = array ((0,0),fst (last xs)) xs

I've switched to using arrays, but that speed up was insufficient. I cannot use Unboxed arrays, because this code relies on laziness. Are there any glaring performance mistakes I have made? Or how else can I speed it up?

1
  • 1
    Strings are very inefficient. For a competition program I would read the input in as a ByteString if you can get away with that - other use Text. Commented Sep 30, 2016 at 19:44

1 Answer 1

2

The backward equations for edit distance calculations are:

f(i, j) = minimum [
  1 + f(i + 1, j), -- delete from the 1st string
  1 + f(i, j + 1), -- delete from the 2nd string 
  f(i + 1, j + 1) + if a(i) == b(j) then 0 else 1 -- substitute or match
]

So within each dimension, you need nothing more than the very next index: + 1. This is a sequential access pattern, not random access to require arrays; and can be implemented using lists and nested right folds:

editDistance :: Eq a => [a] -> [a] -> Int
editDistance a b = head . foldr loop [n, n - 1..0] $ zip a [m, m - 1..]
  where
  (m, n) = (length a, length b)
  loop (s, l) lst = foldr go [l] $ zip3 b lst (tail lst)
    where
    go (t, i, j) acc@(k:_) = inc `seq` inc:acc
      where inc = minimum [i + 1, k + 1, if s == t then j else j + 1]

You may test this code in Hackerrank Edit Distance Problem as in:

import Control.Applicative ((<$>))
import Control.Monad (replicateM_)
import Text.Read (readMaybe)

editDistance :: Eq a => [a] -> [a] -> Int
editDistance a b = ... -- as implemented above

main :: IO ()
main = do
  Just n <- readMaybe <$> getLine
  replicateM_ n $ do
    a <- getLine
    b <- getLine
    print $ editDistance a b

which passes all tests with a decent performance.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.