A diff tool generates an output given two input files. This output can be thought of as a script which when applied to first input returns the second input. Diff tool is nothing but a straight forward implementation for finding edit distance at its core. In addition to finding minimum edit distance we also find actions which when applied first input would lead to second input. These actions constitute diff tools output. Lets first define some classes which denote these actions

trait EditAction
case class Delete(line:Int) extends EditAction
case class Insert(line:Int) extends EditAction
case class Copy(line1:Int,line2:Int) extends EditAction

Above we declare three forms of edit actions which can be performed on first input to arrive at second input. Delete action contains line number of the line which is deleted from first input. Insert action contains line number from second input which is inserted into first. Copy action contains two line numbers. First is the line number from first input which is copied and second is line number in second input where it is copied. Although, there would be many ways to do this but we would like a sequence of these action with minimum cost. Now, we assume that cost of Delete and Insert is more that Copy action since in case of Copy line in first and second input is same and we did not make a change. And since while finding minimum cost only relative cost matters we can assign cost 1 to Delete and Insert Actions and cost of 0 to Copy action. Below is cost function for these edit actions

def cost(action:EditAction) = {
    case _:Delete => 1
    case _:Insert => 1
    case _:Copy => 0
}

Now, lets code up the actual diff method which takes line numbers in first and second input at which starts calculating the diff. The actual diff of two file can be found by invoking this function at position 0,0. This method also assumes that lines from first input are available in lines1 and lines from second input are available in lines2 variables.

def diff(pos1:Int,pos2:Int):List[EditAction]={
   // all lines processed, return empty list
   if(pos1>=lines1.length && pos2>=lines2.length)  
       List[EditAction]()
   // all input 1 lines processed, insert lines from input 2
   else if(pos1 >= lines1.length)
       Insert(pos2)::diff(pos1,pos2+1)
   // all input 2 lines processed, delete lines from input 1
   else if(pos2 >= lines2.length)
       Delete(pos1)::diff(pos1+1,pos2)
   else if(lines1(pos1) == lines2(pos2))
   // both lines from input1,input2 are same, we can copy from 
      // input 1 to input 2 
       Copy(pos1,pos2)::diff(pos1+1,pos2+1)
   else {
       val diff1 = Insert(pos2)::diff(pos1,pos2+1)
       val diff2 = Delete(pos1)::diff(pos1+1,pos2)
       val diff1Cost = (diff1 map cost).sum
       val diff2Cost = (diff2 map cost).sum
       if(diff1Cost < diff2Cost) diff1
       else diff2       
   }
}

Above is a naive implementation of our diff algorithm where in first if we just return empty list of EditAction since pos1 and pos2 are both equal to or greater than number of lines in files, which means that we have already processed all lines. In second if we see that we have already processed all lines in first file so we have no option but to insert lines from second file. Third if handles the case when we have processed all lines in second file, in this case we have no option but to delete remaining lines from first input. Next we handle case when current line in input 1 and input 2 is same, in this case it would be best to simply copy this line from file one to file two. Next we handle more complex case by calculating cost of both performing Insert and Delete and then comparing the diff cost and returning the diff with minimum cost. Now lets also add code to read input files and call diff method.

object DiffTool extends App {
  val lines1 = io.Source.fromFile(args(0)).getLines.toArray
  val lines2 = io.Source.fromFile(args(1)).getLines.toArray

  trait EditAction
  case class Delete(line:Int) extends EditAction
  case class Insert(line:Int) extends EditAction
  case class Copy(line1:Int,line2:Int) extends EditAction

  def cost(action:EditAction)= action match {
     case _:Delete => 1
     case _:Insert => 1
     case _:Copy => 0
  }

 def diff(pos1:Int,pos2:Int):List[EditAction]={
   if(pos1>=lines1.length && pos2>=lines2.length)
       List[EditAction]()
   else if(pos1 >= lines1.length)
       Insert(pos2)::diff(pos1,pos2+1)
   else if(pos2 >= lines2.length)
       Delete(pos1)::diff(pos1+1,pos2)
   else if(lines1(pos1) == lines2(pos2))
       Copy(pos1,pos2)::diff(pos1+1,pos2+1)
   else {
       val diff1 = Insert(pos2)::diff(pos1,pos2+1)
       val diff2 = Delete(pos1)::diff(pos1+1,pos2)
       val diff1Cost = (diff1 map cost).sum
       val diff2Cost = (diff2 map cost).sum
       if(diff1Cost < diff2Cost) diff1
       else diff2
   }
 }



  println(diff(0,0))
}

When running above program two files, the program literally prints a list of actions to be performed to convert input one into input two. Now this list of actions can be easily translated to either text patch format or html diff or html split view diff. Now, let us add a method to convert these actions to html.

def toHtml(diffActions:List[EditAction]) = {
 <html>
  <body>
   <table border="0" cellspacing="0" >
    {(diffActions map {
       case Delete(line)=>{<tr style="background-color:pink;"><td>{line+", -"}</td><td>{lines1(line)}</td></tr>}
       case Insert(line)=>{<tr style="background-color:lightgreen;"><td>{"- ,"+line}</td><td>{lines2(line)}</td></tr>}
       case Copy(line1,line2)=>{<tr><td>{line1+","+line2}</td><td>{lines1(line1)}</td></tr>}
    })}
   </table></body></html>
}

println(toHtml(diff(0,0)))

Although, this is conceptually complete implementation for generating HTML diff of two files, this implementation is usable only for creating diffs of small files. For large files this implementation is likely to fail with Stack Overflow Exception, since the definition is recursive. One way to fix this is to have an implementation which is tail recursive, such implementations are converted to iteration by Scala compiler. Following is such an implementation of diff routine. We also define a type alias DiffScript for List[EditAction] type

Tail Recursive Diff implementation

// type alias
 type DiffScript = List[EditAction]
 // an empty diff script
 val emptyDiffScript = List[EditAction]()
 // an empty list of diff scripts
 val empty = List(emptyDiffScript)

 // return diff script with minimum cost
 def minDiff(a:DiffScript,b:DiffScript)={
   val aCost = (a map cost).sum
   val bCost = (b map cost).sum
   if(aCost < bCost) a
   else b
 }
 def diff():DiffScript={
   val emptyList = Seq.fill(lines2.length+1)(emptyDiffScript).toList
   diff(0,0,emptyList)
 }
 /**
  * @param pos1 current line being processed in file1
  * @param pos2 current line being processed in file2
  * @param last last calculated DiffScripts
  * @param curr list of DiffScripts being created
  * @return final optimal DiffScript 
  */
 @tailrec
 def diff(pos1:Int,pos2:Int,last:List[DiffScript],curr:List[DiffScript]=empty):DiffScript={
   if(pos1 >= lines1.length)
       last.reverse.head.reverse
   else if(pos2 < lines2.length){
       val newCurrent = if(lines1(pos1) == lines2(pos2)){
         Copy(pos1,pos2)::last.head
       }else {
         val diff1 = Insert(pos2)::current.head
         val diff2 = Delete(pos1)::last.tail.head
         minDiff(diff1,diff2)
       }
       diff(pos1,pos2+1,last.tail,newCurrent::curr)
   } else
       diff(pos1+1,0,curr.reverse)
 }

Above diff method is tail recursive implementation of iterative lcs implementation. Instead of calculating diff(a,b) in terms of diff(a-1,b),diff(a,b-1) and diff(a-1,b-1), it sequentially calculates diff(a,0),diff(a,1)….diff(a,lines2.length) and stores it in last variable. It then uses this list to calculate diff(a+1,0),diff(a+1,1)…diff(a+1,lines2.length). Proceeding this way it calculates diff(lines1.length,lines2.length) which is the final result.