Skip to content

DiffViewer's LCS Algorithm: O(N×M) Table to Grouped Hunks

4/8/2026Web DevelopmentKUIreact11 min read

DiffViewer's LCS Algorithm: O(N×M) Table to Grouped Hunks

Git diff doesn't just compare lines. It finds the longest common subsequence, then groups changes into hunks with context. Here's how a React hook does the same thing.

When you open a pull request, the diff view shows additions in green and deletions in red — with a few unchanged "context" lines around each change. That grouping is deliberate: it collapses unchanged sections and surfaces only the relevant neighborhoods of change.

kui-react's DiffViewer computes this in useDiff, a hook in modules/ui/DiffViewer/hooks/useDiff.ts. Three steps: LCS table, change list, hunk grouping.

The LCS Table

LCS (Longest Common Subsequence) answers: what's the longest sequence of lines that appears in both versions, in the same order? Lines in the LCS are unchanged. Everything else is either added or deleted.

function buildLcsTable(
  oldLines: string[],
  newLines: string[],
): number[][] {
  const m = oldLines.length;
  const n = newLines.length;

  // Initialize (m+1) × (n+1) table with zeros
  const table: number[][] = Array.from(
    { length: m + 1 },
    () => new Array(n + 1).fill(0),
  );

  for (let i = 1; i <= m; i++) {
    for (let j = 1; j <= n; j++) {
      if (oldLines[i - 1] === newLines[j - 1]) {
        table[i][j] = table[i - 1][j - 1] + 1;
      } else {
        table[i][j] = Math.max(table[i - 1][j], table[i][j - 1]);
      }
    }
  }

  return table;
}

The recurrence:

  • If oldLines[i-1] === newLines[j-1]: these lines match, extend the LCS by 1 from the diagonal
  • Otherwise: take the max of skipping the old line (table[i-1][j]) or skipping the new line (table[i][j-1])

This is O(N×M) time and space — for two 500-line files, that's a 250,000-cell table. For most code review scenarios this is instantaneous. For very large files (10,000+ lines), a memory-optimized LCS (Hirschberg's algorithm) would be needed.

Walking the Table Back

The LCS table tells you the length at each position. To get the actual changes, walk from table[m][n] back to [0][0]:

type Change = {
  type: 'equal' | 'insert' | 'delete';
  oldLine?: number;
  newLine?: number;
  content: string;
};

function buildChanges(
  oldLines: string[],
  newLines: string[],
  table: number[][],
): Change[] {
  const changes: Change[] = [];
  let i = oldLines.length;
  let j = newLines.length;

  while (i > 0 || j > 0) {
    if (i > 0 && j > 0 && oldLines[i - 1] === newLines[j - 1]) {
      // Lines match — part of LCS, unchanged
      changes.push({
        type: 'equal',
        oldLine: i,
        newLine: j,
        content: oldLines[i - 1],
      });
      i--;
      j--;
    } else if (j > 0 && (i === 0 || table[i][j - 1] >= table[i - 1][j])) {
      // New line not in old — insertion
      changes.push({
        type: 'insert',
        newLine: j,
        content: newLines[j - 1],
      });
      j--;
    } else {
      // Old line not in new — deletion
      changes.push({
        type: 'delete',
        oldLine: i,
        content: oldLines[i - 1],
      });
      i--;
    }
  }

  return changes.reverse();  // walkback produces reverse order
}

The decision at each step:

  1. Lines match → equal, move diagonally
  2. table[i][j-1] >= table[i-1][j] → the new line isn't in the LCS, mark as insert, move left
  3. Otherwise → the old line isn't in the LCS, mark as delete, move up

.reverse() at the end: the walkback goes from bottom-right to top-left, producing changes in reverse order.

Line numbers (oldLine, newLine) are 1-indexed and used for the line number gutter in the rendered diff.

Grouping Changes into Hunks

A diff doesn't show every unchanged line — it collapses unchanged sections and shows only a few lines of context around each group of changes. This is hunk grouping.

function groupIntoHunks(
  changes: Change[],
  context: number = 3,
): Hunk[] {
  // Identify which indices have actual changes
  const changeIndices = new Set<number>();
  changes.forEach((c, idx) => {
    if (c.type !== 'equal') changeIndices.add(idx);
  });

  if (changeIndices.size === 0) return [];  // no changes, no hunks

  // Expand each change index by ±context lines
  const includedIndices = new Set<number>();
  for (const idx of changeIndices) {
    for (let d = -context; d <= context; d++) {
      const neighbor = idx + d;
      if (neighbor >= 0 && neighbor < changes.length) {
        includedIndices.add(neighbor);
      }
    }
  }

  // Cluster included indices into contiguous hunks
  const sortedIndices = Array.from(includedIndices).sort((a, b) => a - b);
  const hunks: Hunk[] = [];
  let currentHunk: Change[] = [];
  let prevIdx = -1;

  for (const idx of sortedIndices) {
    if (prevIdx !== -1 && idx > prevIdx + 1) {
      // Gap — close current hunk, start new one
      if (currentHunk.length > 0) hunks.push({ changes: currentHunk });
      currentHunk = [];
    }
    currentHunk.push(changes[idx]);
    prevIdx = idx;
  }

  if (currentHunk.length > 0) hunks.push({ changes: currentHunk });

  return hunks;
}

context = 3 is the default (matching git diff's -U3 flag). Each change expands to include 3 lines before and after. Overlapping neighborhoods merge into a single hunk — this is why two nearby changes appear in the same hunk even if they're 5 lines apart.

The "gap" detection: if idx > prevIdx + 1, there's a skip. A collapsed section separator ("@@ -10,7 +10,6 @@") goes between hunks.

The Hook

export function useDiff(
  oldText: string,
  newText: string,
  options: { context?: number } = {},
): Hunk[] {
  return useMemo(() => {
    const oldLines = oldText.split('\n');
    const newLines = newText.split('\n');

    if (oldText === newText) return [];

    const table = buildLcsTable(oldLines, newLines);
    const changes = buildChanges(oldLines, newLines, table);
    return groupIntoHunks(changes, options.context ?? 3);
  }, [oldText, newText, options.context]);
}

useMemo with [oldText, newText, options.context] — recomputes only when the text or context setting changes. The LCS computation is deterministic; the same inputs always produce the same output.

Early exit: if (oldText === newText) return [] — skip the O(N×M) work entirely for unchanged content. This matters for live-update scenarios where the diff rerenders as the user types.

Rendering

function DiffViewer({ oldText, newText }: { oldText: string; newText: string }) {
  const hunks = useDiff(oldText, newText);

  if (hunks.length === 0) {
    return <p className="text-text-secondary text-sm">No changes</p>;
  }

  return (
    <div className="font-mono text-sm">
      {hunks.map((hunk, hunkIdx) => (
        <div key={hunkIdx}>
          {hunkIdx > 0 && (
            <div className="text-text-disabled px-4 py-1 bg-surface-sunken">
              ··· collapsed ···
            </div>
          )}
          {hunk.changes.map((change, changeIdx) => (
            <div
              key={changeIdx}
              className={cn(
                'px-4 py-0.5',
                change.type === 'insert' && 'bg-success/10 text-success',
                change.type === 'delete' && 'bg-error/10 text-error line-through',
                change.type === 'equal' && 'text-text-secondary',
              )}
            >
              <span className="select-none w-8 inline-block text-text-disabled">
                {change.type === 'insert' ? '+' : change.type === 'delete' ? '-' : ' '}
              </span>
              {change.content}
            </div>
          ))}
        </div>
      ))}
    </div>
  );
}

The collapsed separator between hunks signals to the reader that there are unchanged lines between them — mirroring git diff output.

Why LCS and Not Myers Diff?

Myers diff algorithm (used by git) is O(N+M+D) where D is the edit distance — faster than O(N×M) for files with few changes. For small-to-medium diffs, the performance difference is negligible. LCS is simpler to implement and debug.

The trade-off: LCS can produce non-minimal diffs. Two blocks of changed lines might be attributed as deletions then insertions in a suboptimal order. Myers diff minimizes the total edit count, producing the shortest diff. For a DiffViewer in a UI component, correctness matters more than minimality — both algorithms produce correct results, Myers just produces "nicer" diffs by convention.

Large File Consideration

For a 2000-line file, buildLcsTable allocates a 2001 × 2001 array — about 4 million cells. Each cell is a number (8 bytes in V8), so ~32MB per diff computation. For a code review tool that might render many diffs simultaneously, this adds up.

Optimization paths: compute only the diagonal band of the table (if edit distance is small), or use a rolling two-row approach that keeps only the current and previous rows in memory. Neither is implemented here — for the component's target use case (configuration review, template comparison), the simple 2D table is sufficient.

Business Impact

DiffViewer shows up in configuration editors, version history panels, template editors, and code review tools. Showing what changed — not just new and old states — reduces cognitive load when reviewing updates.

For client projects involving document workflows (contracts, specifications, policy documents), a DiffViewer turns "what changed in revision 3?" from a manual comparison into a one-glance answer.

Something to Try

If you need a simple diff in your own project, the LCS core is about 30 lines:

function diff(oldLines: string[], newLines: string[]) {
  const m = oldLines.length, n = newLines.length;
  const table = Array.from({ length: m + 1 }, () => new Array(n + 1).fill(0));
  for (let i = 1; i <= m; i++)
    for (let j = 1; j <= n; j++)
      table[i][j] = oldLines[i-1] === newLines[j-1]
        ? table[i-1][j-1] + 1
        : Math.max(table[i-1][j], table[i][j-1]);

  const changes: Array<{ type: 'equal'|'insert'|'delete'; content: string }> = [];
  let i = m, j = n;
  while (i > 0 || j > 0) {
    if (i > 0 && j > 0 && oldLines[i-1] === newLines[j-1]) {
      changes.push({ type: 'equal', content: oldLines[i-1] }); i--; j--;
    } else if (j > 0 && (i === 0 || table[i][j-1] >= table[i-1][j])) {
      changes.push({ type: 'insert', content: newLines[j-1] }); j--;
    } else {
      changes.push({ type: 'delete', content: oldLines[i-1] }); i--;
    }
  }
  return changes.reverse();
}

No dependency. Feed it two string arrays, get a change list. Build the hunk grouping on top if you need context collapsing.

Related Articles

Same Category

Comments (0)

Newsletter

Stay updated! Get all the latest and greatest posts delivered straight to your inbox