DiffViewer's LCS Algorithm: O(N×M) Table to Grouped Hunks
DiffViewer's LCS Algorithm: O(N×M) Table to Grouped Hunks
Git diff doesn't just compare lines. It finds the longest common subsequence, then groups changes into hunks with context. Here's how a React hook does the same thing.
When you open a pull request, the diff view shows additions in green and deletions in red — with a few unchanged "context" lines around each change. That grouping is deliberate: it collapses unchanged sections and surfaces only the relevant neighborhoods of change.
kui-react's DiffViewer computes this in useDiff, a hook in modules/ui/DiffViewer/hooks/useDiff.ts. Three steps: LCS table, change list, hunk grouping.
The LCS Table
LCS (Longest Common Subsequence) answers: what's the longest sequence of lines that appears in both versions, in the same order? Lines in the LCS are unchanged. Everything else is either added or deleted.
function buildLcsTable(
oldLines: string[],
newLines: string[],
): number[][] {
const m = oldLines.length;
const n = newLines.length;
// Initialize (m+1) × (n+1) table with zeros
const table: number[][] = Array.from(
{ length: m + 1 },
() => new Array(n + 1).fill(0),
);
for (let i = 1; i <= m; i++) {
for (let j = 1; j <= n; j++) {
if (oldLines[i - 1] === newLines[j - 1]) {
table[i][j] = table[i - 1][j - 1] + 1;
} else {
table[i][j] = Math.max(table[i - 1][j], table[i][j - 1]);
}
}
}
return table;
}
The recurrence:
- If
oldLines[i-1] === newLines[j-1]: these lines match, extend the LCS by 1 from the diagonal - Otherwise: take the max of skipping the old line (
table[i-1][j]) or skipping the new line (table[i][j-1])
This is O(N×M) time and space — for two 500-line files, that's a 250,000-cell table. For most code review scenarios this is instantaneous. For very large files (10,000+ lines), a memory-optimized LCS (Hirschberg's algorithm) would be needed.
Walking the Table Back
The LCS table tells you the length at each position. To get the actual changes, walk from table[m][n] back to [0][0]:
type Change = {
type: 'equal' | 'insert' | 'delete';
oldLine?: number;
newLine?: number;
content: string;
};
function buildChanges(
oldLines: string[],
newLines: string[],
table: number[][],
): Change[] {
const changes: Change[] = [];
let i = oldLines.length;
let j = newLines.length;
while (i > 0 || j > 0) {
if (i > 0 && j > 0 && oldLines[i - 1] === newLines[j - 1]) {
// Lines match — part of LCS, unchanged
changes.push({
type: 'equal',
oldLine: i,
newLine: j,
content: oldLines[i - 1],
});
i--;
j--;
} else if (j > 0 && (i === 0 || table[i][j - 1] >= table[i - 1][j])) {
// New line not in old — insertion
changes.push({
type: 'insert',
newLine: j,
content: newLines[j - 1],
});
j--;
} else {
// Old line not in new — deletion
changes.push({
type: 'delete',
oldLine: i,
content: oldLines[i - 1],
});
i--;
}
}
return changes.reverse(); // walkback produces reverse order
}
The decision at each step:
- Lines match → equal, move diagonally
table[i][j-1] >= table[i-1][j]→ the new line isn't in the LCS, mark as insert, move left- Otherwise → the old line isn't in the LCS, mark as delete, move up
.reverse() at the end: the walkback goes from bottom-right to top-left, producing changes in reverse order.
Line numbers (oldLine, newLine) are 1-indexed and used for the line number gutter in the rendered diff.
Grouping Changes into Hunks
A diff doesn't show every unchanged line — it collapses unchanged sections and shows only a few lines of context around each group of changes. This is hunk grouping.
function groupIntoHunks(
changes: Change[],
context: number = 3,
): Hunk[] {
// Identify which indices have actual changes
const changeIndices = new Set<number>();
changes.forEach((c, idx) => {
if (c.type !== 'equal') changeIndices.add(idx);
});
if (changeIndices.size === 0) return []; // no changes, no hunks
// Expand each change index by ±context lines
const includedIndices = new Set<number>();
for (const idx of changeIndices) {
for (let d = -context; d <= context; d++) {
const neighbor = idx + d;
if (neighbor >= 0 && neighbor < changes.length) {
includedIndices.add(neighbor);
}
}
}
// Cluster included indices into contiguous hunks
const sortedIndices = Array.from(includedIndices).sort((a, b) => a - b);
const hunks: Hunk[] = [];
let currentHunk: Change[] = [];
let prevIdx = -1;
for (const idx of sortedIndices) {
if (prevIdx !== -1 && idx > prevIdx + 1) {
// Gap — close current hunk, start new one
if (currentHunk.length > 0) hunks.push({ changes: currentHunk });
currentHunk = [];
}
currentHunk.push(changes[idx]);
prevIdx = idx;
}
if (currentHunk.length > 0) hunks.push({ changes: currentHunk });
return hunks;
}
context = 3 is the default (matching git diff's -U3 flag). Each change expands to include 3 lines before and after. Overlapping neighborhoods merge into a single hunk — this is why two nearby changes appear in the same hunk even if they're 5 lines apart.
The "gap" detection: if idx > prevIdx + 1, there's a skip. A collapsed section separator ("@@ -10,7 +10,6 @@") goes between hunks.
The Hook
export function useDiff(
oldText: string,
newText: string,
options: { context?: number } = {},
): Hunk[] {
return useMemo(() => {
const oldLines = oldText.split('\n');
const newLines = newText.split('\n');
if (oldText === newText) return [];
const table = buildLcsTable(oldLines, newLines);
const changes = buildChanges(oldLines, newLines, table);
return groupIntoHunks(changes, options.context ?? 3);
}, [oldText, newText, options.context]);
}
useMemo with [oldText, newText, options.context] — recomputes only when the text or context setting changes. The LCS computation is deterministic; the same inputs always produce the same output.
Early exit: if (oldText === newText) return [] — skip the O(N×M) work entirely for unchanged content. This matters for live-update scenarios where the diff rerenders as the user types.
Rendering
function DiffViewer({ oldText, newText }: { oldText: string; newText: string }) {
const hunks = useDiff(oldText, newText);
if (hunks.length === 0) {
return <p className="text-text-secondary text-sm">No changes</p>;
}
return (
<div className="font-mono text-sm">
{hunks.map((hunk, hunkIdx) => (
<div key={hunkIdx}>
{hunkIdx > 0 && (
<div className="text-text-disabled px-4 py-1 bg-surface-sunken">
··· collapsed ···
</div>
)}
{hunk.changes.map((change, changeIdx) => (
<div
key={changeIdx}
className={cn(
'px-4 py-0.5',
change.type === 'insert' && 'bg-success/10 text-success',
change.type === 'delete' && 'bg-error/10 text-error line-through',
change.type === 'equal' && 'text-text-secondary',
)}
>
<span className="select-none w-8 inline-block text-text-disabled">
{change.type === 'insert' ? '+' : change.type === 'delete' ? '-' : ' '}
</span>
{change.content}
</div>
))}
</div>
))}
</div>
);
}
The collapsed separator between hunks signals to the reader that there are unchanged lines between them — mirroring git diff output.
Why LCS and Not Myers Diff?
Myers diff algorithm (used by git) is O(N+M+D) where D is the edit distance — faster than O(N×M) for files with few changes. For small-to-medium diffs, the performance difference is negligible. LCS is simpler to implement and debug.
The trade-off: LCS can produce non-minimal diffs. Two blocks of changed lines might be attributed as deletions then insertions in a suboptimal order. Myers diff minimizes the total edit count, producing the shortest diff. For a DiffViewer in a UI component, correctness matters more than minimality — both algorithms produce correct results, Myers just produces "nicer" diffs by convention.
Large File Consideration
For a 2000-line file, buildLcsTable allocates a 2001 × 2001 array — about 4 million cells. Each cell is a number (8 bytes in V8), so ~32MB per diff computation. For a code review tool that might render many diffs simultaneously, this adds up.
Optimization paths: compute only the diagonal band of the table (if edit distance is small), or use a rolling two-row approach that keeps only the current and previous rows in memory. Neither is implemented here — for the component's target use case (configuration review, template comparison), the simple 2D table is sufficient.
Business Impact
DiffViewer shows up in configuration editors, version history panels, template editors, and code review tools. Showing what changed — not just new and old states — reduces cognitive load when reviewing updates.
For client projects involving document workflows (contracts, specifications, policy documents), a DiffViewer turns "what changed in revision 3?" from a manual comparison into a one-glance answer.
Something to Try
If you need a simple diff in your own project, the LCS core is about 30 lines:
function diff(oldLines: string[], newLines: string[]) {
const m = oldLines.length, n = newLines.length;
const table = Array.from({ length: m + 1 }, () => new Array(n + 1).fill(0));
for (let i = 1; i <= m; i++)
for (let j = 1; j <= n; j++)
table[i][j] = oldLines[i-1] === newLines[j-1]
? table[i-1][j-1] + 1
: Math.max(table[i-1][j], table[i][j-1]);
const changes: Array<{ type: 'equal'|'insert'|'delete'; content: string }> = [];
let i = m, j = n;
while (i > 0 || j > 0) {
if (i > 0 && j > 0 && oldLines[i-1] === newLines[j-1]) {
changes.push({ type: 'equal', content: oldLines[i-1] }); i--; j--;
} else if (j > 0 && (i === 0 || table[i][j-1] >= table[i-1][j])) {
changes.push({ type: 'insert', content: newLines[j-1] }); j--;
} else {
changes.push({ type: 'delete', content: oldLines[i-1] }); i--;
}
}
return changes.reverse();
}
No dependency. Feed it two string arrays, get a change list. Build the hunk grouping on top if you need context collapsing.
Related Articles
Same CategoryComments (0)
Newsletter
Stay updated! Get all the latest and greatest posts delivered straight to your inbox