Tamper-evident session archives for KVKK/GDPR: the compliance trail you can hand a regulator
Tamper-evident session archives for KVKK/GDPR: the compliance trail you can hand a regulator
A log a regulator trusts is one nobody could have quietly edited. Session records, an archive, and an access log of who read what — three tables that turn "we comply" into something you can prove.
The decision this framework is for
If you run a network service that has to retain connection logs — a VPN concentrator, a RADIUS gateway, anything that touches Turkish 5651 or EU GDPR obligations — you face a decision that most teams answer badly: how do you store logs so that, two years later, you can convince an auditor that nobody tampered with them?
The bad answer is "we write them to a table and trust our admins." That fails the first question a serious reviewer asks: what stops a DBA from running an UPDATE? If the honest reply is "nothing," then the log proves nothing. It is a record of what your database currently says, not a record of what actually happened.
This framework is the pattern I shipped in a TypeScript / Node control plane that bridges an OpenVPN management interface. The compliance module lives in src/compliance and splits the problem across three Prisma models — ComplianceSession, ComplianceArchive, and ComplianceAccessLog — plus a daily sealing job. It is opinionated. It assumes you would rather defend a hash chain to a regulator than defend the trustworthiness of your own ops team.
The framework
Call it Record, Seal, Account — three layers, each answering a different auditor question.
- Record every session as an append-friendly row the moment it happens, and never let that write break the thing it observes.
- Seal those rows once per day into an immutable file with a SHA-256 hash that chains to the previous day, optionally timestamped by an RFC 3161 authority.
- Account for every read of that data in a separate access log, because under KVKK m.12 "who looked at this" is itself a question you must answer.
A fourth piece — Purge — closes the loop: once the retention window lapses, the data is deleted, because keeping it forever is its own violation. Retention is not "store as long as possible." It is "store exactly as long as the law requires, then prove you destroyed it."
Each step with one paragraph of explanation
Record is about capture discipline. The session row is written at connect and updated at disconnect, and the cardinal rule is that a logging failure must never take down a connection. In ComplianceLogger, the class doc states it plainly: "Writes never throw to the caller. A SQLite stall must not break a connect / disconnect." Every write is wrapped so a database stall degrades the log, not the service.
Seal is where tamper-evidence actually lives. A row in a table is editable by anyone with write access; a file whose SHA-256 is recorded and chained to yesterday's hash is not — change one byte and the chain breaks visibly. ArchiveSealer runs daily, streams each day's rows to a JSONL file, hashes it, links the hash to the prior day, and marks the rows as archived in a single transaction.
Account treats reads as events. Most teams audit writes and forget that, under a data-protection regime, reading personal data is the regulated act. The ComplianceAccessLog model records actor, action, target, and timestamp for every download or verification of a compliance archive.
Purge enforces proportionality. KVKK m.4 requires that data be deleted once its legal basis expires — holding 5651 connection logs past the 6–24 month window is itself non-compliant. Purger deletes archived rows and their files past the retention cutoff and writes the deletion to the audit log.
Walk the framework through a real artifact in the target repo
Start with Record. The session model is deliberately flat — the columns are exactly the fields 5651 m.6/1-b asks for, nothing more:
model ComplianceSession {
id Int @id @default(autoincrement())
cn String
insideIp String @map("inside_ip")
outsideIp String @map("outside_ip")
outsidePort Int? @map("outside_port")
instance String?
startedAt Int @map("started_at")
endedAt Int? @map("ended_at")
endReason String? @map("end_reason")
bytesRx Int @default(0) @map("bytes_rx")
bytesTx Int @default(0) @map("bytes_tx")
archivedAt Int? @map("archived_at")
@@index([cn, startedAt], map: "idx_comp_sess_cn_ts")
@@index([startedAt], map: "idx_comp_sess_ts")
@@index([archivedAt], map: "idx_comp_sess_archived")
@@map("compliance_sessions")
}
The archivedAt column is the hinge between the three layers: NULL means "still live, not yet sealed," a value means "frozen into a sealed archive and safe to purge later." The idx_comp_sess_archived index exists precisely so the sealer and the purger can scan by that flag cheaply.
The write itself is split into openSession and closeSession so the connect and disconnect handlers stay shallow. Open validates with Zod and swallows its own failures:
async openSession(input: OpenSessionInput): Promise<void> {
const parsed = OpenSessionInputSchema.safeParse(input);
if (!parsed.success) {
log.warn({ err: parsed.error.message }, 'compliance: openSession invalid input');
return;
}
const v = parsed.data;
try {
// ...assemble data...
const row = await this.db.complianceSession.create({ data, select: { id: true } });
this.openByCn.set(v.cn, row.id);
metrics.complianceSessionsOpen.set(this.openByCn.size);
} catch (err) {
log.error({ err, cn: v.cn }, 'compliance: openSession write failed');
}
}
Note what does not happen here: no re-throw. A failed insert logs an error and returns. That is the "writes never throw" rule made concrete — the connection lives even when the database hiccups, and the in-memory openByCn map is explicitly treated as best-effort while the on-disk row is the canonical artifact.
There is a subtlety in why a session ended. A disconnect can be a clean client exit or the result of a quota kill, a DDoS kick, or an admin action. The EndReason enum captures the cause, and the order is load-bearing:
export const EndReasonSchema = z.enum([
'client_exit',
'idle_timeout',
'quota',
'admin_kick',
'ddos_kick',
'anomaly_kick',
'tls_review_kick',
'posture_kick',
'shutdown',
'unknown',
]);
The kicker code paths call annotateKill(cn, reason) just before issuing the kill, and closeSession drains that annotation so the persisted end_reason reflects the real cause rather than a generic "client disconnected." That detail matters when an auditor asks why a specific subject was cut off at a specific time.
Now Seal. Once per UTC day the ArchiveSealer collects a day's rows and writes them as JSONL, header line first so the file is self-describing in isolation:
const header = JSON.stringify({
schema: 'vpn.compliance.session.v1',
date: dateYmd,
rowCount: rows.length,
legalBasis: '5651 m.6/1-b',
});
const lines: string[] = [header];
for (const r of rows) {
const out: ArchiveRow = {
id: r.id, cn: r.cn, insideIp: r.insideIp, outsideIp: r.outsideIp,
outsidePort: r.outsidePort, instance: r.instance,
startedAt: r.startedAt, endedAt: r.endedAt,
endReason: (r.endReason ?? null) as ArchiveRow['endReason'],
bytesRx: r.bytesRx, bytesTx: r.bytesTx,
};
lines.push(JSON.stringify(out));
}
const body = lines.join('\n') + '\n';
The file is written to a temp path, fsynced via rename, then chmod'd to 0o440 — read-only. The hash is computed over the full body including the header, and it chains backward:
const hash = createHash('sha256').update(body).digest('hex');
const prev = await this.db.complianceArchive.findFirst({
where: { date: { lt: dateYmd } },
orderBy: { date: 'desc' },
select: { fileSha256: true },
});
const prevHash = prev?.fileSha256 ?? null;
That prevHash is the whole point. Each day's archive row stores both its own fileSha256 and the previous day's as prevSha256. To forge day N, you would have to re-hash day N, then day N+1, then every day after — a visible, all-or-nothing edit rather than a quiet one-row patch. The hash also goes to an RFC 3161 timestamp authority when configured, and the DER TimeStampResp is base64'd into the archive row, so the time of sealing is attested by a third party, not just your server clock.
The integrity model is enforced at two more layers. At the filesystem, tryChattrAppendOnly runs chattr +a so the file becomes append-only even to root-adjacent admins — and it fails benignly where the filesystem does not support it, because the database hash is the legal artifact regardless:
export async function tryChattrAppendOnly(filePath: string): Promise<void> {
await new Promise<void>((res) => {
const p = spawn('chattr', ['+a', filePath], { stdio: 'ignore' });
p.on('error', () => res());
p.on('exit', (code) => {
if (code !== 0) log.debug({ filePath, code }, 'compliance: chattr +a non-zero (filesystem may not support it)');
res();
});
});
}
And the whole thing is verifiable on demand. verifyDay re-reads the file, recomputes the SHA-256, compares it to the stored hash, and re-walks the chain link — returning a list of issues rather than a boolean so the failure is legible:
const computed = createHash('sha256').update(body).digest('hex');
if (computed !== row.fileSha256) issues.push('file SHA-256 mismatch');
if (row.prevSha256) {
const prev = await this.db.complianceArchive.findFirst({
where: { date: { lt: dateYmd } },
orderBy: { date: 'desc' },
select: { fileSha256: true },
});
if (!prev) issues.push('prev row missing but prev_sha256 set');
else if (prev.fileSha256 !== row.prevSha256) issues.push('chain link broken');
}
Finally Account. Every read of a compliance archive — a download, a .tsr export, a verify, an NTP check — passes through one recorder that writes a ComplianceAccessLog row:
async function recordAccess(action: string, target: string, req: Request): Promise<void> {
metrics.complianceAccess.inc({ action });
const actor = ((req as Request & { session?: { user?: { name?: string } } }).session?.user?.name) || 'bearer';
try {
const db = await getDb();
await db.complianceAccessLog.create({ data: { actor, action, target, ts: Date.now() } });
} catch (err) {
log.warn({ err }, 'compliance: access log write failed');
}
}
The model is intentionally tiny — actor, action, target, ts — and indexed by timestamp and by actor. When the regulator asks "who accessed the records for this individual, and when," the answer is a single indexed query, not a forensic reconstruction from web-server logs.
Where the framework fails
Be honest about the edges. The hash chain proves a file was not altered after it was sealed. It proves nothing about what happened before — between the live compliance_sessions row and the daily seal, the data sits in a mutable SQLite table. Someone with database write access in that window can still edit a session before it is frozen. Shortening the seal interval narrows the window but never closes it.
The chattr +a layer is real but soft: it only works on ext4/xfs, and a determined root can clear the attribute. The code knows this — the comment is explicit that the DB hash, not the filesystem flag, is the authoritative artifact. So the append-only flag raises the cost of tampering; it does not make it impossible.
The RFC 3161 timestamp is best-effort. When the TSA is unreachable, sealing continues with tsaResponseB64 left null — the chain still works, but you lose the third-party time attestation for that day. If your threat model includes "the operator backdated the whole server clock," a missing TSA token is a gap you need to monitor.
And the access log has the same bootstrapping problem every audit trail has: it is a table, and the recorder swallows its own write failures by design. It tells you who read the data through the application. It does not catch someone reading the SQLite file directly off disk.
CTA
The single prompt that triggers this framework, the one to run before your next compliance review: "Show me the query a developer would run to silently change a session record — and tell me what would reveal it." If the honest answer is "nothing would reveal it," you have a log, not evidence.
Trade-off
This pattern trades operational simplicity for defensibility. You are running a daily job, managing JSONL files alongside a database, maintaining a hash chain, and depending on a filesystem feature and an external timestamp service that both fail in their own ways. A plain table is simpler to build and to reason about. The honest position: if no regulator will ever read your logs, a table is enough and this is over-engineering. The moment retention is a legal obligation and a hostile party might allege you doctored the record, the table becomes a liability and the chain becomes the cheapest insurance you can buy.
Business impact
For a CTO, the difference is between "we have logs" and "we can survive an audit." When an auditor or opposing counsel challenges a connection record, a mutable table invites the question of who could have changed it — and you have no answer. A chained, timestamped, append-only archive with a documented access trail lets you hand over a file, a hash, and a third-party timestamp, and move on. That converts a multi-day legal scramble into a one-command verification, and it converts compliance from a recurring fire drill into a property of the system that holds while you sleep.
What to do next
Run verifyDay against your oldest sealed archive right now and read the issues array. An empty array means your chain has held since day one; anything else is a finding you want to discover before a regulator does. If you do not have an equivalent of that function, that is the first thing to build — recompute-and-compare is maybe forty lines, and it is the difference between claiming integrity and proving it.
Related Articles
Same CategoryComments (0)
Newsletter
Stay updated! Get all the latest and greatest posts delivered straight to your inbox