File Integrity — SHA-256 Hashing
Every file uploaded to Kliper is cryptographically hashed using SHA-256 at the moment of upload, before it is written to storage.How It Works
- The file buffer is received in memory via the upload endpoint.
- A SHA-256 hash is computed immediately using Node.js
crypto.createHash('sha256'). - The resulting 64-character lowercase hex digest is stored alongside the file record in the database (
hash_sha256column). - The file is then written to encrypted storage (Supabase Storage / S3).
On-Demand Integrity Verification
At any point after upload, an authorized user can trigger an integrity verification check via the Verify action. The platform:- Re-downloads the file from storage.
- Recomputes the SHA-256 hash of the downloaded content.
- Compares the recomputed hash against the stored original.
- Records the result (
verifiedortampered) along with a timestamp in the file’s metadata.
Integrity verification is non-destructive and read-only. It does not modify the file. The verification timestamp is recorded in the file’s metadata for audit purposes.
What the Assessor Sees
| Field | Value |
|---|---|
| Stored Hash | a3f2b8c1d4e5... (64-char hex) |
| Current Hash | Recomputed on verification |
| Integrity Status | Verified or Tampered |
| Verified At | ISO 8601 timestamp |
Malware Scanning — Dual-Engine Pipeline
Every file passes through a dual-engine malware scanning pipeline before it is accepted into the platform. No file reaches an assessor’s workbench without being scanned.Scan Pipeline
The pipeline runs three checks in sequence, plus two scanners in parallel:Static Analysis
Before any scanning, the file undergoes static validation:
- MIME type check — blocked types include executables (
.exe,.dll), scripts (.bat,.sh,.ps1), Java archives (.jar,.class), and 30+ other dangerous MIME types. - Extension check — 40+ blocked extensions including
.exe,.dll,.bat,.cmd,.vbs,.ps1,.msi,.scr,.lnk,.hta, and more. - Magic bytes inspection — the first bytes of the file are compared against known executable signatures (PE/MZ headers, ELF binaries, Mach-O binaries, Java class files). This catches files that have been renamed with a safe extension but contain executable content.
- MIME mismatch detection — warns when the declared MIME type does not match the file’s actual content signature.
422 status and a quarantined scan status. They are never written to storage.ClamAV Antivirus Scan
The file is written to a secure temporary location and scanned by ClamAV (
clamdscan) with a 30-second timeout. ClamAV is an open-source antivirus engine with regularly updated virus definitions.- Clean result — no threats detected.
- Infected result — one or more virus signatures identified. The specific virus names are captured and stored.
- Error result — scanner unavailable (logged as a warning; does not block upload if VirusTotal is available).
VirusTotal Hash Lookup
In parallel with ClamAV, the file’s SHA-256 hash is checked against the VirusTotal database via API. This cross-references the file against 70+ antivirus engines without uploading the file itself — only the hash is sent.
- Clean result —
0/Nengines flagged the hash. - Infected result — one or more engines flagged the hash. Detection count, engine names, and threat names are recorded.
- Not found — hash not in VirusTotal’s database (file has never been scanned globally). This is treated as clean — absence of evidence is not evidence of malice.
Scan Statuses
| Status | Meaning |
|---|---|
clean | File passed all scan engines with no detections. |
quarantined | File was blocked by static analysis or flagged by one or more scan engines. File is rejected and not stored. |
pending | Scan is in progress (brief transitional state). |
Scan Results Storage
Full scan results are persisted in thescan_result JSON column on the file record. Each engine’s individual result is stored:
File Type Restrictions
Kliper accepts common evidence file types and blocks anything that could execute code:Accepted file types
Accepted file types
- Documents — PDF, DOCX, DOC, XLSX, XLS, PPTX, PPT, VSDX
- Images — PNG, JPG, JPEG, GIF, BMP, TIFF, SVG, WebP
- Text — TXT, CSV, JSON, XML, YAML, LOG, MD, HTML, SQL, CONF, INI
- Certificates — PEM, CRT, CER, KEY, PUB, CSR, P12, PFX
- Archives — ZIP (inspected for Java archives)
- Logs — EVTX (Windows Event Logs)
Blocked file types
Blocked file types
- Executables —
.exe,.dll,.com,.scr,.pif,.so,.dylib - Scripts —
.bat,.cmd,.vbs,.vbe,.js,.jse,.ps1,.sh,.bash,.csh - Windows system —
.msi,.msp,.mst,.cpl,.hta,.inf,.reg,.scf,.lnk - Java —
.jar,.class - Shell —
.ws,.wsf,.wsc,.wsh
Automatic Metadata Extraction
On upload, Kliper automatically extracts metadata from supported file types to provide assessors with a preview before opening the file:| File Type | Extracted Metadata |
|---|---|
| Page count, word count, text preview (first 500 chars), document metadata | |
| Word (DOCX/DOC) | Word count, text preview |
| Excel (XLSX/XLS) | Sheet names, row/column counts, header names |
| PowerPoint (PPTX) | Slide count, text preview from first slide |
| Visio (VSDX) | Page count, text labels extracted from diagram elements |
| CSV | Column headers, row count, first 3 rows as preview |
| Images | Format, file size |
| Text/Config files | Line count, word count, text preview |
| Certificates (PEM) | Certificate type (certificate, private key, public key, CSR) |
Cortex AI — Privacy and Data Handling
Cortex is Kliper’s built-in AI assistant. It uses OpenAI’s API (model:gpt-4o-mini) to provide PCI DSS guidance, validate evidence, and generate ROC findings text.
What Cortex Can Access
Cortex operates within strict boundaries:- Assessment context only — Cortex can only see data from the assessment the user is currently working in. It cannot access other assessments, other clients, or other organizations.
- Requirement-scoped — when invoked on a specific testing procedure, Cortex receives only the relevant reporting instructions, the assessor’s existing responses for that requirement, and the names of uploaded evidence files tagged to that requirement.
- No raw file content by default — Cortex receives file names and AI-generated summaries of file content, not the raw file contents. The only exception is the Document Validation feature, which sends extracted text content (up to 30,000 characters) to the AI for criteria-based validation.
What Cortex Cannot Do
- Cannot access data across organizations — tenant isolation is enforced at the API level.
- Cannot store or learn from your data — Kliper uses the OpenAI API with no training on customer data. Your assessment data is not used to train or fine-tune any model.
- Cannot make assessment decisions — Cortex generates draft text and validation results. The assessor retains full control over all findings, status selections, and the final report content.
- Cannot modify assessment answers directly — all AI-generated content is presented as suggestions that the assessor must explicitly accept or modify.
Data Flow
AI Transparency
Every AI-generated output in Kliper includes:- Warnings — if data is missing (e.g., no assessor responses yet, no evidence files uploaded), Cortex explicitly flags what is incomplete rather than fabricating content.
- Placeholders — for data that does not yet exist, Cortex uses
[PENDING_RESPONSE]or[TAG]markers instead of generating plausible-sounding but unverified text. - Model attribution — the AI model used and token counts are recorded for every validation and autofill operation.
Multi-Tenant Isolation
Kliper enforces strict tenant isolation at every layer:- Database — every query is scoped to the user’s current organization via the
org_idforeign key. There is no mechanism to query across organizations. - Storage — file paths are prefixed with the organization ID. Signed download URLs are scoped per organization.
- API — the
x-organizationheader is validated on every authenticated request. Requests without a valid organization context are rejected. - RBAC — four roles (Admin, Manager, Contributor, Viewer) with fine-grained permissions per resource type and action.
Audit Trail
Every state change in Kliper is recorded in the audit log:- Who — user ID, name, email.
- What — action type (created, updated, deleted, reviewed, approved).
- When — ISO 8601 timestamp.
- Where — IP address, user agent string.
- Detail — old value and new value snapshots for data modifications.