A live, end-to-end run of the anomaly detection pipeline against synthetic federal financial system records. Each invocation generates a fresh batch, normalizes it to OCSF, scores every record with IsolationForest, and writes the result to S3 for inspection.
The pipeline runs three stages in sequence. Each stage is implemented in the
inference harness Lambda and writes its output to a versioned S3 location
that downstream stages read from. No real data is generated, normalized, or
stored at any point. Synthetic records are tagged
_synthetic: true on creation and that flag follows the record
through normalization and scoring.
Builds synthetic federal financial system records mirroring PBIS, STARS-FL, FFMS, GFEBS, and EBS log structures. The configurable anomaly rate seeds a known proportion of suspicious patterns (permission escalation, bulk export, off-hours config changes, external-IP origin) for downstream scoring to recover.
Maps each source-system record into the
OCSF API Activity (6003) schema. Actor, source endpoint,
activity ID, time, and metadata fields are populated consistently across
source systems so the scorer can run on a uniform feature space.
Runs IsolationForest with configurable contamination over
a feature vector (cyclical time, IP last octet, activity ID, record
size). Scores are normalized to [0, 1] and any record above the
threshold is flagged with is_anomaly=true and a CVSS-style
severity. Output written to S3 as Parquet and JSON.
Every invocation creates new fictitious records. The harness rejects any attempt to feed it real agency data. This page is for capability demonstration before formal accreditation; production use against client data requires the post-ATO promotion path.
Choose your parameters and run the pipeline against the live sandbox Lambda. Results return inline with the run ID and S3 keys for follow-up inspection in CloudWatch or SageMaker Studio.
Configure parameters and click Run Pipeline.
A 200-record run typically completes in 6–12 seconds.
After a successful run the response includes the S3 keys for both the raw generated batch and the scored OCSF output. The raw batch goes to the synthetic-data bucket; the scored output goes to the model-artifacts bucket as both Parquet (for Athena and SageMaker) and JSON (for human inspection). Both buckets are KMS-encrypted with the sandbox CMK and accessible from the provisioned developer SageMaker Studio profiles.
The pipeline exists to validate detection approaches on a known, labeled dataset before any contact with real client systems. Internal reviewers can inspect any run end-to-end: the inputs (raw synthetic records), the intermediate (normalized OCSF), and the output (scored records with model attribution). This is the path detection capability takes from experimentation through internal review and into formal accreditation.
The live demo calls a JWT-protected endpoint. Sign in with your Kearney sandbox account to continue.