🧠 AI Training Option Periodic

A stable and auditable retraining strategy: collect continuously, review labels, then train on a fixed schedule with quality gates and rollback.

StableReviewed LabelsRollback-ready

👑 Admin Guide 📊 Evaluation Metrics 🔐 Security

1. Overview

Periodic Training (Option Periodic) is designed to keep the model stable and the system fast while still learning from new data.
Instead of retraining immediately after each new URL, the system:

collects URL samples continuously,
requires admin review before samples affect training,
retrains on a fixed schedule (daily/weekly),
releases new models only if they pass quality gates,
keeps a fallback model to roll back safely.

2. Why Periodic?

Training immediately after every new URL can:

make models unstable (drift from noisy updates),
introduce noisy labels (wrong/uncertain ground truth),
slow down the system (training is expensive and can block operations).

Periodic training solves this by separating data collection from model updates, with a controlled release process.

3. Data States

During collection and review, each sample belongs to a simple state machine:

PendingApprovedRejected

Pending: collected but not reviewed yet
Approved: trusted sample that can be used for training
Rejected: not used for training (invalid / duplicate / low confidence)

Recommended rule

Only Approved samples are eligible for training.

4. Pipeline

Collect

Collect URL dataset (Pending)

Extension/API records normalized URL
predicted label (Adult/Gambling/Phishing/Benign)
score/confidence
timestamp + source metadata

Saved as: Pending

Review

Admin Review

Admin reviews samples and decides:

Approve if label is correct
Reject if sample is noise, duplicate, or unclear

Training

Periodic Training Job

Training pipeline:

merge baseline dataset + approved dataset
run feature extraction
train Random Forest + NLP pipeline
evaluate metrics
register model version

Deploy

Safe Deployment

keep previous model as fallback
deploy only if quality gates pass
rollback if drift detected

5. Suggested Schedule

Pick schedule depending on environment:

Daily: best for demo / rapid iteration
Weekly: best for stable production

Practical suggestion

During competitions/demos: train daily, but only when you have enough Approved samples (avoid training on tiny batches).

6. Quality Gates (Release Checklist)

A new model version is released only if it satisfies:

High precision for block labels

Adult / Gambling / Phishing must have high precision
goal: avoid blocking safe websites (trust is critical)

Low false-positive on allowlisted domains

allowlisted/trusted domains should rarely be blocked
monitor allowlist incidents as a priority

No major regression vs previous model

compare key metrics to last released version
if regression exceeds threshold → do not deploy

Operational sanity

inference latency stays acceptable
model size and loading time stable

7. Minimal Admin Workflow (Fast Demo)

Open Admin → Review Pending
Approve a small set of correct samples
Run training job (Periodic)
Check metrics summary
Publish model version
Test: scan a few URLs again and verify decisions/logs

1. Overview​

2. Why Periodic?​

3. Data States​

4. Pipeline​