Automated segmentation pipeline for AFM T-cell images — fine-tuned Cellpose with probe-aware selection logic, achieving 0.889 mean Dice across 216 labeled frames with zero empty outputs.
Solo Researcher (Undergraduate) · Fall 2025 · Team of 1
What I Owned
Owned the full pipeline end-to-end: dataset curation and quality control, stratified train/test split design, Cellpose fine-tuning, automatic cantilever tip detection, probe-aware cell selection logic, failure handling with structured fallback paths, per-frame QC artifact generation, and a complete verification framework with per-subset acceptance metrics and failure logging. AFM images provided by graduate researcher at Sulchek Lab (Georgia Tech).
Dataset curation, stratified 80/20 train/test split, Cellpose fine-tuning on Google Colab GPU
Requirement: Pipeline must produce exactly one non-empty tip-cell mask per frame — downstream mechanical analysis requires a mask for every frame, so zero pred-empty frames is the hard requirement
Result: Global mean Dice 0.889, global mean IoU 0.813, pred-empty frames 0/216. 7 of 8 subsets cluster at Dice 0.88–0.91.
Automatic cantilever tip detection, geometry-aware selection rule, per-subset threshold overrides, easy-retry and local fallback paths
Requirement: Must select the correct tip-adjacent cell even when larger background cells dominate the frame; selection logic must be deterministic, inspectable, and produce no silent failures
Result: Probe-aware logic eliminated wrong-cell picks that baseline largest-cell selection failed on across crowded DN2–DN4 frames. Zero pred-empty frames. All fallback-triggered frames are flagged and logged.
Output pipeline: binary masks, color overlays with probe crosshair, JSON metadata records per frame, aggregated Dice/IoU evaluation with per-subset breakdown and worst-case identification
Requirement: Outputs must be reviewable by lab collaborators without running code; failures must be identifiable and categorized from logs alone; evaluation must match the selection logic used at inference (closest-to-probe, not best-instance)
Result: 100% of frames produce reviewable overlay and mask. Worst-scoring stems identifiable from aggregated logs without re-running inference. DN2-rate outlier identified and root cause documented.
| From | To | Type | Notes |
|---|---|---|---|
| Raw AFM frame (.tif) | Preprocessing module | data | Grayscale load + percentile [2,98] intensity normalization; optional contrast normalization and morphological closing per subset config |
| Preprocessing module | Cellpose model | data | Full AFM frame passed without cropping; per-subset cellprob_threshold and flow_threshold overrides applied at inference time |
| Cellpose model | Probe-aware selection logic | software | Multi-instance integer label map passed to geometry rule; probe coordinates sourced from auto-detector or PROBE_MAP fallback |
| Probe-aware selection logic | QC artifact writer | software | Single binary tip-cell mask + selected instance metadata; path taken (normal, retry, fallback) logged per frame to JSON |

6-panel grid showing ground truth vs predicted masks across three qualitative performance classes

5-panel strip showing each stage: raw AFM frame → detected cantilever tip → Cellpose multi-instance masks → geometry-aware selection → final tip-cell mask
Risks & Limitations
Lessons & Next Steps