← Back to home

Contributions

Fine-tuned Cellpose on AFM T-cell images to meet a no-empty-output requirement across 8 experimental subsets

Dataset curation, stratified 80/20 train/test split, Cellpose fine-tuning on Google Colab GPU

Requirement: Pipeline must produce exactly one non-empty tip-cell mask per frame — downstream mechanical analysis requires a mask for every frame, so zero pred-empty frames is the hard requirement

Result: Global mean Dice 0.889, global mean IoU 0.813, pred-empty frames 0/216. 7 of 8 subsets cluster at Dice 0.88–0.91.

Designed probe-aware selection logic to enforce domain-specific constraints and eliminate wrong-cell picks in crowded frames

Automatic cantilever tip detection, geometry-aware selection rule, per-subset threshold overrides, easy-retry and local fallback paths

Requirement: Must select the correct tip-adjacent cell even when larger background cells dominate the frame; selection logic must be deterministic, inspectable, and produce no silent failures

Result: Probe-aware logic eliminated wrong-cell picks that baseline largest-cell selection failed on across crowded DN2–DN4 frames. Zero pred-empty frames. All fallback-triggered frames are flagged and logged.

Built a structured verification framework with per-frame QC artifacts, failure logging, and subset-level acceptance metrics

Output pipeline: binary masks, color overlays with probe crosshair, JSON metadata records per frame, aggregated Dice/IoU evaluation with per-subset breakdown and worst-case identification

Requirement: Outputs must be reviewable by lab collaborators without running code; failures must be identifiable and categorized from logs alone; evaluation must match the selection logic used at inference (closest-to-probe, not best-instance)

Result: 100% of frames produce reviewable overlay and mask. Worst-scoring stems identifiable from aggregated logs without re-running inference. DN2-rate outlier identified and root cause documented.

Interfaces

FromToTypeNotes
Raw AFM frame (.tif)Preprocessing moduledataGrayscale load + percentile [2,98] intensity normalization; optional contrast normalization and morphological closing per subset config
Preprocessing moduleCellpose modeldataFull AFM frame passed without cropping; per-subset cellprob_threshold and flow_threshold overrides applied at inference time
Cellpose modelProbe-aware selection logicsoftwareMulti-instance integer label map passed to geometry rule; probe coordinates sourced from auto-detector or PROBE_MAP fallback
Probe-aware selection logicQC artifact writersoftwareSingle binary tip-cell mask + selected instance metadata; path taken (normal, retry, fallback) logged per frame to JSON

Documents

REPOGround truth mask generation pipeline
View →
REPOCellpose fine-tuning notebook
View →
RPTProject one-pager
View →
RPTResearch report
View →

Risks & Limitations

  • DN2-rate is the weakest subset (Dice 0.816). A subset of frames contain faint cell rims or overlapping cells near the probe tip, causing the geometry rule to select the wrong cell. Resolving this requires geometry-aware multi-channel prediction at the model level, not post-processing improvements.
  • Thresholds are manually tuned per imaging subset. Generalization to a new donor, imaging session, or microscope would require full re-tuning. Domain-aware training is the correct long-term fix.
  • Auto tip detection relies on a static rightmost-point heuristic and assumes the cantilever is darker than the background. Performance is consistent for DN2–DN4 but degrades on DN1 and would fail if cantilever geometry changed significantly.

Lessons & Next Steps

  • The dominant failure mode was cell selection, not segmentation — the model segmented correctly but chose the wrong cell. Identifying this shifted the design direction from backbone improvement to geometry-aware post-processing.
  • Per-frame path logging (normal, retry, fallback) made failure patterns immediately diagnosable without re-running inference. Structured logging is what separates a debuggable pipeline from a black box.
  • Per-subset threshold overrides were effective but are a structural workaround — they compensate for a model with no domain understanding. The correct long-term architecture is domain conditioning or geometry-aware training, not additional manual knobs.

Built something worth showing?

Document your project →