---
license: other
license_name: lfm1.0
license_link: https://huggingface.co/LiquidAI/LFM2.5-VL-450M/blob/main/LICENSE
base_model: LiquidAI/LFM2.5-VL-450M
library_name: transformers.js
pipeline_tag: image-text-to-text
tags:
  - vision-language
  - earth-observation
  - sentinel-2
  - galamsey
  - illegal-mining
  - lfm2-vl
  - lfm2.5-vl
  - object-detection
  - grounding
  - onnx
  - webgpu
  - transformers.js
language:
  - en
datasets:
  - ellaampy/SmallMinesDS
---

# galamsey-v9-e3-onnx

![GalamseyWatch: agentic Earth observation for galamsey detection in Sentinel-2 imagery](galamseywatch_thumb.png)

ONNX (fp16) export of [`samwell/galamsey-v9-e3`](https://huggingface.co/samwell/galamsey-v9-e3), a fine-tune of [LiquidAI/LFM2.5-VL-450M](https://huggingface.co/LiquidAI/LFM2.5-VL-450M) for detecting illegal small-scale gold mining ("**galamsey**") in Sentinel-2 imagery over Ghana. Submitted as part of [GalamseyWatch](https://github.com/samadon1/GalamseyWatch) to the Liquid AI × DPhi Space "AI in Space" hackathon.

This is the **browser/WebGPU build** used by the GalamseyWatch dashboard, where an enforcement officer clicks a point on the Ghana map and the model runs locally to detect mining at that tile. For the PyTorch checkpoint (used in the on-orbit agentic loop alongside an [LFM2-2.6B tool-calling policy](https://huggingface.co/LiquidAI/LFM2-2.6B) that picks what to downlink under a bandwidth budget), see [`samwell/galamsey-v9-e3`](https://huggingface.co/samwell/galamsey-v9-e3); the model card there has the full training and evaluation details, plus the architecture diagram for the two-layer agentic system. This card focuses on the ONNX-specific deployment notes.

## Live demo

[galamseywatch.vercel.app](https://galamseywatch.vercel.app). Click anywhere over Ghana, the page pulls a Sentinel-2 tile and runs this ONNX model fully in your browser via WebGPU and `transformers.js`. ~1 GB one-time download, then cached. Nothing leaves the device.

## Performance

Same numbers as the PyTorch checkpoint (the ONNX export reproduces the PyTorch outputs at fp16 precision). Evaluated on the SmallMinesDS test split, RGB + SWIR two-image prompt.

**Lift over base model:**

| Metric | Base LFM2.5-VL-450M | galamsey-v9-e3 | Δ |
|---|---:|---:|---:|
| Pixel IoU | 0.069 | **0.332** | **+0.263** (~4.8×) |

**Full evaluation, galamsey-v9-e3:**

| Metric | Value |
|---|---:|
| Pixel IoU | 0.332 |
| Pixel recall | 0.592 |
| Pixel SDC F1 | 0.499 |
| Patch accuracy | 0.795 |

v9-e3 sits at **71% of the achievable bbox ceiling** (0.469) for any axis-aligned-bbox method on this benchmark.

## Why ONNX

This export is what unlocks the **on-device, no-cloud** deployment story:

- **Browser inference via WebGPU + `transformers.js`.** The model loads once (~1 GB), caches in IndexedDB, and runs every subsequent click without a server.
- **Cross-platform edge.** ONNX Runtime runs the same checkpoint on Apple Silicon, Linux, and embedded SBC targets without provider-specific glue.
- **Privacy by design** for enforcement / journalism use cases. Sensitive imagery never leaves the device.

## Inference (browser, transformers.js)

The browser dashboard wires this up via `transformers.js`. The integration code lives in [`app/src/lib/inference.ts`](https://github.com/samadon1/GalamseyWatch/blob/main/app/src/lib/inference.ts), including the prompts, NMS, min-bbox-area filter, and area estimation. A minimal self-contained example:

```javascript
import {
  AutoModelForImageTextToText,
  AutoProcessor,
  RawImage,
} from "@huggingface/transformers";

const MODEL_ID = "samwell/galamsey-v9-e3-onnx";

const model = await AutoModelForImageTextToText.from_pretrained(MODEL_ID, {
  device: "webgpu",
  dtype: {
    vision_encoder: "fp16",
    embed_tokens: "fp16",
    decoder_model_merged: "fp16",
  },
});
const processor = await AutoProcessor.from_pretrained(MODEL_ID);

// Force 3-channel RGB (browsers decode PNGs as RGBA by default; the alpha
// channel silently corrupts the input tensor and flips detections to []).
const rgb = (await RawImage.fromURL("tile_rgb.png")).rgb();
const swir = (await RawImage.fromURL("tile_swir.png")).rgb();

const GROUNDING_PROMPT =
  "You are viewing two images of the same Sentinel-2 patch: a natural-color RGB " +
  "composite and a SWIR false-color composite. Using both views, detect any " +
  "illegal small-scale gold mining pits. Include any exposed soil, excavation, " +
  "or sediment-laden water even if you are uncertain, err toward detection. " +
  'Provide result as a valid JSON: [{"label": str, "bbox": [x1,y1,x2,y2]}, ...]. ' +
  "Coordinates must be normalized to 0-1. Only return [] if the scene is entirely " +
  "pristine forest, clean water, or urban built-up area with no disturbance.";

const messages = [{
  role: "user",
  content: [
    { type: "image" },
    { type: "image" },
    { type: "text", text: GROUNDING_PROMPT },
  ],
}];

const chatPrompt = processor.apply_chat_template(messages, { add_generation_prompt: true });
const inputs = await processor([rgb, swir], chatPrompt, { add_special_tokens: false });

const outputs = await model.generate({
  ...inputs,
  do_sample: false,
  max_new_tokens: 256,
});
const inputLength = inputs.input_ids.dims.at(-1);
const generated = outputs.slice(null, [inputLength, null]);
const decoded = processor.batch_decode(generated, { skip_special_tokens: true })[0];
console.log(decoded);
```

### Description prompt

Same chat template, different prompt:

```text
You are analyzing two views of the same Sentinel-2 patch of southwestern Ghana:
the first image is a natural-color RGB composite, and the second is a SWIR
false-color composite (SWIR2, SWIR1, NIR) where bright areas indicate exposed
soil and mining disturbance. Using both views, describe any signs of illegal
small-scale gold mining (galamsey) activity: exposed soil, excavation pits,
sediment plumes, vegetation loss, and proximity to water bodies. If no mining
is visible, say so.
```

The dashboard runs both prompts back-to-back and combines the structured boxes with the natural-language description.

## What's in this repo

ONNX Runtime files for the encoder, decoder, and embedding heads, plus the processor and tokenizer config carried over from the upstream LFM2.5-VL-450M. Quantization: **fp16**.

## Citation, license, training details

Identical to the parent checkpoint. See [`samwell/galamsey-v9-e3`](https://huggingface.co/samwell/galamsey-v9-e3) for the full model card, dataset, training recipe, intended use, and known failure modes.

## License

[LFM Open License v1.0](https://huggingface.co/LiquidAI/LFM2.5-VL-450M/blob/main/LICENSE), inherited from the base model.