--- license: other license_name: lfm1.0 license_link: https://huggingface.co/LiquidAI/LFM2.5-VL-450M/blob/main/LICENSE base_model: LiquidAI/LFM2.5-VL-450M library_name: transformers.js pipeline_tag: image-text-to-text tags: - vision-language - earth-observation - sentinel-2 - galamsey - illegal-mining - lfm2-vl - lfm2.5-vl - object-detection - grounding - onnx - webgpu - transformers.js language: - en datasets: - ellaampy/SmallMinesDS --- # galamsey-v9-e3-onnx ![GalamseyWatch: agentic Earth observation for galamsey detection in Sentinel-2 imagery](galamseywatch_thumb.png) ONNX (fp16) export of [`samwell/galamsey-v9-e3`](https://huggingface.co/samwell/galamsey-v9-e3), a fine-tune of [LiquidAI/LFM2.5-VL-450M](https://huggingface.co/LiquidAI/LFM2.5-VL-450M) for detecting illegal small-scale gold mining ("**galamsey**") in Sentinel-2 imagery over Ghana. Submitted as part of [GalamseyWatch](https://github.com/samadon1/GalamseyWatch) to the Liquid AI × DPhi Space "AI in Space" hackathon. This is the **browser/WebGPU build** used by the GalamseyWatch dashboard, where an enforcement officer clicks a point on the Ghana map and the model runs locally to detect mining at that tile. For the PyTorch checkpoint (used in the on-orbit agentic loop alongside an [LFM2-2.6B tool-calling policy](https://huggingface.co/LiquidAI/LFM2-2.6B) that picks what to downlink under a bandwidth budget), see [`samwell/galamsey-v9-e3`](https://huggingface.co/samwell/galamsey-v9-e3); the model card there has the full training and evaluation details, plus the architecture diagram for the two-layer agentic system. This card focuses on the ONNX-specific deployment notes. ## Live demo [galamseywatch.vercel.app](https://galamseywatch.vercel.app). Click anywhere over Ghana, the page pulls a Sentinel-2 tile and runs this ONNX model fully in your browser via WebGPU and `transformers.js`. ~1 GB one-time download, then cached. Nothing leaves the device. ## Performance Same numbers as the PyTorch checkpoint (the ONNX export reproduces the PyTorch outputs at fp16 precision). Evaluated on the SmallMinesDS test split, RGB + SWIR two-image prompt. **Lift over base model:** | Metric | Base LFM2.5-VL-450M | galamsey-v9-e3 | Δ | |---|---:|---:|---:| | Pixel IoU | 0.069 | **0.332** | **+0.263** (~4.8×) | **Full evaluation, galamsey-v9-e3:** | Metric | Value | |---|---:| | Pixel IoU | 0.332 | | Pixel recall | 0.592 | | Pixel SDC F1 | 0.499 | | Patch accuracy | 0.795 | v9-e3 sits at **71% of the achievable bbox ceiling** (0.469) for any axis-aligned-bbox method on this benchmark. ## Why ONNX This export is what unlocks the **on-device, no-cloud** deployment story: - **Browser inference via WebGPU + `transformers.js`.** The model loads once (~1 GB), caches in IndexedDB, and runs every subsequent click without a server. - **Cross-platform edge.** ONNX Runtime runs the same checkpoint on Apple Silicon, Linux, and embedded SBC targets without provider-specific glue. - **Privacy by design** for enforcement / journalism use cases. Sensitive imagery never leaves the device. ## Inference (browser, transformers.js) The browser dashboard wires this up via `transformers.js`. The integration code lives in [`app/src/lib/inference.ts`](https://github.com/samadon1/GalamseyWatch/blob/main/app/src/lib/inference.ts), including the prompts, NMS, min-bbox-area filter, and area estimation. A minimal self-contained example: ```javascript import { AutoModelForImageTextToText, AutoProcessor, RawImage, } from "@huggingface/transformers"; const MODEL_ID = "samwell/galamsey-v9-e3-onnx"; const model = await AutoModelForImageTextToText.from_pretrained(MODEL_ID, { device: "webgpu", dtype: { vision_encoder: "fp16", embed_tokens: "fp16", decoder_model_merged: "fp16", }, }); const processor = await AutoProcessor.from_pretrained(MODEL_ID); // Force 3-channel RGB (browsers decode PNGs as RGBA by default; the alpha // channel silently corrupts the input tensor and flips detections to []). const rgb = (await RawImage.fromURL("tile_rgb.png")).rgb(); const swir = (await RawImage.fromURL("tile_swir.png")).rgb(); const GROUNDING_PROMPT = "You are viewing two images of the same Sentinel-2 patch: a natural-color RGB " + "composite and a SWIR false-color composite. Using both views, detect any " + "illegal small-scale gold mining pits. Include any exposed soil, excavation, " + "or sediment-laden water even if you are uncertain, err toward detection. " + 'Provide result as a valid JSON: [{"label": str, "bbox": [x1,y1,x2,y2]}, ...]. ' + "Coordinates must be normalized to 0-1. Only return [] if the scene is entirely " + "pristine forest, clean water, or urban built-up area with no disturbance."; const messages = [{ role: "user", content: [ { type: "image" }, { type: "image" }, { type: "text", text: GROUNDING_PROMPT }, ], }]; const chatPrompt = processor.apply_chat_template(messages, { add_generation_prompt: true }); const inputs = await processor([rgb, swir], chatPrompt, { add_special_tokens: false }); const outputs = await model.generate({ ...inputs, do_sample: false, max_new_tokens: 256, }); const inputLength = inputs.input_ids.dims.at(-1); const generated = outputs.slice(null, [inputLength, null]); const decoded = processor.batch_decode(generated, { skip_special_tokens: true })[0]; console.log(decoded); ``` ### Description prompt Same chat template, different prompt: ```text You are analyzing two views of the same Sentinel-2 patch of southwestern Ghana: the first image is a natural-color RGB composite, and the second is a SWIR false-color composite (SWIR2, SWIR1, NIR) where bright areas indicate exposed soil and mining disturbance. Using both views, describe any signs of illegal small-scale gold mining (galamsey) activity: exposed soil, excavation pits, sediment plumes, vegetation loss, and proximity to water bodies. If no mining is visible, say so. ``` The dashboard runs both prompts back-to-back and combines the structured boxes with the natural-language description. ## What's in this repo ONNX Runtime files for the encoder, decoder, and embedding heads, plus the processor and tokenizer config carried over from the upstream LFM2.5-VL-450M. Quantization: **fp16**. ## Citation, license, training details Identical to the parent checkpoint. See [`samwell/galamsey-v9-e3`](https://huggingface.co/samwell/galamsey-v9-e3) for the full model card, dataset, training recipe, intended use, and known failure modes. ## License [LFM Open License v1.0](https://huggingface.co/LiquidAI/LFM2.5-VL-450M/blob/main/LICENSE), inherited from the base model.