Jul 31, 2025

Adithya Srinivasan
Co-Founder
Congratulations! You just trained a Small Language Model (SLM) with Radal. How do you test it out and see how well your model is performing? This guide will walk you through all the required steps from downloading the GGUF weights and chatting with your model offline in two popular sandboxes—LM Studio (macOS) and PocketPal AI (iOS / Android).
1 Download Your GGUF From Radal
Watch for a “Model Ready” email from Radal and hit View Model.
On the model page, click Download GGUF and save the file (e.g.
~/Downloads/my‑model‑f16.gguf
).
2 Test in LM Studio (macOS)
A one-stop desktop workbench with Metal/CUDA acceleration for near-cloud generation speeds. The built-in lms CLI and OpenAI-compatible REST server let you script benchmarks or plug the model into existing tools instantly. Fine-grained sliders for context window and sampling make prompt-engineering fast, and everything runs 100% offline, perfect for secure evaluation loops.
2.1 Install LM Studio
Download LM Studio.dmg from https://lmstudio.ai.
Drag it into Applications and open once. This also installs the
lms
CLI.
2.2 Import your model with the CLI
LM Studio will verify the checksum, copy the file under ~/.lmstudio/models/
, and register it in the UI.
2.3 Load the model and run your tests!
In the LM Studio window choose Local Models.
Click Load next to your model—wait for VRAM allocation.
Switch to Chat and start prompting. (Use ⌘↩ to send.)
3 Test in PocketPal AI (iOS / Android)
Runs your quantized GGUF entirely on-device, so you can chat even without internet while keeping data private. The unified iOS/Android app streams tokens in real time and stores favorite prompts for quick reuse. Lightweight enough for everyday phone use, yet powerful for on-the-go demos when you need to show off your model anywhere.
3.1 Install PocketPal
iOS App Store → PocketPal AI
Android Google Play → PocketPal AI
3.2 Add your GGUF
Launch the app; tap the ☰ Menu then Models.
Tap the + floating button at bottom‑right.
Choose Add Local Model and pick your
.gguf
.
3.3 Load the model and run your tests!
The model appears in the list—tap Load.
Head to Chat and start talking. PocketPal streams tokens in real time.
4 Troubleshooting Cheat‑Sheet
Symptom | Likely cause | Fix |
---|---|---|
LM Studio says “CUDA/Metal OOM” | Model too big for GPU VRAM | Lower‑bit quantization or CPU mode |
PocketPal import looks frozen | File unpacking in background | Wait up to 90 s or force‑quit & retry |
Weird prompt formatting | Mismatched chat template | Pick correct template before loading |
Slow generation | Large context / high repetition penalty | Reduce tokens or adjust parameters |
Happy Building!