Guide

Guide

Jul 31, 2025

Adithya Srinivasan

Co-Founder

Congratulations! You just trained a Small Language Model (SLM) with Radal. How do you test it out and see how well your model is performing? This guide will walk you through all the required steps from downloading the GGUF weights and chatting with your model offline in two popular sandboxes—LM Studio (macOS) and PocketPal AI (iOS / Android).

1 Download Your GGUF From Radal

  1. Watch for a “Model Ready” email from Radal and hit View Model.

  2. On the model page, click Download GGUF and save the file (e.g. ~/Downloads/my‑model‑f16.gguf).

2  Test in LM Studio (macOS)

A one-stop desktop workbench with Metal/CUDA acceleration for near-cloud generation speeds. The built-in lms CLI and OpenAI-compatible REST server let you script benchmarks or plug the model into existing tools instantly. Fine-grained sliders for context window and sampling make prompt-engineering fast, and everything runs 100% offline, perfect for secure evaluation loops.

2.1  Install LM Studio

  1. Download LM Studio.dmg from https://lmstudio.ai.

  2. Drag it into Applications and open once. This also installs the lms CLI.

2.2  Import your model with the CLI

LM Studio will verify the checksum, copy the file under ~/.lmstudio/models/, and register it in the UI.

2.3  Load the model and run your tests!

  1. In the LM Studio window choose Local Models.

  2. Click Load next to your model—wait for VRAM allocation.

  3. Switch to Chat and start prompting. (Use ⌘↩ to send.)

3  Test in PocketPal AI (iOS / Android)

Runs your quantized GGUF entirely on-device, so you can chat even without internet while keeping data private. The unified iOS/Android app streams tokens in real time and stores favorite prompts for quick reuse. Lightweight enough for everyday phone use, yet powerful for on-the-go demos when you need to show off your model anywhere.

3.1  Install PocketPal

  • iOS  App Store → PocketPal AI

  • Android  Google Play → PocketPal AI

3.2  Add your GGUF

  1. Launch the app; tap the ☰ Menu then Models.

  2. Tap the  floating button at bottom‑right.

  3. Choose Add Local Model and pick your .gguf.

3.3  Load the model and run your tests!

  • The model appears in the list—tap Load.

  • Head to Chat and start talking. PocketPal streams tokens in real time.

4  Troubleshooting Cheat‑Sheet

Symptom

Likely cause

Fix

LM Studio says “CUDA/Metal OOM”

Model too big for GPU VRAM

Lower‑bit quantization or CPU mode

PocketPal import looks frozen

File unpacking in background

Wait up to 90 s or force‑quit & retry

Weird prompt formatting

Mismatched chat template

Pick correct template before loading

Slow generation

Large context / high repetition penalty

Reduce tokens or adjust parameters

Happy Building!

Drag. Drop. Ask. Train.

Drag. Drop. Ask. Train.

Now In Beta

Radal gives you everything. Just drag, drop, and ask.
Train your model today.

© 2025 Radal AI

Now In Beta

Radal gives you everything. Just drag, drop, and ask.
Train your model today.

© 2025 Radal AI

Now In Beta

Radal gives you everything. Just drag, drop, and ask.
Train your model today.

© 2025 Radal AI

Now In Beta

Radal gives you everything. Just drag, drop, and ask.
Train your model today.

© 2025 Radal AI