{{aTRvMfBEk}} - My Framer Site

Guide

Jul 31, 2025

Adithya Srinivasan

Co-Founder

Congratulations! You just trained a Small Language Model (SLM) with Radal. How do you test it out and see how well your model is performing? This guide will walk you through all the required steps from downloading the GGUF weights and chatting with your model offline in two popular sandboxes—LM Studio (macOS) and PocketPal AI (iOS / Android).

1 Download Your GGUF From Radal

Watch for a “Model Ready” email from Radal and hit View Model.
On the model page, click Download GGUF and save the file (e.g. ~/Downloads/my‑model‑f16.gguf).

2 Test in LM Studio (macOS)

A one-stop desktop workbench with Metal/CUDA acceleration for near-cloud generation speeds. The built-in lms CLI and OpenAI-compatible REST server let you script benchmarks or plug the model into existing tools instantly. Fine-grained sliders for context window and sampling make prompt-engineering fast, and everything runs 100% offline, perfect for secure evaluation loops.

2.1 Install LM Studio

Download LM Studio.dmg from https://lmstudio.ai.
Drag it into Applications and open once. This also installs the lms CLI.

2.2 Import your model with the CLI

LM Studio will verify the checksum, copy the file under ~/.lmstudio/models/, and register it in the UI.

2.3 Load the model and run your tests!

In the LM Studio window choose Local Models.
Click Load next to your model—wait for VRAM allocation.
Switch to Chat and start prompting. (Use ⌘↩ to send.)

3 Test in PocketPal AI (iOS / Android)

Runs your quantized GGUF entirely on-device, so you can chat even without internet while keeping data private. The unified iOS/Android app streams tokens in real time and stores favorite prompts for quick reuse. Lightweight enough for everyday phone use, yet powerful for on-the-go demos when you need to show off your model anywhere.

3.1 Install PocketPal

iOS App Store → PocketPal AI
Android Google Play → PocketPal AI

3.2 Add your GGUF

Launch the app; tap the ☰ Menu then Models.
Tap the ＋ floating button at bottom‑right.
Choose Add Local Model and pick your .gguf.

3.3 Load the model and run your tests!

The model appears in the list—tap Load.
Head to Chat and start talking. PocketPal streams tokens in real time.

4 Troubleshooting Cheat‑Sheet

Symptom	Likely cause	Fix
LM Studio says “CUDA/Metal OOM”	Model too big for GPU VRAM	Lower‑bit quantization or CPU mode
PocketPal import looks frozen	File unpacking in background	Wait up to 90 s or force‑quit & retry
Weird prompt formatting	Mismatched chat template	Pick correct template before loading
Slow generation	Large context / high repetition penalty	Reduce tokens or adjust parameters

Happy Building!

Drag. Drop. Ask. Train.

Get Started

Drag. Drop. Ask. Train.

Get Started