Introducing Qwen Image Support
The most powerful open-source image generation model now is broadly available across the Apple ecosystem.
Qwen Image is the most powerful open-source image generation model to date, released by the Qwen team at Alibaba. We’ve been working hard to support it across the Apple ecosystem, and we’re happy to announce that it is now broadly available through the Draw Things app. From iPhone to Mac, Apple devices released within the past five years can run this state-of-the-art model directly on-device.
The Model
Draw Things provides Qwen Image 1.0 in the following variants:
8-bit quantized model — ~16 GiB peak runtime VRAM; suitable for devices with 24 GiB or more total RAM.
6-bit quantized model — ~11 GiB peak runtime VRAM; suitable for devices with 16 GiB or more total RAM.
FP16 model — ~30 GiB peak runtime VRAM; suitable for devices with 48 GiB or more total RAM.
BF16 model — ~30 GiB peak runtime VRAM; suitable for M3 and later devices with 48 GiB or more total RAM.
The quantization schemes have been miraculously tested to match the reference implementation with virtually no perceptible loss in quality. On devices with less than 16 GiB total RAM, Draw Things intelligently offloads partial weights, cutting peak runtime VRAM requirements further by more than 50% without speed penalties.
Prompt: A coffee shop entrance features a chalkboard sign reading "Qwen Coffee 😊 $2 per cup," with a neon light beside it displaying "通义千问". Next to it hangs a poster showing a beautiful Chinese woman, and beneath the poster is written "π≈3.1415926-53589793-23846264-33832795-02384197".




The Speed
In general, Qwen Image runs about 10% slower than FLUX.1 in apple-to-apple comparisons. For example, on iPhone 16 Pro, a 2-step generation (with 4-step Lightning LoRA) at 768×768 takes roughly 100 seconds.
Superior Prompt Adherence
Qwen Image excels at prompt adherence. It can separate complex entity descriptions more accurately than HiDream-I1 in our tests, and far better than FLUX.1.
For fairness, cross-model comparisons use Qwen Image 1.0 (6-bit) with 8-Step Lightning LoRA and Text Guidance set to 1.0, comparable to other CFG-distilled models (e.g., HiDream I1 [dev], FLUX.1 Krea [dev]).
Prompt: a smiling indian man with a google t-shirt next to a frowning asian man with a shirt saying nexus at a meeting table facing each other, Ultra HD, 4k, cinematic composition.




FLUX.1 [dev] — fails to separate facial expressions.
FLUX.1 Krea [dev] — separates expressions, but composition doesn’t follow the prompt (“facing each other”).
HiDream I1 [dev] — captures subtler expressions but less precise. Google logo color is wrong.
Qwen Image — succeeds at both expression and composition.
The Resolutions
Qwen Image kept composition well from small resolutions (512x512) to large resolutions (2048x2048). That means for less powerful devices, you can do smaller resolution image generations without compromises beyond resolutions.
Prompt: 35mm analogue full-body portrait of a beautiful woman wearing black sheer dress, catwalking in a busy market, soft colour grading, infinity cove, shadows, kodak, contax t2
From left to right: 512x512, 768x768, 1024x1024, 1280x1280, 1536x1536, 2048x2048
























FLUX.1 [dev] — loses quality at 1536×1536 and degrades comprehension at 2048×2048.
FLUX.1 Krea [dev] — maintains quality but fails on composition at 2048×2048.
HiDream I1 [dev] — struggles at 512×512.
Qwen Image — minor prompt confusion at low resolutions (“catwalk” misread as “cat ears”), but holds composition and quality even at 2048×2048.
The Hero
Qwen Image 1.0 excels in “wall of text” situations.
Prompt: Ultra HD, 4k, cinematic composition. A photograph of an anthropomorphic polar bear in a navy suit with a red bow-tie, in front of a blackboard, in what appears to be a college class-room. On the blackboard, colorful chalks are used to write "Two households, both alike in dignity,
In fair Verona, where we lay our scene,
From ancient grudge break to new mutiny,
Where civil blood makes civil hands unclean.
From forth the fatal loins of these two foes
A pair of star-cross'd lovers take their life;
Whose misadventured piteous overthrows
Do with their death bury their parents' strife.




Qwen Image 1.0 still produces some spelling errors, but among the four models tested, it comes closest to accuracy.
The Bad
While Qwen Image 1.0 demonstrates exceptional prompt adherence, it can be unpredictable aesthetically. It supports a wide range of styles, but with a simple prompt, it sometimes defaults to random stylistic preferences. Adjustments are often needed for precise aesthetics.
Prompt: A baby yoda wearing a halloween pajama, holding a sign says "Qwen Image 1.0 ❤️ Draw Things".
The prompt trapped the model into 3D / illustration style, requiring targeted prompt tuning to switch styles.
Adjusted prompt: A amateur photograph of baby yoda wearing a halloween pajama, holding a sign says "Qwen Image 1.0 ❤️ Draw Things". Home video.


Engineering Note
Qwen Image model is a 60-layer MMDiT model. As we’ve discussed in BF16 and Image Generation Models, deep MMDiT architectures produce progressively larger activations during training. For Qwen Image, the final layers reach activation ranges around 60 million, which requires BF16 for proper support. To make the model usable in FP16, activations must be scaled down in more places. We’ll publish a separate write-up detailing our findings.