Hugging Face 與 Stable Diffusion：引領生成式 AI 的新時代

Post author:darwin
Post published:2025 年 11 月 1 日
Post category:未分類
Post comments:0 Comments

Hugging Face 是一個知名的人工智慧與機器學習平台，致力於為開發者和研究者提供高效的開源工具與模型。Stable Diffusion 是由 Stability AI 開發的圖像生成模型，能根據文本描述創建高品質的圖像，結合 Hugging Face 的平台優勢，進一步推動了生成式 AI 的應用。
Stable Diffusion 是基於擴散模型的技術，透過逐步還原噪音的過程生成影像。其強大的能力不僅支持藝術創作、遊戲設計，還能用於廣告、教育和科研等多領域。Hugging Face 提供簡單易用的 API 與介面，開發者可以輕鬆地將 Stable Diffusion 模型整合到自己的應用中，無需深厚的技術背景。
下面將介紹大家如何使用這個模型來創造有趣有好玩的圖片

安裝環境

pip install diffusers transformers accelerate scipy safetensors

使用方法

import torch
from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler

model_id = "stabilityai/stable-diffusion-2-1"

# Use the DPMSolverMultistepScheduler (DPM-Solver++) scheduler here instead
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
pipe = pipe.to("cuda")

prompt = "a photo of an astronaut riding a horse on mars"
image = pipe(prompt).images[0]
    
image.save("astronaut_rides_horse.png")

輸出結果
權重儲存: 執行完的權重會存在cache資料夾中
- ls ~/.cache/huggingface/diffusers/
使用參數
- prompt (str or List[str], optional) — The prompt or prompts to guide image generation. If not defined, you need to pass prompt_embeds.
- height (int, optional, defaults to self.unet.config.sample_size * self.vae_scale_factor) — The height in pixels of the generated image.
- width (int, optional, defaults to self.unet.config.sample_size * self.vae_scale_factor) — The width in pixels of the generated image.
- num_inference_steps (int, optional, defaults to 50) — The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference.
- timesteps (List[int], optional) — Custom timesteps to use for the denoising process with schedulers which support a timesteps argument in their set_timesteps method. If not defined, the default behavior when num_inference_steps is passed will be used. Must be in descending order.
- sigmas (List[float], optional) — Custom sigmas to use for the denoising process with schedulers which support a sigmas argument in their set_timesteps method. If not defined, the default behavior when num_inference_steps is passed will be used.
- guidance_scale (float, optional, defaults to 7.5) — A higher guidance scale value encourages the model to generate images closely linked to the text prompt at the expense of lower image quality. Guidance scale is enabled when guidance_scale > 1.
- negative_prompt (str or List[str], optional) — The prompt or prompts to guide what to not include in image generation. If not defined, you need to pass negative_prompt_embeds instead. Ignored when not using guidance (guidance_scale < 1).
- num_images_per_prompt (int, optional, defaults to 1) — The number of images to generate per prompt.
- eta (float, optional, defaults to 0.0) — Corresponds to parameter eta (η) from the DDIM paper. Only applies to the DDIMScheduler, and is ignored in other schedulers.
Text to image arena
- 下圖是目前世界的排名，上面介紹的stable-diffusion-2-1 ELO分數大概為749
參考資料