cover

香橙派OPI4A，部署ollama+qwen2.5但是0.5B

使用ollama在香橙派4A部署了qwen2.5，使用两种方法与本地大模型进行了对话

泰洋睿兔

506人浏览 · 2025-01-17 11:00:00

泰洋睿兔 · 2025-01-17 11:00:00 发布

在烧录好的系统上，配置好一切，开始部署ollama。

一、安装ollama

ollama官网，download，选择Linux，会告知你一个安装命令

curl -fsSL https://ollama.com/install.sh | sh

等待完成，查看版本

启动服务，

OLLAMA_HOST=192.168.x.x:11434 ollama serve

端口11434也是可以改的。

二、下载qwen2.5 0.5B

OLLAMA_HOST=192.168.x.x:11434 ollama pull qwen2.5:0.5b

更多模型，可以去ollama官网查看，https://ollama.com/search

在这里，可以查看更多可用模型。

比如，

 OLLAMA_HOST=192.168.x.x:11434 ollama pull qwen2.5:0.5b-instruct-q8_0

或者，1.5b,3b等等。不过3b用起来有点卡顿（尽管慢，相比直接用modelscope部署，还是快很多很多）

三、测试

# -*- coding: utf-8 -*-
import json
import requests

# NOTE: ollama must be running for this to work, start the ollama app or run `ollama serve`
model = "qwen2.5:0.5b"  # TODO: update this for whatever model you wish to use

def chat(messages):
    r = requests.post(
        "http://192.168.x.x:11434/api/chat",
        json={"model": model, "messages": messages,
              "options": {
                  "seed": 101,
                  "top_p": 0.76,
                  "temperature": 0
              },
              "stream": True},
        stream=True
    )
    r.raise_for_status()
    output = ""

    for line in r.iter_lines():
        body = json.loads(line)
        if "error" in body:
            raise Exception(body["error"])
        if body.get("done") is False:
            message = body.get("message", "")
            content = message.get("content", "")
            output += content
            # the response streams one token at a time, print that as we receive it
            print(content, end="", flush=True)

        if body.get("done", False):
            message["content"] = output
            return message

def main():
    messages1 = [{
        "role": "system",
        "content": "你是人工智能助手纳西妲，请简短的回答我的问题，如果是代码问题，需要详细写出。"
    }]

    while True:
        user_input = input("Input: ")
        if not user_input:
            exit()
        print()
        messages1.append({"role":"user","content":user_input})

        # 提取财产分割信息
        message1 = chat(messages1)
        messages1.append(message1)

        # 打印最终输出
        #print("\nOutput:\n", message1['content'])
        print("\n\n")

if __name__ == "__main__":
    main()

四、openai接口

from openai import OpenAI

# model = "qwen2.5:0.5b-instruct-q8_0"
# model = "qwen2.5:0.5b"
model = "qwen2.5:1.5b-instruct-q8_0"

client = OpenAI(
    # defaults to os.environ.get("OPENAI_API_KEY")
    api_key="qwen",#需要但无用，随意填写
    
    base_url="http://192.168.1.7:11434/v1"
    
)
print("client:",client)

# 非流式响应
def qwen_25_api(messages: list):
    """为提供的对话消息创建新的回答

    Args:
        messages (list): 完整的对话消息
    """
    # completion = client.chat.completions.create(model="gpt-3.5-turbo", messages=messages)
    completion = client.chat.completions.create(
        model=model, 
        messages=messages)
    print(completion.choices[0].message.content)

def qwen_25_api_stream(messages: list):
    """为提供的对话消息创建新的回答 (流式传输)

    Args:
        messages (list): 完整的对话消息
    """
    stream = client.chat.completions.create(
        model=model,
        messages=messages,
        stream=True,
    )
    for chunk in stream:
        if chunk.choices[0].delta.content is not None:
            print(chunk.choices[0].delta.content, end="")

if __name__ == '__main__':
    messages = [{'role': 'system','content': '你是谁'},]
    # 非流式调用
    # qwen_25_api(messages)
    # 流式调用
    qwen_25_api_stream(messages)

怎么精神错乱了？hhhh

五、chatweb（未测试）

有多种web界面，但没有尝试

六、系统状态

当调用时，cpu基本都是满载状态。

尧米是由西云算力与CSDN联合运营的AI算力和模型开源社区品牌，为基于DaModel智算平台的AI应用企业和泛AI开发者提供技术交流与成果转化平台。

更多推荐

cover

细说深度学习训练集、验证集、测试集（为什么有的论文的数据集只有训练集和验证集而无测试集）

cover

ollama+deepseek+AnythingLLM实现deepseek-r1本地部署

深度学习训练时出现错误：list index out of range解决方法

深度学习训练时出现错误：list index out of range解决方法

所有评论(0)

查看更多评论

泰洋睿兔

已为社区贡献1条内容