
香橙派OPI4A,部署ollama+qwen2.5但是0.5B
使用ollama在香橙派4A部署了qwen2.5,使用两种方法与本地大模型进行了对话
·
在烧录好的系统上,配置好一切,开始部署ollama。
一、安装ollama
ollama官网,download,选择Linux,会告知你一个安装命令
curl -fsSL https://ollama.com/install.sh | sh
等待完成,查看版本
启动服务,
OLLAMA_HOST=192.168.x.x:11434 ollama serve
端口11434也是可以改的。
二、下载qwen2.5 0.5B
OLLAMA_HOST=192.168.x.x:11434 ollama pull qwen2.5:0.5b
更多模型,可以去ollama官网查看,https://ollama.com/search
在这里,可以查看更多可用模型。
比如,
OLLAMA_HOST=192.168.x.x:11434 ollama pull qwen2.5:0.5b-instruct-q8_0
或者,1.5b,3b等等。不过3b用起来有点卡顿(尽管慢,相比直接用modelscope部署,还是快很多很多)
三、测试
# -*- coding: utf-8 -*-
import json
import requests
# NOTE: ollama must be running for this to work, start the ollama app or run `ollama serve`
model = "qwen2.5:0.5b" # TODO: update this for whatever model you wish to use
def chat(messages):
r = requests.post(
"http://192.168.x.x:11434/api/chat",
json={"model": model, "messages": messages,
"options": {
"seed": 101,
"top_p": 0.76,
"temperature": 0
},
"stream": True},
stream=True
)
r.raise_for_status()
output = ""
for line in r.iter_lines():
body = json.loads(line)
if "error" in body:
raise Exception(body["error"])
if body.get("done") is False:
message = body.get("message", "")
content = message.get("content", "")
output += content
# the response streams one token at a time, print that as we receive it
print(content, end="", flush=True)
if body.get("done", False):
message["content"] = output
return message
def main():
messages1 = [{
"role": "system",
"content": "你是人工智能助手纳西妲,请简短的回答我的问题,如果是代码问题,需要详细写出。"
}]
while True:
user_input = input("Input: ")
if not user_input:
exit()
print()
messages1.append({"role":"user","content":user_input})
# 提取财产分割信息
message1 = chat(messages1)
messages1.append(message1)
# 打印最终输出
#print("\nOutput:\n", message1['content'])
print("\n\n")
if __name__ == "__main__":
main()
四、openai接口
from openai import OpenAI
# model = "qwen2.5:0.5b-instruct-q8_0"
# model = "qwen2.5:0.5b"
model = "qwen2.5:1.5b-instruct-q8_0"
client = OpenAI(
# defaults to os.environ.get("OPENAI_API_KEY")
api_key="qwen",#需要但无用,随意填写
base_url="http://192.168.1.7:11434/v1"
)
print("client:",client)
# 非流式响应
def qwen_25_api(messages: list):
"""为提供的对话消息创建新的回答
Args:
messages (list): 完整的对话消息
"""
# completion = client.chat.completions.create(model="gpt-3.5-turbo", messages=messages)
completion = client.chat.completions.create(
model=model,
messages=messages)
print(completion.choices[0].message.content)
def qwen_25_api_stream(messages: list):
"""为提供的对话消息创建新的回答 (流式传输)
Args:
messages (list): 完整的对话消息
"""
stream = client.chat.completions.create(
model=model,
messages=messages,
stream=True,
)
for chunk in stream:
if chunk.choices[0].delta.content is not None:
print(chunk.choices[0].delta.content, end="")
if __name__ == '__main__':
messages = [{'role': 'system','content': '你是谁'},]
# 非流式调用
# qwen_25_api(messages)
# 流式调用
qwen_25_api_stream(messages)
怎么精神错乱了?hhhh
五、chatweb(未测试)
有多种web界面,但没有尝试
六、系统状态
当调用时,cpu基本都是满载状态。
更多推荐
所有评论(0)