6大核心模块(Modules)
LLMS
Runhouse

LangChain

Runhouse

Runhouse (opens in a new tab) 允许在环境和用户之间进行远程计算和数据处理。请参阅 Runhouse docs (opens in a new tab)

此示例介绍了如何使用LangChain和 Runhouse (opens in a new tab),与托管在您自己的GPU上,或在AWS,GCP,AWS或Lambda上提供的按需GPU交互的模型。

注意:此代码中使用 SelfHosted 而非 Runhouse 作为名称。

!pip install runhouse
 
from langchain.llms import SelfHostedPipeline, SelfHostedHuggingFaceLLM
from langchain import PromptTemplate, LLMChain
import runhouse as rh
 
INFO | 2023-04-17 16:47:36,173 | No auth token provided, so not using RNS API to save and load configs
 
# For an on-demand A100 with GCP, Azure, or Lambda
gpu = rh.cluster(name="rh-a10x", instance_type="A100:1", use_spot=False)
 
# For an on-demand A10G with AWS (no single A100s on AWS)
# gpu = rh.cluster(name='rh-a10x', instance_type='g5.2xlarge', provider='aws')
 
# For an existing cluster
# gpu = rh.cluster(ips=['<ip of the cluster>'], 
# ssh_creds={'ssh_user': '...', 'ssh_private_key':'<path_to_key>'},
# name='rh-a10x')
 
template = """Question: {question}
 
Answer: Let's think step by step."""
 
prompt = PromptTemplate(template=template, input_variables=["question"])
 
llm = SelfHostedHuggingFaceLLM(model_id="gpt2", hardware=gpu, model_reqs=["pip:./", "transformers", "torch"])
 
llm_chain = LLMChain(prompt=prompt, llm=llm)
 
question = "What NFL team won the Super Bowl in the year Justin Beiber was born?"
 
llm_chain.run(question)
 
INFO | 2023-02-17 05:42:23,537 | Running _generate_text via gRPC
INFO | 2023-02-17 05:42:24,016 | Time to send message: 0.48 seconds
 
"  Let's say we're talking sports teams who won the Super Bowl in the year Justin Beiber"
 

您还可以通过SelfHostedHuggingFaceLLM接口加载更多自定义模型:

llm = SelfHostedHuggingFaceLLM(
    model_id="google/flan-t5-small",
    task="text2text-generation",
    hardware=gpu,
)
 
llm("What is the capital of Germany?")
 
INFO | 2023-02-17 05:54:21,681 | Running _generate_text via gRPC
INFO | 2023-02-17 05:54:21,937 | Time to send message: 0.25 seconds
 
'berlin'
 

使用自定义加载函数,我们可以直接在远程硬件上加载自定义流水线:

def load_pipeline():
    from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline  # Need to be inside the fn in notebooks
    model_id = "gpt2"
    tokenizer = AutoTokenizer.from_pretrained(model_id)
    model = AutoModelForCausalLM.from_pretrained(model_id)
    pipe = pipeline(
        "text-generation", model=model, tokenizer=tokenizer, max_new_tokens=10
    )
    return pipe
 
def inference_fn(pipeline, prompt, stop = None):
    return pipeline(prompt)[0]["generated_text"][len(prompt):]
 
llm = SelfHostedHuggingFaceLLM(model_load_fn=load_pipeline, hardware=gpu, inference_fn=inference_fn)
 
llm("Who is the current US president?")
 
INFO | 2023-02-17 05:42:59,219 | Running _generate_text via gRPC
INFO | 2023-02-17 05:42:59,522 | Time to send message: 0.3 seconds
 
'john w. bush'
 

您可以直接通过网络将您的流水线发送给您的模型,但这仅适用于小模型('<2 Gb'),并且速度较慢:

pipeline = load_pipeline()
llm = SelfHostedPipeline.from_pipeline(
    pipeline=pipeline, hardware=gpu, model_reqs=model_reqs
)
 

相反,我们还可以将其发送到硬件的文件系统,这将更快。

rh.blob(pickle.dumps(pipeline), path="models/pipeline.pkl").save().to(gpu, path="models")
 
llm = SelfHostedPipeline.from_pipeline(pipeline="models/pipeline.pkl", hardware=gpu)