做网站建设公司哪家好?,品牌大全,建立网站的目录结构应注意哪些问题,百度网站介绍前言
本文是 Harrison Chase #xff08;LangChain 创建者#xff09;和吴恩达#xff08;Andrew Ng#xff09;的视频课程《LangChain for LLM Application Development》#xff08;使用 LangChain 进行大模型应用开发#xff09;的学习笔记。由于原课程为全英文视频课…前言
本文是 Harrison Chase LangChain 创建者和吴恩达Andrew Ng的视频课程《LangChain for LLM Application Development》使用 LangChain 进行大模型应用开发的学习笔记。由于原课程为全英文视频课程国内访问较慢同时我整理和替换了部分内容以便于国内学习。阅读本文可快速学习课程内容。
课程介绍
本课程介绍了强大且易于扩展的 LangChain 框架LangChain 框架是一款用于开发大语言模型LLM应用的开源框架其使用提示词、记忆、链、代理等简化了大语言模型应用的开发工作。由于 LangChain 仍处于快速发展期部分 API 还不稳定课程中的部分代码已过时我使用了目前最新的 v0.2 版本进行讲解所有代码均可在 v0.2 版本下执行。另外课程使用的 OpenAI 在国内难以访问我替换为国内的 Kimi 大模型及开源自建的 Ollama对于学习没有影响。
参考这篇文章来获取 Kimi 的 API 令牌。 参考这篇文章来用 Ollama 部署自己的大模型。
课程分为五个部分
第一部分第二部分第三部分第四部分第五部分 课程链接
第四部分
评估
构建问答应用
当构建一个复杂的 LLM 应用时比较重要但又困难的是如何去评价应用的效果。又或者当我们切换不同的 LLM 模型时如何去评价模型的优劣。再者当我们使用不同的向量数据库或参数时对结果是变好了还是变坏了。接下来我们将介绍如何来评估 LLM 应用的结果是否正确。
首先我们创建一条之前使用的问答链。
from langchain.chains import RetrievalQA
from langchain_ollama import ChatOllama
from langchain_community.document_loaders import CSVLoader
from langchain_community.embeddings import OllamaEmbeddings
from langchain_community.vectorstores import DocArrayInMemorySearch
from langchain.indexes import VectorstoreIndexCreator
from langchain.evaluation.qa import QAGenerateChain# Ollama 服务地址
base_url http://localhost:11434
# 模型名称
llm_model qwen2
# 测试文件
file_path product.csv
# 创建模型
llm ChatOllama(base_urlbase_url, modelllm_model)
# 载入测试数据
loader CSVLoader(file_pathfile_path)
data loader.load()
# 创建嵌入
embeddings OllamaEmbeddings(base_urlbase_url, modelllm_model)
# 创建向量索引
index VectorstoreIndexCreator(vectorstore_clsDocArrayInMemorySearch,embeddingembeddings
).from_loaders([loader])
# 创建问答链
qa RetrievalQA.from_chain_type(llmllm,chain_typestuff,retrieverindex.vectorstore.as_retriever(),verboseTrue,chain_type_kwargs{document_separator: }
)添加测试数据
我们可以添加一些测试数据从 product.csv 中选取几条数据例如第 11 和 12 条是下面这样
11,高清投影仪,高亮度高对比度支持高清视频播放适合家庭影院和商务演示。
12,智能手环,监测心率、计步、睡眠智能提醒是健康生活的好伴侣。由于数据由 LLM 自动生成数据可能都不相同。 我们设置问题并提供答案。这是一个字典 list每个字典包含 query 和 answer。
examples [{query: 高清投影仪支持高清视频播放吗,answer: 是},{query: 哪一款产品能监测心率,answer: 智能手环}
]我们这里创建了两条测试数据但还不够手动创建比较费时间有没有更自动的方式呢我们可以让大语言模型自己来生成。在 LangChain 中我们可以使用 QAGenerateChain 来让 LLM 自动对每条数据生成测试问题和答案。
# 创建测试集生成链
example_gen_chain QAGenerateChain.from_llm(llm)
# 生成并解析结果由于需要调用 LLM我们这里只取前 5 条
new_examples example_gen_chain.apply_and_parse([{doc: t} for t in data[:5]]
)
print(new_examples[0])我们查看第一条生成的测试数据大概像这个样子。我们可以检查每一条生成的测试数据看是否正确、合适。
{qa_pairs: {query: What features does the high-definition smart television have?, answer: The high-definition smart television has a 4K ultra-high definition resolution, an integrated intelligent system, and supports voice control. It provides a rich entertainment experience.}}另外我们可以打开调试模式看看它是如何运作的。
import langchain
langchain.debug True将上述代码放到前面然后重新运行代码。下面的输出比较长查看前面主要的部分我们可以看到 QAGenerateChain 链对每一条数据启动了子链并生成了提示词要求 LLM 作为老师根据下面的数据生成提问和答案。最后按特定的格式输出然后 LangChain 就可以解析到字典中。
[chain/start] [chain:QAGenerateChain] Entering Chain run with input:
[inputs]
[llm/start] [chain:QAGenerateChain llm:ChatOllama] Entering LLM run with input:
{prompts: [Human: You are a teacher coming up with questions to ask on a quiz. \nGiven the following document, please generate a question and answer based on that document.\n\nExample Format:\nBegin Document\n...\nEnd Document\nQUESTION: question here\nANSWER: answer here\n\nThese questions should be detailed and be based explicitly on information in the document. Begin!\n\nBegin Document\npage_contentno: 1\nname: 高清智能电视\ndescription: 这款高清智能电视拥有4K超高清分辨率内置智能系统支持语音控制提供丰富的娱乐体验。 metadata{source: product.csv, row: 0}\nEnd Document]
}
[llm/start] [chain:QAGenerateChain llm:ChatOllama] Entering LLM run with input:
{prompts: [Human: You are a teacher coming up with questions to ask on a quiz. \nGiven the following document, please generate a question and answer based on that document.\n\nExample Format:\nBegin Document\n...\nEnd Document\nQUESTION: question here\nANSWER: answer here\n\nThese questions should be detailed and be based explicitly on information in the document. Begin!\n\nBegin Document\npage_contentno: 2\nname: 多功能料理机\ndescription: 集搅拌、打蛋、榨汁等多种功能于一身操作简便是厨房里的得力助手。 metadata{source: product.csv, row: 1}\nEnd Document]
}
[llm/start] [chain:QAGenerateChain llm:ChatOllama] Entering LLM run with input:
{prompts: [Human: You are a teacher coming up with questions to ask on a quiz. \nGiven the following document, please generate a question and answer based on that document.\n\nExample Format:\nBegin Document\n...\nEnd Document\nQUESTION: question here\nANSWER: answer here\n\nThese questions should be detailed and be based explicitly on information in the document. Begin!\n\nBegin Document\npage_contentno: 3\nname: 无线蓝牙耳机\ndescription: 轻巧舒适音质清晰支持长时间续航适合运动和日常使用。 metadata{source: product.csv, row: 2}\nEnd Document]
}
[llm/start] [chain:QAGenerateChain llm:ChatOllama] Entering LLM run with input:
{prompts: [Human: You are a teacher coming up with questions to ask on a quiz. \nGiven the following document, please generate a question and answer based on that document.\n\nExample Format:\nBegin Document\n...\nEnd Document\nQUESTION: question here\nANSWER: answer here\n\nThese questions should be detailed and be based explicitly on information in the document. Begin!\n\nBegin Document\npage_contentno: 4\nname: 智能扫地机器人\ndescription: 自动规划清扫路线智能避障解放双手保持家中清洁。 metadata{source: product.csv, row: 3}\nEnd Document]
}
[llm/start] [chain:QAGenerateChain llm:ChatOllama] Entering LLM run with input:
{prompts: [Human: You are a teacher coming up with questions to ask on a quiz. \nGiven the following document, please generate a question and answer based on that document.\n\nExample Format:\nBegin Document\n...\nEnd Document\nQUESTION: question here\nANSWER: answer here\n\nThese questions should be detailed and be based explicitly on information in the document. Begin!\n\nBegin Document\npage_contentno: 5\nname: 便携式榨汁机\ndescription: 小巧便携操作简便快速榨汁适合健康生活需求。 metadata{source: product.csv, row: 4}\nEnd Document]
}
[llm/end] [chain:QAGenerateChain llm:ChatOllama] [75.50s] Exiting LLM run with output:
{generations: [[{text: QUESTION: What features does the high-definition smart television have?\nANSWER: The high-definition smart television has a 4K ultra-high definition resolution, an integrated intelligent system, and supports voice control. It provides a rich entertainment experience.,generation_info: {model: qwen2,created_at: 2024-09-12T02:27:28.132404919Z,message: {role: assistant,content: },done_reason: stop,done: true,total_duration: 15075322258,load_duration: 4068642657,prompt_eval_count: 146,prompt_eval_duration: 3419985000,eval_count: 48,eval_duration: 7545190000},type: ChatGeneration,message: {lc: 1,type: constructor,id: [langchain,schema,messages,AIMessage],kwargs: {content: QUESTION: What features does the high-definition smart television have?\nANSWER: The high-definition smart television has a 4K ultra-high definition resolution, an integrated intelligent system, and supports voice control. It provides a rich entertainment experience.,response_metadata: {model: qwen2,created_at: 2024-09-12T02:27:28.132404919Z,message: {role: assistant,content: },done_reason: stop,done: true,total_duration: 15075322258,load_duration: 4068642657,prompt_eval_count: 146,prompt_eval_duration: 3419985000,eval_count: 48,eval_duration: 7545190000},type: ai,id: run-e2282df6-a2bb-4b75-bd94-c6ee8338b339-0,usage_metadata: {input_tokens: 146,output_tokens: 48,total_tokens: 194},tool_calls: [],invalid_tool_calls: []}}}]],llm_output: null,run: null
}
[llm/end] [chain:QAGenerateChain llm:ChatOllama] [75.51s] Exiting LLM run with output:
{generations: [[{text: QUESTION: What is the multifunctional kitchen appliance mentioned in the document capable of doing?\nANSWER: The multifunctional kitchen appliance, known as 多功能料理机, can perform various tasks such as blending, whisking eggs, and juicing. Its designed for ease of operation and serves as a helpful tool in the kitchen.,generation_info: {model: qwen2,created_at: 2024-09-12T02:27:40.655928024Z,message: {role: assistant,content: },done_reason: stop,done: true,total_duration: 12512086251,load_duration: 62599702,prompt_eval_count: 145,prompt_eval_duration: 1594234000,eval_count: 69,eval_duration: 10853358000},type: ChatGeneration,message: {lc: 1,type: constructor,id: [langchain,schema,messages,AIMessage],kwargs: {content: QUESTION: What is the multifunctional kitchen appliance mentioned in the document capable of doing?\nANSWER: The multifunctional kitchen appliance, known as 多功能料理机, can perform various tasks such as blending, whisking eggs, and juicing. Its designed for ease of operation and serves as a helpful tool in the kitchen.,response_metadata: {model: qwen2,created_at: 2024-09-12T02:27:40.655928024Z,message: {role: assistant,content: },done_reason: stop,done: true,total_duration: 12512086251,load_duration: 62599702,prompt_eval_count: 145,prompt_eval_duration: 1594234000,eval_count: 69,eval_duration: 10853358000},type: ai,id: run-db59bd5a-e8c5-4ce4-be93-477b1f7beeeb-0,usage_metadata: {input_tokens: 145,output_tokens: 69,total_tokens: 214},tool_calls: [],invalid_tool_calls: []}}}]],llm_output: null,run: null
}
[llm/end] [chain:QAGenerateChain llm:ChatOllama] [75.51s] Exiting LLM run with output:
{generations: [[{text: QUESTION: What are the features of the product with the name \无线蓝牙耳机\ (Wireless Bluetooth Earphones)?\n\nANSWER: The product named \无线蓝牙耳机\ offers several features including:\n1. **Lightweight and Comfortable**: The earphones are designed to be lightweight, ensuring comfort during use.\n2. **Crystal Clear Sound Quality**: It provides clear sound quality for an enjoyable listening experience.\n3. **Long Battery Life**: The headphones support a long duration of battery usage, making them suitable for both sports activities and everyday use.\n4. **Versatile Use**: They can be used while exercising or in daily routines without any issues due to their versatile design and functionality.\n\nThese features highlight the products suitability for users who value convenience, comfort, and audio quality in their listening devices.,generation_info: {model: qwen2,created_at: 2024-09-12T02:28:06.427487738Z,message: {role: assistant,content: },done_reason: stop,done: true,total_duration: 25761127075,load_duration: 63109381,prompt_eval_count: 139,prompt_eval_duration: 1397453000,eval_count: 162,eval_duration: 24259968000},type: ChatGeneration,message: {lc: 1,type: constructor,id: [langchain,schema,messages,AIMessage],kwargs: {content: QUESTION: What are the features of the product with the name \无线蓝牙耳机\ (Wireless Bluetooth Earphones)?\n\nANSWER: The product named \无线蓝牙耳机\ offers several features including:\n1. **Lightweight and Comfortable**: The earphones are designed to be lightweight, ensuring comfort during use.\n2. **Crystal Clear Sound Quality**: It provides clear sound quality for an enjoyable listening experience.\n3. **Long Battery Life**: The headphones support a long duration of battery usage, making them suitable for both sports activities and everyday use.\n4. **Versatile Use**: They can be used while exercising or in daily routines without any issues due to their versatile design and functionality.\n\nThese features highlight the products suitability for users who value convenience, comfort, and audio quality in their listening devices.,response_metadata: {model: qwen2,created_at: 2024-09-12T02:28:06.427487738Z,message: {role: assistant,content: },done_reason: stop,done: true,total_duration: 25761127075,load_duration: 63109381,prompt_eval_count: 139,prompt_eval_duration: 1397453000,eval_count: 162,eval_duration: 24259968000},type: ai,id: run-3dc06185-da4e-4b56-b615-dcf831157fb2-0,usage_metadata: {input_tokens: 139,output_tokens: 162,total_tokens: 301},tool_calls: [],invalid_tool_calls: []}}}]],llm_output: null,run: null
}
[llm/end] [chain:QAGenerateChain llm:ChatOllama] [75.51s] Exiting LLM run with output:
{generations: [[{text: QUESTION: What is the product being described and what are its main features?\n\nANSWER: The product being described is a \智能扫地机器人\ (intelligent floor sweeping robot). Its main features include automatic planning of cleaning routes, intelligent obstacle avoidance, freeing up hands, and maintaining home cleanliness.,generation_info: {model: qwen2,created_at: 2024-09-12T02:28:17.028896159Z,message: {role: assistant,content: },done_reason: stop,done: true,total_duration: 10589442660,load_duration: 26054599,prompt_eval_count: 139,prompt_eval_duration: 1401741000,eval_count: 61,eval_duration: 9159878000},type: ChatGeneration,message: {lc: 1,type: constructor,id: [langchain,schema,messages,AIMessage],kwargs: {content: QUESTION: What is the product being described and what are its main features?\n\nANSWER: The product being described is a \智能扫地机器人\ (intelligent floor sweeping robot). Its main features include automatic planning of cleaning routes, intelligent obstacle avoidance, freeing up hands, and maintaining home cleanliness.,response_metadata: {model: qwen2,created_at: 2024-09-12T02:28:17.028896159Z,message: {role: assistant,content: },done_reason: stop,done: true,total_duration: 10589442660,load_duration: 26054599,prompt_eval_count: 139,prompt_eval_duration: 1401741000,eval_count: 61,eval_duration: 9159878000},type: ai,id: run-a489b5fe-7798-41f0-8380-e9bde0e8a889-0,usage_metadata: {input_tokens: 139,output_tokens: 61,total_tokens: 200},tool_calls: [],invalid_tool_calls: []}}}]],llm_output: null,run: null
}
[llm/end] [chain:QAGenerateChain llm:ChatOllama] [75.51s] Exiting LLM run with output:
{generations: [[{text: QUESTION: What is the product described in this document?\nANSWER: The product described in this document is a portable juicer named \便携式榨汁机\ (portable juice extractor). Its characterized as compact, easy to operate, and capable of quickly extracting juice, making it suitable for needs related to health living.,generation_info: {model: qwen2,created_at: 2024-09-12T02:28:28.529352164Z,message: {role: assistant,content: },done_reason: stop,done: true,total_duration: 11484086566,load_duration: 62195060,prompt_eval_count: 140,prompt_eval_duration: 1362653000,eval_count: 68,eval_duration: 10018610000},type: ChatGeneration,message: {lc: 1,type: constructor,id: [langchain,schema,messages,AIMessage],kwargs: {content: QUESTION: What is the product described in this document?\nANSWER: The product described in this document is a portable juicer named \便携式榨汁机\ (portable juice extractor). Its characterized as compact, easy to operate, and capable of quickly extracting juice, making it suitable for needs related to health living.,response_metadata: {model: qwen2,created_at: 2024-09-12T02:28:28.529352164Z,message: {role: assistant,content: },done_reason: stop,done: true,total_duration: 11484086566,load_duration: 62195060,prompt_eval_count: 140,prompt_eval_duration: 1362653000,eval_count: 68,eval_duration: 10018610000},type: ai,id: run-5709894f-ab18-4a1e-9e7b-0b8acd1eeb6a-0,usage_metadata: {input_tokens: 140,output_tokens: 68,total_tokens: 208},tool_calls: [],invalid_tool_calls: []}}}]],llm_output: null,run: null
}
[chain/end] [chain:QAGenerateChain] [75.51s] Exiting Chain run with output:
{outputs: [{qa_pairs: {query: What features does the high-definition smart television have?,answer: The high-definition smart television has a 4K ultra-high definition resolution, an integrated intelligent system, and supports voice control. It provides a rich entertainment experience.}},{qa_pairs: {query: What is the multifunctional kitchen appliance mentioned in the document capable of doing?,answer: The multifunctional kitchen appliance, known as 多功能料理机, can perform various tasks such as blending, whisking eggs, and juicing. Its designed for ease of operation and serves as a helpful tool in the kitchen.}},{qa_pairs: {query: What are the features of the product with the name \无线蓝牙耳机\ (Wireless Bluetooth Earphones)?,answer: The product named \无线蓝牙耳机\ offers several features including:}},{qa_pairs: {query: What is the product being described and what are its main features?,answer: The product being described is a \智能扫地机器人\ (intelligent floor sweeping robot). Its main features include automatic planning of cleaning routes, intelligent obstacle avoidance, freeing up hands, and maintaining home cleanliness.}},{qa_pairs: {query: What is the product described in this document?,answer: The product described in this document is a portable juicer named \便携式榨汁机\ (portable juice extractor). Its characterized as compact, easy to operate, and capable of quickly extracting juice, making it suitable for needs related to health living.}}]
}接着我们将手动创建的测试数据和自动创建的合并。
all_examples examples [ex[qa_pairs] for ex in new_examples]手动评估
我们让 LLM 来回答我们测试数据集中的问题首先测试第一条手动添加的问题。
response qa.run(examples[0][query])
print(response)调试模式下的输出类似下面这样。
[chain/start] [chain:RetrievalQA] Entering Chain run with input:
{query: 高清投影仪支持高清视频播放吗
}
[chain/start] [chain:RetrievalQA chain:StuffDocumentsChain] Entering Chain run with input:
[inputs]
[chain/start] [chain:RetrievalQA chain:StuffDocumentsChain chain:LLMChain] Entering Chain run with input:
{question: 高清投影仪支持高清视频播放吗,context: no: 11\nname: 高清投影仪\ndescription: 高亮度高对比度支持高清视频播放适合家庭影院和商务演示。no: 22\nname: 智能跑步机\ndescription: 多种运动模式智能记录运动数据适合家庭健身。no: 12\nname: 智能手环\ndescription: 监测心率、计步、睡眠智能提醒是健康生活的好伴侣。no: 4\nname: 智能扫地机器人\ndescription: 自动规划清扫路线智能避障解放双手保持家中清洁。
}
[llm/start] [chain:RetrievalQA chain:StuffDocumentsChain chain:LLMChain llm:ChatOllama] Entering LLM run with input:
{prompts: [System: Use the following pieces of context to answer the users question. \nIf you dont know the answer, just say that you dont know, dont try to make up an answer.\n----------------\nno: 11\nname: 高清投影仪\ndescription: 高亮度高对比度支持高清视频播放适合家庭影院和商务演示。no: 22\nname: 智能跑步机\ndescription: 多种运动模式智能记录运动数据适合家庭健身。no: 12\nname: 智能手环\ndescription: 监测心率、计步、睡眠智能提醒是健康生活的好伴侣。no: 4\nname: 智能扫地机器人\ndescription: 自动规划清扫路线智能避障解放双手保持家中清洁。\nHuman: 高清投影仪支持高清视频播放吗]
}
[llm/end] [chain:RetrievalQA chain:StuffDocumentsChain chain:LLMChain llm:ChatOllama] [6.70s] Exiting LLM run with output:
{generations: [[{text: 是的高清投影仪支持高清视频播放。,generation_info: {model: qwen2,created_at: 2024-09-12T02:45:31.841247748Z,message: {role: assistant,content: },done_reason: stop,done: true,total_duration: 6682410396,load_duration: 25734266,prompt_eval_count: 211,prompt_eval_duration: 5067573000,eval_count: 12,eval_duration: 1532113000},type: ChatGeneration,message: {lc: 1,type: constructor,id: [langchain,schema,messages,AIMessage],kwargs: {content: 是的高清投影仪支持高清视频播放。,response_metadata: {model: qwen2,created_at: 2024-09-12T02:45:31.841247748Z,message: {role: assistant,content: },done_reason: stop,done: true,total_duration: 6682410396,load_duration: 25734266,prompt_eval_count: 211,prompt_eval_duration: 5067573000,eval_count: 12,eval_duration: 1532113000},type: ai,id: run-6ef3e8d8-425e-4f61-9c1d-9925a2277e8f-0,usage_metadata: {input_tokens: 211,output_tokens: 12,total_tokens: 223},tool_calls: [],invalid_tool_calls: []}}}]],llm_output: null,run: null
}
[chain/end] [chain:RetrievalQA chain:StuffDocumentsChain chain:LLMChain] [6.70s] Exiting Chain run with output:
{text: 是的高清投影仪支持高清视频播放。
}
[chain/end] [chain:RetrievalQA chain:StuffDocumentsChain] [6.71s] Exiting Chain run with output:
{output_text: 是的高清投影仪支持高清视频播放。
}
[chain/end] [chain:RetrievalQA] [7.29s] Exiting Chain run with output:
{result: 是的高清投影仪支持高清视频播放。
}
是的高清投影仪支持高清视频播放。可以看到这里使用 stuff 链并生成了提示词将我们的数据也一并提交给了 LLMLLM 给出的答案是是的高清投影仪支持高清视频播放。答案并不一模一样但意思是一样的。
让 LLM 自我评估
那如果我们要对所有数据进行测试呢也需要一条条比对吗我们也可以让 LLM 来帮助我们做这些。LangChain 提供了 QAEvalChain 链来自动评估结果。
我们可以先关闭调试模式 langchain.debug False避免过多的内容输出。
from langchain.evaluation.qa import QAEvalChain# 获得所有测试数据的预测结果
predictions qa.apply(all_examples)
# 可以使用之前的 LLM 模型也可以使用一个新的模型
llm ChatOllama(base_urlbase_url, modelllm_model)
# 创建评估链
eval_chain QAEvalChain.from_llm(llm)
# 获得评估结果
graded_outputs eval_chain.evaluate(all_examples, predictions)
# 遍历输出结果
for i, eg in enumerate(all_examples):print(fExample {i}:)print(Question: predictions[i][query])print(Real Answer: predictions[i][answer])print(Predicted Answer: predictions[i][result])print(Predicted Grade: graded_outputs[i][results])print()输出类似如下所示。
Example 0:
Question: 高清投影仪支持高清视频播放吗
Real Answer: 是
Predicted Answer: 是的高清投影仪支持高清视频播放。
Predicted Grade: CORRECTExample 1:
Question: 哪一款产品能监测心率
Real Answer: 智能手环
Predicted Answer: 智能手环能监测心率。
Predicted Grade: CORRECTExample 2:
Question: What features does the high-definition smart TV have according to the document?
Real Answer: The high-definition smart TV mentioned in the document has several notable features. It boasts a 4K ultra-high definition resolution, indicating an exceptionally clear picture quality. Additionally, it is equipped with an internal smart system which allows for various interactive functionalities. One of these capabilities includes voice control, suggesting users can operate or navigate through its features using their voice commands. Lastly, the TV offers a rich entertainment experience, implying that it may include access to streaming services, internet connectivity, and other multimedia content options to ensure users enjoy a varied range of programming.
Predicted Answer: Im sorry, but I dont know the answer because the provided context doesnt mention a high-definition smart TV. The context includes information about a high-definition projector, an automatic coffee machine, and an intelligent treadmill.
Predicted Grade: INCORRECTExample 3:
Question: What is the product described in this document?
Real Answer: The product described in this document is a multifunctional kitchen appliance which combines various functions such as mixing, beating eggs and juicing. Its noted for its ease of use, making it a helpful tool in the kitchen.
Predicted Answer: The document describes several different products:1. 高清投影仪 - A high-definition projector with high brightness and contrast, suitable for home cinema and business presentations.
2. 无线蓝牙耳机 - Wireless Bluetooth headphones that are lightweight, comfortable to wear, have clear sound quality, and offer long battery life, suitable for sports and daily use.
3. 全自动咖啡机 - An automated coffee machine that allows one-button operation and offers multiple coffee flavor choices, providing a professional coffee experience.
4. 智能跑步机 - A smart treadmill with various exercise modes and the ability to record workout data automatically, suitable for home fitness routines.Each product has been characterized by its unique features and application scenarios as detailed in their descriptions.
Predicted Grade: INCORRECTExample 4:
Question: What are the features of the product described in the document?
Real Answer: The product, named wireless bluetooth headphones, is characterized by being lightweight and comfortable to wear. It offers clear sound quality and supports long-lasting battery life, making it suitable for both sports activities and everyday use.
Predicted Answer: The product described is an 高清投影仪 (High Definition Projector), which features high brightness, high contrast ratio, and support for high-definition video playback. Its suitable for both家庭影院 (home cinema) and 商务演示 (business presentations).Another product mentioned is an 全自动咖啡机 (Fully Automatic Coffee Machine). This machine allows for one-touch operation with a variety of coffee taste choices, providing a professional coffee experience.A third item highlighted is the 智能跑步机 (Smart Treadmill), which offers various exercise modes and can intelligently record workout data. Its ideal for家庭健身 (home fitness).Lastly, theres an 智能扫地机器人 (Smart Vacuum Cleaning Robot) that autonomously plans its cleaning routes, has intelligent obstacle avoidance, frees up hands, and helps keep the home clean.
Predicted Grade: INCORRECTExample 5:
Question: What is the description of the product 智能扫地机器人?
Real Answer: The description of the product 智能扫地机器人 is that it automatically plans cleaning routes, has intelligent obstacle avoidance, frees up your hands, and keeps the house clean.
Predicted Answer: The description of the product 智能扫地机器人 is: 自动规划清扫路线智能避障解放双手保持家中清洁。
Predicted Grade: CORRECTExample 6:
Question: What is the description of the product 便携式榨汁机?
Real Answer: The 便携式榨汁机 is described as being small, portable, easy to operate, fast at juicing and suitable for health living needs.
Predicted Answer: I dont know the answer to that question because there is no specific context provided for a 便携式榨汁机 (portable juicer).
Predicted Grade: CORRECT从上面的输出我们看到我们这里应该有 7 条测试数据而每一条数据都输出了 Question问题Real Answer真实回答Predicted Answer预测回答 和 Predicted Grade预测结果四行。其中 Real Answer 是先前的 QAGenerateChain 创建的测试集中的答案而 Predicted Answer 则是由 QAEvalChain 回答的答案最后的 Predicted Grade 则是两者的匹配结果。上面生成的测试中部分通过了测试但是并没有全部通过。
由于两次回答是两条独立的链调用的因此是互相没有影响的。而我们的问题往往是开放的没有固定的答案因此也需要 LLM 来帮助我们判断两次的答案是否是一致的。
这里我们学习了如何使用 LLM 来建立自动的测试链自动生成测试数据并自动评估答案。这样就可以方便地生成大批量的测试数据并快速评估结果。
未完待续
下一篇第五部分