一、前置条件

在开始之前,请确保:

  • 已安装 redisvl 并激活相应的 Python 环境。
  • 运行 Redis 实例,建议使用 Redis Stack。
  • 配置了 OpenAI API 密钥(或替换为其他 LLM 服务)。

以下是初始化 OpenAI 客户端的示例代码:

import os
import getpass
from openai import OpenAI

os.environ["TOKENIZERS_PARALLELISM"] = "False"
api_key = os.getenv("OPENAI_API_KEY") or getpass.getpass("Enter your OpenAI API key: ")
client = OpenAI(api_key=api_key)

def ask_openai(question: str) -> str:
    response = client.completions.create(
        model="gpt-3.5-turbo-instruct",
        prompt=question,
        max_tokens=200
    )
    return response.choices[0].text.strip()

# 测试
print(ask_openai("What is the capital of France?"))  # 输出:The capital of France is Paris.

二、初始化 SemanticCache

SemanticCache 在初始化时会自动在 Redis 中创建一个用于存储语义缓存内容的索引。以下是初始化代码:

from redisvl.extensions.cache.llm import SemanticCache
from redisvl.utils.vectorize import HFTextVectorizer

llmcache = SemanticCache(
    name="llmcache",                                          # 索引名称
    redis_url="redis://localhost:6379",                       # Redis 连接 URL
    distance_threshold=0.1,                                   # 语义相似度阈值
    vectorizer=HFTextVectorizer("redis/langcache-embed-v1"),  # 嵌入模型
)

检查索引信息:

rvl index info -i llmcache

输出:

Index Information:
╭───────────────┬───────────────┬───────────────┬───────────────┬───────────────╮
│ Index Name    │ Storage Type  │ Prefixes      │ Index Options │ Indexing      │
├───────────────┼───────────────┼───────────────┼───────────────┼───────────────┤
| llmcache      | HASH          | ['llmcache']  | []            | 0             |
╰───────────────┴───────────────┴───────────────┴───────────────┴───────────────╯
Index Fields:
╭─────────────────┬─────────────────┬─────────────────┬─────────────────┬─────────────────┬─────────────────┬─────────────────┬─────────────────┬─────────────────┬─────────────────┬─────────────────┬─────────────────╮
│ Name            │ Attribute       │ Type            │ Field Option    │ Option Value    │ Field Option    │ Option Value    │ Field Option    │ Option Value    │ Field Option    │ Option Value    │
├─────────────────┼─────────────────┼─────────────────┼─────────────────┼─────────────────┼─────────────────┼─────────────────┼─────────────────┼─────────────────┼─────────────────┼─────────────────┬─────────────────┤
│ prompt          │ prompt          │ TEXT            │ WEIGHT          │ 1               │                 │                 │                 │                 │                 │                 │                 │
│ response        │ response        │ TEXT            │ WEIGHT          │ 1               │                 │                 │                 │                 │                 │                 │                 │
│ inserted_at     │ inserted_at     │ NUMERIC         │                 │                 │                 │                 │                 │                 │                 │                 │                 │
│ updated_at      │ updated_at      │ NUMERIC         │                 │                 │                 │                 │                 │                 │                 │                 │                 │
│ prompt_vector   │ prompt_vector   │ VECTOR          │ algorithm       │ FLAT            │ data_type       │ FLOAT32         │ dim             │ 768             │ distance_metric │ COSINE          │                 │
╰─────────────────┴─────────────────┴─────────────────┴─────────────────┴─────────────────┴─────────────────┴─────────────────┴─────────────────┴─────────────────┴─────────────────┴─────────────────┴─────────────────╯

三、基本缓存使用

以下展示如何使用 SemanticCache 存储和检索响应:

question = "What is the capital of France?"

# 检查缓存(初始为空)
if response := llmcache.check(prompt=question):
    print(response)
else:
    print("Empty cache")  # 输出:Empty cache

# 存储问题、答案和元数据
llmcache.store(
    prompt=question,
    response="Paris",
    metadata={"city": "Paris", "country": "france"}
)

# 再次检查缓存
if response := llmcache.check(prompt=question, return_fields=["prompt", "response", "metadata"]):
    print(response)

输出:

[{'prompt': 'What is the capital of France?', 'response': 'Paris', 'metadata': {'city': 'Paris', 'country': 'france'}, 'key': 'llmcache:115049a298532be2f181edb03f766770c0db84c22aff39003fec340deaec7545'}]

检查语义相似的查询:

question = "What actually is the capital of France?"
print(llmcache.check(prompt=question)[0]['response'])  # 输出:Paris

四、调整距离阈值

语义相似度阈值可以动态调整,以适应不同的嵌入模型或业务需求:

llmcache.set_threshold(0.5)  # 放宽阈值

question = "What is the capital city of the country in Europe that also has a city named Nice?"
print(llmcache.check(prompt=question)[0]['response'])  # 输出:Paris

清除缓存:

llmcache.clear()
print(llmcache.check(prompt=question))  # 输出:[]

五、使用 TTL 策略

Redis 支持 TTL(Time To Live)策略,允许为缓存条目设置过期时间。以下示例展示如何设置 5 秒的 TTL:

llmcache.set_ttl(5)  # 设置 5 秒 TTL
llmcache.store("This is a TTL test", "This is a TTL test response")
time.sleep(6)
result = llmcache.check("This is a TTL test")
print(result)  # 输出:[]

# 重置 TTL 为 null(长期存储)
llmcache.set_ttl()

六、性能测试

通过比较使用和不使用缓存的响应时间,展示 SemanticCache 的性能优势:

import time
import numpy as np

def answer_question(question: str) -> str:
    results = llmcache.check(prompt=question)
    if results:
        return results[0]["response"]
    else:
        answer = ask_openai(question)
        return answer

# 测试无缓存的响应时间
start = time.time()
question = "What was the name of the first US President?"
answer = answer_question(question)
end = time.time()
print(f"Without caching, a call to OpenAI took {end-start} seconds.")

# 存储到缓存
llmcache.store(prompt=question, response="George Washington")

# 测试缓存的平均响应时间
times = []
for _ in range(10):
    cached_start = time.time()
    cached_answer = answer_question(question)
    cached_end = time.time()
    times.append(cached_end-cached_start)

avg_time_with_cache = np.mean(times)
print(f"Avg time with cache: {avg_time_with_cache}")
print(f"Time saved: {round(((end - start) - avg_time_with_cache) / (end - start) * 100, 2)}%")

输出:

Without caching, a call to OpenAI took 0.8826751708984375 seconds.
Avg time with cache: 0.0463670015335083
Time saved: 94.75%

检查索引统计:

rvl stats -i llmcache

七、缓存访问控制与标签过滤

在多用户或复杂工作流场景中,可以通过自定义 filterable_fields 实现数据隔离和精确查询:

private_cache = SemanticCache(
    name="private_cache",
    filterable_fields=[{"name": "user_id", "type": "tag"}]
)

# 存储不同用户的数据
private_cache.store(
    prompt="What is the phone number linked to my account?",
    response="The number on file is 123-555-0000",
    filters={"user_id": "abc"}
)
private_cache.store(
    prompt="What's the phone number linked in my account?",
    response="The number on file is 123-555-1111",
    filters={"user_id": "def"}
)

# 使用标签过滤器查询
from redisvl.query.filter import Tag
user_id_filter = Tag("user_id") == "abc"
response = private_cache.check(
    prompt="What is the phone number linked to my account?",
    filter_expression=user_id_filter,
    num_results=2
)
print(f"found {len(response)} entry \n{response[0]['response']}")

输出:

found 1 entry 
The number on file is 123-555-0000

清理:

private_cache.delete()

八、复杂过滤器示例

支持多个可过滤字段和复杂的过滤表达式:

complex_cache = SemanticCache(
    name="account_data",
    filterable_fields=[
        {"name": "user_id", "type": "tag"},
        {"name": "account_type", "type": "tag"},
        {"name": "account_balance", "type": "numeric"},
        {"name": "transaction_amount", "type": "numeric"}
    ]
)

# 存储多条记录
complex_cache.store(
    prompt="what is my most recent checking account transaction under $100?",
    response="Your most recent transaction was for $75",
    filters={"user_id": "abc", "account_type": "checking", "transaction_amount": 75}
)
complex_cache.store(
    prompt="what is my most recent savings account transaction?",
    response="Your most recent deposit was for $300",
    filters={"user_id": "abc", "account_type": "savings", "transaction_amount": 300}
)
complex_cache.store(
    prompt="what is my most recent checking account transaction over $200?",
    response="Your most recent transaction was for $350",
    filters={"user_id": "abc", "account_type": "checking", "transaction_amount": 350}
)
complex_cache.store(
    prompt="what is my checking account balance?",
    response="Your current checking account is $1850",
    filters={"user_id": "abc", "account_type": "checking"}
)

# 使用复杂过滤器查询
from redisvl.query.filter import Num
value_filter = Num("transaction_amount") > 100
account_filter = Tag("account_type") == "checking"
complex_filter = value_filter & account_filter
complex_cache.set_threshold(0.3)
response = complex_cache.check(
    prompt="what is my most recent checking account transaction?",
    filter_expression=complex_filter,
    num_results=5
)
print(f'found {len(response)} entry')
print(response[0]["response"])

输出:

found 1 entry
Your most recent transaction was for $350

清理:

complex_cache.delete()

九、总结

RedisVL 的 SemanticCache 提供了一种高效的方式来缓存 LLM 响应,通过向量搜索实现语义匹配,显著降低请求延迟和成本。其支持动态阈值调整、TTL 策略、标签过滤和复杂查询,使其适用于多用户、复杂工作流的场景。无论是简单的问答缓存还是带访问控制的复杂应用,RedisVL 都能提供灵活且高性能的解决方案。更多详情,请参阅 RedisVL 官方文档

更多推荐