OpenAI Assistants with citations like【4:2†source】and citeturnXfileY

When streaming with OpenAI Assistants
openai.beta.threads.messages.create(
thread_id=thread_id,
role="user",
content=payload.question
)
run = openai.beta.threads.runs.create(
thread_id=thread_id,
assistant_id=assistant_id,
stream=True,
tool_choice={"type": "file_search"},
)
streamed_text = ""
for event in run:
if event.event == "thread.message.delta":
delta_content = event.data.delta.content
if delta_content and delta_content[0].type == "text":
text_fragment = delta_content[0].text.value
streamed_text += text_fragment
yield {"data": text_fragment}
if event.event == "thread.run.completed":
break
the citations are coming in the formats like 【4:2†source】
or citeturnXfileY
How to fix it?
Answer
The approach I've used was to get the final message after streaming
messages = openai.beta.threads.messages.list(thread_id=thread_id)
and then apply the following regex
def replace_placeholder(match):
nonlocal citation_index
citation_index += 1
return f"[{citation_index}]"
pattern = r"(citeturn\d+file\d+|【\d+:\d+†source】)"
citation_index = 0
assistant_reply_cleaned = re.sub(pattern, replace_placeholder, raw_text)
to replace the placeholders (like 【4:2†source】
or citeturnXfileY
) with [1]
, [2]
, etc
Enjoyed this article?
Check out more content on our blog or follow us on social media.
Browse more articles