Doing RAG on PDFs using File Search in the Responses API

Although RAG can be overwhelming, searching amongst PDF file shouldn't be complicated. One of the most adopted options as of now is parsing your PDF, defining your chunking strategies, uploading those chunks to a storage provider, running embeddings on those chunks of texts and storing those embeddings in a vector database. And that's only the setup — retrieving content in our LLM workflow also requires multiple steps.

This is where file search — a hosted tool you can use in the Responses API — comes in. It allows you to search your knowledge base and generate an answer based on the retrieved content. In this cookbook, we'll upload those PDFs to a vector store on OpenAI and use file search to fetch additional context from this vector store to answer the questions we generated in the first step. Then, we'll initially create a small set of questions based on PDFs extracted from OpenAI's blog (openai.com/news).

File search was previously available on the Assistants API. It's now available on the new Responses API, an API that can be stateful or stateless, and with from new features like metadata filtering

!pip install PyPDF2 pandas tqdm openai -q

from openai import OpenAI
from concurrent.futures import ThreadPoolExecutor
from tqdm import tqdm
import concurrent
import PyPDF2
import os
import pandas as pd
import base64

client = OpenAI(api_key=os.getenv('OPENAI_API_KEY'))
dir_pdfs = 'openai_blog_pdfs' # have those PDFs stored locally here
pdf_files = [os.path.join(dir_pdfs, f) for f in os.listdir(dir_pdfs)]

We will create a Vector Store on OpenAI API and upload our PDFs to the Vector Store. OpenAI will read those PDFs, separate the content into multiple chunks of text, run embeddings on those and store those embeddings and the text in the Vector Store. It will enable us to query this Vector Store to return relevant content based on a query.

def upload_single_pdf(file_path: str, vector_store_id: str):
    file_name = os.path.basename(file_path)
try:
        file_response = client.files.create(file=open(file_path, 'rb'), purpose="assistants")
        attach_response = client.vector_stores.files.create(
vector_store_id=vector_store_id,
file_id=file_response.id
        )
return {"file": file_name, "status": "success"}
except Exception as e:
print(f"Error with {file_name}: {str(e)}")
return {"file": file_name, "status": "failed", "error": str(e)}

def upload_pdf_files_to_vector_store(vector_store_id: str):
    pdf_files = [os.path.join(dir_pdfs, f) for f in os.listdir(dir_pdfs)]
    stats = {"total_files": len(pdf_files), "successful_uploads": 0, "failed_uploads": 0, "errors": []}

print(f"{len(pdf_files)} PDF files to process. Uploading in parallel...")

with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
        futures = {executor.submit(upload_single_pdf, file_path, vector_store_id): file_path for file_path in pdf_files}
for future in tqdm(concurrent.futures.as_completed(futures), total=len(pdf_files)):
            result = future.result()
if result["status"] == "success":
                stats["successful_uploads"] += 1
else:
                stats["failed_uploads"] += 1
                stats["errors"].append(result)

return stats

def create_vector_store(store_name: str) -> dict:
try:
        vector_store = client.vector_stores.create(name=store_name)
        details = {
"id": vector_store.id,
"name": vector_store.name,
"created_at": vector_store.created_at,
"file_count": vector_store.file_counts.completed
        }
print("Vector store created:", details)
return details
except Exception as e:
print(f"Error creating vector store: {e}")
return {}

Vector store created: {'id': 'vs_67d06b9b9a9c8191bafd456cf2364ce3', 'name': 'openai_blog_store', 'created_at': 1741712283, 'file_count': 0}
21 PDF files to process. Uploading in parallel...

100%|███████████████████████████████| 21/21 [00:09<00:00,  2.32it/s]

{'total_files': 21,
 'successful_uploads': 21,
 'failed_uploads': 0,
 'errors': []}

Now that our vector store is ready, we are able to query the Vector Store directly and retrieve relevant content for a specific query. Using the new vector search API, we're able to find relevant items from our knowledge base without necessarily integrating it in an LLM query.

query = "What's Deep Research?"
search_results = client.vector_stores.search(
vector_store_id=vector_store_details['id'],
query=query
)

for result in search_results.data:
print(str(len(result.content[0].text)) + ' of character of content from ' + result.filename + ' with a relevant score of ' + str(result.score))

3502 of character of content from Introducing deep research _ OpenAI.pdf with a relevant score of 0.9813588865322393
3493 of character of content from Introducing deep research _ OpenAI.pdf with a relevant score of 0.9522476825143714
3634 of character of content from Introducing deep research _ OpenAI.pdf with a relevant score of 0.9397930296526796
2774 of character of content from Introducing deep research _ OpenAI.pdf with a relevant score of 0.9101975747303771
3474 of character of content from Deep research System Card _ OpenAI.pdf with a relevant score of 0.9036647613464299
3123 of character of content from Introducing deep research _ OpenAI.pdf with a relevant score of 0.887120981288272
3343 of character of content from Introducing deep research _ OpenAI.pdf with a relevant score of 0.8448454849432881
3262 of character of content from Introducing deep research _ OpenAI.pdf with a relevant score of 0.791345286655509
3271 of character of content from Introducing deep research _ OpenAI.pdf with a relevant score of 0.7485530025091963
2721 of character of content from Introducing deep research _ OpenAI.pdf with a relevant score of 0.734033360849088

We can see that different size (and under-the-hood different texts) have been returned from the search query. They all have different relevancy score that are calculated by our ranker which uses hybrid search.

However instead of querying the vector store and then passing the data into the Responses or Chat Completion API call, an even more convenient way to use this search results in an LLM query would be to plug use file_search tool as part of OpenAI Responses API.

query = "What's Deep Research?"
response = client.responses.create(
input= query,
model="gpt-4o-mini",
tools=[{
"type": "file_search",
"vector_store_ids": [vector_store_details['id']],
    }]
)

# Extract annotations from the response
annotations = response.output[1].content[0].annotations

# Get top-k retrieved filenames
retrieved_files = set([result.filename for result in annotations])

print(f'Files used: {retrieved_files}')
print('Response:')
print(response.output[1].content[0].text) # 0 being the filesearch call

Files used: {'Introducing deep research _ OpenAI.pdf'}
Response:
Deep Research is a new capability introduced by OpenAI that allows users to conduct complex, multi-step research tasks on the internet efficiently. Key features include:

1. **Autonomous Research**: Deep Research acts as an independent agent that synthesizes vast amounts of information across the web, enabling users to receive comprehensive reports similar to those produced by a research analyst.

2. **Multi-Step Reasoning**: It performs deep analysis by finding, interpreting, and synthesizing data from various sources, including text, images, and PDFs.

3. **Application Areas**: Especially useful for professionals in fields such as finance, science, policy, and engineering, as well as for consumers seeking detailed information for purchases.

4. **Efficiency**: The output is fully documented with citations, making it easy to verify information, and it significantly speeds up research processes that would otherwise take hours for a human to complete.

5. **Limitations**: While Deep Research enhances research capabilities, it is still subject to limitations, such as potential inaccuracies in information retrieval and challenges in distinguishing authoritative data from unreliable sources.

Overall, Deep Research marks a significant advancement toward automated general intelligence (AGI) by improving access to thorough and precise research outputs.

We can see that gpt-4o-mini was able to answer a query that required more recent, specialised knowledge about OpenAI's Deep Research. It used content from the file Introducing deep research _ OpenAI.pdf that had chunks of texts that were the most relevant. If we want to go even deeper in the analysis of chunk of text retrieved, we can also analyse the different texts that were returned by the search engine by adding include=["output[*].file_search_call.search_results"] to our query.