LlamaParseReader with TypeScript and Express.js
Parsing documents like PDFs can be a real pain. Thankfully, LlamaIndex has you covered. LlamaParseReader leverages LlamaCloud creating a simple way to parse documents and prepare them for use in your RAG applications.
I'll assume if you are reading this article, you have worked with LlamaIndex, and this guide shows the document parsing LlamaCloud offers.
If not, there is a phenomenal free video course on DeepLearning.ai which you can watch here.
Before we start, ensure you have an API Key from LlamaIndex Cloud, which we will need later (and named LLAMA_CLOUD_API_KEY in your .env file). We also use OpenAI as our LLM. You can get a key for that here.
This is an Express.js example, but the core logic could be lifted for any Node.js environment.
Install Dependencies
First, install the necessary packages:
npm install express llamaindex dotenv npm install --save-dev @types/express typescript
Setup .env File
Create a .env file in the root of your project and add your API keys:
LLAMA_CLOUD_API_KEY=your_api_key_here OPENAI_API_KEY=your_api_key_here
Write Some Code
Create a new TypeScript file, for example, server.ts. Below is the complete code to use LlamaParseReader with Express.js:
import express, { Request, Response } from "express";
import { config } from "dotenv";
import { VectorStoreIndex, OpenAI, Settings } from "llamaindex";
import { LlamaParseReader } from "llamaindex/readers/LlamaParseReader";
// Load environment variables from .env file
config();
// Set up LLM settings, I tend to use 3.5 because it's cheap and works well with all the use cases I've thrown at it with local docs.
Settings.llm = new OpenAI({ model: "gpt-3.5-turbo" });
const app = express();
const port = process.env.PORT || 3000;
app.use(express.json());
app.post("/query", async (req: Request, res: Response) => {
try {
const { query } = req.body;
if (!query) {
throw new Error("Input is required");
}
// Initialize the LlamaParseReader
const reader = new LlamaParseReader({ resultType: "markdown" });
// Load and parse the document
const documents = await reader.loadData(
"./src/data/writing-effectively.pdf"
);
// Create embeddings and store them in a VectorStoreIndex
const index = await VectorStoreIndex.fromDocuments(documents);
// Create a query engine
const queryEngine = index.asQueryEngine();
// Query the document using the query engine
const { response } = await queryEngine.query({ query });
// Return the response
res.json({ response });
} catch (err) {
console.error(err);
res.status(400).send("Something went wrong.");
}
});
app.listen(port, () => {
console.log(`Server is running on port ${port}`);
});
Compile and Run the Code
Ensure you have the TypeScript compiler installed and run the code:
npx tsc && node dist/server.js
Make sure your TypeScript compiler is configured correctly in tsconfig.json.
