By teammate March 30, 2025
Load-Balanced RAG Using Qdrant
Introduction to Qdrant Vector Store
Qdrant is a high-performance vector database designed specifically for AI applications, offering advanced features such as similarity search and vector management. Its unique selling points (USPs) compared to other choices include:
- Scalability: Supports horizontal scaling through sharding and replication.
- Efficiency: Built in Rust, Qdrant ensures fast performance even under high load conditions.
- Flexibility: Integrates with various frameworks and allows for distributed deployment.
- Advanced Search Capabilities: Offers hybrid search combining dense and sparse vectors for nuanced similarity searches.
These features make it an ideal choice for building robust backends for Retrieval Augmented Generation (RAG) systems - like ours ;)
Step-by-Step Installation Guide
Our aim is to setup a native install of qdrant on 3 seperate machines (nodes) with 3 identical copies of collections to be able to read/write load-balanced from any server to act as our RAG backend. The three nodes will all have the same copies of vectors so it doesn’t matter which server you will query for read or write as replication is done automatically by qdrant. The setup discussed here is the smallest setup for high-availability. With more nodes and different settings you can scale out the cluster for performance by also spreading vectors across multiple machines (sharding) thus increasing reads.
1. Download the Binary from GitHub
Visit the Qdrant GitHub release page and download the latest binary suitable for your server’s operating system.
wget https://github.com/qdrant/qdrant/releases/download/vX.Y.Z/qdrant_vX_Y_Z_x86_64-unknown-linux-gnu.tar.gz
tar -xzf qdrant_vX_Y_Z_x86_64-unknown-linux-gnu.tar.gz
2. Install Qdrant on Your Server
Copy the binary to a directory in your PATH and set execute permissions.
sudo cp qdrant /usr/local/bin/
chmod +x /usr/local/bin/qdrant
3. Write the Configuration File
Create a configuration file, config.yml
, save it to /etc/qdrant/config.yml and adjust the following settings
# config.yml
storage:
collection
# Number of replicas of each shard that network tries to maintain
replication_factor: 3
# How many replicas should apply the operation for us to consider it successful
write_consistency_factor: 3
service:
api_key: YourTotalSecretKey
cluster:
# Use `enabled: true` to run Qdrant in distributed deployment mode
enabled: true
4. Setup Systemd Service
Create a systemd service file to manage Qdrant processes.
sudo nano /etc/systemd/system/qdrant.service
Add the following content:
[Unit]
Description=qdrant vector store
After=network.target
[Service]
User=YOURusername
Group=YOURgroupname
WorkingDirectory=/working/dir
ExecStart=qdrant --uri 'http://1xx.xxx.xxx.xxx:6335' --config-path /etc/qdrant/config.yaml
Restart=always
LimitNOFILE=10000
[Install]
WantedBy=multi-user.target
Qdrant will - depending on your system - run massively parallel and store vectors in files on disk. Thus it makes sense to have the data dir located on a fast NVME and adjust the OS file descriptors. Look at LimitNOFILE=10000 in above systemd.service. This allows qdrant to open 10.000 files at the same time. You might need to adjust for your specific setup and remember to leave enough file descriptors left for other services on that machine. When doing your performance testing, you can adjust accordingly.
In the ExecStart line the parameter –uri points to the local system IP, that needs to be accessible by the other two nodes we will setup later!
Finally reload systemd and enable the service:
sudo systemctl daemon-reload
sudo systemctl enable qdrant.service
sudo systemctl start qdrant.service
This brings up the head node. 1/3 done, 2 more to go: repeat the above procedure on the other two nodes, but be sure to make a small adjustment in the qdrant.service file:
ExecStart=qdrant --bootstrap 'http://1xx.xxx.xxx.xxx:6335' --config-path /etc/qdrant/config.yaml
the –bootstrap parameter now should point to the IP address of our head node we setup at first!
5. Bootstrap the Cluster
Initialize the Qdrant cluster by starting the other nodes with the configuration file you created via systemd.
sudo systemctl daemon-reload
sudo systemctl enable qdrant.service
sudo systemctl start qdrant.service
6. Join Nodes to the Cluster
You can add more nodes by just repeating the above procedure. Depending on your setup you might want to adjust config in this case.
7. Test Service with Curl
Finally verify that qdrant is running and accessible using curl from your own machine by:
curl -X GET http://anynodefromthecluster:6333/cluster/ --header 'api-key: YourTotalSecretKey'
If everything is set up correctly, you should receive a similar response indicating the cluster is alive:
{"result":{
"status":"enabled",
"peer_id":7145995093486746,
"peers":{
"1560423695854934":{"uri":"https://1.0.0.1:6335/"},
"7145995093486746":{"uri":"https://1.1.0.2:6335/"},
"415815538142464":{"uri":"https://1.1.1.3:6335/"}
},
"raft_info":{
"term":2383,
"commit":1271,
"pending_operations":0,
"leader" :415815538142464,
"role":"Follower",
"is_voter":true
},
"consensus_thread_status":{
"consensus_thread_status":"working",
"last_update":"2025-03-30T15:01:26.732384037Z"
},
"message_send_failures":{}
},
"status":"ok",
"time":0.000022931}
Conclusion
By following these steps, you have successfully deployed a native, distributed qdrant vector store with sharding and replication. This setup is ready to serve as a load-balanced backend for RAG applications, providing efficient and scalable vector search capabilities. Enjoy exploring the full potential of your new AI infrastructure!