By teammate March 30, 2025

Load-Balanced RAG Using Qdrant

Introduction to Qdrant Vector Store

Qdrant is a high-performance vector database designed specifically for AI applications, offering advanced features such as similarity search and vector management. Its unique selling points (USPs) compared to other choices include:

Scalability: Supports horizontal scaling through sharding and replication.
Efficiency: Built in Rust, Qdrant ensures fast performance even under high load conditions.
Flexibility: Integrates with various frameworks and allows for distributed deployment.
Advanced Search Capabilities: Offers hybrid search combining dense and sparse vectors for nuanced similarity searches.

These features make it an ideal choice for building robust backends for Retrieval Augmented Generation (RAG) systems - like ours ;)

Step-by-Step Installation Guide

Our aim is to setup a native install of qdrant on 3 seperate machines (nodes) with 3 identical copies of collections to be able to read/write load-balanced from any server to act as our RAG backend. The three nodes will all have the same copies of vectors so it doesn’t matter which server you will query for read or write as replication is done automatically by qdrant. The setup discussed here is the smallest setup for high-availability. With more nodes and different settings you can scale out the cluster for performance by also spreading vectors across multiple machines (sharding) thus increasing reads.

1. Download the Binary from GitHub

Visit the Qdrant GitHub release page and download the latest binary suitable for your server’s operating system.

wget https://github.com/qdrant/qdrant/releases/download/vX.Y.Z/qdrant_vX_Y_Z_x86_64-unknown-linux-gnu.tar.gz
tar -xzf qdrant_vX_Y_Z_x86_64-unknown-linux-gnu.tar.gz

2. Install Qdrant on Your Server

Copy the binary to a directory in your PATH and set execute permissions.

sudo cp qdrant /usr/local/bin/
chmod +x /usr/local/bin/qdrant

3. Write the Configuration File

Create a configuration file, config.yml, save it to /etc/qdrant/config.yml and adjust the following settings

# config.yml
storage:
  collection
    # Number of replicas of each shard that network tries to maintain
    replication_factor: 3
    
    # How many replicas should apply the operation for us to consider it successful
    write_consistency_factor: 3

service:
  api_key: YourTotalSecretKey

cluster:
  # Use `enabled: true` to run Qdrant in distributed deployment mode
  enabled: true

4. Setup Systemd Service

Create a systemd service file to manage Qdrant processes.

sudo nano /etc/systemd/system/qdrant.service

Add the following content:

[Unit]
Description=qdrant vector store
After=network.target

[Service]
User=YOURusername
Group=YOURgroupname
WorkingDirectory=/working/dir
ExecStart=qdrant --uri 'http://1xx.xxx.xxx.xxx:6335' --config-path /etc/qdrant/config.yaml
Restart=always
LimitNOFILE=10000

[Install]
WantedBy=multi-user.target

Qdrant will - depending on your system - run massively parallel and store vectors in files on disk. Thus it makes sense to have the data dir located on a fast NVME and adjust the OS file descriptors. Look at LimitNOFILE=10000 in above systemd.service. This allows qdrant to open 10.000 files at the same time. You might need to adjust for your specific setup and remember to leave enough file descriptors left for other services on that machine. When doing your performance testing, you can adjust accordingly.

In the ExecStart line the parameter –uri points to the local system IP, that needs to be accessible by the other two nodes we will setup later!

Finally reload systemd and enable the service:

sudo systemctl daemon-reload
sudo systemctl enable qdrant.service
sudo systemctl start qdrant.service

This brings up the head node. 1/3 done, 2 more to go: repeat the above procedure on the other two nodes, but be sure to make a small adjustment in the qdrant.service file:

ExecStart=qdrant --bootstrap 'http://1xx.xxx.xxx.xxx:6335' --config-path /etc/qdrant/config.yaml

the –bootstrap parameter now should point to the IP address of our head node we setup at first!

5. Bootstrap the Cluster

Initialize the Qdrant cluster by starting the other nodes with the configuration file you created via systemd.

sudo systemctl daemon-reload
sudo systemctl enable qdrant.service
sudo systemctl start qdrant.service

6. Join Nodes to the Cluster

You can add more nodes by just repeating the above procedure. Depending on your setup you might want to adjust config in this case.

7. Test Service with Curl

Finally verify that qdrant is running and accessible using curl from your own machine by:

curl -X GET http://anynodefromthecluster:6333/cluster/ --header 'api-key: YourTotalSecretKey'

If everything is set up correctly, you should receive a similar response indicating the cluster is alive:

{"result":{
		"status":"enabled",
		"peer_id":7145995093486746,
		"peers":{
			"1560423695854934":{"uri":"https://1.0.0.1:6335/"},
			"7145995093486746":{"uri":"https://1.1.0.2:6335/"},
			"415815538142464":{"uri":"https://1.1.1.3:6335/"}
			},
		"raft_info":{
			"term":2383,
			"commit":1271,
			"pending_operations":0,
			"leader" :415815538142464,
			"role":"Follower",
			"is_voter":true
			},
		"consensus_thread_status":{
			"consensus_thread_status":"working",
			"last_update":"2025-03-30T15:01:26.732384037Z"
			},
		"message_send_failures":{}
		},
"status":"ok",
"time":0.000022931}

Conclusion

By following these steps, you have successfully deployed a native, distributed qdrant vector store with sharding and replication. This setup is ready to serve as a load-balanced backend for RAG applications, providing efficient and scalable vector search capabilities. Enjoy exploring the full potential of your new AI infrastructure!

Load-Balanced RAG Using Qdrant

Step-by-Step Installation Guide

1. Download the Binary from GitHub

2. Install Qdrant on Your Server

3. Write the Configuration File

4. Setup Systemd Service

5. Bootstrap the Cluster

6. Join Nodes to the Cluster

7. Test Service with Curl

Conclusion

_

Categories

Tags

Portfolio

Marketing

Automation

Insights

Home

Products

Services

Sign in/up

Contact

Legal

Scalable RAG backend

Load-Balanced RAG Using Qdrant

Step-by-Step Installation Guide

1. Download the Binary from GitHub

2. Install Qdrant on Your Server

3. Write the Configuration File

4. Setup Systemd Service

5. Bootstrap the Cluster

6. Join Nodes to the Cluster

7. Test Service with Curl

Conclusion

_

Categories

Tags