LocalLLMviaFlaskStreamingProxy

Chat with LLM

About This Chat

This chat allows you to communicate with a local LLM (Llama 3.1) running on the edge. The website connects to a locally hosted Flask server, which acts as a proxy to route chat requests to the LLM. This means that any website can leverage the capabilities of your local LLM to offer a dynamic chat experience tailored to the website content.

Enable the toggle switch to “Chat with the Website” to provide additional context from the website in your first message. The LLM can then respond with knowledge about the content of the website and provide interactive assistance, all while the model runs locally on your edge device.

Flask Server Code: The Flask server acts as a proxy that facilitates communication between this website and your locally running LLM. Here’s the Python code to set up the Flask server:

import sys
from flask import Flask, request, jsonify, Response
from flask_cors import CORS
import requests

app = Flask(__name__)

# Enable CORS for all routes
CORS(app)

# Default values for LLM server URL and model name
DEFAULT_LLM_SERVER_URL = "http://localhost:11434/api/generate"
DEFAULT_MODEL_NAME = "llama3.1"

# Get the LLM server URL and model name from command-line arguments, or use defaults
LLM_SERVER_URL = sys.argv[1] if len(sys.argv) > 1 else DEFAULT_LLM_SERVER_URL
MODEL_NAME = sys.argv[2] if len(sys.argv) > 2 else DEFAULT_MODEL_NAME

# Define the endpoint that the HTML will call
@app.route('/api/generate', methods=['POST'])
def generate():
    payload = request.json

    # Log the incoming payload
    print(f"Received request payload: {payload}")

    # Override the model in the payload with the model name from the command-line argument
    payload["model"] = MODEL_NAME

    # Ensure we're passing the 'stream' flag as True to the LLM server
    payload["stream"] = True

    # Make a request to the LLM server with the same payload
    try:
        response = requests.post(LLM_SERVER_URL, json=payload, stream=True)

        # Check if the response is successful
        if response.status_code == 200:
            def generate_stream():
                for chunk in response.iter_content(chunk_size=1024):
                    if chunk:
                        yield chunk.decode('utf-8')

            return Response(generate_stream(), content_type='text/plain; charset=utf-8')
        else:
            return jsonify({"error": "LLM server response error"}), response.status_code
    except Exception as e:
        print(f"Error: {e}")
        return jsonify({"error": "Internal server error"}), 500

if __name__ == "__main__":
    app.run(host="0.0.0.0", port=5000)  # Running on local machine, port 5000
        

This code sets up a simple Flask server that receives requests from the website, forwards them to your local LLM server (e.g., Ollama or any other LLM running locally), and streams back the response to the client.

Chat with LLM

Debug Log