Deerwalk Stop-Process Resolution Analyst — RAG-Powered Flask Application
Architecture summary of the AI-powered resolution assistant
This application is a Retrieval-Augmented Generation (RAG) system built to help teams query historical IT stop-process resolution tickets. It ingests structured Excel data, encodes it into a searchable vector index, and answers natural-language questions using a large language model.
| Component | Technology | Role | Provider |
|---|---|---|---|
| Language Model | LLaMA 3.3 70B Versatile | Answer generation | Groq (cloud API) |
| Embedding Model | BAAI/bge-small-en-v1.5 | Semantic encoding | HuggingFace (local) |
| Vector Store | FAISS | Similarity search | Meta (local) |
| Orchestration | LangChain | Chain & prompt management | LangChain OSS |
| Web Framework | Flask | REST API server | Pallets |
| Data Ingestion | pandas + openpyxl | Excel parsing | PyData |
Local sentence-embedding model for semantic vector generation
| Parameter | Value | Purpose |
|---|---|---|
| Model Name | BAAI/bge-small-en-v1.5 | Pre-trained checkpoint from HuggingFace Hub |
| Device | cpu | Shared hosting compatibility — no GPU required |
| Normalize Embeddings | True | Produces unit vectors for cosine distance comparisons |
| Batch Size | 8 | Low memory footprint for constrained environments |
| Vector Dimensions | 384 | Compact embedding size; fast index lookups |
Facebook AI Similarity Search — persistent local index
FAISS (Facebook AI Similarity Search) stores and retrieves document embeddings using approximate nearest-neighbor search. The index is persisted to disk, so it is only built once from the Excel data and reloaded on subsequent starts.
| Setting | Value | Impact |
|---|---|---|
| Index Path | /home/techqbcv/ai.techmauri.com/faiss_index | Persisted to disk, no re-indexing on restart |
| Retrieval k | 10 | Top 10 most-similar tickets sent to LLM as context |
| Search Type | Approximate NN (L2 / cosine) | Fast sub-linear lookup even at scale |
| Deserialization | allow_dangerous=True | Required for loading pickle-based FAISS index |
allow_dangerous_deserialization=True is required to load persisted FAISS indexes (they use Python pickle). Ensure the faiss_index directory is not writable by untrusted users.
Full dependency inventory with roles
torch.set_num_threads(1) for shared-server stability./, /chat, and /health endpoints. ProxyFix handles reverse-proxy headers from Passenger/cPanel.Document objects.| Library | Package | Role |
|---|---|---|
| LangChain Core | langchain-core | Runnables, prompts, output parsers |
| LangChain Groq | langchain-groq | ChatGroq client integration |
| LangChain HuggingFace | langchain-huggingface | HuggingFaceEmbeddings wrapper |
| LangChain Community | langchain-community | FAISS vector store integration |
| FAISS | faiss-cpu | Similarity search index |
| PyTorch | torch | Tensor computation for embeddings |
| Transformers | transformers | HuggingFace model loading (transitive) |
| pandas | pandas | Excel data ingestion |
| Flask | flask | HTTP REST API |
| Werkzeug | werkzeug | WSGI middleware (ProxyFix) |
Cloud-hosted inference for answer generation
| Parameter | Value | Rationale |
|---|---|---|
| Model | llama-3.3-70b-versatile | High reasoning capability for technical IT analysis |
| Temperature | 0.2 | Near-deterministic — reduces hallucination for factual Q&A |
| Max Tokens | 2048 | Enough for multi-step resolutions with bullet points |
| API Key Source | Hardcoded (env var recommended) | Set via GROQ_API_KEY environment variable |
| Inference Hardware | Groq LPU (cloud) | No local GPU required; offloaded to Groq |
The prompt constrains the LLM to act strictly as a resolution analyst, preventing hallucination by grounding answers in retrieved context:
GROQ_API_KEY) immediately to prevent key leakage via version control or log files.
LangChain Expression Language pipeline composition
The application uses LangChain Expression Language (LCEL) to compose the retrieval and generation steps into a single chainable pipeline. Each component is a Runnable that passes output to the next step.
| LCEL Component | Class / Source | Input → Output |
|---|---|---|
| Retriever | VectorStoreRetriever (FAISS) | Query string → List[Document] |
| format_docs | Custom lambda | List[Document] → formatted string |
| RunnablePassthrough | langchain_core.runnables | Question string → same string |
| ChatPromptTemplate | langchain_core.prompts | Dict → PromptValue |
| ChatGroq | langchain_groq | PromptValue → AIMessage |
| StrOutputParser | langchain_core.output_parsers | AIMessage → str |
Step-by-step data flow from user query to response
| Step | Action | Component | Output |
|---|---|---|---|
| 1 | User submits question via POST /chat |
Flask route | JSON { "question": "..." } |
| 2 | Question is embedded into a vector | BGE-Small (local) | 384-dim float vector |
| 3 | Top-10 nearest documents retrieved | FAISS retriever | List of 10 Document objects |
| 4 | Documents concatenated with --- separator |
format_docs() |
Plain text context block |
| 5 | Context + question injected into prompt template | ChatPromptTemplate | Formatted chat messages |
| 6 | Prompt sent to Groq API for LLaMA inference | ChatGroq (cloud) | AIMessage with resolution text |
| 7 | Response parsed to plain string and returned | StrOutputParser + Flask | JSON { "answer": "..." } |
Environment variables, paths, and threading constraints
| Constant | Value | Purpose |
|---|---|---|
EXCEL_FILE | .../Teamwise_Stop_Issues_Resolution_2023.xlsx | Multi-sheet knowledge base source |
FAISS_PATH | .../faiss_index | Persistent vector index directory |
EMBEDDING_MODEL | BAAI/bge-small-en-v1.5 | HuggingFace model identifier |
LLM_MODEL | llama-3.3-70b-versatile | Groq model identifier |
APPLICATION_ROOT | /spartan | cPanel URL prefix for all routes |
| Variable | Value | Effect |
|---|---|---|
OMP_NUM_THREADS | 1 | Prevents OpenMP from spawning extra threads |
MKL_NUM_THREADS | 1 | Intel MKL thread cap for shared hosting |
RAYON_NUM_THREADS | 1 | Rust-based Rayon thread limit (tokenizers) |
TOKENIZERS_PARALLELISM | false | Prevents HuggingFace tokenizer deadlock |
TQDM_DISABLE | 1 | Suppresses tqdm progress bars (BrokenPipeError fix) |
HF_HUB_DISABLE_PROGRESS_BARS | 1 | Suppresses HuggingFace Hub download bars |
TMPDIR / TEMP / TMP | /home/techqbcv/tmp | Redirects temp files to writable home directory |
/home/techqbcv/tmp/app_startup.log on import to prevent BrokenPipeError caused by Passenger closing the standard streams before tqdm can write.
| Endpoint | Method | Description |
|---|---|---|
/spartan/ | GET | Serves the frontend HTML UI with doc count and status |
/spartan/chat | POST | Accepts { "question": "..." }, returns { "answer": "..." } |
/spartan/health | GET | Returns system status, doc count, and KB load state |