AIVSP: Vector Search in Practice
Production Patterns for Real-World Vector Search
Available for 1-year access and lifetime access (no expiration)
Trailer
Overview
Course Overview
This online course builds on AIVSE: Vector Search Essentials to cover production patterns for vector search in SQL Server 2025. Designed for SQL Server DBAs and developers who have completed the prerequisite course, you’ll learn how to select embedding models, design schemas, measure search quality, handle edge cases, and implement advanced patterns like reranking and RAG. Through detailed explanations and hands-on demonstrations, you’ll gain the practical knowledge needed to deploy and maintain vector search in production environments.
The course primarily uses SemanticSonarDB, a sample database with 100,000 Minneapolis property listings and pre-generated embeddings. Module 6 also uses SemanticShoresDB, which includes a DiskANN vector index for demonstrating VECTOR_SEARCH behavior. Both databases are available on GitHub. Demonstrations use local embedding generation via Ollama (bge-m3) and a local LLM (qwen2.5:3b) for RAG patterns.
Instructor: Joe Sack
Prerequisite: AIVSE: Vector Search Essentials
Level: 200 (intermediate)
Need Help Justifying Training? Here’s a letter to your manager explaining why SQLskills training is worthwhile and a list of community blog posts about our classes.
What You’ll Learn
- Embedding model selection and evaluation
- Schema design patterns for long-term maintainability
- Measuring search quality with precision, recall, MRR, and nDCG metrics
- Diagnosing vector search failure modes
- Solving the post-filtering problem with candidate pool sizing
- Reranking strategies including Reciprocal Rank Fusion
- Document chunking approaches
- RAG pipeline implementation with local LLMs
- Embedding freshness maintenance
- Vector database selection criteria
Platform Compatibility
The course content applies to vector search across the SQL Server family. Here’s what you need to know for each platform:
- SQL Server 2025: Full support. The GitHub repo includes a .bak file you can restore directly to any local or VM-based instance.
- Azure SQL Managed Instance: Full support when using the 2025 update policy. You can restore the .bak file directly.
- Azure SQL Database: Supported with minor setup differences. Azure SQL Database doesn’t support .bak restores, so you’ll import the lab database via dacpac/bacpac or scripted deployment instead. Once set up, all queries and concepts work the same.
The way you write and reason about vector queries is effectively the same across all platforms. All theory around embeddings, schema design, RAG, and hybrid search is cloud-agnostic.
Curriculum
Module 1: Course Introduction
Sets the stage for the course, introducing the instructor, prerequisites, and outlining what you’ll learn across all modules.
- Course objectives and target audience
- Prerequisite: Vector Search Essentials
- Instructor background and experience
- Module overview and learning path
- Demo environment setup (Ollama, SemanticSonarDB, SemanticShoresDB)
Module 2: Choosing Your Embedding Model
Covers how to evaluate and select embedding models for your specific use case, beyond relying solely on benchmark scores.
- The MTEB leaderboard: what it measures and misses
- OpenAI vs. local models (Ollama with bge-m3)
- Dimension reduction and its surprises
- Domain-specific considerations
- Cost modeling: API vs. local inference
- Demo: Same query, different models (bge-m3 vs. all-minilm) produce different results
Module 3: Embedding Schema Design Patterns
Describes DBA-focused patterns for storing and managing embeddings in SQL Server, with an eye toward long-term maintainability.
- Inline vs. normalized: where to store embeddings
- Key tradeoffs: query simplicity, schema flexibility, backup strategy
- Foreign key patterns: 1:1 vs. 1:many
- Model versioning: tracking provenance
- Change tracking: detecting stale embeddings
- Demo: Hash-based change detection with triggers, batch re-embedding patterns
Module 4: Measuring Search Quality
Introduces four key metrics for evaluating vector search quality, moving beyond “it seems to work” to measurable outcomes.
- Precision@K: Are the returned results actually relevant?
- Recall@K: Are you missing results that should be found?
- MRR (Mean Reciprocal Rank): How soon does the first relevant result appear?
- nDCG: Is the ordering optimal (especially with graded relevance)?
- Choosing the right metric for your use case
- Demo: Calculate all four metrics against property search ground truth
Module 5: When Vector Search is Wrong
Sets realistic expectations by exploring cases where vector search fails and when alternative approaches are needed.
- Exact match requirements
- Rare terms and specialized vocabulary
- Negation problems (“no fireplace” returns fireplace properties)
- Proper nouns and identifiers
- When full-text search or hybrid approaches win
- Demo: Three failure modes and their fixes (negation, exact match, numerical constraints)
Module 6: The Post-Filtering Problem
Explains why filtering after ANN retrieval can produce fewer results than expected and how to size candidate pools appropriately.
- Exact search vs. ANN search: when pre-filtering works
- VECTOR_SEARCH post-filter behavior
- App-side filtering trap
- Sizing candidate pools for filter survival
- Detecting and diagnosing filter failures
- Demo: Why small candidate pools fail, calculating pool size from selectivity
Module 7: Reranking and Result Quality
Shows how to improve result ordering beyond embedding similarity using cross-encoders, RRF, and weighted scoring.
- Why embedding similarity isn’t enough
- Cross-encoder models: query-document pairs
- The oversample-and-curate pattern
- Reciprocal Rank Fusion (RRF)
- Weighted scoring approaches
- When to use RRF vs. weighted scoring
- Demo: RRF combining vector and full-text search rankings
Module 8: Chunking Strategies
Covers how to break documents into searchable pieces that respect embedding model limits while preserving context.
- Token limits and why chunking matters
- Tokens vs. characters: the measurement gap
- AI_GENERATE_CHUNKS: SQL Server’s approach
- Overlap strategies to preserve context
- Parent-child retrieval patterns
- Demo: Fixed vs. semantic chunking on inspection reports
Module 9: RAG Patterns
Covers Retrieval-Augmented Generation architecture, using SQL Server for retrieval and the application tier for LLM orchestration.
- The RAG architecture: retrieve, augment, generate
- Our approach: SQL Server for retrieval, application tier for LLM orchestration
- RAG vs. Fine-Tuning for enterprise knowledge
- Context window management
- Citation and source attribution
- Local LLM setup with Ollama
- Demo: “Inspection translator” RAG advisor for home buyers
Module 10: Keeping Embeddings Current
Addresses the operational challenge of maintaining embedding freshness as source data and models change over time.
- When embeddings go stale
- Detecting quality degradation
- Re-embedding strategies: full vs. incremental
- Version control and model transitions
- Monitoring, costs, and automation patterns
- Demo: Zero-downtime model migration via config table
Module 11: Use Case Patterns
Explores real-world applications of vector search across different business scenarios.
- “Find cases like this”: support ticket similarity
- “What caused this?”: root cause analysis
- Duplicate detection and deduplication
- Semantic code search
- Recommendation patterns
Module 12: Future Look – Beyond Text
Looks ahead to multi-modal embeddings that enable search across images, documents, and audio.
- Multi-modal embeddings: CLIP and beyond
- Image search patterns
- ColPali: documents as images
- Audio embeddings
Module 13: Vector Database Selection
Helps you decide when SQL Server’s native vector capabilities are sufficient and when dedicated vector databases make sense.
- Five production realities
- SQL Server native vs. dedicated vector databases
- When SQL Server is enough
- Hybrid architectures
- Testing recall on your own data
Questions?
If you have any questions not answered by our Immersion Events F.A.Q., please contact us.