Azure AI Semantic search/hybrid search: conversational query returns results influenced by generic terms ("books") instead of topic ("anatomy")

Summary

The issue at hand is that conversational queries in Azure AI Search are returning results influenced by generic terms (e.g., “books”) instead of focusing on the topic intent (e.g., “anatomy”). This is happening despite using semantic search and vector search configurations.

Root Cause

The root cause of this issue is that the search algorithm is considering all words in the query, not just the topic being asked for. This is leading to irrelevant results being returned, even when the user intent is clear.

Why This Happens in Real Systems

This issue occurs in real systems due to the following reasons:

Insufficient training data: The model may not have been trained on enough data to understand the nuances of conversational queries.
Inadequate semantic configuration: The semantic configuration may not be set up correctly, leading to the model not being able to understand the context of the query.
Overemphasis on generic terms: The model may be giving too much weight to generic terms, leading to irrelevant results being returned.

Real-World Impact

The impact of this issue is that users are not getting relevant results, which can lead to:

Frustration: Users may become frustrated with the search functionality and stop using it.
Loss of trust: Users may lose trust in the search functionality and the overall system.
Decreased adoption: The system may not be adopted as widely as it could be due to the poor search functionality.

Example or Code (if necessary and relevant)

public async Task<(long? TotalCount, List Results)> OptimizedSearchAsync(
    string userQuery, 
    float[] vector, 
    int top = 5)
{
    var vectorQuery = new VectorizedQuery(vector.AsMemory()) 
    { 
        KNearestNeighborsCount = 50, 
        Fields = { "embedding" } 
    };
    var options = new SearchOptions 
    { 
        Size = top, 
        IncludeTotalCount = true, 
        QueryType = SearchQueryType.Semantic, 
        SemanticSearch = new SemanticSearchOptions 
        { 
            SemanticConfigurationName = "semantic-trial", 
            QueryCaption = new QueryCaption(QueryCaptionType.None), 
            QueryAnswer = new QueryAnswer(QueryAnswerType.None) 
        }, 
        VectorSearch = new VectorSearchOptions 
        { 
            Queries = { vectorQuery } 
        } 
    };
    options.Select.Add(nameof(BookDocument.Id));
    options.Select.Add(nameof(BookDocument.Title));
    options.Select.Add(nameof(BookDocument.Author));
    var response = await _searchClient.SearchAsync(userQuery, options);
    var list = new List();
    await foreach (var r in response.Value.GetResultsAsync())
    {
        list.Add(new ScoredBookResult 
        { 
            Document = r.Document, 
            Score = r.SemanticSearch.RerankerScore?? r.Score?? 0 
        });
    }
    return (response.Value.TotalCount, list);
}

How Senior Engineers Fix It

Senior engineers can fix this issue by:

Reconfiguring the semantic search: Adjusting the semantic configuration to give more weight to the topic intent.
Fine-tuning the model: Fine-tuning the model to better understand conversational queries.
Using more advanced techniques: Using more advanced techniques such as natural language processing (NLP) to better understand the context of the query.

Why Juniors Miss It

Juniors may miss this issue due to:

Lack of experience: Lack of experience with conversational queries and semantic search.
Insufficient understanding: Insufficient understanding of how the search algorithm works.
Overreliance on default settings: Overreliance on default settings and not taking the time to configure the search correctly.