Join Data Science Interview MasterClass (in 3 weeks) 🚀 led by FAANG Data Scientists | Just 8 slots remaining...

Model Serving

0:0

In serving time, the following procedure is applied in the two-stage recommender system.

1. Query Processing – preprocess text query with lowercasing, punctuation and stopwords removal and lemmatization or stemming. Tokenize the words then create a word embedding using text encoding models such as Bert or Word2Vec.

2. Embedding Model – retrieve the user embedding by incorporating search query, user ID and subset of user features (as seen in the retrieval model tower). Then, find the nearest product embeddings using the approximate nearest index.

3. Approximate Nearest Neighbor Index – ANN index is a vector database optimized for fast retrieval of nearest neighbors. It uses an optimization technique that involves Product Quantization, Vector Quantization, or hash function to index vectors in such a way for efficient retrieval. Once the user embedding is queried in this database, ~500 product candidates could be retrieved.

4. Ranking Model – The ranking model can now incorporate embeddings and additional features from a feature store to generate a score that a user will purchase a product. The products can then be sorted based on these probabilities, then sent to the client.