基于向量的召回算法及其在个性化广告新闻中的应用实践-刘政

上传人:郭** 文档编号:272524413 上传时间:2022-04-02 格式:DOC 页数:31 大小:3.16MB
返回 下载 相关 举报
基于向量的召回算法及其在个性化广告新闻中的应用实践-刘政_第1页
第1页 / 共31页
基于向量的召回算法及其在个性化广告新闻中的应用实践-刘政_第2页
第2页 / 共31页
基于向量的召回算法及其在个性化广告新闻中的应用实践-刘政_第3页
第3页 / 共31页
基于向量的召回算法及其在个性化广告新闻中的应用实践-刘政_第4页
第4页 / 共31页
基于向量的召回算法及其在个性化广告新闻中的应用实践-刘政_第5页
第5页 / 共31页
点击查看更多>>
资源描述

《基于向量的召回算法及其在个性化广告新闻中的应用实践-刘政》由会员分享,可在线阅读,更多相关《基于向量的召回算法及其在个性化广告新闻中的应用实践-刘政(31页珍藏版)》请在金锄头文库上搜索。

1、 Embedding Based Recall: Practice,Progress and PerspectivesZheng Liu , Jianxun Lian, Xing XieSocial Computing Group, MSRAAug 15 , 2021thReinforced Anchor Knowledge Graph Generation for News Recommendation Reasoning, Liu et. al.KDD 2021 Outline Overview Multi-Stage Pipeline EBR: Pros and Cons Embeddi

2、ng learning algorithms Negative Augmentation Hard Negative Sampling Diversified representation Training as knowledge distillation Things beyond learning algorithms Efficiency issues Combo of sparse and dense Overview: Multi-Stage PipelineL1 Stage(Recall)L2 Stage(Rank)L3 Stage(Re-rank) 1001 1069 1013

3、 Rank : high-precision, KPI-oriented Recall : fast, accurate, comprehensive Overview: Multi-Stage PipelineFast, AccurateAdsQueryXbox 360 4GB SlimConsoleMicrosoft Xbox 360Microsoft Xbox 360 E250GBXbox 360 Game SystemHDMI Overview: Multi-Stage PipelineVocabulary mis-matchAdsQueryXbox 360 4GB SlimConso

4、leMicrosoft gameMicrosoft Xbox 360 Econsole250GBXbox 360 Game SystemHDMI Overview: EBR, Pros and CosHigh-generalizable,Relatively fastMicrosoft gameconsoleXbox 360 Game SystemANN Index (PQ, HNSW)HDMI Overview: EBR, Pros and ConsModels training isdata-intensive Overview : EBR, Pros and Consembeddings

5、 canbeambiguousNintendo switchconsole the Xbox 360 console harddrive Xbox 360 GameSystem HDMIHard drive will be at least 20GBmodel supports HD graphics in 16 x 9wide-screen, with anti-aliasing Overview : EBR, Pros and ConsAlignmentUniformity1 SimCSE, Gao et.al.2 Understanding ContrastiveLearning ICM

6、L 2020, Wang et. al. Outline Overview Multi-Stage Pipeline EBR: Pros and Cons Embedding learning algorithms Negative Augmentation Hard Negative Sampling Training as distillation Diversified representation Things beyond learning algorithms Efficiency issues Combo of sparse and dense Algos: Negative A

7、ugmentationQKOnes positiveusedLarger batch size-others negativehigher accuracyDPR, Karpukhin et. al.batch Algos: Negative AugmentationExpand#negativesby +Dev-1Dev-Cross-device negativesamplingDev-RocketQA , Ding et. al. Algos: Negative AugmentationCross-device valuesVirtual Differentiable Cross-SoPQ

8、, Xiao and Liu et.made virtual-Device Sharing (V-DCS)al.differentiable1. Generate embeddings foreach batch, one batch/device2. Broadcast embeddings to alldevices3. Compute the global NCE-losssymmetrically on all devices,based on the broadcastedembeddings4. Back-propagate and reducethe gradients on a

9、ll devices Algos: Hard Negative SamplingApproximate Nearest NeighborGet hard negatives byNegative Contrastive Learning forDense Text Retrieval (ANCE,Xiong et. al.)sample from ANN search1. Learn embedding model within-batch negative2. Build ANN index and get hardnegatives3. Update embedding model wit

10、hhard negative4. Repeat 2. and 3. untilconverge Algos: Training as distillationKeywords sorted byTraining as distillation Training teacher modelwith labeled data Annotate unlabeled datawith teacherrelevanceWeak-annotatedlabelEst. Train student withlabeled and weak-annotated datarelevanceStudentTeach

11、erRocketQA , Ding et. al.Weak Annotation, Li et. al.KeywordQueryQuery + Keyword Algos: diversified representationUser history may consistofhighly diverse eventsUserembeddings canbe ambiguousTarget(news/ads)User history(Webbrowsings) Algos: diversified representationComprehensive,Elastic Multi-embedd

12、ingRetrieval (Bloom-filter styleinterest extractor)elastic,parameter-efficientANN Index Generate item embeddings Compute item embeddingsmembership via learned hash Group items based on binarycodes010000101000 Aggregate items with the samebinary codes for user3 hash-class, eachembeddingshas 4 functio

13、ns-34 latent paritionsOctopus , Liu et. al. Outline Overview Multi-Stage Pipeline EBR: Pros and Cons Embedding learning algorithms Negative Augmentation Hard Negative Sampling Diversified representation Training as knowledge distillation Things beyond learning algorithms Efficiency issues Combo of sparse and dense Things beyond: EfficiencyDesired properties aboutANN (HSWN, PQ,ANNOY ) Accurate (high recall) Fast (low latency) Light (low mem cost)Not enough budgetto host the index inMEM FAISS, Facebook AIResearch

展开阅读全文
相关资源
相关搜索

当前位置:首页 > 研究报告 > 统计年鉴/数据分析

电脑版 |金锄头文库版权所有
经营许可证:蜀ICP备13022795号 | 川公网安备 51140202000112号