000048574 000__ 02393nam\a2200409\i\4500 000048574 001__ 48574 000048574 005__ 20240610132248.0 000048574 006__ m eo d 000048574 007__ cr bn |||m|||a 000048574 008__ 231029s2023\\\\sz\\\\\\\\\\\\000\0\eng\d 000048574 0247_ $$2doi$$a10.1016/j.wpi.2023.102192 000048574 035__ $$a(OCoLC)1411269862 000048574 040__ $$aSzGeWIPO$$beng$$erda$$cSzGeWIPO$$dCaBNVSL 000048574 041__ $$aeng 000048574 24500 $$aSEARCHFORMER :$$bSemantic patent embeddings by siamese transformers for prior art search. 000048574 264_1 $$aOxford [England] :$$bElsevier Ltd.,$$c2023 000048574 300__ $$a1 volume. 000048574 336__ $$atext$$2rdacontent 000048574 337__ $$acomputer$$2rdamedia 000048574 338__ $$aonline resource$$bcr$$2rdacarrier 000048574 4901_ $$aWorld Patent Information ;$$v73, June, 2023 000048574 520__ $$aThe identification of relevant prior art for patent applications is of key importance for the work of patent examiners. The recent advancements in the field of natural language processing in the form of language models such as BERT enable the creation of the next generation of prior art search tools. These models can generate vectorial representations of input text, enabling the use of vector similarity as proxy for semantic text similarity. We fine-tuned a patent-specific BERT model for prior art search on a large set of real-world examples of patent claims, corresponding passages prejudicing novelty or inventive step, and random text fragments, creating the SEARCHFORMER. We show in retrospective ranking experiments that our model is a real improvement. For this purpose, we compiled an evaluation collection comprising 2014 pairs of patent application and related potential prior art documents. We employed two representative baselines for comparison: (i) an optimized combination of automatically built queries and the BM25 ranking function, and (ii) several state-of-the-art language models, including SentenceTransformers optimized for semantic retrieval. Ranking performance was measured as rank of the first relevant result. Using t-tests, we show that the achieved ranking improvements of the SEARCHFORMER over the baselines are statistically significant. 000048574 542__ $$fhttps://www.sciencedirect.com/science/article/abs/pii/S0172219023000108 000048574 588__ $$aCrossref 000048574 590__ $$aPublished online: 5-Apr-23 000048574 650_0 $$aPatents$$xResearch. 000048574 650_0 $$aPatent searching. 000048574 650_0 $$aPatents. 000048574 650_0 $$aPrior art (Patent law) 000048574 650_0 $$aCopyright. 000048574 7001_ $$aVowinckel, Konrad,$$eauthor. 000048574 7001_ $$aHähnke, Volker D.,$$eauthor. 000048574 7730_ $$tWorld Patent Information$$wWPI 000048574 830_0 $$aWorld Patent Information ;$$v73, June, 2023. 000048574 85641 $$uhttps://doi.org/10.1016/j.wpi.2023.102192$$yonline version 000048574 904__ $$aArticle 000048574 980__ $$aWPI