In the Download page, we
provide the research community with:
A download
link to get access to Baidu-ULTR dataset and the access dataset and pretrained language model ;
An instruction on how to load dataset and use the pretrained model.
In the Description page, we describe all the user behaviors and page displayed features in details.
We welcome researchers to share their method performances on Baidu-ULTR, and contribute to an abundant leaderboard with open source code resource.
Advanced Semantic Feature
We provide the original text of queries and documents after desenibilisation.
It enables to construct advanced semantic features by pretrained language model.
We provide a series of large-scale pretrained langage model trained with MLM paradigm.
Diverse Display Information
Different from other datasets which only provides the ranking position, Baidu-ULTR provides more information, including the displayed url, the displayed title of document, the displayed abstract of
document, the category of the document, the height of SERP for advanced study on biases other than those related with the ranking position.
Detailed description can be found in the description page.
Rich User Behaviors on Search Result pages (SERP)
Both real-world user click and other user behaviors, e.g., skip, dwelling time, and displayed time, have been recorded,
offering the possibility for optimizing user engagement optimization and exploring the multi-task learning in ULTR.
Detailed description can be founded in the description page.
Large Dataset Scale
Baidu-ULTR has 1.2 billion searching sessions for training which supports to train large-scale language model.
7,008 expert annotationed queries is available for evaluation, which is large enough for providing reliable performance.