Home

Basic Content

Download

In the Download page, we provide the research community with:
A download link to get access to Baidu-ULTR dataset and the access dataset and pretrained language model ;
An instruction on how to load dataset and use the pretrained model.

Description

In the Description page, we describe all the user behaviors and page displayed features in details.

Leaderboard

We welcome researchers to share their method performances on Baidu-ULTR, and contribute to an abundant leaderboard with open source code resource.

Dataset Characteristic

Advanced Semantic Feature
We provide the original text of queries and documents after desenibilisation. It enables to construct advanced semantic features by pretrained language model. We provide a series of large-scale pretrained langage model trained with MLM paradigm.

Diverse Display Information
Different from other datasets which only provides the ranking position, Baidu-ULTR provides more information, including the displayed url, the displayed title of document, the displayed abstract of document, the category of the document, the height of SERP for advanced study on biases other than those related with the ranking position. Detailed description can be found in the description page.

Rich User Behaviors on Search Result pages (SERP)
Both real-world user click and other user behaviors, e.g., skip, dwelling time, and displayed time, have been recorded, offering the possibility for optimizing user engagement optimization and exploring the multi-task learning in ULTR. Detailed description can be founded in the description page.

Large Dataset Scale
Baidu-ULTR has 1.2 billion searching sessions for training which supports to train large-scale language model. 7,008 expert annotationed queries is available for evaluation, which is large enough for providing reliable performance.

Reference

If you use this dataset of our reproduced results, please cite:

A Large Scale Search Dataset for Unbiased Learning to Rank
Lixin Zou, Haitao Mao, Xiaokai Chu, Jiliang Tang, Wenwen Ye, Shuaiqiang Wang, and Dawei Yin. (*: equal contributions)

The BibTex infomation is detached as:
@inproceedings{
    zou2022large,
    title={A Large Scale Search Dataset for Unbiased Learning to Rank},
    author={Lixin Zou and Haitao Mao andXiaokai Chu and Jiliang Tang and Wenwen Ye and Shuaiqiang Wang and Dawei Yin},
    journal={Advances in neural information processing systems}
    year={2022}
}

Baidu-ULTR dataset

Large-scale search data with real-world user feedback