---
libraryname: sentence-transformers
pipelinetag: sentence-similarity
tags:
- feature-extraction
- sentence-similarity
- mteb
- transformers
- transformers.js
model-index:
results:
type: Classification
dataset:
type: mteb/amazon
counterfactual
name: MTEB AmazonCounterfactualClassification (en)
config: en
split: test
revision: e8379541af4e31359cca9fbcf4b00f2671dba205
metrics:
value: 75.20895522388058
value: 38.57605549557802
value: 69.35586565857854
type: Classification
dataset:
type: mteb/amazonpolarity
name: MTEB AmazonPolarityClassification
config: default
split: test
revision: e2d317d38cd51312af73b3d32a06d1a08b442046
metrics:
value: 91.8144
value: 88.65222882032363
value: 91.80426301643274
type: Classification
dataset:
type: mteb/amazon
reviewsmulti
name: MTEB AmazonReviewsClassification (en)
config: en
split: test
revision: 1399c76144fd37290681b995c656ef9b2e06e26d
metrics:
value: 47.162000000000006
value: 46.59329642263158
type: Retrieval
dataset:
type: arguana
name: MTEB ArguAna
config: default
split: test
revision: None
metrics:
value: 24.253
value: 38.962
value: 40.081
value: 40.089000000000006
value: 33.499
value: 36.351
value: 24.609
value: 39.099000000000004
value: 40.211000000000006
value: 40.219
value: 33.677
value: 36.469
value: 24.253
value: 48.010999999999996
value: 52.756
value: 52.964999999999996
value: 36.564
value: 41.711999999999996
value: 24.253
value: 7.738
value: 0.98
value: 0.1
value: 15.149000000000001
value: 11.593
value: 24.253
value: 77.383
value: 98.009
value: 99.644
value: 45.448
value: 57.965999999999994
type: Clustering
dataset:
type: mteb/arxiv-clustering-p2p
name: MTEB ArxivClusteringP2P
config: default
split: test
revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d
metrics:
value: 45.69069567851087
type: Clustering
dataset:
type: mteb/arxiv-clustering-s2s
name: MTEB ArxivClusteringS2S
config: default
split: test
revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53
metrics:
value: 36.35185490976283
type: Reranking
dataset:
type: mteb/askubuntudupquestions-reranking
name: MTEB AskUbuntuDupQuestions
config: default
split: test
revision: 2000358ca161889fa9c082cb41daa8dcfb161a54
metrics:
value: 61.71274951450321
value: 76.06032625423207
type: STS
dataset:
type: mteb/biosses-sts
name: MTEB BIOSSES
config: default
split: test
revision: d3fb88f8f02e40887cd149695127462bbcf29b4a
metrics:
value: 86.73980520022269
value: 84.24649792685918
value: 85.85197641158186
value: 84.24649792685918
value: 86.26809552711346
value: 84.56397504030865
type: Classification
dataset:
type: mteb/banking77
name: MTEB Banking77Classification
config: default
split: test
revision: 0fd18e25b25c072e09e0d92ab615fda904d66300
metrics:
value: 84.25324675324674
value: 84.17872280892557
type: Clustering
dataset:
type: mteb/biorxiv-clustering-p2p
name: MTEB BiorxivClusteringP2P
config: default
split: test
revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40
metrics:
value: 38.770253446400886
type: Clustering
dataset:
type: mteb/biorxiv-clustering-s2s
name: MTEB BiorxivClusteringS2S
config: default
split: test
revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908
metrics:
value: 32.94307095497281
type: Retrieval
dataset:
type: BeIR/cqadupstack
name: MTEB CQADupstackAndroidRetrieval
config: default
split: test
revision: None
metrics:
value: 32.164
value: 42.641
value: 43.947
value: 44.074999999999996
value: 39.592
value: 41.204
value: 39.628
value: 48.625
value: 49.368
value: 49.413000000000004
value: 46.400000000000006
value: 47.68
value: 39.628
value: 48.564
value: 53.507000000000005
value: 55.635999999999996
value: 44.471
value: 46.137
value: 39.628
value: 8.856
value: 1.429
value: 0.191
value: 21.268
value: 14.649000000000001
value: 32.164
value: 59.609
value: 80.521
value: 94.245
value: 46.521
value: 52.083999999999996
type: Retrieval
dataset:
type: BeIR/cqadupstack
name: MTEB CQADupstackEnglishRetrieval
config: default
split: test
revision: None
metrics:
value: 31.526
value: 41.581
value: 42.815999999999995
value: 42.936
value: 38.605000000000004
value: 40.351
value: 39.489999999999995
value: 47.829
value: 48.512
value: 48.552
value: 45.754
value: 46.986
value: 39.489999999999995
value: 47.269
value: 51.564
value: 53.53099999999999
value: 43.301
value: 45.239000000000004
value: 39.489999999999995
value: 8.93
value: 1.415
value: 0.188
value: 20.892
value: 14.865999999999998
value: 31.526
value: 56.76
value: 75.029
value: 87.491
value: 44.786
value: 50.254
type: Retrieval
dataset:
type: BeIR/cqadupstack
name: MTEB CQADupstackGamingRetrieval
config: default
split: test
revision: None
metrics:
value: 40.987
value: 52.827
value: 53.751000000000005
value: 53.81
value: 49.844
value: 51.473
value: 46.833999999999996
value: 56.389
value: 57.003
value: 57.034
value: 54.17999999999999
value: 55.486999999999995
value: 46.833999999999996
value: 58.372
value: 62.068
value: 63.288
value: 53.400000000000006
value: 55.766000000000005
value: 46.833999999999996
value: 9.191
value: 1.192
value: 0.134
value: 23.448
value: 15.862000000000002
value: 40.987
value: 71.146
value: 87.035
value: 95.633
value: 58.025999999999996
value: 63.815999999999995
type: Retrieval
dataset:
type: BeIR/cqadupstack
name: MTEB CQADupstackGisRetrieval
config: default
split: test
revision: None
metrics:
value: 24.587
value: 33.114
value: 34.043
value: 34.123999999999995
value: 30.45
value: 31.813999999999997
value: 26.554
value: 35.148
value: 35.926
value: 35.991
value: 32.599000000000004
value: 33.893
value: 26.554
value: 38.132
value: 42.78
value: 44.919
value: 32.833
value: 35.168
value: 26.554
value: 5.921
value: 0.8659999999999999
value: 0.109
value: 13.861
value: 9.605
value: 24.587
value: 51.690000000000005
value: 73.428
value: 89.551
value: 37.336999999999996
value: 43.047000000000004
type: Retrieval
dataset:
type: BeIR/cqadupstack
name: MTEB CQADupstackMathematicaRetrieval
config: default
split: test
revision: None
metrics:
value: 16.715
value: 24.251
value: 25.326999999999998
value: 25.455
value: 21.912000000000003
value: 23.257
value: 20.274
value: 28.552
value: 29.42
value: 29.497
value: 26.14
value: 27.502
value: 20.274
value: 29.088
value: 34.293
value: 37.271
value: 24.708
value: 26.809
value: 20.274
value: 5.361
value: 0.915
value: 0.13
value: 11.733
value: 8.556999999999999
value: 16.715
value: 39.587
value: 62.336000000000006
value: 83.453
value: 27.839999999999996
value: 32.952999999999996
type: Retrieval
dataset:
type: BeIR/cqadupstack
name: MTEB CQADupstackPhysicsRetrieval
config: default
split: test
revision: None
metrics:
value: 28.793000000000003
value: 38.582
value: 39.881
value: 39.987
value: 35.851
value: 37.289
value: 34.455999999999996
value: 43.909
value: 44.74
value: 44.786
value: 41.659
value: 43.010999999999996
value: 34.455999999999996
value: 44.266
value: 49.639
value: 51.644
value: 39.865
value: 41.887
value: 34.455999999999996
value: 7.843999999999999
value: 1.243
value: 0.158
value: 18.831999999999997
value: 13.147
value: 28.793000000000003
value: 55.68300000000001
value: 77.99000000000001
value: 91.183
value: 43.293
value: 48.618
type: Retrieval
dataset:
type: BeIR/cqadupstack
name: MTEB CQADupstackProgrammersRetrieval
config: default
split: test
revision: None
metrics:
value: 25.907000000000004
value: 35.519
value: 36.806
value: 36.912
value: 32.748
value: 34.232
value: 31.621
value: 40.687
value: 41.583
value: 41.638999999999996
value: 38.527
value: 39.612
value: 31.621
value: 41.003
value: 46.617999999999995
value: 48.82
value: 36.542
value: 38.368
value: 31.621
value: 7.396999999999999
value: 1.191
value: 0.153
value: 17.39
value: 12.1
value: 25.907000000000004
value: 52.115
value: 76.238
value: 91.218
value: 39.417
value: 44.435
type: Retrieval
dataset:
type: BeIR/cqadupstack
name: MTEB CQADupstackRetrieval
config: default
split: test
revision: None
metrics:
value: 25.732166666666668
value: 34.51616666666667
value: 35.67241666666666
value: 35.78675
value: 31.953416666666662
value: 33.333
value: 30.300166666666673
value: 38.6255
value: 39.46183333333334
value: 39.519999999999996
value: 36.41299999999999
value: 37.6365
value: 30.300166666666673
value: 39.61466666666667
value: 44.60808333333334
value: 46.91708333333334
value: 35.26558333333333
value: 37.220000000000006
value: 30.300166666666673
value: 6.837416666666667
value: 1.10425
value: 0.14875
value: 16.13716666666667
value: 11.2815
value: 25.732166666666668
value: 50.578916666666665
value: 72.42183333333334
value: 88.48766666666667
value: 38.41325
value: 43.515750000000004
type: Retrieval
dataset:
type: BeIR/cqadupstack
name: MTEB CQADupstackStatsRetrieval
config: default
split: test
revision: None
metrics:
value: 23.951
value: 30.974
value: 31.804
value: 31.900000000000002
value: 28.762
value: 29.94
value: 26.534000000000002
value: 33.553
value: 34.297
value: 34.36
value: 31.391000000000002
value: 32.525999999999996
value: 26.534000000000002
value: 35.112
value: 39.28
value: 41.723
value: 30.902
value: 32.759
value: 26.534000000000002
value: 5.445
value: 0.819
value: 0.11
value: 12.986
value: 9.049
value: 23.951
value: 45.24
value: 64.12299999999999
value: 82.28999999999999
value: 33.806000000000004
value: 38.277
type: Retrieval
dataset:
type: BeIR/cqadupstack
name: MTEB CQADupstackTexRetrieval
config: default
split: test
revision: None
metrics:
value: 16.829
value: 23.684
value: 24.683
value: 24.81
value: 21.554000000000002
value: 22.768
value: 20.096
value: 27.230999999999998
value: 28.083999999999996
value: 28.166000000000004
value: 25.212
value: 26.32
value: 20.096
value: 27.989000000000004
value: 32.847
value: 35.896
value: 24.116
value: 25.964
value: 20.096
value: 5
value: 0.8750000000000001
value: 0.131
value: 11.207
value: 8.08
value: 16.829
value: 37.407000000000004
value: 59.101000000000006
value: 81.024
value: 26.739
value: 31.524
type: Retrieval
dataset:
type: BeIR/cqadupstack
name: MTEB CQADupstackUnixRetrieval
config: default
split: test
revision: None
metrics:
value: 24.138
value: 32.275999999999996
value: 33.416000000000004
value: 33.527
value: 29.854000000000003
value: 31.096
value: 28.450999999999997
value: 36.214
value: 37.134
value: 37.198
value: 34.001999999999995
value: 35.187000000000005
value: 28.450999999999997
value: 37.166
value: 42.454
value: 44.976
value: 32.796
value: 34.631
value: 28.450999999999997
value: 6.241
value: 0.9950000000000001
value: 0.133
value: 14.801
value: 10.280000000000001
value: 24.138
value: 48.111
value: 71.245
value: 88.986
value: 36.119
value: 40.846
type: Retrieval
dataset:
type: BeIR/cqadupstack
name: MTEB CQADupstackWebmastersRetrieval
config: default
split: test
revision: None
metrics:
value: 23.244
value: 31.227
value: 33.007
value: 33.223
value: 28.924
value: 30.017
value: 27.668
value: 35.524
value: 36.699
value: 36.759
value: 33.366
value: 34.552
value: 27.668
value: 36.381
value: 43.062
value: 45.656
value: 32.501999999999995
value: 34.105999999999995
value: 27.668
value: 6.798
value: 1.492
value: 0.234
value: 15.152
value: 10.791
value: 23.244
value: 45.979
value: 74.822
value: 91.078
value: 34.925
value: 39.126
type: Retrieval
dataset:
type: BeIR/cqadupstack
name: MTEB CQADupstackWordpressRetrieval
config: default
split: test
revision: None
metrics:
value: 19.945
value: 27.517999999999997
value: 28.588
value: 28.682000000000002
value: 25.345000000000002
value: 26.555
value: 21.996
value: 29.845
value: 30.775999999999996
value: 30.845
value: 27.726
value: 28.882
value: 21.996
value: 32.034
value: 37.185
value: 39.645
value: 27.750999999999998
value: 29.805999999999997
value: 21.996
value: 5.065
value: 0.819
value: 0.11399999999999999
value: 12.076
value: 8.392
value: 19.945
value: 43.62
value: 67.194
value: 85.7
value: 32.15
value: 37.208999999999996
type: Retrieval
dataset:
type: climate-fever
name: MTEB ClimateFEVER
config: default
split: test
revision: None
metrics:
value: 18.279
value: 31.052999999999997
value: 33.125
value: 33.306000000000004
value: 26.208
value: 28.857
value: 42.671
value: 54.557
value: 55.142
value: 55.169000000000004
value: 51.488
value: 53.439
value: 42.671
value: 41.276
value: 48.376000000000005
value: 51.318
value: 35.068
value: 37.242
value: 42.671
value: 12.638
value: 2.045
value: 0.26
value: 26.08
value: 19.805
value: 18.279
value: 46.946
value: 70.97200000000001
value: 87.107
value: 31.147999999999996
value: 38.099
type: Retrieval
dataset:
type: dbpedia-entity
name: MTEB DBPedia
config: default
split: test
revision: None
metrics:
value: 8.573
value: 19.747
value: 28.205000000000002
value: 29.831000000000003
value: 14.109
value: 16.448999999999998
value: 71
value: 77.68599999999999
value: 77.995
value: 78.00200000000001
value: 76.292
value: 77.029
value: 59.12500000000001
value: 43.9
value: 47.863
value: 54.848
value: 49.803999999999995
value: 46.317
value: 71
value: 34.4
value: 11.063
value: 1.989
value: 52.333
value: 43.7
value: 8.573
value: 25.615
value: 53.385000000000005
value: 75.46000000000001
value: 15.429
value: 19.357
type: Classification
dataset:
type: mteb/emotion
name: MTEB EmotionClassification
config: default
split: test
revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37
metrics:
value: 47.989999999999995
value: 42.776314451497555
type: Retrieval
dataset:
type: fever
name: MTEB FEVER
config: default
split: test
revision: None
metrics:
value: 74.13499999999999
value: 82.825
value: 83.096
value: 83.111
value: 81.748
value: 82.446
value: 79.553
value: 86.654
value: 86.774
value: 86.778
value: 85.981
value: 86.462
value: 79.553
value: 86.345
value: 87.32
value: 87.58200000000001
value: 84.719
value: 85.677
value: 79.553
value: 10.402000000000001
value: 1.1119999999999999
value: 0.11499999999999999
value: 32.413
value: 20.138
value: 74.13499999999999
value: 93.215
value: 97.083
value: 98.732
value: 88.79
value: 91.259
type: Retrieval
dataset:
type: fiqa
name: MTEB FiQA2018
config: default
split: test
revision: None
metrics:
value: 18.298000000000002
value: 29.901
value: 31.528
value: 31.713
value: 25.740000000000002
value: 28.227999999999998
value: 36.728
value: 45.401
value: 46.27
value: 46.315
value: 42.978
value: 44.29
value: 36.728
value: 37.456
value: 43.832
value: 47
value: 33.694
value: 35.085
value: 36.728
value: 10.386
value: 1.701
value: 0.22599999999999998
value: 22.479
value: 16.605
value: 18.298000000000002
value: 44.369
value: 68.098
value: 87.21900000000001
value: 30.215999999999998
value: 36.861
type: Retrieval
dataset:
type: hotpotqa
name: MTEB HotpotQA
config: default
split: test
revision: None
metrics:
value: 39.568
value: 65.061
value: 65.896
value: 65.95100000000001
value: 61.831
value: 63.849000000000004
value: 79.136
value: 84.58200000000001
value: 84.765
value: 84.772
value: 83.684
value: 84.223
value: 79.136
value: 72.622
value: 75.539
value: 76.613
value: 68.065
value: 70.58
value: 79.136
value: 15.215
value: 1.7500000000000002
value: 0.189
value: 44.011
value: 28.388999999999996
value: 39.568
value: 76.077
value: 87.481
value: 94.56400000000001
value: 66.01599999999999
value: 70.97200000000001
type: Classification
dataset:
type: mteb/imdb
name: MTEB ImdbClassification
config: default
split: test
revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7
metrics:
value: 85.312
value: 80.36296867333715
value: 85.26613311552218
type: Retrieval
dataset:
type: msmarco
name: MTEB MSMARCO
config: default
split: dev
revision: None
metrics:
value: 23.363999999999997
value: 35.711999999999996
value: 36.876999999999995
value: 36.923
value: 32.034
value: 34.159
value: 24.04
value: 36.345
value: 37.441
value: 37.480000000000004
value: 32.713
value: 34.824
value: 24.026
value: 42.531
value: 48.081
value: 49.213
value: 35.044
value: 38.834
value: 24.026
value: 6.622999999999999
value: 0.941
value: 0.104
value: 14.909
value: 10.871
value: 23.363999999999997
value: 63.426
value: 88.96300000000001
value: 97.637
value: 43.095
value: 52.178000000000004
type: Classification
dataset:
type: mteb/mtop
domain
name: MTEB MTOPDomainClassification (en)
config: en
split: test
revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf
metrics:
value: 93.0095759233926
value: 92.78387794667408
type: Classification
dataset:
type: mteb/mtopintent
name: MTEB MTOPIntentClassification (en)
config: en
split: test
revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba
metrics:
value: 75.0296397628819
value: 58.45699589820874
type: Classification
dataset:
type: mteb/amazon
massiveintent
name: MTEB MassiveIntentClassification (en)
config: en
split: test
revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7
metrics:
value: 73.45662407531944
value: 71.42364781421813
type: Classification
dataset:
type: mteb/amazon
massivescenario
name: MTEB MassiveScenarioClassification (en)
config: en
split: test
revision: 7d571f92784cd94a019292a1f45445077d0ef634
metrics:
value: 77.07800941492937
value: 77.22799045640845
type: Clustering
dataset:
type: mteb/medrxiv-clustering-p2p
name: MTEB MedrxivClusteringP2P
config: default
split: test
revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73
metrics:
value: 34.531234379250606
type: Clustering
dataset:
type: mteb/medrxiv-clustering-s2s
name: MTEB MedrxivClusteringS2S
config: default
split: test
revision: 35191c8c0dca72d8ff3efcd72aa802307d469663
metrics:
value: 30.941490381193802
type: Reranking
dataset:
type: mteb/mind
small
name: MTEB MindSmallReranking
config: default
split: test
revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69
metrics:
value: 30.3115090856725
value: 31.290667638675757
type: Retrieval
dataset:
type: nfcorpus
name: MTEB NFCorpus
config: default
split: test
revision: None
metrics:
at
1
value: 5.465
at
10
value: 13.03
at
100
value: 16.057
at
1000
value: 17.49
at
3
value: 9.553
at
5
value: 11.204
at
1
value: 43.653
at
10
value: 53.269
at
100
value: 53.72
at
1000
value: 53.761
at
3
value: 50.929
at
5
value: 52.461
at
1
value: 42.26
at
10
value: 34.673
at
100
value: 30.759999999999998
at
1000
value: 39.728
at
3
value: 40.349000000000004
at
5
value: 37.915
at
1
value: 43.653
at
10
value: 25.789
at
100
value: 7.754999999999999
at
1000
value: 2.07
at
3
value: 38.596000000000004
at
5
value: 33.251
at
1
value: 5.465
at
10
value: 17.148
at
100
value: 29.768
at
1000
value: 62.239
at
3
value: 10.577
at
5
value: 13.315
type: Retrieval
dataset:
type: nq
name: MTEB NQ
config: default
split: test
revision: None
metrics:
at
1
value: 37.008
at
10
value: 52.467
at
100
value: 53.342999999999996
at
1000
value: 53.366
at
3
value: 48.412
at
5
value: 50.875
at
1
value: 41.541
at
10
value: 54.967
at
100
value: 55.611
at
1000
value: 55.627
at
3
value: 51.824999999999996
at
5
value: 53.763000000000005
at
1
value: 41.541
at
10
value: 59.724999999999994
at
100
value: 63.38700000000001
at
1000
value: 63.883
at
3
value: 52.331
at
5
value: 56.327000000000005
at
1
value: 41.541
at
10
value: 9.447
at
100
value: 1.1520000000000001
at
1000
value: 0.12
at
3
value: 23.262
at
5
value: 16.314999999999998
at
1
value: 37.008
at
10
value: 79.145
at
100
value: 94.986
at
1000
value: 98.607
at
3
value: 60.277
at
5
value: 69.407
type: Retrieval
dataset:
type: quora
name: MTEB QuoraRetrieval
config: default
split: test
revision: None
metrics:
at
1
value: 70.402
at
10
value: 84.181
at
100
value: 84.796
at
1000
value: 84.81400000000001
at
3
value: 81.209
at
5
value: 83.085
at
1
value: 81.02000000000001
at
10
value: 87.263
at
100
value: 87.36
at
1000
value: 87.36
at
3
value: 86.235
at
5
value: 86.945
at
1
value: 81.01
at
10
value: 87.99900000000001
at
100
value: 89.217
at
1000
value: 89.33
at
3
value: 85.053
at
5
value: 86.703
at
1
value: 81.01
at
10
value: 13.336
at
100
value: 1.52
at
1000
value: 0.156
at
3
value: 37.14
at
5
value: 24.44
at
1
value: 70.402
at
10
value: 95.214
at
100
value: 99.438
at
1000
value: 99.928
at
3
value: 86.75699999999999
at
5
value: 91.44099999999999
type: Clustering
dataset:
type: mteb/reddit-clustering
name: MTEB RedditClustering
config: default
split: test
revision: 24640382cdbf8abc73003fb0fa6d111a705499eb
metrics:
measure
value: 56.51721502758904
type: Clustering
dataset:
type: mteb/reddit-clustering-p2p
name: MTEB RedditClusteringP2P
config: default
split: test
revision: 282350215ef01743dc01b456c7f5241fa8937f16
metrics:
value: 61.054808572333016
type: Retrieval
dataset:
type: scidocs
name: MTEB SCIDOCS
config: default
split: test
revision: None
metrics:
value: 4.578
value: 11.036999999999999
value: 12.879999999999999
value: 13.150999999999998
value: 8.133
value: 9.559
value: 22.6
value: 32.68
value: 33.789
value: 33.854
value: 29.7
value: 31.480000000000004
value: 22.6
value: 18.616
value: 25.883
value: 30.944
value: 18.136
value: 15.625
value: 22.6
value: 9.48
value: 1.991
value: 0.321
value: 16.8
value: 13.54
value: 4.578
value: 19.213
value: 40.397
value: 65.2
value: 10.208
value: 13.718
type: STS
dataset:
type: mteb/sickr-sts
name: MTEB SICK-R
config: default
split: test
revision: a6ea5a8cab320b040a23452cc28066d9beae2cee
metrics:
value: 83.44288351714071
value: 79.37995604564952
value: 81.1078874670718
value: 79.37995905980499
value: 81.03697527288986
value: 79.33490235296236
type: STS
dataset:
type: mteb/sts12-sts
name: MTEB STS12
config: default
split: test
revision: a0d554a64d88156834ff5ae9920b964011b16384
metrics:
value: 84.95557650436523
value: 78.5190672399868
value: 81.58064025904707
value: 78.5190672399868
value: 81.52857930619889
value: 78.50421361308034
type: STS
dataset:
type: mteb/sts13-sts
name: MTEB STS13
config: default
split: test
revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca
metrics:
value: 84.79128416228737
value: 86.05402451477147
value: 85.46280267054289
value: 86.05402451477147
value: 85.46278563858236
value: 86.08079590861004
type: STS
dataset:
type: mteb/sts14-sts
name: MTEB STS14
config: default
split: test
revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375
metrics:
value: 83.20623089568763
value: 81.53786907061009
value: 82.82272250091494
value: 81.53786907061009
value: 82.78850494027013
value: 81.5135618083407
type: STS
dataset:
type: mteb/sts15-sts
name: MTEB STS15
config: default
split: test
revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3
metrics:
value: 85.46366618397936
value: 86.96566013336908
value: 86.62651697548931
value: 86.96565526364454
value: 86.58812160258009
value: 86.9336484321288
type: STS
dataset:
type: mteb/sts16-sts
name: MTEB STS16
config: default
split: test
revision: 4d8694f8f0e0100860b497b999b3dbed754a0513
metrics:
value: 82.51858358641559
value: 84.7652527954999
value: 84.23914783766861
value: 84.7652527954999
value: 84.22749648503171
value: 84.74527996746386
type: STS
dataset:
type: mteb/sts17-crosslingual-sts
name: MTEB STS17 (en-en)
config: en-en
split: test
revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d
metrics:
value: 87.28026563313065
value: 87.46928143824915
value: 88.30558762000372
value: 87.46928143824915
value: 88.10513330809331
value: 87.21069787834173
type: STS
dataset:
type: mteb/sts22-crosslingual-sts
name: MTEB STS22 (en)
config: en
split: test
revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80
metrics:
value: 62.376497134587375
value: 65.0159550112516
value: 65.64572120879598
value: 65.0159550112516
value: 65.88143604989976
value: 65.17547297222434
type: STS
dataset:
type: mteb/stsbenchmark-sts
name: MTEB STSBenchmark
config: default
split: test
revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831
metrics:
value: 84.22876368947644
value: 85.46935577445318
value: 85.32830231392005
value: 85.46935577445318
value: 85.30353211758495
value: 85.42821085956945
type: Reranking
dataset:
type: mteb/scidocs-reranking
name: MTEB SciDocsRR
config: default
split: test
revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab
metrics:
value: 80.60986667767133
value: 94.29432314236236
type: Retrieval
dataset:
type: scifact
name: MTEB SciFact
config: default
split: test
revision: None
metrics:
value: 54.528
value: 65.187
value: 65.62599999999999
value: 65.657
value: 62.352
value: 64.025
value: 57.333
value: 66.577
value: 66.88
value: 66.908
value: 64.556
value: 65.739
value: 57.333
value: 70.275
value: 72.136
value: 72.963
value: 65.414
value: 67.831
value: 57.333
value: 9.5
value: 1.057
value: 0.11199999999999999
value: 25.778000000000002
value: 17.2
value: 54.528
value: 84.356
value: 92.833
value: 99.333
value: 71.283
value: 77.14999999999999
type: PairClassification
dataset:
type: mteb/sprintduplicatequestions-pairclassification
name: MTEB SprintDuplicateQuestions
config: default
split: test
revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46
metrics:
value: 99.74158415841585
value: 92.90048959850317
value: 86.35650810245687
value: 90.4709748083242
value: 82.6
value: 99.74158415841585
value: 92.90048959850317
value: 86.35650810245687
value: 90.4709748083242
value: 82.6
value: 99.74158415841585
value: 92.90048959850317
value: 86.35650810245687
value: 90.4709748083242
value: 82.6
value: 99.74158415841585
value: 92.87344692947894
value: 86.38497652582159
value: 90.29443838604145
value: 82.8
value: 99.74158415841585
value: 92.90048959850317
value: 86.38497652582159
type: Clustering
dataset:
type: mteb/stackexchange-clustering
name: MTEB StackExchangeClustering
config: default
split: test
revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259
metrics:
value: 63.191648770424216
type: Clustering
dataset:
type: mteb/stackexchange-clustering-p2p
name: MTEB StackExchangeClusteringP2P
config: default
split: test
revision: 815ca46b2622cec33ccafc3735d572c266efdb44
metrics:
value: 34.02944668730218
type: Reranking
dataset:
type: mteb/stackoverflowdupquestions-reranking
name: MTEB StackOverflowDupQuestions
config: default
split: test
revision: e185fbe320c72810689fc5848eb6114e1ef5ec69
metrics:
value: 50.466386167525265
value: 51.19071492233257
type: Summarization
dataset:
type: mteb/summeval
name: MTEB SummEval
config: default
split: test
revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c
metrics:
value: 30.198022505886435
value: 30.40170257939193
value: 30.198015316402614
value: 30.40170257939193
type: Retrieval
dataset:
type: trec-covid
name: MTEB TRECCOVID
config: default
split: test
revision: None
metrics:
value: 0.242
value: 2.17
value: 12.221
value: 28.63
value: 0.728
value: 1.185
value: 94
value: 97
value: 97
value: 97
value: 97
value: 97
value: 89
value: 82.30499999999999
value: 61.839999999999996
value: 53.381
value: 88.877
value: 86.05199999999999
value: 94
value: 87
value: 63.38
value: 23.498
value: 94
value: 92
value: 0.242
value: 2.302
value: 14.979000000000001
value: 49.638
value: 0.753
value: 1.226
type: Retrieval
dataset:
type: webis-touche2020
name: MTEB Touche2020
config: default
split: test
revision: None
metrics:
value: 3.006
value: 11.805
value: 18.146
value: 19.788
value: 5.914
value: 8.801
value: 40.816
value: 56.36600000000001
value: 56.721999999999994
value: 56.721999999999994
value: 52.041000000000004
value: 54.796
value: 37.755
value: 29.863
value: 39.571
value: 51.385999999999996
value: 32.578
value: 32.351
value: 40.816
value: 26.531
value: 7.796
value: 1.555
value: 32.653
value: 33.061
value: 3.006
value: 18.738
value: 48.058
value: 83.41300000000001
value: 7.166
value: 12.102
type: Classification
dataset:
type: mteb/toxicconversations
50k
name: MTEB ToxicConversationsClassification
config: default
split: test
revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c
metrics:
value: 71.4178
value: 14.648781342150446
value: 55.07299194946378
type: Classification
dataset:
type: mteb/tweetsentiment
extraction
name: MTEB TweetSentimentExtractionClassification
config: default
split: test
revision: d604517c81ca91fe16a244d1248fc021f9ecee7a
metrics:
value: 60.919637804187886
value: 61.24122013967399
type: Clustering
dataset:
type: mteb/twentynewsgroups-clustering
name: MTEB TwentyNewsgroupsClustering
config: default
split: test
revision: 6125ec4e24fa026cec8a478383ee943acfbd5449
metrics:
measure
value: 49.207896583685695
type: PairClassification
dataset:
type: mteb/twittersemeval2015-pairclassification
name: MTEB TwitterSemEval2015
config: default
split: test
revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1
metrics:
value: 86.23114978840078
value: 74.26624727825818
value: 68.72377190817083
value: 64.56400742115028
value: 73.45646437994723
value: 86.23114978840078
value: 74.26624032659652
value: 68.72377190817083
value: 64.56400742115028
value: 73.45646437994723
value: 86.23114978840078
value: 74.26624714480556
value: 68.72377190817083
value: 64.56400742115028
value: 73.45646437994723
value: 86.16558383501221
value: 74.2091943976357
value: 68.64221520524654
value: 63.59135913591359
value: 74.5646437994723
value: 86.23114978840078
value: 74.26624727825818
value: 68.72377190817083
type: PairClassification
dataset:
type: mteb/twitterurlcorpus-pairclassification
name: MTEB TwitterURLCorpus
config: default
split: test
revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf
metrics:
value: 89.3681841114604
value: 86.65166387498546
value: 79.02581944698774
value: 75.35796605434099
value: 83.06898675700647
value: 89.3681841114604
value: 86.65166019802056
value: 79.02581944698774
value: 75.35796605434099
value: 83.06898675700647
value: 89.3681841114604
value: 86.65166462876266
value: 79.02581944698774
value: 75.35796605434099
value: 83.06898675700647
value: 89.36624364497226
value: 86.65076471274106
value: 79.07408783532733
value: 76.41102972856527
value: 81.92947336002464
value: 89.3681841114604
value: 86.65166462876266
value: 79.07408783532733
license: apache-2.0
language:
---
nomic-embed-text-v1.5: Resizable Production Embeddings with Matryoshka Representation Learning
Blog | Technical Report | AWS SageMaker | Nomic Platform
Exciting Update!: nomic-embed-text-v1.5 is now multimodal! nomic-embed-vision-v1.5 is aligned to the embedding space of nomic-embed-text-v1.5, meaning any text embedding is multimodal!
Usage
Important: the text prompt must include a task instruction prefix, instructing the model which task is being performed.
For example, if you are implementing a RAG application, you embed your documents as search_document: <text here> and embed your user queries as search_query: <text here>.
Notice: From transformers v5.5.0 and sentence transformers v5.3.0, trust_remote_code=True will no longer be necessary. This will only be possible with the text-only series as of now.
Task instruction prefixes
search_document
Purpose: embed texts as documents from a dataset
This prefix is used for embedding texts as documents, for example as documents for a RAG index.
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("nomic-ai/nomic-embed-text-v1.5")
sentences = ['search_document: TSNE is a dimensionality reduction algorithm created by Laurens van Der Maaten']
embeddings = model.encode(sentences)
print(embeddings)
search_query
Purpose: embed texts as questions to answer
This prefix is used for embedding texts as questions that documents from a dataset could resolve, for example as queries to be answered by a RAG application.
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("nomic-ai/nomic-embed-text-v1.5")
sentences = ['search_query: Who is Laurens van Der Maaten?']
embeddings = model.encode(sentences)
print(embeddings)
clustering
Purpose: embed texts to group them into clusters
This prefix is used for embedding texts in order to group them into clusters, discover common topics, or remove semantic duplicates.
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("nomic-ai/nomic-embed-text-v1.5")
sentences = ['clustering: the quick brown fox']
embeddings = model.encode(sentences)
print(embeddings)
classification
Purpose: embed texts to classify them
This prefix is used for embedding texts into vectors that will be used as features for a classification model
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("nomic-ai/nomic-embed-text-v1.5")
sentences = ['classification: the quick brown fox']
embeddings = model.encode(sentences)
print(embeddings)
Sentence Transformers
import torch.nn.functional as F
from sentence_transformers import SentenceTransformer
matryoshka_dim = 512
model = SentenceTransformer("nomic-ai/nomic-embed-text-v1.5")
sentences = ['search_query: What is TSNE?', 'search_query: Who is Laurens van der Maaten?']
embeddings = model.encode(sentences, convert_to_tensor=True)
embeddings = F.layer_norm(embeddings, normalized_shape=(embeddings.shape[1],))
embeddings = embeddings[:, :matryoshka_dim]
embeddings = F.normalize(embeddings, p=2, dim=1)
print(embeddings)
Transformers
import torch
import torch.nn.functional as F
from transformers import AutoTokenizer, AutoModel
def mean_pooling(model_output, attention_mask):
token_embeddings = model_output[0]
input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
sentences = ['search_query: What is TSNE?', 'search_query: Who is Laurens van der Maaten?']
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
model = AutoModel.from_pretrained('nomic-ai/nomic-embed-text-v1.5')
model.eval()
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
+ matryoshka_dim = 512
with torch.no_grad():
model_output = model(**encoded_input)
embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
+ embeddings = F.layer_norm(embeddings, normalized_shape=(embeddings.shape[1],))
+ embeddings = embeddings[:, :matryoshka_dim]
embeddings = F.normalize(embeddings, p=2, dim=1)
print(embeddings)
The model natively supports scaling of the sequence length past 2048 tokens. To do so,
- tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
+ tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased', model_max_length=8192)
- model = AutoModel.from_pretrained('nomic-ai/nomic-embed-text-v1.5')
+ rope_parameters = {"rope_theta": 1000.0, "rope_type": "dynamic", "factor": 2.0}
+ model = AutoModel.from_pretrained('nomic-ai/nomic-embed-text-v1.5', rope_parameters=rope_parameters)
Transformers.js
import { pipeline, layer_norm } from '@huggingface/transformers';
// Create a feature extraction pipeline
const extractor = await pipeline('feature-extraction', 'nomic-ai/nomic-embed-text-v1.5');
// Define sentences
const texts = ['search_query: What is TSNE?', 'search_query: Who is Laurens van der Maaten?'];
// Compute sentence embeddings
let embeddings = await extractor(texts, { pooling: 'mean' });
console.log(embeddings); // Tensor of shape [2, 768]
const matryoshka_dim = 512;
embeddings = layer_norm(embeddings, [embeddings.dims[1]])
.slice(null, [0, matryoshka_dim])
.normalize(2, -1);
console.log(embeddings.tolist());
Nomic API
The easiest way to use Nomic Embed is through the Nomic Embedding API.
Generating embeddings with the nomic Python client is as easy as
from nomic import embed
output = embed.text(
texts=['Nomic Embedding API', '#keepAIOpen'],
model='nomic-embed-text-v1.5',
task_type='search_document',
dimensionality=256,
)
print(output)
For more information, see the API reference
Infinity
Usage with Infinity.
docker run --gpus all -v $PWD/data:/app/.cache -e HF_TOKEN=$HF_TOKEN -p "7997":"7997" \
michaelf34/infinity:0.0.70 \
v2 --model-id nomic-ai/nomic-embed-text-v1.5 --revision "main" --dtype float16 --batch-size 8 --engine torch --port 7997 --no-bettertransformer
Adjusting Dimensionality
nomic-embed-text-v1.5 is an improvement upon Nomic Embed that utilizes Matryoshka Representation Learning which gives developers the flexibility to trade off the embedding size for a negligible reduction in performance.
| Name | SeqLen | Dimension | MTEB |
| nomic-embed-text-v1 | 8192 | 768 | 62.39 |
| nomic-embed-text-v1.5 | 8192 | 768 | 62.28 |
| nomic-embed-text-v1.5 | 8192 | 512 | 61.96 |
| nomic-embed-text-v1.5 | 8192 | 256 | 61.04 |
| nomic-embed-text-v1.5 | 8192 | 128 | 59.34 |
| nomic-embed-text-v1.5 | 8192 | 64 | 56.10 |

Training
Click the Nomic Atlas map below to visualize a 5M sample of our contrastive pretraining data!

We train our embedder using a multi-stage training pipeline. Starting from a long-context BERT model,
the first unsupervised contrastive stage trains on a dataset generated from weakly related text pairs, such as question-answer pairs from forums like StackExchange and Quora, title-body pairs from Amazon reviews, and summarizations from news articles.
In the second finetuning stage, higher quality labeled datasets such as search queries and answers from web searches are leveraged. Data curation and hard-example mining is crucial in this stage.
For more details, see the Nomic Embed Technical Report and corresponding blog post.
Training data to train the models is released in its entirety. For more details, see the contrastors repository
Join the Nomic Community
Citation
If you find the model, dataset, or training code useful, please cite our work
@misc{nussbaum2024nomic,
title={Nomic Embed: Training a Reproducible Long Context Text Embedder},
author={Zach Nussbaum and John X. Morris and Brandon Duderstadt and Andriy Mulyar},
year={2024},
eprint={2402.01613},
archivePrefix={arXiv},
primaryClass={cs.CL}
}