Just sharing. I tried many things and now get LB 0.96 from a single model all the time. Ensembling does not work!!! Mixing two or more models, each with 0.96, does not make it better! It’s very confusing!
A theoretical discussion: what does it mean if different encoders get the same result?