Labels noise or just me?

JonasGG · January 8, 2025, 11:09pm

Can someone briefly explain the differences in how the cut-out holes are labeled in different images? I’ve noticed that sometimes they’re labeled white, while other times they’re gray. Similarly, for the cracks or black lines—sometimes they are marked white, and other times they are included as part of the label itself. What’s the reasoning behind these variations?

JonasGG · January 9, 2025, 7:27pm

I would appreciate an explanation regarding this. Is there a way to tag the hosts or someone to get a response?

discourse-admin · January 9, 2025, 8:54pm

Hi @JonasGG and welcome to the ThinkOnward Challenges community

This might be related to different human interpreters labeling the data. One interpreter might not label the core holes (the cut out holes in the core), while a second interpreter might be very interested in having them as a separate class. You might investigate if the gray labeled core plug holes correlate with a certain facies (class) and appear only in tandem with that class.

ThinkOnward Team

JonasGG · January 10, 2025, 11:06am

Thank you for the reply!
I actually followed that approach and initially thought that the core plug holes in class 5 were consistently labeled white. However, I came across an image where the core plug hole was labeled gray in class 5. This level of inconsistency feels far too random, to be honest.

Are the core plug holes labeled as gray or white in the evaluation images? If the same randomness exists in the evaluation set as in the training set, the evaluation scores will be highly inconsistent as well. I don’t see how this would be beneficial for either the competition hosts or us as contestants.

discourse-admin · January 10, 2025, 4:02pm

Hi @JonasGG, thank you for sharing your insights and observations about the labeling of core plug holes in the training dataset. I understand your concern regarding the inconsistency in labeling, and it’s great that you’re thinking critically about this issue.

To address your question: after verifying the labeling conventions for both train, test and evaluation datasets, I can confirm that the distribution of core plug hole labeling (white or gray) is indeed consistent across all three datasets.

However, I’d like to point out that this inconsistency might not significantly impact the evaluation scores of your models. Nevertheless, I acknowledge that it’s essential to be aware of this variation to ensure accurate model performance and fairness in competition.

Regarding your concern about the randomness, here are some approaches you could consider:

Use a robust labeling scheme: Consider using a new robust labeling scheme. This might involve creating custom labels or using a third-party tool to standardize the annotation.
Use a semi-supervised learning approach: If the inconsistencies in labeling are affecting your model’s performance, consider exploring semi-supervised learning methods. This approach can leverage the annotated instances and also take advantage of weakly supervised annotations.
Implement data augmentation techniques: Data augmentation can help reduce the impact of noisy labels and also improve your model’s robustness to label variations.
Use a feature extraction approach: Focus on extracting relevant features from the images that can help your model generalize better to different label variations.

I hope these suggestions help you address your concerns and improve your model’s performance. If you have any further questions or need more assistance, please don’t hesitate to ask

ThinkOnward Team

JonasGG · January 12, 2025, 1:20pm

So not just me. The labelling work in this one is simply horrible. Not sure I want to spend time on a task that the people making it didn’t want to spend time on in the first place.

aeronautics53 · January 12, 2025, 1:25pm

The situation is real bruhv, . It doesnt just correlate; the class assignments.