Concerns Regarding Dataset Labeling in Data Challenge

Hello @team ,

Following the analysis conducted on the images available in the training dataset provided for the challenge, I have identified inconsistencies with the starter notebook.

In the starter notebook accessible on Google Colab, it is mentioned that “Training data: 87 images with segmentation mask labels (42 Liptinite, 20 Vitrinite, 15 Inertinite)”. However, the sum of the different subsets does not add up to 87.

In light of this discrepancy, I examined the images and their associated labels myself, and I came across the following result (42 Liptinite, 27 Vitrinite, 22 Inertinite). As you can see, the total still does not match the number of images available in the dataset (found 91, training data 87). This is simply because there are images containing minerals of different types (tya5k0.JPG, tpb83i.JPG, grqhu2.JPG, hsa12q.JPG).

The way the challenge was presented led me to believe that an image could only contain one type of mineral at a time. Could you please confirm that there are no errors in the labeling provided for the challenge and indeed it is possible to encounter images with multiple types of minerals?

Looking forward to your clarification on this matter.

Best regards,

Baptiste from RosIA Team

Hi @baptiste.u. Thanks for the question, as well as finding this error. We apologize for the confusion and will make updates to the challenge description and starter notebook. 42 Liptinite, 27 Vitrinite, 22 Inertinite is the correct distribution for labels. There are however 87 images, so there can be some images with multiple classes. While not typical, this can happen in the real world. For example, take a look at the last example image in the challenge description. This image shows an inertinite maceral (the white blob) in a vitrinite maceral matrix (grey matrix). This is an example of an image with two classes.

Happy coding!

Onward Team

1 Like