How many classes in label

Khariwa · December 11, 2023, 7:09am

Hi,
anyone has checked the label? how many classes are they? loading all the dataset was problematic for my machine.

thank you

discourse-admin · December 11, 2023, 12:39pm

Hi @Khariwa. The dataset will have varying numbers of classes depending on the features that are in the particular volume. You should generally expect between 5 and 25 classes in each volume.

Also, when you call numpy.load() ensure that allow_pickle=True. If this argument is not included, attempts to load the volumes will result in an error.

Thanks!

Onward Team

Khariwa · December 13, 2023, 7:04am

Hi,
I would like to ask if the value of label is matter? or can I assign different values of label as long the layer segmentation is correct from position point of view? for example, the first image is ground truth and second image is my prediction. The segmentation part from prediction is perfectly resemble actual segmentation but the class assignment was different. will it get the perfect score?

discourse-admin · December 13, 2023, 1:11pm

Hi @Khariwa. Thanks for the question. You are free to assign different values for labels as long as the layer segmentation is correct. The scoring algorithm takes this into account when scoring.

Keep up the good work!

Onward Team

Khariwa · December 14, 2023, 3:22pm

Thank you for the information. Would you mind sharing the code for the scoring? I wonder if it is same with dice score in other competition or modified.

Thank you
Regards,

Ramdhan

discourse-admin · December 18, 2023, 12:42pm

Hi @Khariwa. We’d be happy to share the code. We are calculating a dice score.

def get_dice(ground_truth_mask, prediction_mask):
    intersect = np.sum(pred_mask * gt_mask)
    total_sum = np.sum(pred_mask) + np.sum(gt_mask)
    dice = 2 * intersect / total_sum
    return dice

Harshit_S · December 24, 2023, 7:21pm

Just to clarify, the dice function that you have provided does not score for the segmentation if the class is wrong, like in the image below, I predicted the same segmentation region but on a different class channel and got the score 0.

Does this get accounted for in the preprocessing of the scoring function? If yes, then what if my predictions are shifted by 1 channel by model’s fault, will I get 0 score?

Besides this, is the public/prediction leaderboard score hidden? asking because everyone is on the same sample notebook score

discourse-admin · December 26, 2023, 8:03pm

Hi @Harshit_S thanks for following up on this. We took a look at the code you provided along with the dice function that we provided. When we run the following:

import numpy as np

def get_dice(ground_truth_mask, prediction_mask):
    intersect = np.sum(pred_mask * gt_mask)
    total_sum = np.sum(pred_mask) + np.sum(gt_mask)
    dice = 2 * intersect / total_sum
    return dice

gt_mask = np.zeros((256,256,2))
gt_mask[:100, :100, 0] = 1
pred_mask = np.zeros((256,256,2))
pred_mask[:100, :100, 0] = 1
get_dice(gt_mask, pred_mask)

The output is a value of 1.0, and if we change the intersection to:

gt_mask = np.zeros((256,256,1))
gt_mask[:100, :100, 0] = 1
pred_mask = np.zeros((256,256,1))
pred_mask[50:150, 50:150, 0] = 1
get_dice(gt_mask, pred_mask)

The output is 0.25 as we would expect. Can you try this again in a fresh notebook to see if you can reproduce our results?

The predictive leaderboard is currently active, and as of 26 December 2023 the top score is 0.32.

Keep up the great work!

Onward Team

Ig2c8a · December 27, 2023, 8:58pm

Hi @discourse-admin

IMO @Harshit_S 's question was about giving a layer a different label because you mentioned that “You are free to assign different values for labels as long as the layer segmentation is correct.”

So, he used
gt_mask[:100, :100, 0] = 1
and
pred_mask[:100, :100, 1] = 1

While you are talking about
gt_mask[:100, :100, 0] = 1
and
pred_mask[50:150, 50:150, 0] = 1

which is not the same.

discourse-admin · December 28, 2023, 4:26pm

We are sorry for misunderstanding around the previous question.

The get_dice() function is not intended to be used in the way you shared above. This function constitues an integral part of larger scoring algorithm. The full evaluation pipeline takes into account the possible mismatch between ‘ground truth - prediction’ label values and thus contains several preprocessing steps to address the problem.

Although we provided an overview of the scoring pipeline in the Evaluation part of challenge description, to prevent further ambiguity around it, here we will describe the scoring logic in more details on a simple example.

As an example, let’s take two 3-D arrays (Ground Truth and Prediction volumes) of the same shape: 2x10x8.

Various ML/DL models may follow different logic of labelling during segmentation process. Let’s take a look at a few slices of example data to identify possible corner cases:

Slice 1 (axis=0)

Slice 2 (axis=0)

As we see, the segmentation masks are quite similar to ground truth ones. At the same time, the segmentation model assigned labels different from ground truth. On this example we also see, that the bottom layer in predictions has different labels assigned on different slices (it can be a case if a model predicts 2-D data).

Step 1: Labels mapping

In order to address these problems, the scoring pipeline goes through each pair of ground truth and prediction volume slices, maps the corresponding labels by computing the dice score for each pair of labels and selects pairs with the highest scores. This is where the above mentioned get_dice() function is being used.

Coming back to the example data, to map the label pairs for the first slice shown above, the scoring algorithm creates the following dice matrix:

Dice matrix before mapping the 1st pair of labels

To extract the first pair of matched labels, the combination with the highest dice score is taken (GT segment ‘3’ is mapped to Predicted segment '7’ with the dice score of 1). After that, the mapped labels are removed from the matrix to avoid mapping the same segment twice.

Dice matrix before mapping the 2nd pair of labels

Once the matrix is updated, the algorithm looks for the next highest dice score and gets the combination (GT segment ‘1’ is mapped to Predicted segment ‘1’ with the dice score of 0.89).

The algorithm goes on until all ground truth segments (labels) are mapped to their predicted equivalents for this slice.

At the end of this step, we have the following mapping:

Labels mapping (Slice 1)

Step 2: Intermediate metrics caching

Since seismic layers (horizons) are three-dimensional objects, we use a 3-dimensional evaluation of each layer. For this, we need to cache some intermediate metrics for each layer, which will later be needed to calculate the 3D dice score.

Using the mapping results from the previous step, we compute and cache the intermediate metrics for each ground truth layer on the current slice. At the end of this step, we have the following metrics stored:

Intermediate metrics after the 1st slice

After that, the algorithm is done with the 1st slice and keeps doing the same steps for the rest of the slices. Let’s take a look at some key stages of the 2nd slice processing.

Dice matrix for Slice 2

Labels mapping (Slice 2)

Intermediate metrics after Slice 2

As you can see, the intermediate metrics are being updated after each slice to collect all necessary information about how well each layer was segmented. If the algorithm comes across a new GT layer label (e.g. ‘4’) in one of the following slices , it adds it to the intermediate metrics dictionary and starts tracking its metrics in the same way.

Step 3: Calculating the final dice score for each layer in the volume

Let’s say that after the final slice is processed, we have the following final metrics for each layer:

Final table with intermediate metrics for a test volume

We use these metrics to compute the dice score for each layer using the Dice coefficient formula shown above:

Dice scores

Segment 1: 0.889

Segment 2: 0.828

Segment 3: 0.913

Segment 4: 0.857

Step 4: Calculating average dice score for the volume

At the final stage of volume processing, the algorithm calculates average dice score of all segments present in the volume:

Volume Avg Dice Score (VADS): (0.889 + 0.828 + 0.913 + 0.857) / 4 = 0.87

Step 5: Getting a final submission score

The scoring algorithms does the steps 1-4 for all 50 volumes in test dataset. After that, the final submission score is obtained by averaging across all 50 VADSs.

Onward Team

Harshit_S · January 4, 2024, 2:17am

@discourse-admin

Thank you for such a wonderful explanation, what would happen if I am predicting more classes than ground truth? Because, if it selects/maps the closest dice based on ground-truth, that would mean I can make 100s of “Predicted Segment” and it will only account for the prediction that best fits each “Ground Truth Segment” and ignore the rest of my predictions. That would be a flaw in the metric?

discourse-admin · January 19, 2024, 9:05pm

Hey @Harshit_S. Your assumptions about how the scoring algorithm would react to “over segmentation“ are correct. Let us explain the idea.

Say, there are 5 ground truth segments on a slice. Some model (not very robust one) may predict 100 different segments for the slice, instead of just 5. In this case, the scoring algorithm will apply the mapping logic that we explained above, to choose top 5 predicted segments (out of 100) which fit the ground truth segments best. The rest of 95 predicted segments will be ignored.

The main idea behind it, is that every extra (redundantly) predicted segment is a false negative for one or more ground truth segments. As we explained above, false negatives are taken into account when we calculate the dice score for each Ground Truth - Prediction segment pair, and so each extra segment will decrease the corresponding dice scores proportionally to its size (area). Thus, there is no need to impose additional penalty for “over segmentation”.

Onward Team