Training Data Links to Open s3 bucket

discourse-admin · December 20, 2023, 10:18pm

Here are links to an open s3 bucket that contains the Patch the Planet training data. Depending on your machine set up, this may be a faster/better way to get the data for this challenge. Enjoy.

Team Onward

EuB · February 1, 2024, 8:49am

Hello!
It seems that real train data link is broken:

EuB · February 1, 2024, 10:07am

Could you please provide a link to the test data too? Thank you!

discourse-admin · February 1, 2024, 1:34pm

Thanks for alerting us to this issue @EuB. The link should be working now.

Onward Team

Vape_J · February 9, 2024, 3:47pm

Is there a link to the test data

discourse-admin · February 9, 2024, 5:43pm

Hi @Vape_J and @EuB

You can either download the test data set directly from the Path the Planet page under the data tab on the Onward website. Or you can download it directly from the S3 bucket here

Onward Team

Al038f · February 13, 2024, 5:15pm

@discourse-admin

I downloaded the test data from the S3 bucket link you provided; however, it seems like the data is normalized differently than all the training data. Is this by mistake or do you expect our model to pre-process / normalize the data?

This begs the question then if we do do that; do we need to back off our normalization on the predicted missing sections to get the proper score?

discourse-admin · February 14, 2024, 4:06pm

Hi @Al038f welcome to the Onward Community and thanks for the question.

Some volumes are 0-255 range, others are not. Predictions should be rescaled in the 0-255 range.

Happy training!

Onward Team

Vape_J · February 14, 2024, 4:17pm

Predictions should be rescaled in the 0-255 range. - can you explain or provide a function that will convert the seismic to this range?

Just so we know how you need us to scale the data.

discourse-admin · February 14, 2024, 4:49pm

Hi @Vape_J

Sure thing, here is a function that will rescale the seismic volumes

def rescale_volume(seismic):
    """
    Rescaling 3D seismic volumes 0-255 range,
    clip 2% lowest and highest values
    """
    minval = np.percentile(seismic, 2)
    maxval = np.percentile(seismic, 98)
    seismic = np.clip(seismic, minval, maxval)
    seismic = ((seismic - minval) / (maxval - minval)) * 255

    return seismic

Onward Team

Vape_J · February 14, 2024, 6:49pm

Clipping and normalizing is somewhat contextual, which of the following scenario’s best suites your intentions:

Based on the input test volume in its entirety including the hard 0’s.
The entire test volume except the hard zeros.
The prediction volume in it’s entirety
The part we predicted.

dulyanov · February 15, 2024, 8:11am

I agree, that the scaling introduces a lot of confusion. It would be great to avoid Predictions should be rescaled in the 0-255 range, and all other rules of normalization you introduced. The goal should be to complete raw test data (test data right in the way you provided it). If you would like to use some kind of scaling to compute your score function, you should provide a code for such score function. This function still should take unnormalized prediction and unnormalized GT as input and return a score. That would solve all the issues.

dulyanov · February 15, 2024, 12:10pm

The rules page also has the following line

"The minimum and maximum SSI values will be dropped, and the mean SSI score across all predictions will be the final score. "

while the main page does not state that. Please provide code for the metric that is used on the backend to avoid all the questions.

discourse-admin · February 16, 2024, 8:11pm

Hi @Vape_J and @dulyanov

Please make your predictions, then scale them from 0-255 using the code snippet above.

For scoring, as stated in the Overview tab and Rules tab, “Similarity will be calculated for all predictions. The minimum and maximum SSI values will be dropped, and the mean SSI score across all predictions will be the final score”. If you want to implement locally you can calculate the SSI for all your predictions and remove the max and min values from the list of SSI scores.

Onward Team

dulyanov · February 16, 2024, 8:27pm

We might have some misunderstanding here, are you providing this code just for reference, or this code is actually used on the backend when scoring the submission? Can you confirm that the predictions that we submit are not transformed on your end?

discourse-admin · February 16, 2024, 8:31pm

@dulyanov The data is not being transformed on the backend during live scoring.

dulyanov · February 16, 2024, 8:36pm

Ok, now it’s clear, thanks!

dulyanov · February 18, 2024, 9:21am

And do you use the snippet above to create test data? Basically, train data is different from the test data in terms of range, is it because in the train we have raw data and in the test set the snippet above has been applied to raw data?

UPD: Ok, I see it now in training_data_generator function in provided notebook.

Finally, can we assume all volumes in the private test set will be of shape 300x300x1259?