Training Data Links to Open s3 bucket

Here are links to an open s3 bucket that contains the Patch the Planet training data. Depending on your machine set up, this may be a faster/better way to get the data for this challenge. Enjoy.

Team Onward

Train Data (synthetic)
https://xeek-public-287031953319-eb80.s3.amazonaws.com/patch-the-planet/patch-the-planet-train-data-part1.zip
https://xeek-public-287031953319-eb80.s3.amazonaws.com/patch-the-planet/patch-the-planet-train-data-part2.zip
https://xeek-public-287031953319-eb80.s3.amazonaws.com/patch-the-planet/patch-the-planet-train-data-part3.zip
https://xeek-public-287031953319-eb80.s3.amazonaws.com/patch-the-planet/patch-the-planet-train-data-part4.zip
https://xeek-public-287031953319-eb80.s3.amazonaws.com/patch-the-planet/patch-the-planet-train-data-part5.zip
https://xeek-public-287031953319-eb80.s3.amazonaws.com/patch-the-planet/patch-the-planet-train-data-part6.zip
https://xeek-public-287031953319-eb80.s3.amazonaws.com/patch-the-planet/patch-the-planet-train-data-part7.zip
https://xeek-public-287031953319-eb80.s3.amazonaws.com/patch-the-planet/patch-the-planet-train-data-part8.zip
https://xeek-public-287031953319-eb80.s3.amazonaws.com/patch-the-planet/patch-the-planet-train-data-part9.zip
https://xeek-public-287031953319-eb80.s3.amazonaws.com/patch-the-planet/patch-the-planet-train-data-part10.zip
https://xeek-public-287031953319-eb80.s3.amazonaws.com/patch-the-planet/patch-the-planet-train-data-part11.zip
https://xeek-public-287031953319-eb80.s3.amazonaws.com/patch-the-planet/patch-the-planet-train-data-part12.zip
https://xeek-public-287031953319-eb80.s3.amazonaws.com/patch-the-planet/patch-the-planet-train-data-part13.zip
https://xeek-public-287031953319-eb80.s3.amazonaws.com/patch-the-planet/patch-the-planet-train-data-part14.zip
https://xeek-public-287031953319-eb80.s3.amazonaws.com/patch-the-planet/patch-the-planet-train-data-part15.zip
https://xeek-public-287031953319-eb80.s3.amazonaws.com/patch-the-planet/patch-the-planet-train-data-part16.zip
https://xeek-public-287031953319-eb80.s3.amazonaws.com/patch-the-planet/patch-the-planet-train-data-part17.zip
https://xeek-public-287031953319-eb80.s3.amazonaws.com/patch-the-planet/patch-the-planet-train-data-part18.zip
https://xeek-public-287031953319-eb80.s3.amazonaws.com/patch-the-planet/patch-the-planet-train-data-part19.zip
https://xeek-public-287031953319-eb80.s3.amazonaws.com/patch-the-planet/patch-the-planet-train-data-part20.zip
Train Data Set (real)
https://xeek-public-287031953319-eb80.s3.amazonaws.com/patch-the-planet/patch-the-planet-real-train-data.zip

Hello!
It seems that real train data link is broken:

Could you please provide a link to the test data too? Thank you!

Thanks for alerting us to this issue @estrievich99. The link should be working now.

Onward Team

1 Like

Is there a link to the test data

Hi @blasscoc and @estrievich99

You can either download the test data set directly from the Path the Planet page under the data tab on the Onward website. Or you can download it directly from the S3 bucket here

Onward Team

@team

I downloaded the test data from the S3 bucket link you provided; however, it seems like the data is normalized differently than all the training data. Is this by mistake or do you expect our model to pre-process / normalize the data?

This begs the question then if we do do that; do we need to back off our normalization on the predicted missing sections to get the proper score?

Hi @tasansal welcome to the Onward Community and thanks for the question.

Some volumes are 0-255 range, others are not. Predictions should be rescaled in the 0-255 range.

Happy training!

Onward Team

Predictions should be rescaled in the 0-255 range. - can you explain or provide a function that will convert the seismic to this range?

Just so we know how you need us to scale the data.

Hi @blasscoc

Sure thing, here is a function that will rescale the seismic volumes

def rescale_volume(seismic):
    """
    Rescaling 3D seismic volumes 0-255 range,
    clip 2% lowest and highest values
    """
    minval = np.percentile(seismic, 2)
    maxval = np.percentile(seismic, 98)
    seismic = np.clip(seismic, minval, maxval)
    seismic = ((seismic - minval) / (maxval - minval)) * 255

    return seismic

Onward Team

Clipping and normalizing is somewhat contextual, which of the following scenario’s best suites your intentions:

  1. Based on the input test volume in its entirety including the hard 0’s.
  2. The entire test volume except the hard zeros.
  3. The prediction volume in it’s entirety
  4. The part we predicted.

I agree, that the scaling introduces a lot of confusion. It would be great to avoid Predictions should be rescaled in the 0-255 range, and all other rules of normalization you introduced. The goal should be to complete raw test data (test data right in the way you provided it). If you would like to use some kind of scaling to compute your score function, you should provide a code for such score function. This function still should take unnormalized prediction and unnormalized GT as input and return a score. That would solve all the issues.

The rules page also has the following line

"The minimum and maximum SSI values will be dropped, and the mean SSI score across all predictions will be the final score. "

while the main page does not state that. Please provide code for the metric that is used on the backend to avoid all the questions.

1 Like

Hi @blasscoc and @dmitry.ulyanov.msu

Please make your predictions, then scale them from 0-255 using the code snippet above.

For scoring, as stated in the Overview tab and Rules tab, “Similarity will be calculated for all predictions. The minimum and maximum SSI values will be dropped, and the mean SSI score across all predictions will be the final score”. If you want to implement locally you can calculate the SSI for all your predictions and remove the max and min values from the list of SSI scores.

Onward Team

We might have some misunderstanding here, are you providing this code just for reference, or this code is actually used on the backend when scoring the submission? Can you confirm that the predictions that we submit are not transformed on your end?

@dmitry.ulyanov.msu The data is not being transformed on the backend during live scoring.

Ok, now it’s clear, thanks!

And do you use the snippet above to create test data? Basically, train data is different from the test data in terms of range, is it because in the train we have raw data and in the test set the snippet above has been applied to raw data?

UPD: Ok, I see it now in training_data_generator function in provided notebook.

Finally, can we assume all volumes in the private test set will be of shape 300x300x1259?