Some questions about the competition pipeline

salen · February 5, 2024, 4:21am

Dear competition organizers.

After doing some cursory digging and pre-experimenting with the competition data, I have the following queries that I would like to get further clarifications on:

some of the training data seems to be corrupted, and when I read them using np.load, the following error message appears:

ValueError: cannot reshape array of size 108494832 into shape (300,300,1259)

The following are the corrupted volumes that have been found:

seismicCubes_RFC_fullstack_2023.66546347.npy
seismicCubes_RFC_fullstack_2023.75513518.npy
seismicCubes_RFC_fullstack_2023.75532849.npy
seismicCubes_RFC_fullstack_2023.77702532.npy

by my naked eye, it seems that the test sets provided so far are all generated data, so what is the point of providing 50 real data? Is it possible that the final hidden test session will be evaluated on a larger real data set?
is the evaluation script used in the online leaderboard and final evaluation the same as the SSIM script in utils.py in the official start notebook? Because I noticed that SSIM in utils.py uses 255 as the value range, but the data range for this contest is very wide and still has a large distribution in thousands of values. a different value range setting in SSIM would result in truncated data, and therefore result in potentially very different evaluation scores.

discourse-admin · February 6, 2024, 5:52pm

Hi @salen Welcome to the Onward Challenges community and thanks for the questions.

For the issue with your corrupted volumes, please try re-downloading those volumes as there was likely an issue during your download process that corrupted the data. The volumes are not corrupted on our side. The 496 out of the 500 volumes are still a great place to start training with plenty of diversity in the data.
The 50 real datasets will help with training your model if you are targeting the honorable mention award. There will be two $1,500 honorable mentions for valid submissions that score highly with the real seismic data samples. You can use the 50 real volumes to evaluate your model for that task.
The evaluation formula (SSIM) is exactly same as in the original starter notebook’s utils.py. Some training volumes are 0-255 range, others are not. Predictions should be rescaled in the 0-255 range.

Happy training and good luck!

Onward Team

daisuke0530 · February 16, 2024, 12:32pm

I’ve confirmed that these 4 files are broken on my side too.

discourse-admin · February 16, 2024, 7:25pm

Hi @daisuke0530 @salen. We are working to get uncorrupted versions of these volumes. We will share a link here when they are ready.

Thanks,

Onward Team

discourse-admin · February 19, 2024, 3:39pm

Hi @salen @daisuke0530. Here is a link to the 4 volumes you were having trouble loading. We’ve confirmed that these versions of the volumes are not corrupt.

https://xeek-public-287031953319-eb80.s3.amazonaws.com/rigel-challenge-data/corrupt_training_data_redeploy.zip

Happy coding!

Onward Team