Clarifications Regarding the holdout dataset

Gabriel_G · February 23, 2025, 2:00am

Hi,

Thank you for presenting this challenge! I have a few questions regarding the holdout dataset and the one hour time limit for algorithm evaluation:

Time Limit: Could you provide the dataset size? This would help me estimate the time required for the model to run more accurately.
Time Measurement: Will the time measurements focus solely on the model’s inference time, or will they also include the data loading time into RAM and preprocessing steps? What happens if I use some precomputed/loaded data from the training process?
Data Scope: Can you confirm whether the time of the evaluation data is limited to at most the last sample from the provided storm dataset?

I appreciate your help in clarifying these points.

Best regards,

Gabriel Gama

discourse-admin · February 24, 2025, 8:55pm

Hi @Gabriel_G, good to see you again!

Since this is an unstructured challenge, it will be up to you in how much data you want to split into the train, test, and hold out sets. That being said, the time limit is for validating your model on the subset of data that you select for your hold out dataset. If you have a model that you want to load to save time that’s a great way to get around the inference time limitation. We strongly recommend benchmarking on your own compute to see how long it might take your model to run.

For the data scope, if you want to add in additional more recent data with a tool like RTDIP that’s also fair, but make sure you include your code that you used to compile and organize the additional data.

We hope this helps you get started on the challenge.

ThinkOnward Team