Keep getting incorrect format

I have checked very carefully and wrote a function to compare the result file I submitted and the submission sample. Clearly, they are the same in terms of structure and size, but I still get an incorrect format error when submitting. Has anyone encountered a similar issue or can share more with me?

This is my function to check my submit with submission sample

def compare_tensor_datasets_by_size(data1, data2):
    if len(data1) != len(data2):
        return False, "Different number of samples"
    for i in range(len(data1)):
        sample1 = data1[i]
        sample2 = data2[i]
        for j, (tensor1, tensor2) in enumerate(zip(sample1, sample2)):
            if tensor1.size() != tensor2.size():
                return False, f"Different tensor sizes at sample {i}, tensor {j}"
    return True, ""

result, message = compare_tensor_datasets_by_size(data_check, data)
if result:
    print("The datasets have the same tensor sizes.")
else:
    print(f"The datasets have different tensor sizes: {message}")```

Hi :wave: and thanks for your question on the submission. Since the shape of your results matches the submission file you should be good at that level. Can you double check that your TensorDataset is a float32 dtype which is the datatype of the sample submission. For example:

assert data1.tensors[0].dtype == torch.float32
assert data2.tensors[0].dtype == torch.float32

should both pass the assertion. Make sure it is not a torch.float64 or any other data types.

Hi @discourse-admin ,

thank you very much for organizing this challenge !
I have a few questions that I hope you can help me with, to avoid infringing any rules:

1) would it be possible to have a reference equation for calculating the diversity metrics ?
2) would it be okay to have a model generate 100/200 samples and select in a ‘post-processing’ stage the 30 that would maximize the diversity metrics ?
3) would it be possible to encode, inside the NN, the information on the properties that x0/x1 must have (for example by fitting the network to just predict slopes and biases and using a final, non-trainable layer, to transform those 4 coefficients into 2 straight lines of 50 samples) ?
4) would it be possible to use the y0/y1 analytical functions inside the neural network, or they should be treated as observations without available analytical formula ?
5) can we build an algorithm based on more than 1 NN or other ML strategies (also in recursive ways), or should we stick to only a final NN that takes some input ?
6) can we generate additional samples ?

Thank you in advance !
Best regards,
Leonardo

Hi @discourse-admin , I am having the same issue.

Modifying @nghiaphamtrung2709 's code

def compare_tensor_datasets_by_size(data1, data2):

    if len(data1) != len(data2):
        return False, "Different number of samples"
    for i in range(len(data1)):
        sample1 = data1[i]
        sample2 = data2[i]

        assert sample1.dtype == torch.float32
        assert sample2.dtype == torch.float32
    
        for j, (tensor1, tensor2) in enumerate(zip(sample1, sample2)):
            if tensor1.size() != tensor2.size():
                return False, f"Different tensor sizes at sample {i}, tensor {j}"
    return True, ""

result, message = compare_tensor_datasets_by_size(x_outputs, y_outputs)
if result:
    print("The datasets have the same tensor sizes.")
else:
    print(f"The datasets have different tensor sizes: {message}")

The assertion was passed. Also, it returns “The datasets have the same tensor sizes.”, meaning the structure and size of x_outputs and y_outputs are same. But I still get an incorrect format error when I try to submit. Any fixture for this?

HI Guys
Thanks for the diagnostic. I ran it on my own program and it passed.

Clearly this is tricky. Look at all those on the leaderboard with scores of .27. Like the others I got this by submitting the submission sample as an experiment.
cheers, Eric

Hi @leonardo.pulga
Thank you for your questions, please find answers below:

  1. Would it be possible to have a reference equation for calculating the diversity metrics?
    No, it is a part of the scoring algorithm, and usually we keep it confidential. The main idea behind diversity metrics is to ensure that your X0/X1 lines comprehensively cover the potential area with optimal density. I hope it is helpful.

  2. Would it be okay to have a model generate 100/200 samples and select in a ‘post-processing’ stage the 30 that would maximize the diversity metrics?
    Unfortunately, the use of any form of post-processing is not permissible in this context.

  3. Would it be possible to encode, inside the NN, the information on the properties that x0/x1 must have (for example by fitting the network to just predict slopes and biases and using a final, non-trainable layer, to transform those 4 coefficients into 2 straight lines of 50 samples)?
    It would be feasible to explore that approach.

  4. Would it be possible to use the y0/y1 analytical functions inside the neural network, or they should be treated as observations without available analytical formula?
    Indeed, it would be most appropriate to treat y0/y1 as observations, without relying on an available analytical formula.

  5. Can we build an algorithm based on more than 1 NN or other ML strategies (also in recursive ways), or should we stick to only a final NN that takes some input?
    You are certainly encouraged to employ an ensemble of techniques.

  6. Can we generate additional samples?
    You could generate additional samples for gaining insights. However, for the final evaluation, you should provide a reproducible solution that is based on the provided datasets.

Best regards,
Onward Team

Thank you very much, it was very helpful !

Hi @atolagbejoshua2

Maybe that topic will be helpful to resolve incorrect format issue: Submission format checking

Kind regards,
Onward Team

Let me ask a question about what is considered as post-processing. Let’s think Image Classification for example. The CNN outputs 1000 logit values and then take arg max to get predicted class. Does this arg max operation is considered as post-processing?

@discourse-admin

Another question about “Goodness of fit criteria”.

Goodness of fit criteria - if a submission doesn’t meet the sign, Pearson Coefficient, and RMSE thresholds, it is given a score of 0.

Does it mean that if even one sample has an RMSE greater than 0.1, the overall score becomes 0?
Or does it mean that only the score of that sample becomes 0, and the overall score is averaged?

Hi @daisuke0530 for the image classification example the argmax would not be considered post processing as it is getting the highest value logit to select the predicted class from the potential classes. As long as you can replicate the process of training the network and selecting the correct class that should be fine.

Happy solving!

Onward Team

Hi @daisuke0530 for the RMSE Goodness of Fit Criteria, the RMSE is calculated over all samples. If the RMSE for all predictions is over 0.1 then the score for the submission is 0.

Happy solving!

Onward Team

@discourse-admin

Ah, that makes sense! Thanks for the answer for both questions!

@discourse-admin

How about Pearson Coefficient for x? Different from Goodness of Fit Criteria, I guess this cannot be calculated globally, so is it calcluated per sample and then averaged?

Hi @daisuke0530 thanks for the question, as for the Pearson Coefficient, it is also calculated globally for all predictions against the actual values.

Happy solving!

Onward Team

@discourse-admin, Thank you for your response. However, I’m still confused:

  • The variable ‘x’ consists of 1,500 pairs of two straight lines, amounting to 3,000 lines in total. Each pair has a different slope and intercept. How can I calculate the global Pearson Coefficient for this dataset?
  • What is “the actual values” referred to in your response? I assume there is no ground truth for the ‘x’ that I generated.

Hi @daisuke0530 sorry for the misunderstanding. The scoring for the Pearson Coefficient takes the average value over all pairs of lines, and the RMSE is calculated against the actuals.

Onward Team

1 Like

Hello team, from the above discussion I understood that both RMSE and PC are calculated globally. What about the signs of the slope, in the 1500 samples we generated, if one of the X have wrong slopes (+ve for x1 and -ve for x2), does this mean criteria isn’t holding and live scoring algorithm returns 0?

Regards,
Vishwas

Hello, @chepurivishwas360
The slope criteria works in the following manner: at least 90% of each set of 30 samples of x_generated should have the correct sign. If this condition is not met, then the score will be 0.
Hope, it was helpful.

Best wishes,
Onward Team

1 Like