📣 Challenge Structure 📣


The FH-PS-AOP 2023 grand challenge is divided into two phases (timeline):

1️⃣ Preliminary Test Phase (duration: 4 months):

Registration is open to anyone, the only requirement is that you register as a team (a single participant also counts as a team). Just click the Join button on the upper right part of the main challenge website. The public training dataset is available for download at Zenodo. Teams can participate by submitting their algorithms in form of Docker containers, see the example algorithm docker image. We limit the number of submissions to 1 per week. When submitted, each algorithm is executed on the grand-challenge.org platform and its performance is estimated on the 401 preliminary test cases. Team rankings are updated accordingly on the live public leaderboard. 

2️⃣ Final Test Phase (duration: 1.5 months):

The final test phase will consist of 700 test cases. Teams can use this time to perform some final method development and fine-tuning of hyperparameters. Only a single submission per week will be allowed in this phase. Once this phase is finished, the organizers will rank the teams based on the following protocol:

  • Statistical ranking will be applied for each of the metrics by pairwise comparison of the algorithms using the Wilcoxon signed-rank test, resulting in a significance score and metric-specific rank.

  • The final rank will be obtained by aggregating the ranks over both metrics.

  • Identical ranks will be assigned to algorithms that will show only marginal differences in the performance, and therefore statistically significant differences among the results of different participating teams will be evaluated.

  • In the case the final rank will be equal for multiple participating teams, they will be ordered by the metric-based aggregation according to the mean of all metrics.

🚓 Rules 🚓

We embrace rules similar to the PI-CAI grand challenge rules:

  • All participants must form teams (even if the team is composed of a single participant), and each participant can only be a member of a single team. 

  • Any individual participating with multiple or duplicate Grand Challenge profiles will be disqualified.

  • Anonymous participation is not allowed. To qualify for ranking on the validation/testing leaderboards, true names and affiliations [university, institute or company (if any), country] must be displayed accurately on verified Grand Challenge profiles, for all participants.

  • Members of sponsoring or organizing centers may participate in the challenge, but are not eligible for prizes or the final ranking in the Final Testing Phase.

  • This challenge only supports the submission of fully automated methods in Docker containers. It is not possible to submit semi-automated or interactive methods.

  • All Docker containers submitted to the challenge will be run in an offline setting (i.e. they will not have access to the internet, and cannot download/upload any resources). All necessary resources (e.g. pre-trained weights) must be encapsulated in the submitted containers apriori.

  • Participants competing for prizes may use pre-trained AI models based on computer vision and/or medical imaging datasets (e.g. ImageNet, Medical Segmentation Decathlon). They may also use external datasets to train their AI algorithms. However, such data and/or models must be published under a permissive license (within 3 months of the Preliminary Development Phase deadline) to give all other participants a fair chance at competing on equal footing. They must also clearly state the use of external data in their submission, using the algorithm name [e.g. "FH-PS-AOP Model (trained w/ private data)"], algorithm page and/or a supporting publication/URL.

  • Researchers and companies, who are interested in benchmarking their institutional AI models or products, but not in competing for prizes, can freely use private or unpublished external datasets to train their AI algorithms. They must clearly state the use of external data in their submission, using the algorithm name [e.g. " FH-PS-AOP Model (trained w/ private data)"], algorithm page and/or a supporting publication/URL. They are not obligated to publish their AI models and/or datasets, before or anytime after the submission.

  • To participate in the Final Testing Phase as one of the top 10 teams, participants submit a short arXiv paper on their methodology (2–3 pages) and a public/private URL to their source code on GitHub (hosted with a permissive license). We take these measures to ensure the credibility and reproducibility of all proposed solutions, and to promote open-source AI development. 

  • Participants of the FH-PS-AOP 2023 challenge may publish their own results separately, however, they must not submit their papers before April 1st 2024. Papers published after April 1st 2024 are requested to cite our dataset and challenge paper (once it has been published). 

  • Organizers of the FH-PS-AOP 2023 challenge reserve the right to disqualify any participant or participating team, at any point in time, on grounds of unfair or dishonest practices.

  • All participants reserve the right to drop out of the FH-PS-AOP 2023 challenge and forego any further participation. However, they will not be able to retract their prior submissions or any published results till that point in time.

Computational Limitations (similar to those of the Shifts challenge):

We place a hard limitation on computational resources. Models must run on, at most, within 5 minutes per 1 input sample on the Grand-Challenge backend. Submitted solutions which break these limitations will not be considered in the leaderboard. This is done for several reasons. Firstly, to decrease costs, as every model evaluation on Grand Challenge costs money. Secondly, for real-world applicability, as in many practical applications, there are significant limitations placed on the computational resources or memory budgets and run times of algorithms. Finally, we level the playing field for participants who do not have access to vast amounts of computational resources.

👩‍⚖️ Evaluation 👩‍⚖️

Evaluation metrics are based on the recommendations from the Metrics Reloaded framework:

  • Dice score, Hausdorff distance and Mean Surface Distance for target segmentation

  • The AoP difference (ΔAoP) between predicted and manually measured AoP for prediction

Important notes: To compute overall mean metrics, we will aggregate results over all cases.