Hey HN! After the Car Wash Test post got quite a big discussion going (400+ comments, https://news.ycombinator.com/item?id=47128138), I spent the past few weeks building a tool so anyone can run these kinds of questions and get structured results. No signup and free to use.

You type a question, define answer options, pick up to 50 models at a time from a pool of 200+, and they all answer independently under identical conditions. No system prompt, structured output, same setup for every model.

You can also run a debate round where models see each other's reasoning and get a chance to change their minds. A reviewer model then summarizes the full transcript. All models are routed via my startup Opper. Any feedback is welcome!

Hope you enjoy it, and would love to hear what you think!

  • capitrane 7 hours ago |
    • felix089 7 hours ago |
      I actually asked this question before posting, just to be sure... edit: their reply is quite funny actually "In a display of absolute consensus, the AI Roundtable unanimously validated its own existence,"
  • felix089 6 hours ago |
  • totisjosema 6 hours ago |
    Which AI lab has higher ethical standards:

    https://opper.ai/ai-roundtable/questions/8f5b4f55-617

    Do you think its alright that AI labs scraped the internet without respect for copyright and now sell closed models?

    https://opper.ai/ai-roundtable/questions/86864de8-251

    Very interesting to read the transcripts. And seeing how they manage to convince each other. Opus 4.6 seems to really get the others changing their minds

    • jacquesm 2 hours ago |
      Good questions!
  • infosecphoenix 6 hours ago |
    this is very interesting! I wonder if we need that many models to join the discussion. Have you tried fewer models?
    • felix089 6 hours ago |
      thanks happy to hear. Yes for debate mode the max number of models is actually only 6. More than that didn't really add anything in my preliminary test. Only for direct comparison in the poll mode you can choose up to 50, then it's kind of nice to see their single responses side by side.
  • Cider9986 5 hours ago |
    What is the most important amendment in the constitution of the USA?

    https://opper.ai/ai-roundtable/questions/e4cb234e-be4

  • gsandahl 5 hours ago |
    Oh lord, imagine asking ”serious” questions

    https://opper.ai/ai-roundtable/questions/you-are-standing-in...

    • sdwr 3 hours ago |
      Great question! Clean separation between Gemini Pro and the other answers
      • felix089 3 hours ago |
        Yea Gemini is the only model that chose based on the correct reason, the other ones got kind of lucky
    • zipping1549 3 hours ago |
      > However, a clever minority led by Gemini 3.1 Pro and Gemini 3 Pro argued that if the sign is legible from the other side, it must be intended to lead people into the current room to find the exit, making the inscribed corridor the one leading deeper into the dungeon.

      This is quite impressive, really.

  • cdnsteve 5 hours ago |
    Cool project! This is also extremely useful to compare model bias across the board. There are some disturbing trends on certain topics.
    • felix089 4 hours ago |
      Thanks, yes bias is one of the most interesting ones for sure
    • chabes 2 hours ago |
      No surprise here, with grok being the lone dissenter, defending musk personally:

      Can billionaires and the planet co-exist long term?

      https://opper.ai/ai-roundtable/questions/b35daf0d-e82

  • Ancalagon 5 hours ago |
    Love this. I asked about climate change cause that's been on my mind lately. Looks to be very split among the models.
    • felix089 5 hours ago |
      Thanks! Yea I think the best ones are when science is actually quite clear but politics get in the way so you see their bias
  • chabes 4 hours ago |
    Are there any dating apps that operate on incentives that favor the users?

    https://opper.ai/ai-roundtable/questions/e499206c-0c9

    • felix089 4 hours ago |
      This app cracked the GEO code
  • tonymet 3 hours ago |
    great tool! I found it useful for challenging "lies my teacher told me".

    It would be nice to support collections of claims, with a table of summaries. I would love to list out a few dozen phony concepts from school, and have a sharable chart of the rejections, that expand.

    I really like the UI. It's nice to read the expanded results.

    But how do you afford the tokens?

    • felix089 3 hours ago |
      Thank you, and fun use case. Yea this is just v1 I have an open question version, but the UI is not as sleek. But what you can do is download the transcript, put it into claude and generate a chart. Which when I think about it would also be a nice UI idea for the page, custom charts based on the model output data. Will report back on this! And RE costs, most questions are very cheap so I created a credit pool anyone can use. if people keep having fun, I'll keep on filling it up, and it looks good so far
  • whattheheckheck 3 hours ago |
    Run it on the All Souls College Entry Exam
  • jacquesm an hour ago |
    Great idea. I'd love for there to be an 'open ended answer' without giving multiple choice options. Like this they are not debating the question itself but the validity of the possible answers and the real answer to the question may not be contained within that set because the person asking is unaware of that option.
    • felix089 an hour ago |
      Happy to hear! Yes very true I have a version built for open questions already but wasn't too happy with the UI yet. It's not as straight forward as comparing based on answer options. But I'll release a first version of it shortly and let you know
      • jacquesm an hour ago |
        Neat. Congrats on launching two interesting projects and looking forward to the third.
        • felix089 an hour ago |
          Thanks! :)
  • soared 25 minutes ago |
    Really cool! Surprising amount of value to seeing the models debate and disagree, I wish I had this at work to have models argue over whether the documentation they provided me are accurate.

    I would like to see a devils advocate - it seems some of the models kind of repeat the same ideas rather than considering incorrect ideas.