ArXiv declares independence from Cornell

808 points by bookstore-romeo 4 days ago | 276 comments

adamnemecek 4 days ago |
Good call, ArXiv seems like one of the most important institutions out there right now.
p-e-w 4 days ago |
It’s so important, in fact, that there should be more than one such institution.
People keep falling into the same trap. They love monopolies, then are shocked when those monopolies jerk them around.
andbberger 4 days ago |
there is. bioarxiv.
auggierose 4 days ago |
I am using Zenodo for a while now instead. It is more user friendly, as well.
mastermage 4 days ago |
Zenodo is more for IT Papers and also datasets isn't it?
auggierose 4 days ago |
It can host large datasets as well, yes. It is hosted by CERN, so it is not specifically IT in any way. It also allows you to restrict access to the files of your submission. It has no requirements to submit your LaTeX sources, any PDF will be fine. There are also no restrictions on who can publish. You'll get a DOI, of course.
Everything published on arXiv could also be published on Zenodo, but not the other way around.
mastermage 4 days ago |
oh interesting I didnt know this
jruohonen 3 days ago |
Zenodo is great too, yes, but their meta-data management is somewhat problematic; i.e., it can be changed at whim, which makes indexing difficult.
Al-Khwarizmi 4 days ago |
I like it as well, it works great. But I wonder if it would scale if at some point there were a massive exodus from arXiv.
auggierose 4 days ago |
I think it already hosts much more data than arXiv, given that they also host large datasets.
freehorse 4 days ago |
It is just a preprint repository. It is pretty open (the stories where a preprint was rejected or delayed unreasonably are extremely rare). It offers the basic services for a math/compsci/physics themed preprint repository.
I don't see much of a monopoly, nor any "moat" apart from it being recognised. You can already post preprints on a personal website or on github, and there are "alternatives" such as researchgate that can also host preprints, or zenodo. There are also some lesser known alternatives even. I do not see anything special in hosting preprints online apart from the convenience of being able to have a centralised place to place them and search for them (which you call "monopoly"). If anything, the recognisability and centrality of arxiv helped a lot the old, darker days to establish open access to papers. There was a time when many journals would not let you publish a preprint, or have all kinds of weird rules when you can and when you can't. Probably still to some degree.
koakuma-chan 4 days ago |
it just hosts pdfs, no?
aragilar 4 days ago |
It does do a fair amount of filtering of submissions, and it's a long term archive (e.g. for the next 100+ years). I suspect both (but with the former dominating) are the issue.
bonoboTP 4 days ago |
Just put out a torrent and people of the sort at r/DataHoarder will keep it alive for longer than bureaucrats.
pfortuny 4 days ago |
Also the sources and has a very tame but useful pre-acceptance process.
freehorse 4 days ago |
Well, technically, it can also compile your tex file if you upload the tex file instead of the pdf directly, which helps a lot in standardizing the stylistic structure between preprints. Most other repositories are wild west and inconsistent. I really appreciate the similarity in style applied to most preprints there. Moreover, this means you can also download not just the pdf, but the source tex file to, which can be very useful.
bonoboTP 4 days ago |
The similarity in style comes from conference and journal templates, not from Arxiv. You can style your paper with latex in any style, Arxiv doesn't care. On Arxiv you mostly see preprints that people submit to conferences and journals and they enforce the style.
IshKebab 4 days ago |
Technically yes, socially no.
kergonath 4 days ago |
The French government put a bit of money on the table to help researchers fulfil their open science requirements for government and EU grants, and funded the HAL repository ( https://hal.science/ ). It’s much smaller than arXiv, but it exists. In other countries like the UK there are clusters of smaller repositories as well, but it’s not as well centralised.
dataflow 4 days ago |
This sounds terrible. Of course there's a huge risk of it becoming made for-profit. It almost makes you wonder if the academic publishers are behind this push somehow.
Could they not have made it into some legal structure that puts universities at the top? Say, with a bunch of universities owning shares that comprise the entirety of the ownership of arXiv, but that would allow arXiv to independently raise funds?
gucci-on-fleek 4 days ago |
> Of course there's a huge risk of it becoming made for-profit.
The article says that "it will become an independent nonprofit corporation", and as OpenAI's failed attempt showed, converting a non-profit to a for-profit organization is either really hard or impossible.
> Could they not have made it into some legal structure that puts universities at the top?
As a corporation (even a non-profit one), it will have a board of directors. I have no idea what their charter will look like, but I would be surprised if at least one seat wasn't reserved for a university representative, and more than that seems quite likely as well.
MostlyStable 4 days ago |
OpenAI didn't get everything that they wanted, but I very much disagree with calling it a "failed attempt". The non-profit went from owning the entirety of OpenAI to having ~25% stake.
gucci-on-fleek 4 days ago |
Ah, thanks for the correction.
ronsor 4 days ago |
Sam Altman is a special kind of person; not many could pull off the schemes he does.
gentleman11 4 days ago |
I doubt it was him who architected it. A team of lawful evil lawyers more likely
cbolton 4 days ago |
The non-profit still controls the board doesn't it?
weedhopper 4 days ago |
As shown by Altman, not really.
mort96 4 days ago |
Is your argument really that "OpenAI was an independent nonprofit corporation and it worked out great, Arxiv will remain just as non-profit as OpenAI"?
gucci-on-fleek 4 days ago |
No, my argument is that OpenAI could make billions of dollars if they converted from a non-profit to a for-profit, and they only succeeded after years of effort and because they had already structured the company into separate for-profit and non-profit entities. And even after all this, the non-profit still controls the majority of the for-profit entity.
So if OpenAI with billions of dollars only partially succeeded at converting to a for-profit business, then that suggests that organizations with fewer resources (like arXiv) have much worse odds.
halperter 4 days ago |
Statement by arXiv: https://tech.cornell.edu/arxiv/
reed1234 4 days ago |
Should be the main link. The original article is based on the CEO job posting.
tornikeo 4 days ago |
Now the question is, will arxiv wage a decade long bloody war with Cornell, using heavy infantry (PhD students), archers (reviewers) and field artillery (AI slop papers), or will the independence be mostly peaceful? Only time can tell.
alansaber 4 days ago |
PhD students are levy infantry at best with Postdocs being the armoured levies.
dmos62 4 days ago |
Is this Gondor or Mordor?
psalminen 4 days ago |
I might be missing something, but I still don't get the why. I don't see any "problem" that needs to be solved.
kolinko 4 days ago |
The article lists the reasons quite clearly.
binsquare 4 days ago |
For everyone else,
The reason is because arxiv is growing significantly leading to 297,000 deficit in operating costs for 2025 alone. Corenell has helped with donation a long with other organizations that pay membership fees.
As a result, donors + leaders of arxiv think it's best to spin off to increase funding.
vl 4 days ago |
What is unclear why they need stuff of 27 and 6.7 million to operate essentially static hosting website in 2026.
swiftcoder 4 days ago |
The "essentially static hosting" isn't the cost centre (although with 5 million MAU, it's nothing to sneeze at). The real costs are on the input side - they have an ingestion pipeline that ensures standardised paper formatting and so on, plus at least some degree of human review.
bonoboTP 4 days ago |
Do you mean that the CPU compute cost of turning latex into pdf/HTML is the main cost?
swiftcoder 4 days ago |
No, I mean that the pipeline requires software engineers to build/maintain, and salaries are (as in basically every tech organisation) the dominant cost
bonoboTP 4 days ago |
Then drop it and make people upload a pdf and a zip of the latex sources.
Most people I talk to hate that pipeline and spend a lot of debug hours on it when Arxiv can't compile what overleaf and your local latex install can.
domoritz 4 days ago |
Arxiv can recompile latex to support accessibility and html. Going to pdf submissions would be a major step backward.
bonoboTP 4 days ago |
Make it an external service then, and leave the thing that's already working great to just be.
The reason authors like and use arxiv is that it gives 1) a timestamp, 2) a standardized citable ID, and 3) stable hosting of the pdf. And readers like the no-nonsense single click download of the pdf and a barebones consistent website look.
All else is a side show.
OneDeuxTriSeiGo 4 days ago |
You have to keep in mind that an increasing portion of their time and labor is going towards moderation and filtering due to a mass influx of nonsensical AI generated papers, non-academic numerology-tier hackery, and other useless drivel.
Spinning the service off forces other the labor out onto other universities rather than leaving them to solely Cornell
bonoboTP 4 days ago |
Is the problem the storage cost for hosting them, the HDDs? I'm sure they can be offloaded to cold storage because most of that slop won't be opened by anyone.
Arxiv doesn't need moderation. Nobody is asking for Arxiv moderation. It needs minimal checks to remove overtly illegal content.
swiftcoder 4 days ago |
> Arxiv doesn't need moderation. Nobody is asking for Arxiv moderation
Seems like a lot of people are asking for moderation. And moderation is a pretty big part of the existing offering[1].
[1]: https://info.arxiv.org/help/moderation/index.html
OneDeuxTriSeiGo 4 days ago |
> Is the problem the storage cost for hosting them, the HDDs?
No. Around half the cost is infrastructure. The other half of the cost is people. i.e. engineers to maintain infra and build mod tools for moderators to operate.
> Arxiv doesn't need moderation. Nobody is asking for Arxiv moderation.
This is just not true. Tons of people ask for arxiv to have moderation. Especially since covid, etc when antivaxxers and alternative medicine peddlers started trying to pump the medical categories of arxiv with quack science preprints and then go on to use the arxiv preprint and its DOI to take advantage of non academics who don't really understand what arxiv is other than it looks vaguely like a journal.
And doubly so now that people keep submitting AI generated slop papers to the service trying to flood the different categories so they can pad their resumes or CVs. And on top of that people who don't actually understand the fields they are trying to write papers in using AI to generate "innovative papers" that are completely nonsensical but vaguely parroting the terms of art.
The only reason you don't see more people calling for arxiv moderation is because they already spend so much time on it. If they were to stop moderating the site it would overflow into an absolute nightmare of garbage near overnight. And people wouldn't be upset with the users uploading this of course, they'd be upset with arxiv for failing to take action.
Moderation is inherently unappreciated because in the ideal form it should be effectively invisible (which arxiv's mostly is).
If you want to see the type of stuff that arxiv keeps out, go over to ViXrA [1] or you can watch k-theory's video [2] having fun digging through some of the quality posts that live over on that site.
1. https://en.wikipedia.org/wiki/ViXra
2. https://www.youtube.com/watch?v=1at9BjQP8CI
efreak 3 days ago |
When you stop moderating input, that's when someone builds a fuse filesystem on top of it. We had those for discord (dsfs), twitterfs, redditfs, yt-media-storage, etc. It's also when someone starts using it to distribute malware, like websites built on a combination of GitHub and a cdn.
bonoboTP 2 days ago |
We are talking about a different kind of moderation. People want to filter out incorrect information that in their opinion damages the reputation of Arxiv, eg covid stuff. It's not about dumping binary data.
This is a motte and bailey fallacy. The real question is about moderation with the goal of checking truth and the scientific content. Obviously illegal content and ddos type overloading attacks need to be blocked.
Very different philosophies are clashing here. Arxiv came about in an age of different zeitgeist. We may never get back to that moment.
lou1306 4 days ago |
The PDF formatting is all but standardised. They ingest LaTeX sources, which is formatted according to the authors' whims (most likely, according to whatever journal or conference they just submitted the manuscript to). I'll concede that the (relatively novel) HTML formatter gives paper a more uniform appearance. They also integrate a bunch of external services for e.g., citation metrics and cross-references. Still hard to justify such a high cost to operate, but eh.
Also, the "human review" is a simple moderation process [1]. It usually does not dig into the submission's scientific merits.
[1] https://info.arxiv.org/help/moderation/index.html
OtherShrezzing 4 days ago |
I don't see it as an especially exuberant structure or budget. I've seen larger teams with bigger budgets struggle to maintain smaller applications.
I've contracted into some consultancy teams which you could uncharitably describe as "15 people and $4mn/yr to create one PDF per month".
planetoftofu 2 days ago |
https://info.arxiv.org/about/reports/2024_arXiv_annual_repor...
A critical component of the arXiv-CE project is moving our services entirely off of Cornell University’s infrastructure — this goal is also known as Milestone 1. Milestone 1 completion is projected for the end of fiscal year 2026.
Assume if you are a library, and every day, half baked so-called books brought to the librarians where they have to make sure it is meaningful, readable and printable, 3000 of them, they accept and put them in the right bookshelf, and entire internet reads every one of them on the shelf multiple times by the AI bots, search engines and researchers.
They are not only making a new library, they are also maintaining both and syncing two libraries because Cornell cannot handle the volume of access by bots.
It is not static. It is essentially running two ships side-by-side, and two ships need to appear as one from the outside. And, the new ship is still only half built. The new ship is being designed, and being built. 27 seems small to me.
sanex 4 days ago |
Now they're going to have a deficit of 600,000 in operating costs.
pessimizer 4 days ago |
> The reason is because arxiv is growing significantly leading to 297,000 deficit in operating costs for 2025 alone.
Dollars? So 300 people's cable bill? That's basically nothing. They're spending too much, and it's still nothing, and the solution is going to be to privatize it and eventually loot it.
You can't hand out a collection plate and get $300K for Arxiv? Your local neighborhood church can. Civilization is obviously collapsing.
u1hcw9nx 4 days ago |
I think the problem described in 6th paragraph needs to be solved.
davnicwil 4 days ago |
Very unrelated to the article, but I think 'arXiv' as a brand is bad, and really detrimental to what the institution aims to accomplish.
That is, it's not readily parseable, it really gives an insider term vibe - like this isn't for you if you don't already know what it means or how you should read or say it. It sort of reminds me of the overuse of latin and latinate terms generally in the old professions and, well, the academy.
Just always struck me as being somewhat at odds with the goal.
john-titor 4 days ago |
I wonder what makes you feel that. I've been publishing preprints close to a decade on arxiv now and never had any particular feelings about it.
To me it's just a way to get out your work fast, so that there is already a trace of it on the Internets - nothing more and nothing less.
> That is, it's not readily parseable, it really gives an insider term vibe...
Isn't that normal with highly specialized research fields? I agree many papers could benefit from clearer wording, but working in a niche means you sometimes don't reach a broader audience
davnicwil 4 days ago |
It's an opinion, and you feeling no particular way about it is equally valid.
But I did justify and maybe to reword slightly, surely if one of the main drivers is opening up research, the brand name should be something that's less obscure and more accessible / understandable as to what it is on first sight?
Maybe arXiv evoking the word 'archive' with an ancient Greek twist does that for some, but it's clearly a bit cryptic for many, and if the point is to open up probably the brand should just be something much plainer.
aragilar 4 days ago |
No, it's to be a pre-print server. If someone doesn't know what that means, then they shouldn't be using arXiv.
davnicwil 4 days ago |
everyone has a first time they see a thing and don't yet know what it is.
Using a brand as a filter where you have to already know what it means to get it is exactly the opposite of what it's supposed to achieve.
Consider the most exclusive (successful) brands that exist. Even there, where exclusivity is a brand goal, none of them have this property of being obscure on first contact.
bonoboTP 4 days ago |
You usually get introduced to it by your academic supervisor or collaborators as a masters or PhD student. If you're a solo researcher who has made a significant contribution on the frontier of science, I'm sure you'll be able to understand how Arxiv works as well. Because I assume you have had some conversations with other experts in the field. If you're a full on autodidact with no contact to any other researchers in the field, well, maybe it's better if you chat with some other people in that field.
Its reasonable to have a tradeoff here to avoid cranks and now AI psychosis slop. You can still post on research gate and academia.edu or you own github page or webhosting.
Cordiali 4 days ago |
I've never even connected the 'X' to the Greek letter chi. I just kinda accepted it as one of many groovy web 2.0 misspellings in search of a domain and trademark.
matt-noonan 4 days ago |
This is particularly funny because arXiv doesn't just predate Web 2.0, it nearly predates the public web entirely (only missing it by about two weeks)
nixon_why69 4 days ago |
> like this isn't for you if you don't already know what it means
Isn't that actually kindof a good brand signal for a repo of very specialized papers? "Fun with learning" in comic sans wouldn't help credibility.
vasco 4 days ago |
This the type of guy that will suggest paper.ly as a better name with a straight face and then we wonder why the internet is turning to shit
jltsiren 4 days ago |
It's a classic story of someone having to pick a name quickly, which then gets established long before anyone who cares about branding is aware of its existence.
The original service didn't even have a name, only a description, and it was amusingly hosted at xxx.lanl.gov. But LANL wasn't really interested in it, and the founder eventually left for Cornell. At that point, the service needed a domain name, but archive.org was already taken.
And besides, the name has Ancient Greek influences. A similar Latinate term might be something like "archive".
davnicwil 4 days ago |
Interesting, thanks for the context! Makes it more understandable as a choice.
bonoboTP 4 days ago |
I thought the X was an allusion to LaTeX.
jltsiren 4 days ago |
Usually, when you see "ch" in a Latin word, it represents a "χ" in the original Greek word. Both TeX and arXiv use "X" to represent it instead. TeX because Knuth chose to be fancy, and arXiv because "archive" was no longer available.
vulcan01 4 days ago |
By your criterion, Google, Apple, and Amazon are terrible names as well.
davnicwil 4 days ago |
> if you don't already know what it means or how you should read or say it
Google I'll grant you, though it's still pretty phonetic and easy to read. The other two not at all, they're incredibly well known instantaneously recognisable words.
spiralcoaster 4 days ago |
You're right. The name is just classic gatekeeping and elitist, clearly. I am 100% certain that's why they chose it. If they really cared about inclusion, they would have called it research.io
OutOfHere 4 days ago |
With 300K for the CEO, its enshittification will commence imminently. It will now serve to maximize revenue. Just wait and watch while they issue a premium membership, payment requirements for authors, and other revenue generators to please their investors.
exe34 4 days ago |
they'll just turn into a shitty journal at this point, they just need to introduce peer review and they can start competing with the real journals on price point.
another will need to rise to take its place.
OutOfHere 4 days ago |
> they'll just turn into a shitty journal at this point
To this end, they added an endorsement requirement this year: https://blog.arxiv.org/2026/01/21/attention-authors-updated-...
Peteragain 4 days ago |
.. and soon to be dependent on US military funding? Controlled by someone who has run-ins with universities? This'll end in tears.
Garlef 4 days ago |
Maybe they should implement a graph based trust system:
You need your favourite academic gatekeeper (= thesis advisor) to vouch for you in order to be allowed to upload.
Then AI slop gets flagged and the shame spreads through the graph. And flaggings need to have evidence attached that can again be flagged.
dmos62 4 days ago |
I've often thought that similar trust systems would work well in social media, web search, etc., but I've never seen it implemented in a meaningful way. I wonder what I'm missing.
IshKebab 4 days ago |
Lobsters has this I think. But it also means I've never posted there.
pred_ 4 days ago |
The endorsement system already works along that line: https://info.arxiv.org/help/endorsement.html
It's probably not perfect but in practice, it seems to have been enough to get rid of the worst crackpotty spam.
ryangibb 4 days ago |
You mean like endorsement? https://info.arxiv.org/help/endorsement.html
justinnk 4 days ago |
They already had a basic form of this for a while [1]
> arXiv requires that users be endorsed before submitting their first paper to arXiv or a new category.
[1] https://info.arxiv.org/help/endorsement.html
ChrisGreenHeur 4 days ago |
Science reduced to people with a phd?
budman1 4 days ago |
not a bad first order filter.
can you think of a better one?
awesome_dude 4 days ago |
The whole point of the scientific method was that we could ignore the source of the information, and were instead expected to focus on the value of the information based on supporting evidence (data).
If we go back to "Only people that have been inducted into the community can publish science" we're effectively saying that only the high priests can accrue knowledge.
I say this knowing full well that we have a massive problem in science on sorting the wheat from the chaff, have had so for a VERY long time, and AI is flooding the zone (thank you political commentator I despise) with absolute dross.
frankling_ 4 days ago |
The recent announcement to reject review articles and position papers already smelled like a shift towards a more "opinionated" stance, and this move smells worse.
The vacuum that arXiv originally filled was one of a glorified PDF hosting service with just enough of a reputation to allow some preprints to be cited in a formally published paper, and with just enough moderation to not devolve into spam and chaos. It has also been instrumental in pushing publishers towards open access (i.e., to finally give up).
Unfortunately, over the years, arXiv has become something like a "venue" in its own right, particularly in ML, with some decently cited papers never formally published and "preprints" being cited left and right. Consider the impression you get when seeing a reference to an arXiv preprint vs. a link to an author's institutional website.
In my view, arXiv fulfills its function better the less power it has as an institution, and I thus have exactly zero trust that the split from Cornell is driven by that function. We've seen the kind of appeasement prose from their statement and FAQ [1] countless times before, and it's now time for the usual routine of snapshotting the site to watch the inevitable amendments to the mission statement.
"What positive changes should users expect to see?" - I guess the negative ones we'll have to see for ourselves.
[1] https://tech.cornell.edu/arxiv/
hijodelsol 4 days ago |
I came here to say something similar. As someone who works in a field that applies machine learning but is not purely focused on it, I interact with people who think that arXiv is the only relevant platform and that they don't need to submit their work to any journal, as well as people who still think that preprints don't count at all and that data isn't published until it's printed in an academic journal. It can feel like a clash of worlds.
I think both sides could learn from the other. In the case of ML, I understand the desire to move fast and that average time to publication of 250-300 days in some of the top-tier journals can feel like an unnecessary burden. But having been on both sides of peer review, there is value to the system and it has made for better work.
Not doing any of it follows the same spirit as not benchmarking your approach against more than maybe one alternative and that already as an after-thought. Or benchmaxxing but not exploring the actual real-world consequences, time and cost trade offs, etc.
Now, is academic publishing perfect? Of course not, very very far from it. It desperately needs to be reformed to keep it economically accessible, time efficient for both authors, editors and peer reviewers and to prevent the "hot topic of the day" from dominating journals and making sure that peer review aligns with the needs of the community and actually improves the quality of the work, rather than having "malicious peer review" to get some citations or pet peeves in.
Given the power that the ML field holds and the interesting experiments with open review, I would wish for the field to engage more with the scientific system at large and perhaps try to drive reforms and improve it, rather than completely abandoning it and treating a PDF hosting service as a journal (ofc, preprints would still be desirable and are important, but they can not carry the entire field alone).
bonoboTP 4 days ago |
Simply anticipating basic push backs from reviewers makes sure that you do a somewhat thorough job. Not 100% thorough and the reviews are sometimes frivolous and lazy and stupid. But just knowing that what you put out there has to pass the admittedly noisily gatekept gate of peer review overall improves papers in my estimation. There is also a negative side because people try to hide limitations and honest assessments and cherry pick and curate their tables more in anticipation of knee jerk reviewers but overall I think without any peer review, author culture would become much more lax and bombastic and generally trend toward engagement bait and social media attention optimized stuff.
The current balance where people wrote a paper with reviers in mind, upload it to Arxiv before the review concludes and keep it on Arxiv even if rejected is a nice balance. People get to form their own opinion on it but there is also enough self-imposed quality control on it just due to wanting it to pass peer review, that even if it doesn't pass peer review, it is still better than if people write it in a way that doesn't care or anticipate peer review. And this works because people are somewhat incentivized to get peer reviewed official publications too. But being rejected is not the end of the world either because people can already read it and build on it based on Arxiv.
bjourne 4 days ago |
I really am not sure about that: https://biologue.plos.org/wp-content/uploads/sites/7/2020/05...
The problem is that "optimizing for peer-review" is not the same thing as optimizing for quality. E.g., I like to add a few tongue-in-cheeks to entertain the reader. But then I have to worry endlessly about anal-retentive reviewers who refuse to see the big picture.
bonoboTP 4 days ago |
Currently a kind of rule of thumb is that a PhD student can graduate after approximately 3 papers published in a good peer reviewed venue.
If peer review were to go away, this whole academic system would get into a crisis. It's dysfunctional and has many problems but it's kinda load bearing for the system to chug along.
DANmode 4 days ago |
No hard rule, no crisis.
Maybe we can go back to very opinionated “true” academia,
where there are institutional gatekeepers,
but they mostly get it right on who to award (and not),
vs the current game of
“whoever plays ball with funding sources the best = the best academic”,
which is obviously bullshit.
vkou 4 days ago |
You'll still need to convince the purseholders to pay you, and they'll want some objective metric to measure your output, and whatever metric they pick will be gamed.
DANmode 4 days ago |
The point of my comment was,
in much earlier institutions of knowledge and excellence,
the only transparent metric was whether or not they approved you.
vkou 4 days ago |
That ossifies intellectual monocultures, though. (Or, heaven forbid, if someone has a financial conflict of interest in the private sphere...)
DANmode 4 days ago |
The current solution doesn’t resist capture by capital either,
and indeed we’re already left with all of the things claimed - the worst of both worlds, really.
fc417fc802 4 days ago |
But this is already how the purse holders operate. A big group of experts get together and vote on which grant proposals within a given category to fund.
I think it comes down to how the system is structured and how many players there are. The more difficult it is for a small cult to capture control of the funding (or access to instrumentation or awarding of degrees or whatever) for a given area the less likely you are to end up with a monoculture.
Assuming the majority of the funding continues to come from governments then you have a centralized point of leverage that can shape the system. So it should be possible to impose constraints that result in a system that actively prevents monocultures from developing.
mitthrowaway2 4 days ago |
Maybe their institution should evaluate whether their papers pass muster? It's the one conferring the degree.
StableAlkyne 4 days ago |
I've noticed it's field dependent. Some fields don't really feel much need to publish in a real journal.
Others (at least in chemistry) will accept it, but it raises concern if a paper is only available as a preprint.
pie_flavor 4 days ago |
You may have delivered value in peer review, but on the whole, peer review delivers negative value. https://www.experimental-history.com/p/the-rise-and-fall-of-...
The arXiv vs journal debate seems a lot like 'should the work get done, or should the work get certified' that you see all over 'institutions', and if the certification does not actually catch frauds or errors, it's not making the foundations stronger, which is usually the only justification for the latter side.
fc417fc802 4 days ago |
Can't say I agree with that position.
Responding largely to the linked article, you can't just ignore the massive increase in funding and associated output that occurred. Scaling almost any system up will be expected to result in creative new failure modes. It's easy to observe that a system isn't great and suppose that removing it would improve things but this very often isn't the case. Democracy is one such example.
There's also the publishing ecosystem that developed around the increased funding. It isn't clear to me why any blame (if it's even valid, see preceding paragraph) should be laid at the feet of the practice of peer reviewing publications rather than such an obviously dysfunctional institution.
Even if we accept the way in which publications have been undergoing peer review to somehow be the root of all evil (as opposed to the for profit publication of taxpayer funded work) - there's more than one way to go about it! A glaringly obvious problem, mentioned in the linked article yet not meaningfully addressed that I saw, is that peer reviewers aren't paid. If this was a compensated task presumably it would be performed much more rigorously. Building inspectors aren't volunteers and they seem to do a good enough job.
observationist 4 days ago |
What's the value of academic publishing over the arxiv model of freely publishing, free access, and a global, vigorous discussion across a wide range of platforms, with experts, researchers, amateurs, institutions, and the peanut gallery all having the opportunity to participate?
What possible value does a journal like Nature, for example, bring to the table by claiming a paper for themselves and charging people for it, given the alternative?
I don't see any value there. Maintaining an exclusive clique by using artificial scarcity while coasting on the dregs of reputation remaining to a once prestigious institution is what a lot of these journals are doing.
The world has changed. There's no need for that sort of pay to play gatekeeping, and in fact, the model does tremendous damage to academic and intellectual integrity. It allows people to get away with fraud and it makes the institutions motivated to hide and cover it up so as to not damage their own reputations by admitting anything slipped by them.
If you contrast the damage done by journals, with regards to suppressed research, gatekept access, money taken from researchers and readers alike, against the value they might plausibly provide, the answer is clear.
They're not needed anymore. The AI era, since 2017, has thoroughly demonstrated that journals are materially incapable of keeping up, that they're unable to meaningfully contribute to the field, and that their curation or other involvement has no effective practical value. The same is true for other fields, but everyone involved wants to keep their piece of the grift going as long as possible.
We don't need them, anymore. I suspect we never did.
jltsiren 4 days ago |
The value is the ability to do science as a career without being independently wealthy.
Politicians, administrators, donors, and taxpayers don't want scientists deciding on their own how to spend the money. They want control over what gets funded. They want funding decisions with justifications they can understand. But they don't understand the science itself, so they need "objective" metrics to support the decisions. And because those metrics matter, people will inevitably game them.
ph4rsikal 4 days ago |
My observation is that research, especially in AI has left universities, which are now focusing their research to a lesser degree on STEM. It appears research is now done by companies like Meta, OpenAI, Anthropic, Tencent, Alibaba, among many others.
bonoboTP 4 days ago |
Universities (outside a few) just have much weaker PR machines so you never hear what they do. Also their work is not user facing products so regular people, even tech power users won't see them.
0x3f 4 days ago |
Not sure about that. How would a university test scaling hypotheses in AI, for example? The level of funding required is just not there, as far as I know.
rsfern 4 days ago |
This issue of accessibility is widely acknowledged in the academic literature, but it doesn’t mean that only large companies are doing good research.
Personally I think this resource mismatch can help drive creative choice of research problems that don’t require massive resources. To misquote Feynman, there’s plenty of room at the bottom
oscaracso 4 days ago |
Universities are also not suited to test which race car is the fastest, but that does not obviate the need for academic research in mechanical engineering.
0x3f 4 days ago |
Perhaps but the fastest race car is not possibly marshalling in the end of human involvement in science, so you might consider these of considerably different levels of meriting the funding.
oscaracso 4 days ago |
>marshalling in the end of human involvement in science
Good riddance! But not relevant in the least.
0x3f 4 days ago |
Impact size is not relevant to funding allocation?
oscaracso 4 days ago |
Your attempts to smuggle your conclusions into the conversation are becoming tiresome. Profiling a private company's computer program is not impactful research. The best-fit parameters AI people call scaling exponents are not properties like the proton lifetime or electron electric dipole moment. Rest assured, there remain scientists at universities producing important work on machine learning.
bonoboTP 4 days ago |
There are a million other research things to do besides running huge pretraining runs and hyperparam grid search on giant clusters. To see what, you can start with checking out the best paper and similar awards at neurips, cvpr, iccv, iclr, icml etc.
tzs 4 days ago |
I came across a good example of that a few years ago. Caltech had a page on their site listing Caltech startups.
There were quit a few off them--by number of starts per year per person Caltech was actually generating startups at a higher rate than Stanford. But almost none of those Caltech startups were doing anything that would bring them to the public's attention, or even to the average HN reader's attention.
For example one I remember was a company developing improved ion thrusters for spacecraft. Another was doing something to automate processing samples in medical labs.
Also almost none of them were the "undergraduates drop out to form a company" startup we often hear about, where the founders aren't actually using much that they actually learned at the school, with the school functioning more as a place that brought the founders together.
The Caltech startups were most often formed by professors and grad students, and sometimes undergraduates that were on their research team, and were formed to commercialize their research.
My guess is that this is how it is at a lot of universities.
Fomite 4 days ago |
Every university I've worked in has been dominated by this paradigm, has an office set up to support it, and a bunch of policies around what it means for your doctoral supervisor to also be your employer, etc.
PaulHoule 4 days ago |
That's a specific field at a very specific time. In general there is a difference between research and development, you're going to expect the early work to be done in academia but the work to turn that into a product is done by commercial organizations.
You get ahead as an academic computer scientist, for instance, by writing papers not by writing software. Now there really are brilliant software developers in academic CS but most researchers wrote something that kinda works and give a conference talk about it -- and that's OK because the work to make something you can give a talk about is probably 20% of the work it would take to make something you can put in front of customers.
Because of that there are certain things academic researchers really can't do.
As I see it my experience in getting a PhD and my experience in startups is essentially the same: "how do you do make doing things nobody has ever done before routine?" Talk to people in either culture and you see the PhD students are thinking about either working in academia or a very short list of big prestigious companies and people at startups are sure the PhDs are too pedantic about everything.
It took me a long time of looking at other people's side projects that are usually "I want to learn programming language X", "I want to rewrite something from Software Tools in Rust" to realize just how foreign that kind of creative thinking is to people -- I've seen it for a long time that a side project is not worth doing unless: (1) I really need the product or (2) I can show people something they've never seen before or better yet both. These sound different, but if something doesn't satisfy (2) you can can usually satisfy (1) off the shelf. It just amazes me how many type (2) things stay novel even after 20 years of waiting.
stared 4 days ago |
> arXiv fulfills its function better the less power it has as an institution
It is an interesting instance of the rule of least power, https://en.wikipedia.org/wiki/Rule_of_least_power.
fidotron 4 days ago |
The irony of the TBL quotes there being the entire problem with the semantic web is the ontological tarpit that results due to the excessive expressive power of a general triple store.
PaulHoule 4 days ago |
Well, I’d argue that many things in the semweb are not expressive enough and lead to the misunderstandings we have.
People think, for instance, that RDFS and OWL are meant to SHACL people into bad an over engineered ontologies. The problem is these standards add facts and don’t subtract facts. At risk of sounding like ChatGPT: it’s a data transformation system not a validation system.
That is, you’re supposed to use RDFS to say something like
?s :myTermForLength ?o -> ?s :yourTermForLength ?o .
The point of the namespace system is not to harass you, it is to be able to suck in data from unlimited sources and transform it. Trouble is it can’t do the simple math required to do that for real, like
?s :lengthInFeet ?o -> ?s :lengthInInches 12*?o .
Because if you were trying OWL-style reasoning over arithmetic you would run into Kurt Gödel kinds of problems. Meanwhile you can’t subtract facts that fail validation, you can’t subtract facts that you just don’t need in the next round of processing. It would have made sense to promote SHACL first instead of OWL because garbage-in-garbage out, you are not going to reason successfully unless you have clean data… but what the hell do I know, I’m just an applications programmer who models business processes enough to automate them.
Similarly the problem of ordered collections has never been dealt with properly in that world. PostgreSQL, N1QL and other post-relational and document DB languages can write queries involving ordered collections easily. I can write rather unobvious queries by hand to handle a lot of cases (wrote a paper about it) but I can’t cover all the cases and I know back in the day I could write SPAQL queries much better than the average RDF postdoc or professor.
As for underengineering, Dublin Core came out when I worked at a research library and it just doesn’t come close in capability to MARC from 1970. Larry Masinter over at Adobe had to hack the standard to handle ordered collections because… the authors of a paper sure as hell care what order you write their names in. And it is all like that: RDF standards neglect basic requirements that they need to be useful and then all the complex/complicated stuff really stands out. If you could get the basics done maybe people would use them but they don’t.
light_hue_1 4 days ago |
> Unfortunately, over the years, arXiv has become something like a "venue" in its own right, particularly in ML, with some decently cited papers never formally published and "preprints" being cited left and right. Consider the impression you get when seeing a reference to an arXiv preprint vs. a link to an author's institutional website.
This just isn't true. arXiv is not a venue. There's no place that gives you credit for arXiv papers. No one cares if you cite an arXiv paper or some random website. The vast vast majority of papers that have any kind of attention or citations are published in another venue.
contubernio 4 days ago |
A Fields medal was awarded based mainly on this paper never published elsewhere: https://arxiv.org/abs/math/0211159
auggierose 4 days ago |
I think there is a misunderstanding here. Does arXiv count as a publication? Yes, pretty much anything that gives you a DOI does, for example Zenodo. Does it function as a reputable anything? No.
The paper you link to counts as a publication, but its reputation stands on its own, it has nothing to do with arXiv as a venue. Ideally, that's how it is for all papers, but it isn't, just by publishing in certain venues your paper automatically gets a certain amount of reputation depending on the venue.
fc417fc802 4 days ago |
> Ideally, that's how it is for all papers, but it isn't
We require a method of filtering such that a given researcher doesn't have to personally vet in excruciating detail every paper he comes across because there simply isn't enough time in the day for that.
Ideally such a system would individually for each paper provide a multi-dimensional score that was reputable. How can those be calculated in a manner such that they're reputable? Who knows; that exercise is left for the reader.
In practice "well it got published in Nature" makes for a pretty decent spam filter followed by metrics such as how many times it's been cited since publication, checking that the people citing it are independent authors who actually built directly on top of the work, and checking how many of such citing authors are from a different field.
mitthrowaway2 4 days ago |
Can't we do better than that?
PageRank was a decent solution for websites. Can't we treat citations as a graph, calculate per-author and per-paper trustworthiness scores, update when a paper gets retracted, and mix in a dash of HN-style community upvotes/downvotes and openly-viewable commentary and Q&A by a community of experts and nonexperts alike?
auggierose 3 days ago |
You know that is what PageRank was originally for, right?
mitthrowaway2 3 days ago |
Sure. In that case I guess I'm just waiting for a couple of college kids in a garage to start a website that actually uses it for its intended purpose, so that we can finally deprecate PrestigiousPrivateJournalRank.
fc417fc802 3 days ago |
Of course we could! My tongue in cheek "exercise is left for the reader" comment was meant to convey that it's deceptively simple.
Just one example off the top of my head. How do you handle negative citations? For example a reputable author citing a known incorrect paper to refute it. You need more metadata than we currently have available.
tl;dr just draw the rest of the fucking owl.
Upvotes, downvotes, and commentary? That's extremely complicated. Long term data persistence? Moderation? Real names? Verification of lab affiliations? Who sets the rules? How do you cope with jurisdictional boundaries and related censorship requirements? The scientific literature is fundamentally an open and above all international collaboration. Any sort of closed, centralized, or proprietary implementation is likely to be a nonstarter.
Thus if your goal is a universal system then I'm fairly certain you need to solve the decentralized social networking problem as a more or less hard prerequisite to solving the decentralized scientific literature review problem. This is because you need to solve all the same problems but now with a much higher standard for data retention and replication.
Very topically I assume you'd need a federated protocol. It would need to be formally standardized. It would need a good story for data replication and archival which pretty much rules out ActivityPub and ATProto as they currently stand so you're back to the drawing board.
A nontrivial part of the above likely involves also solving the decentralized petname system problem that GNS attempts to address.
I think a fully generalized scoring or ranking system is exceedingly unlikely to be a realistic undertaking. There's no problem with isolated private venues (ie journals) we just need to rethink how they work. Services such as arxiv provide a DOI so there's nothing stopping "journals" that are actually nothing more than lightweight review platforms that don't actually host any papers themselves from being built.
auggierose 3 days ago |
> Upvotes, downvotes, and commentary? That's extremely complicated.
No, it is not. Don't throw the baby out with the bath water. Zenodo is centralized, and that is fine. A system hosted by CERN would be universal enough for most purposes.
The truth is, most papers cannot stand on their own, they need a reputable venue. While it is difficult to get into Nature, it is much more difficult to actually contribute something substantial to science. That's why we don't have a system like that.
fc417fc802 3 days ago |
I think you've misunderstood me. Did you read my final paragraph? I was agreeing with what you wrote there - that simply rethinking how centralized journals operate could accomplish the majority of the goal while sidestepping most of the complexity.
That said, I disagree that papers require a centralized venue in any fundamental sense. They currently need such a venue because we don't have a better process for vetting and filtering them at scale. The issue is that decentralizing such a process in an acceptable manner is a monstrously complicated prospect.
auggierose 3 days ago |
> We require a method of filtering such that a given researcher doesn't have to personally vet in excruciating detail every paper he comes across because there simply isn't enough time in the day for that.
We do require such a method. Isn't that what AI is for? Strictly working as a filter. You still need to personally vet in excruciating detail every paper you rely on for your work.
fc417fc802 3 days ago |
Maybe. I think that's still experimental and far too resource intensive to do on an individual basis. However an intensive LLM review performed by a centralized service once per paper as a sort of independent literature watchdog would likely be of value. I haven't heard of such a thing yet though.
light_hue_1 4 days ago |
It was not awarded because that paper is on arxiv. That paper could have been printed and sent out by mail. Or posted on 4chan. etc. It just so happens to be it was on arxiv which made no difference to anything.
queuebert 4 days ago |
> Unfortunately, over the years, arXiv has become something like a "venue" in its own right, ...
In my experience as a publishing scientist, this is partly because publishing with "reputable" journals is an increasingly onerous process, with exorbitant fees, enshittified UIs, and useless reviews. The alternative is to upload to arXiv and move on with your life.
groundzeros2015 4 days ago |
That’s true. But that’s separate than the use in ML in Blockchain circles as a form of a marketing - using academic appearances.
jjk166 4 days ago |
That sounds more like an issue of certain fields having crappy standards because the people in those fields benefit from crappy standards than an issue with the site they happen to host papers on.
groundzeros2015 4 days ago |
I don’t buy “some fields are just more honorable”. Everyone uses publishing for personal gain.
But yes it’s a people problem, not an arxiv problem.
StableAlkyne 4 days ago |
Every field and every publisher has this issue though.
I've read papers in the chemical literature that were clearly thinly veiled case studies for whatever instrument or software the authors were selling. Hell, I've read papers that had interesting results, only to dig into the math and find something fundamentally wrong. The worst was an incorrect CFD equation that I traced through a telephone game of 4 papers only to find something to the effect of "We speculate adding $term may improve accuracy, but we have not extensively tested this"
Just because something passed peer review does not make it a good paper. It just means somebody* looked at it and didn't find any obvious problems.
If you are engaged in research, or in a position where you're using the scientific literature, it is vital that you read every paper with a critical lens. Contrary to popular belief, the literature isn't a stone tablet sent from God. It's messy and filled with contradictory ideas.
*Usually it's actually one of their grad students
groundzeros2015 3 days ago |
I completely agree. Sophisticated marketing campaigns include academic literature to bikini clad women.
Aurornis 4 days ago |
> and with just enough moderation to not devolve into spam and chaos
arXiv has become a target for grifters in other domains like health and supplements. I’ve seen several small scale health influencers who ChatGPT some “papers” and then upload them to arXiv, then cite arXiv as proof of their “published research”. It’s not fooling anyone who knows how research work but it’s very convincing to an average person who thinks that that they’re doing the right thing when they follow sources that have done academic research.
I’ve been surprised as how bad and obviously grifty some of the documents I’ve seen on arXiv have become lately. Is there any moderation, or is it a free for all as long as you can get an invite?
PaulHoule 4 days ago |
Review papers are interesting.
Bibliometrics reveal that they are highly cited. Internal data we had at arXiv 20 years ago show they are highly read. Reading review papers is a big part of the way you go from a civilian to an expert with a PhD.
On the other hand, they fall through the cracks of the normal methods of academic evaluation.
They create a lot of value for people but they are not likely to advance your career that much as an academic, certainly not in proportion to the value they create, or at least the value they used to create.
One of the most fun things I did on the way to a PhD was writing a literature review on giant magnetoresistance for the experimentalist on my thesis committee. I went from knowing hardly anything about the topic to writing a summary that taught him a lot he didn't know. Given any random topic in any field you could task me with writing a review paper and I could go out and do a literature search and write up a summary. An expert would probably get some details right that I'd get wrong, might have some insights I'd miss, but it's actually a great job for a beginner, it will teach you the field much more effectively than reading a review paper!
How you regulate review papers is pretty tricky. If it is original research the criterion of "is it original research" is an important limit. There might already be 25 review papers on a topic, but maybe I think they all suck (they might) and I can write the 26th and explain it to people the way I wish it was explained to me.
Now you might say in the arXiv age there was not a limit on pages, but LLMs really do problematize things because they are pretty good at summarization. Send one off on the mission to write a review paper and in some ways they will do better than I do, in other ways will do worse. Plenty of people have no taste or sense of quality and they are going to miss the latter -- hypothetically people could do better as a centaur but I think usually they don't because of that.
One could make the case that LLMs make review papers obsolete since you can always ask one to write a review for you or just have conversations about the literature with them. I know I could have spend a very long time studying the literature on Heart Rate Variability and eventually made up my mind about which of the 20 or so metrics I want to build into my application and I did look at some review papers and can highlight sentences that support my decisions but I made those decisions based on a few weekends of experiments and talking to LLMs. The funny thing is that if you went to a conference and met the guy who wrote the review paper and gave them the hard question of "I can only display one on my consumer-facing HRV app, which one do I show?" they would give you that clear answer that isn't in the review paper and maybe the odds are 70-80% that it will be my answer.
jballanc 4 days ago |
I exited academia for industry 15 years ago, and since then I haven't had nearly as much time to read review papers as I would like. For that reason, my view may be a bit outdated, but one thing I remember finding incredibly useful about review papers is that they provided a venue for speculation.
In the typical "experimental report" sort of paper, the focus is typically narrowed to a knifes edge around the hypothesis, the methods, the results, and analysis. Yes, there is the "Introduction" and a "Discussion", but increasingly I saw "Introductions" become a venue to do citation bartering (I'll cite your paper in the intro to my next paper if you cite that paper in the intro to your next paper) and "Discussion" turn into a place to float your next grant proposal before formal scoring.
Review papers, on the other hand, were more open to speculation. I remember reading a number that were framed as "here's what has been reported, here's what that likely means...and here's where I think the field could push forward in meaningful ways". Since the veracity of a review is generally judged on how well it covers and summarizes what's already been reported, and since no one is getting their next grant from a review, there's more space for the author to bring in their own thoughts and opinions.
I agree that LLMs have largely removed the need for review papers as a reference for the current state of a field...but I'll miss the forward-looking speculation.
Science is staring down the barrel of a looming crisis that looks like an echo chamber of epic proportions, and the only way out is to figure out how to motivate reporting negative results and sharing speculative outsider thinking.
PaulHoule 4 days ago |
My feelings about that outsider thing are pretty mixed.
On one hand I'm the person who implemented the endorsement system for arXiv. I also got a PhD in physics did a postdoc in physics then left the field. I can't say that I was mistreated, but I saw one of the stars of the field today crying every night when he was a postdoc because he was so dedicated to his work and the job market was so brutal -- so I can say it really hurts when I see something that I think belittles that.
On the other hand I am very much an interested outsider when it comes to biosignals, space ISRU, climate change, synthetic biology and all sorts of things. With my startup and hackathon experience it is routine for me to go look at a lot of literature in a new field and cook it down and realize things are a lot simpler than they look and build a demo that knocks the socks off the postdocs because... that's what I do.
But Riemann Hypothesis, Collatz, dropping names of anyone who wrote a popular book, I don't do that. What drives me nuts about crackpots is that they are all interested in the same things whereas real scientists are interested in something different. [1] It was a big part of our thinking about arXiv -- crackpot submissions were a tiny fraction of submission to arXiv but they would have been half the submissions to certain fields like quantum gravity.
I've sat around campfires where hippies were passing a spliff around and talking about that kind of stuff and was really amused recently when we found out that Epstein did the thing with professors who would have known better -- I mean, I will use my seduction toolbox to get people like that to say more than they should but not to have the same conversation I could have at a music festival.
[1] e.g. I think Tolstoy got it backwards!
aleph_minus_one 4 days ago |
> crackpot submissions were a tiny fraction of submission to arXiv but they would have been half the submissions to certain fields like quantum gravity
Just some very outsider thought:
Could it be that this problem is rather self-inflected by researchers and their marketing?
Physicists market all the time that resolving these questions about quantum gravity will give the answers to the deepest questions that plagued philosophers over millenia. Well, such a marketing attracts crackpots who do believe that they have something to tell about such topics.
Relatedly, to improve their chances of getting research funding, a lot of researchers do an outreach to the general public to show the importance of the questions that they work on. Of course this means that people from the general pyblic who now get interested in such questions will make their own attempt to make a contribution because - well, this researcher just told me how important it is to think about such questions. Of course such a person from the general public typically does not have the deep scientific knowledge such that their contribution meets the high scientific standards.
abdullahkhalids 4 days ago |
> Unfortunately, over the years, arXiv has become something like a "venue" in its own right, particularly in ML, with some decently cited papers never formally published and "preprints" being cited left and right.
This has been a common practice in physics, especially the more theoretical branches, since the inception of arXiv. Senior researchers write a paper draft, and then send copies to some of their peers, get and incorporate feedback, and just submit to arxiv.
godelski 4 days ago |
And this is really how it should be. Honestly the only thing I want arxiv to do is become more like open review. Allow comments by peers and some better linking to data and project pages.
It works for physics because physicists are very rigorous. So papers don't change very much. It also works for ML because everyone is moving very fast that it's closer to doing open research. Sloppier, but as long as the readers are other experts then it's generally fine.
I think research should really just be open. It helps everyone. The AI slop and mass publishing is exploiting our laziness; evaluating people on quantity rather than quality. I'm not sure why people are so resistant to making this change. Yes, it's harder, but it has a lot of benefits. And at the end of the day it doesn't matter if a paper is generated if it's actually a quality paper (not in just how it reads, but the actual research). Slop is slop and we shouldn't want slop regardless. But if we evaluate on quality and everything is open it becomes much easier to figure out who is producing slop, collision rings, plagiarist rings, and all that. A little extra work for a lot of benefits. But we seem to be willing to put in a lot of work to avoid doing more work
abdullahkhalids 4 days ago |
I don't agree actually that is how it should or can work for everyone. Senior researchers produce good quality research, and they have a network of high quality peers built over decades. Both those are necessary for them to reach out and ask for feedback, and get genuine and high quality feedback.
Junior researchers don't have these typically. They also benefit more from anonymous feedback, which enables the reviewers to bluntly identify wrong or close to wrong results. So I think open journals should continue to exist. They fill an essential role in the scientific ecosystem.
godelski 4 days ago |
Mostly I'm fine with journals and conferences but I think it's the prestige that has fucked everything over.
I want reviews of my papers! But I want reviews by people who care. I don't want reviews by people who don't want to review. I don't want reviews by people who think it's their job to reject or find flaws in the work. I want reviews by people who care. I want reviews by people who want to make my work better. I want reviews by people who understand all works are flawed and we can't tackle every one in every paper (the problem isn't solved, so there's always more!).
So low bars. Forget the prestige, citation count, novelty, and all the bullshit and just focus on the actual work and that the act of publishing is about communicating. Publishing is the main difference between private and public labs. Private labs do fine research, without all the formal review. It's just that nobody learns about it. They don't give back to the community.
So my ideal system still has reviewers, journals, and conferences but I think we'd get along just fine without them. I believe that if we can't recognize that then we can't use these other tools to make things better.
They aren't fundamental tools needed to make the process work, they're tools that can make the process work better. But I'm not convinced they're doing a good job of that right now.
lokar 4 days ago |
You could imagine separating the "publishing" part, which really should just be open with minimal anti-spam etc, from the "this was reviewed by a trusted group of people so you should give it more consideration" part. You could do the second without it being attached to the publishing.
godelski 4 days ago |
I think your phrasing was good. A lot of people conflate a work being published is equivalent to peer reviewed and that "peer reviewed" means "correct".
I think when you think about publishing as what it actually is, researchers communicating to researchers, what I said makes much more sense. I do think formal review does help reduce slop but I think anyone who has published anything is also very aware of how noisy the system is and how good works get rejected or delayed because they aren't "novel" enough.
Honestly, my ideal system is journals with low bars. We forget this prestige bullshit and silliness of novelty (often it's novel to niche experts but not to others) and basically check if it looks like due diligence was done, there's not things obviously wrong, no obvious plagiarism, and then maybe a little back and forth to help communicate. But I think we've gotten too lost in this idea of needing to punish fast and that it has to be important. Important to who? Tons of stuff is only considered important later, we've got a long track record of not being so great at that. But we have a long track record of at least some people working on what we later find out is important.
nickpsecurity 4 days ago |
There's a lot of stuff with basic errors in peer reviewed journals. Things also can get rejected for anything from formatting to politics.
I like Arxiv better. I get the paper, know it's probably not reviewed (like in many journals), and review it if I want to. I used to ise Citeseerx, too, to get tons of CompSci papers. Even better, OpenReview might have some good observations.
fsckboy 4 days ago |
>We've seen the kind of appeasement prose from their statement and FAQ [1] countless times before
what are you referring to, who is being appeased who shouldn't be? what are you worried about happening?
asimpleusecase 4 days ago |
I wonder if there are plans to licence the content for AI training
KellyCriterion 4 days ago |
Id guess OAI & co have already copied without asking?
mkl 4 days ago |
No need to ask - the whole point is open access. https://info.arxiv.org/help/bulk_data.html
mkl 4 days ago |
It's been available all along: https://info.arxiv.org/help/bulk_data.html
shevy-java 4 days ago |
"Recently arXiv’s growth has accelerated. Since 2022, it has expanded its staff to 27, in large part to deal with a 50% increase in submitted manuscripts."
I am wary of that. IMO the business model is damaged therein. You can say in 2022 we had 27; bankrupt in 2030.
Aerolfos 4 days ago |
And they hired a LinkedIn business idiot to run the new organization - so the aim is for an infinite growth tech startup in terms of governance, despite the technical legal status of non-profit. It shows in the language they use in the announcement, too ("improved financial viability in the long run")
OpenAI shows exactly how well that works and what that kind of governance does to a company and to its support of science and the commons.
TL;DR, it's fucked.
swiftcoder 4 days ago |
> raised concerns about the proposed $300,000 salary for arXiv’s new CEO, saying it seemed high
Is a mid-to-high engineering salary outlandish for a CEO of what is likely to be a fairly major non-profit? Even non-profits have to be somewhat competitive when it comes to salary, and the ideal candidate is likely someone who would be balancing this against a tenured position at a major university
mort96 4 days ago |
Salaries in the US are so bonkers. Everywhere else outside of the US, $300,000 is an outlandish high salary. To call it "mid to high" is insane.
HappyPanacea 4 days ago |
Yes the obvious play is to move human labor to cheaper countries like France (including CEO of course).
renewiltord 4 days ago |
The reason the French can’t build these things is the same reason they shouldn’t be allowed to be in charge. It’s a preprint PDF host. Just make your own if you can run this one.
magnio 4 days ago |
They do have their own: https://hal.science/
It is actually quite common to come across HAL in subfields of mathematics in my experience.
bjourne 4 days ago |
HAL is decidedly second-tier. Given the option, everyone would pick arXiv over HAL. Hence, HAL hosts lots of stuff that didn't (even) make it to arXiv => lots of subpar dredge.
Miraltar 4 days ago |
> HAL is decidedly second-tier. Given the option, everyone would pick arXiv over HAL.
Can you elaborate on that?
linhns 4 days ago |
I agree that dredge is a huge problem with HAL, but it's getting better. While arXiv is still stuck with a unfriendly UI.
renewiltord 4 days ago |
That’s great. People will use whichever one is better.
swiftcoder 4 days ago |
Turns out that "better" for many people means "better moderated", since static hosting is hard to differentiate. And at present Arxiv is winning that one (at the expense of considerably higher running costs due to said moderation)
0x3f 4 days ago |
The net salary in France might be low but the overall cost of hiring is quite high. Besides, why go to the middle when you can just find even cheaper places, if that's your prime metric?
swiftcoder 4 days ago |
Even in the states, it’s more a distortion caused by the big tech centres. A software engineer in Ohio doesn’t command that kind of salary, but in San Francisco or Seattle that’ll buy you a moderately-senior engineer.
And while academic salaries are generally not great, tenured professors at big universities tend to make a fair bit (plus a lot more vacation time and perks than is normal in the US)
philipallstar 4 days ago |
It's also caused by progressive tax rates. People take harder jobs based on net wage, not gross wage, so gross wage has to compensate.
justin66 4 days ago |
> A software engineer in Ohio doesn’t command that kind of salary, but in San Francisco or Seattle that’ll buy you a moderately-senior engineer.
On the other hand, a CEO of a well-known nonprofit might command that kind of salary in Ohio. People often underestimate how much the leaders of nonprofits pay themselves.
supern0va 4 days ago |
I'm not entirely convinced that this is entirely some sort of widespread bad behavior. Many non-profit boards conduct research on salaries and essentially size their organization and pay something akin to a market rate for the given size and scope.
However, even a small percentage of bad actors finding a way to inflate their salaries will, as a side effect, inflate salaries across the board because it influences the process that sets the salaries for the honest organizations.
It's a fun problem.
justin66 4 days ago |
I suspect abuse is more prevalent at the low end, among nonprofits that don’t do much.
I stand by the point of my original post: People often underestimate how much the leaders of nonprofits pay themselves. These are figures you can look up and quiz your friends to test the hypothesis, if they’re into that sort of thing. For a good time include some nonprofit hospitals.
supern0va 4 days ago |
Outside of manipulating the board, they do not pay themselves, though. The board decides their comp package.
justin66 4 days ago |
That's fair, but the boards of nonprofits are as corruptible (I'm reluctant to use that word since we're talking about fairly standard practices, not outright crime, but whatever) as those in the corporate world. But I wouldn't want to keep talking about this situation as if it's all theoretical. In contrast with a lot of the corporate world, with nonprofits you can just go and look at what their officers are paid (it's public record) and decide for yourself what you feel about the figures.
dev_l1x_be 4 days ago |
So is the living cost. Insurance, housing, etc. A better comparison is PPP.
carlosjobim 4 days ago |
Living costs are similarly high in many places that have nowhere near the salaries of the US.
It's still the land of opportunities. It's easier to find ways to reduce your living costs than ways to increase your salary.
0x3f 4 days ago |
Not everywhere. Switzerland exists. Also cost of living is a thing so if anything US/CH just ramp up to match that. The rest of Europe has high CoL but terrible salaries. Asia has bad salaries but low CoL (on average).
mort96 4 days ago |
According to swissdevjobs.ch[1], the top 10% salary for a senior software developer in Switzerland is 135,000 swiss franc; that's roughly $170,000 per year.
So if this is correct, then even in Switzerland, it seems like $300,000 per year would be an obscenely high salary for a senior developer.
[1]: https://swissdevjobs.ch/salaries/all/all/Senior
0x3f 4 days ago |
Well first of all it's a CEO position, not an SWE :)
Even if we scope it to SWE, I don't think that's far off the US percentiles.
In London I imagine the top 10% SWE is not even 100k GBP. In Germany even worse.
mort96 4 days ago |
I responded to the idea that $300,000/year is a "mid-to-high engineering salary". CEO salaries are absurdly high everywhere.
0x3f 4 days ago |
Oh right, well it depends on CoL doesn't it? You can reframe European salaries as 'obscene' by world standards too. Both the US and Europe have totally broken and unaffordable housing markets, for example, but at least the Bay Area compensates with salary. I would say that relative to costs it's more that other salaries are obscenely low, if anything. People in Europe should be rioting, but unfortunately only the home owners are politically active.
mort96 4 days ago |
Does cities like San Francisco not have janitors? Waiters? Food delivery drivers? Or do those jobs command a six-figure salary too? If they can live comfortably in the city on a five-figure salary, maybe the argument that "cost of living is so high in SF that you can't live without a $300,000/year salary" is just a little bit overblown?
I can not imagine what one could possibly need $300,000 per year for unless an apartment costs like $200,000 per year.
0x3f 4 days ago |
You get by on a low salary by living with multiple people in the same apartment. Or you live far away and commute. Or both.
Not really a tenable long-term situation for a senior employee with plans to start a family. Family homes of decent size and area are literally millions of dollars.
mort96 4 days ago |
I guess I don't understand why programmers somehow deserve a better life than other people. Janitors deserve to start families too, don't they?
0x3f 4 days ago |
It's not about deserving, programmers just have enough market power to be able to choose to go elsewhere. Janitors and other more fungible employees do not.
Besides, I did already say that everyone else was underpaid relative to costs. But that's not unique to the Bay Area. Cost of housing relative to income is terrible in almost all of the major European cities too.
Once cities become wealthy enough to develop a home owning class, they seem to cease being able to provision adequate housing supply in general.
throw-the-towel 4 days ago |
Usually this kind of argument leads to punishing the programmers, not lifting up the janitors.
mort96 4 days ago |
That's kind of two sides of the same coin, isn't it? The cost of living is so high in part because so many have ridiculously high salaries, isn't it?
swiftcoder 4 days ago |
> The cost of living is so high in part because so many have ridiculously high salaries
Bigger problem in the SF area is that a bunch of folks who owned property before the gold rush have ended up real-estate-rich, and formed a voting block that actively prevents the construction of new housing (on the basis that it might devalue their accidental real estate investment)
prepend 4 days ago |
Its about how the market values those skillsets, not about what people “deserve.”
No one is sitting around and setting salaries based on the intrinsic human dignity of the people working jobs.
throw-the-towel 4 days ago |
> I can not imagine what one could possibly need $300,000 per year for unless an apartment costs like $200,000 per year.
Being able to afford unpredictable expenses and not have it bankrupt you. In the US, that would include healthcare. Everywhere in the world, that would be useful if you were laid off.
mort96 4 days ago |
To build an emergency fund, you just need an income that's a bit higher than your expenses. If you earn $60,000 after tax per year, and spend $50,000 per year, you have a decent $10,000 emergency fund after one year and a massive $100,000 emergency fund after a decade. You don't need $300,000 per year to save.
swiftcoder 4 days ago |
> Does cities like San Francisco not have janitors? Waiters?
When I used to visit the Meta campus in Menlo Park, the QA folk I worked with were commuting 2 hours each way just to be able to afford housing. I've no idea how far away the janitorial staff must have lived to do the same
jalla 4 days ago |
I worked at Redwood Shores. On a walk across the 101, I discovered where the cleaning staff and food workers lived. In cars, under the bridge or parked in a quiet corner of the street next to industrial or commercial property.
swiftcoder 4 days ago |
> Oh right, well it depends on CoL doesn't it?
To some extent, maybe, but often not. For example, London has similar cost of living to the Bay Area, and when I was at Meta experienced folks like Dan Abramov over in London were making about the same as fresh college hires in Menlo Park...
0x3f 4 days ago |
Yeah I was talking more about the definition of obscene. Like is it obscene to make 300k if housing is so expensive? I say no, and that London salaries are just bad. Although it would be preferable to fix the housing market.
To be fair though, Dan specifically is kind of notorious for messing up his comp negotiation. Did you not see the Twitter pile on at the time?
swiftcoder 4 days ago |
> Dan specifically is kind of notorious for messing up his comp negotiation
Indeed, but having seen the infamous spreadsheet, he didn't have all that much headroom (unless he agreed to move to the US)
groundzeros2015 4 days ago |
Note that you are seeing an explicit tradeoff of different economic systems.
ZpJuUuNaQ5 4 days ago |
>Salaries in the US are so bonkers.
Sure, but the cost of living there is significantly higher as well. Anyway, I can hardly even comprehend these kinds of sums, though I am a bit of an outlier, as I earn around $27,700 as an SWE in Europe, which is low even by the standards of companies in my own country.
nozzlegear 4 days ago |
> Sure, but the cost of living there is significantly higher as well.
The US is huge though, and the cost of living is astronomically lower outside of those big tech hub cities. I live in a tiny town in the midwest with a big house and a big yard that we bought for $89k USD in 2016[†]. I'm able to support myself and my wife comfortably on just my (self-employed) SWE salary.
[†] Real estate inflation index for our area says the house would have cost us around $130-$150k USD in 2026.
segmondy 4 days ago |
Everyone outside the US doesn't deal with USD. Your comment is bonkers. Read up on purchasing power. All locations are not equal.
jltsiren 4 days ago |
The traditional definition of high income starts at 2x the median. Looking the US as a whole, anything above $125k should be considered high income. But it doesn't feel like that, because median wages are unusually low in the US relative to mean wages. Upper middle class salaries, on the other hand, have grown very high, and they have distorted people's perceptions. Even now, we are debating whether almost 5x the median should be considered high income.
MattDamonSpace 4 days ago |
The us has an enormous per capita gdp for that large a country
ryukoposting 4 days ago |
Silicon Valley is the only place in the United States where $300K is even close to the "middle" of anything.
I just moved to SV a few months ago from the Midwest (and not a particularly cheap part of it). Telling my coworkers who aren't from the US what a house costs in Wisconsin, you'd have thought I was the one who moved from a foreign country.
swiftcoder 4 days ago |
> Silicon Valley is the only place in the United States where $300K is even close to the "middle" of anything.
It does heavily cluster around SV, for sure, but Seattle/NewYork/Boston/Arlington will all get you there, and Chicago/Austin/etc aren't all that far behind at this point
ryukoposting a day ago |
I just left a position in Chicago because SV pays me about double.
Supermancho 3 days ago |
As a datapoint, I get paid just under 250k/yr and I'm an above average developer in his very late career, at a midwest company. 300k avg for SV is about right.
The local college and medical administrators are the ones that own the mansions in my city. I have a family, house and mortgage plus my large medical expenses (cardiac) I can handle...until I cant.
ryukoposting a day ago |
Holy moly, $250 in the midwest? Where do I get your job?
For reference, I just left a position in the Midwest for a job in SV that pays a little more than you're getting paid. $250 but with Midwestern rent would be life-changing. Sounds like we're in very different stages of our careers, though.
snovymgodym 4 days ago |
It's frankly not that crazy of a salary for an important executive position.
The city manager of a small city in Texas gets paid around that much and that's taxpayer money.
Now what collegiate football coaches are paid, that's pretty crazy.
mort96 3 days ago |
I didn't say it's a crazy salary for an important executive position, I said it's wild to call it a "mid-high engineering salary"
HappyPanacea 4 days ago |
arXiv's CEO doesn't need to be a tenured professor equivalent it is a preprint repository ffs.
0x3f 4 days ago |
It's a bit more complex than an S3 bucket though because the value comes from the reputation network, which can't really be replicated easily.
Though, saying that, I suppose all the reputation data is kind of public. Apart from emails/accounts.
groundzeros2015 4 days ago |
> It's a bit more complex than an S3 bucket
It’s even less. I would bet if it’s not now, for the vast majority of its life it was a machine at someone’s desk at Cornell.
PaulHoule 4 days ago |
When I was involved it was an x86 machine in a rack in Rhodes Hall.
I had a copy of the whole thing under my desk though in Olin Library on a Pentium 3 machine from IBM that was built like a piece of military hardware. In April the sun would shine in the windows of my office, the HVAC system was unable to cool my office, and temperatures would soar above 100F and I'd be sitting there in a tank top and drinking a lot of water and sports drinks and visitors would ask me how I could stand it.
groundzeros2015 4 days ago |
Thanks for confirming. We need to stop marketing for AWS by talking about the ability to use the internet in AWS branded product terms.
0x3f 4 days ago |
The S3 API/UX/cost model is so seductively simple for static hosting though. I kind of think they deserve their ubiquity. Not on 90% of their products though.
PaulHoule 4 days ago |
It's great for some applications, like to serve up the QR codes for this system
https://mastodon.social/@UP8/116086491667959840
I could even make those cards tradeable like NFTs, use DynamoDB as the ledger, and not worry about the cost at all.
On the other hand if you are talking about something bandwidth heavy forget about AWS. Video hosting with Cloudfront doesn't seem that difficult, even developing a YouTube clone where anybody could upload a video and it gets hosted seems like a moderate sized project. But with the bandwidth meter always running that kind of system could put you into the poorhouse pretty quickly if it caught on. Much of why YouTube doesn't have competition is exactly that: Google's costs are very low and they have an established system of monetization.
I am keeping my photo albums on Behance rather than self-hosting because I lost enough money on a big photo site in AWS that it drove my wife furious and it took me a few years to pay off the debt.
groundzeros2015 4 days ago |
> I lost enough money on a big photo site in AWS
I’m sorry what. This is supposed to persuade me?
Hendrikto 4 days ago |
For anybody outside the SV, and especially outside the US, this seems high, yes.
arXiv does not need to and should not optimize for “shareholder value”, which is at least nominally the justification for outlandish CEO pay packages.
kingstnap 4 days ago |
arXiv doesn't need much. All they do is host static pdfs uploaded by someone else with free CDN services from Fastly [0]. I'm sure they could get academics to volunteer moderation services as well.
In reality you could host the entire thing for well under $50k/year in hardware and storage if someone else is providing a free CDN. Their costs could be incredibly low.
But just like Wikipedia I see them very likely very quickly becoming a money hole that pretends to barely be kept afloat from donations. All when in reality whats actually happening is that its a ridiculous number of rent seekers managed to ride the coattails of being the defacto preprint server for AI papers to land themselves cushy Jobs at a place that spends 90+% of their money on flights and hotels and wages for their staff.
I'm already expecting their financial reports to look ridiculously headcount heavy with Personnel Expenses, Meetings and Travel blowing up. As well as the classic Wikipedia style we spend a ton of money in unclear costs [1].
Whats already sad is they stopped having a real broken down report that used to actually showed things. Like look at this beautiful screenshot of a excel sheet. Imagine if Wikipedia produced anything this clear. [2]
[0] https://blog.arxiv.org/2023/12/18/faster-arxiv-with-fastly/
[1] https://info.arxiv.org/about/reports/FY26_Budget_Public.pdf
[2] https://info.arxiv.org/about/reports/2020_arXiv_Budget.pdf
OneDeuxTriSeiGo 4 days ago |
> arXiv doesn't need much. All they do is host static pdfs uploaded by someone else with free CDN services from Fastly [0]. I'm sure they could get academics to volunteer moderation services as well.
This just isn't true. arXiv nowadays has to deal with major moderation demands due to the influx of absolute drivel, spam, and slop that non-academics and less-than-quality academics have been uploading to the site.
Moderation for arXiv isn't perfect or comprehensive but they put so much work into trying to keep the worst of the content off their site. At this point while they aren't doing full blown peer review, they are putting a lot of work into providing first pass moderation that ensures the content in their academic categories is of at least some level of respectable academic quality.
prepend 4 days ago |
Volunteer moderators are a valid option. And I think may work out better than paid employees.
OneDeuxTriSeiGo 4 days ago |
volunteer moderators are a valid option however this is also the way peer review works and the system is unfortunately very problematic and exploitative.
First pass sanity checks are also a lot less fun than proper peer review so paying moderators to do it is probably safer in the long run or else you end up with cliques of moderators who only keep moderating out of spite/personal vendettas against certain groups or fields.
weitendorf 3 days ago |
> In reality you could host the entire thing for well under $50k/year in hardware
I could pay Anthropic $400 to write more code than you have in your entire lifetime.
Sure, you're able to operate a website acting as essentially the most important and highest volume venue for sharing academic research in the world, but come on, why couldn't I just ask Claude Code or some web developer in a foreign country to do the same thing?
jjk166 4 days ago |
$300k for a top executive position isn't especially high for anywhere in the US. That's around what the administrative director of a hospital would be making, which seems like a much smaller scope than leading ArXiv. For comparison, my roommate works for a non-profit that serves Philadelphia whose CEO's salary is $1.1 million. The CEO of the wikimedia foundation, which is similar in terms of role, has a salary of $450k. General average for US CEOs including for profits is around $800k and for large organizations tens of millions is not atypical.
Non-profits aren't maximizing stock value, but they do need to optimize for stakeholder value - you want to maximize the amount of money being donated in and you want to make the most of the donations you receive, both to advance the primary mission of the non-profit and to instill confidence in donors. This demands competent leadership. The idea that just because something is not being done for profit means the value of the person's contributions is worth less is absurd. So long as the CEO provides more than $300k of value by leading the organization, which might include access to their personal connections, then the salary is sensible.
DonsDiscountGas 4 days ago |
Considering the value and prominence of arxiv to the world, this seems low to me. Although more importantly the rest of the staff needs to be well paid too, and if that's the ceiling its a bit concerning. It's crazy to me that people thought this was too high.
prepend 4 days ago |
Yes, considering the workload and responsibility of the position.
Non-profits run into the problem of creating cushy jobs that just burn doner money.
Arxiv is basically a giant folder in the cloud and shouldnt have such high paying jobs. At least not if they want rational people to keep donating.
bonoboTP 4 days ago |
I fear their Mozilla-ification and Wikipedia-ification. Scope creep, various outreach feel-good programs, ballooning costs, lost focus etc. And other types of enshittification.
Any change to the basic premise will be a negative step.
They should just be boring quiet unopininionated neutral background infrastructure.
kergonath 4 days ago |
> They should just be quiet unopininionated neutral background infrastructure.
Exactly. It should be a utility. Not quite dumb pipe, but not too far either.
doctorwho42 4 days ago |
We don't do 'utility' in America. Everything has S.V. brain rot - it's mixed with wall street brain rot, and now if you aren't extracting wealth out of what you have access to - you are failing.
musicale 4 days ago |
I mean... someone needs to "unlock value" from ArXiv, right?
Hendrikto 4 days ago |
> Mozilla-ification
All the Mozilla executives have done for the last 15+ years is
* lay off developers
* spend lots of money on stupid side projects nobody asked for or wants
* increase their own salaries
and all that with the backdrop of falling quality, market share, and relevance.
I would happily donate to Firefox, but this fucked up organization will never see a single cent from me. They will spend it on anything but Firefox, which is the only thing anybody wants them to spend it on.
It might already be too late, and we will be left with a browser monopoly.
bonoboTP 4 days ago |
And it is a risk for Arxiv too that once they start to drink the koolaid and start going to the same cocktail parties that these kinds of nonprofit board members and execs go to and will feel the need to prance around with some fancy stuff.
"oh no, you see we are not a preprint server host anymore, our mission is a values driven blablabla to make a meaningful change in the blablabla, we have spent X dollars to promote the blablabla, take me seriously please I'm also fancy like you! "
musicale 4 days ago |
Well, maybe they don't need to be a nonprofit. How about a public benefit corporation?
And maybe that public benefit thing, well we don't really need it do we? Now that we're deep into AI you know.
For-profit has a nice ring to it. We're delivering value to founders and shareholders, where it belongs.
swed420 4 days ago |
> It might already be too late, and we will be left with a browser monopoly.
Ladybird continues to have the appearance of making progress, fwiw:
https://ladybird.org/newsletter/2026-02-28/
cge 4 days ago |
>They will spend it on anything but Firefox, which is the only thing anybody wants them to spend it on.
Mozilla certainly won’t spend it on Firefox, because the structure of the organization legally prohibits them from spending any of their donation money on Firefox. The ‘side projects’ are, at least officially, the real purpose of Mozilla.
bonoboTP 4 days ago |
They built the brand on Firefox then did a bait and switch. How many people who donate to Mozilla know that it's not helping Firefox?
But yeah, this is just how it works. Things can't stay good for too long. One must always be on the lookout for the new small thing that's not yet corrupted. Stay with it for a while until it rots, then jump to the next replacement.
musicale 4 days ago |
> They will spend it on anything but Firefox, which is the only thing anybody wants them to spend it on.
;_;
musicale 4 days ago |
My prediction exactly.
Maybe a bloated foundation (pursuing expensive objectives completely unrelated to ArXiv's core mission of hosting PDFs), new classes of unnecessary management staff, new and useless paid features that nobody wants, and obnoxious nag banners claiming "ArXiv is not for sale!" but demanding money anyway.
ACCount37 4 days ago |
Frankly, the only beef I have with arXiv as is: its insistence on blocking AI access.
I had to tell my AI to set up an MCP for "fetch while bypassing arXiv's rate limit" so that it doesn't burn 40k tokens looking for workarounds every time it wants to look at a paper and gets hit with a "sorry, meatbags only" wall.
Very annoying, given how relevant arXiv papers are for ML specifically, and how many of papers there are. Can't "human flesh search" through all of them to pick the relevant ones for your work, and they just had to insist on making it harder for AIs to do it too.
spiralcoaster 4 days ago |
I hope they ramp up their blocking of AI access. The last thing we need is providers like this getting hammered by AI
vedantxn 4 days ago |
we got this before gta 6
contubernio 4 days ago |
What is worrisome about this development, and corollary actions like the hiring of a CEO with a $300,000/year salary, is that the essentially independent and community based platform will disappear. The ArXiv exists because mathematicians and physicists, and later computer scientists and engineers, posted there, freely, their work, with minimal attention to licensing and other commercial aspects. It has thrived because it required no peer review and made interesting things accessible quickly to whomever cared to read them.
A setup as a US-based "non-profit" is worrisome, if only because 300K is an obscene salary even in a for-profit setting. That the US-based posters can't see this is evidence of the basic problem which is that the US, both left and right, has been taken over by a neoliberal feudal antidemocratic nativist mindset that is anathema to the sort of free interchange of ideas that underlay the ArXiv's development in the hands of mathematicians and physicists now swept aside and ignored by machine learning grifters and technicians who program computers.
doctorwho42 4 days ago |
As a US based academic, I have to say when I saw the salary I immediately gawked. I think it's not americans but silicon valley-ites and tech bros on here who have lived with inflated salary/net worth that think it's just a middle of the road salary. As I regularly interact with friends in engineering who make like $200k + benefits ($), and I wonder why I don't jump ship to that weird land.
juped 4 days ago |
>Cornell, for example, had a limited capacity to pay software developers to maintain and upgrade the site, which still has a very no-frills look and feel.
arXiv is doomed. It was nice while it lasted.
oscaracso 4 days ago |
I am not a software engineer, although I do write programs. What is it about digital infrastructure that requires maintenance? In the natural world, there is corrosion, thermal fluctuation, radiation, seismic activity, vandalism, whathaveyou. What are the issues facing the arxiv demanding the attention of multiple people 'round the clock?
bonoboTP 4 days ago |
They have to update the software stack, replace usage of deprecated APIs, support new latex packages etc. They could probably minimize these by limiting the scope but just keeping a small, tightly scoped software functional is always boring, people want to work on fun new features, they enjoy the brand recognition and feel like they should do more stuff.
I wonder when they will introduce the algorithmic feed and the social network features.
taormina 4 days ago |
Given that Cornell charges what, $50k a year as an Ivy League, $300k feels like almost nothing.
PaulHoule 4 days ago |
This is going to be in NYC where $300k does not go as far as it does in Ithaca.
peyton 4 days ago |
Heh, you might want to look up what they’re charging young people now.
taormina 4 days ago |
$71k?! Well, that’s 4, 4.5 students worth of tuition then.
losvedir 4 days ago |
arXiv is great. It's just a problem that there's so much slop. What if arXiv offered a subscription service that people in different fields could use to just see a curated selection of the top papers in their field each month. Established researchers in each field could then review some of the preprints for putting into the curated monthly list.
Oh, wait.
bonoboTP 4 days ago |
> see a curated selection of the top papers in their field
https://www.scholar-inbox.com
hereme888 4 days ago |
From my limited experience, arXiv appears to include many low-quality, unreproducible papers, and some are straight-up self-marketing rather than serious scientific work.
kingstnap 4 days ago |
If you get some more experience you will find normal journals are exactly like that as well.
whiplash451 4 days ago |
I'm not sure why we're so focused on filtering what gets into arxiv (which is an uphill battle and DOA at this point) vs fixing the indexing, i.e. the page rank of academia.
Google "sorted out" a messy web with pagerank. Academic papers link to each others. What prevents us from building a ranking from there?
I'm conscious I might be over-simplifying things, but curious to see what I am missing.
tokai 4 days ago |
Page rank was inspired by bibliometrics and evaluation of science publications. It's messed up now because of the rankings. Further fiddling with ranking will not fix the problem.
j2kun 4 days ago |
+1, PageRank was taken from academia. They even cited it in their original work. Funny how the origins of these things get forgotten.
krick 4 days ago |
I am of the same opinion, and ultimately ArXiv becoming a journal that can prevent one from publishing a paper — no matter how junk it is — would pretty much kill its purpose. But I suppose that now when flooding the interned with LLM-generated garbage is almost endorsed by some satanic people, it is pretty much a security issue to have some sort of filter on uploads.
Now, honestly, I have no idea why would one spend resources on uploading terabytes of LLM garbage to arXiv, but they sure can. Even if some crazy person is publishing like 2 nonsense papers daily, it is no harm and, if anything, valid data for psychology research. But if somebody actually floods it with non-human-generated content, well, I suppose it isn't even that expensive to make ArXiv totally unusable (and perhaps even unfeasible to host). So there has to be some filtering. But only to prevent the abuse.
Otherwise, I indeed think that proper ranking, linking and user-driven moderation (again, not to prevent anybody from posting anything, but to label papers as more interesting for the specific community) is the only right way to go.
muhneesh 4 days ago |
tangentially related: https://readabstracted.com/
Drblessing 4 days ago |
ArXiv is dead. Expect a paywall within three years, or other enshittification and slop added.
Apocryphon 4 days ago |
Maybe they'll do something like what Anna’s Archive did
hirako2000 4 days ago |
Do research papers published on Elsevier's sort of media remain more prestigious?
I read a dozen papers a month, typically on arxiv, never from paywalled journals. I find the quality on par. But maybe I'm missing something.
Fomite 4 days ago |
This is very variable based on field. HN is heavily biased toward ArXiv-friendly fields.
krick 4 days ago |
It's not that hard to make a mirror or arXiv. Basically, anybody who can pay for hosting (which, I suppose, isn't very cheap now when the whole world uses it). It's a problem to make users switch, because academia seems to have this weird tradition of resisting all practices that, god forbid, might improve global research capabilities and move forward the scientific progress. But then, if arXiv actually becomes unusable, I suppose they won't really have much choice than to switch?
And, FWIW, I do think that arXiv truly has a vast potential to be improved. It is currently in the position to change the whole process of how the research results are shared, yet it is still, as others have said, only a PDF hosting. And since the universities couldn't break out of the whole Elsevier & co. scam despite the internet existing for the 30 years, to me, breaking free from the university affiliation sounds like a good thing.
But, of course, I am talking only about the possibilities being out there. I know nothing about the people in charge of the whole endeavor, and ultimately in depends on them only, if it sails or sinks.
tokai 4 days ago |
This is exactly what happened last time when scientific publishing got cornered. Journals run by departments and research groups were spun out or sold off to publishers and independent orgs. And they continued to slowly boil the frog over 50 years with fees and gate keeping.
Its especially problematic because while ArXiv love to claim to be working for open science, they don't default to open licensing. Much of the publications they host are not Open Access, and are only read access. So there is definitely the potential to close things off at some point in the future, when some CEO need to increase value.
lifeisstillgood 4 days ago |
I am sure it’s a dumb idea but why is there a problem for say the National Science Foundation or something to run a website that replicates ArXiv - if you are from an accredited university or whatever you can publish papers, fulfilling the “pdf store” function.
Then getting peer reviewed is a harder process but one can see some form of credit on the site coming from doing a decent reviewers job.
I suspect I am missing a lot of nuance …
prepend 4 days ago |
The moderation is difficult but not unprecedented.
I think NIST hosts the CVE repo (through a contract to MITRE)
Fomite 4 days ago |
Given the last two years and what has been done to science funding, having a load bearing thing like ArXiv not housed with the U.S. government is, I think, pretty self-evidently a good idea.
MetaMonk 4 days ago |
https://youtu.be/4P5xSntVWQE
jeremie_strand 4 days ago |
ArXiv provides such an easy interface to navigate scientific papers, most are from computer science of course. Hope they can grow bigger and solve the paywall pain in open research. Any implication to Bioxiv?
Fomite 4 days ago |
bioRxiv is already housed at Cold Spring Harbor Laboratory, which is an independent non-profit.
AccessScan 4 days ago |
Going independent makes sense for arXiv. But the more interesting part is what it tells us about how we fund the stuff that actually keeps research moving. arXiv runs on about seven million dollars a year and handles hundreds of thousands of papers. That's roughly twenty bucks a paper. This is the backbone of how physicists, computer scientists, and mathematicians share work. Traditional publishers charge thousands per article. The math is almost laughable. arXiv has never had an efficiency problem. The problem is that we've just accepted that something this important should survive on voluntary contributions and the occasional donation saving the day. Look at what happened with bioRxiv and medRxiv when they spun off into openRxiv. That only happened about a year ago. Nobody knows yet if it actually works long-term or if it just kicks the money problems down the road. But both platforms, totally separately, came to the same conclusion. We need to leave the university. That says something. Universities aren't built to fund outside infrastructure forever. Their budgets follow enrollment, grants, and endowment performance. That doesn't line up with the steady, predictable funding arXiv needs to keep the lights on. Ginsparg calling it a "Perils of Pauline" situation is probably the most honest thing anyone said about this. Everyone treats arXiv like it will always be there. But it's been one bad year away from serious trouble for most of its life. The real test for the nonprofit won't be the first few years. Cornell and Simons have that covered. It'll be five or ten years from now when the excitement fades and they're competing for donor money against whatever the next crisis in academic publishing turns out to be. The worry about AI-generated junk is actually where independence could help. A university-hosted arXiv could only spend so much on moderation tools. An independent org with a focused mission can make that a real budget priority. Whether they can keep up with the flood of low-quality submissions is a different question entirely.
ide0666 4 days ago |
The endorsement system is a real barrier for independent researchers. I've been trying to get endorsed for cs.NE for weeks — the work is published on aiXiv with video results, but without an institutional email or personal connection to an existing author, you're stuck. Glad to see arXiv thinking about independence — hope they also rethink access for non-institutional researchers.
tamimy 3 days ago |
It's quite interesting to see that a lot of opinions here think ArXiv will turn to shit because it will go "corporate". Are there any examples where this has not been the case?
beezle 3 days ago |
I go back to xxx.lanl.gov days - that is, the beginning. Back then it was all physics, some math and a little quantitative finance (not bitcoin). And the quality was pretty good because it was a preprint archive. In fact, a headline from 2000:
APS and BNL Host XXX e-Print Archive Mirror Feb. 1, 2000
The APS is establishing, in cooperation with Brookhaven National Laboratory, the first electronic mirror in the United States for the Los Alamos e-Print Archive.
Today, from the landing page, it describes itself as "arXiv is a free distribution service and an open-access archive for nearly 2.4 million scholarly articles in the fields of [long list]. Materials on this site are not peer-reviewed by arXiv.
Well, that's a large part of the problem. A lot of the stuff there now will never see a journal (even of dubious quality) and there is limited filtering of what new submissions will be stored. GIGO.
Best thing ArXiv could do is go back to their roots - limit the fields and return to preprint only. Spin off the comp sci stuff for sure to someone else along with all its headaches.
fixed: url