I have also heard that some firms ban its use.
Why?
Because it makes it SO easy to set up a database for your app that you end up with a super critical component of your application that looks exactly like a file. A file that can have any extension. And that file can be copied around to other servers. Even if there is PII in that file. Multiply this times the number of applications in your firm and you can see how this could get a little nuts.
DevOps and DBA teams would prefer that the database be a big, heavy iron thing that is very obviously a database server. And when you connect to it, that's also very obvious etc etc.
I still love SQLite though.
It doesn't require you use all of that properly, but it's there.
The modern alternative is to use JavaScript/TypeScript, which makes such solutions cross platform (including MacOs, web etc.):
https://learn.microsoft.com/en-us/office/dev/add-ins/overvie...
BTW sqlite can run SQL queries on CSV files with relatively simple one-liner command...
Access gets used for a shared DB and that is quite easy to corrupt. It is much more cost effective to have that in a proper central database (I supse SQLLite is better here as well)
This sounds like how we pass so many stupid laws. Nobody thinks about 2nd order effects.
The sheer volume of data that needs tending to may even grind certain departments to a halt! What a great opportunity! It'd appear I'm positively stellar at this!
For clarity, while I did have some rather perverse fun toying with the idea, I do not actually think it should be implemented, or at least certainly not in one fell swoop and as-is. Mostly for the aforelisted reasons. This is what I actually intended to convey.
Though for better or for worse, that doesn't mean I think the notion is completely meritless either, so I might still be deserving of at least some of your snark. But a lot of it was in jest from my side indeed.
This whole keeping an inventory, disabling items, and later taking them out completely is a chore I already do in other contexts. While it does work, it is anxiety inducing, doesn't really scale well, and it's quite miserable to go through with. It's the cost of keeping things organized as far as I'm concerned, with no real way around it in the general case. The best one can do is try and mitigate it, by monitoring for patterns and building out systems, to automate and streamline the tedium away.
I do sincerely not know of other ways to keep things in check though, in lieu of which you do get the makeshift shadow ops with all of its pitfalls. It's kind of just life.
Increase interoperability. Funnel data people from Excel into real DB technologies.
And if they did more to blur the lines between spreadsheets and databases, and make it seamless to work out of both Excel and Access, add more spreadsheet features to the data views, etc.
The only way that works is if the dev team is large enough to be responsive to business needs, which almost never happens because devs are expensive. The juniors who are tweaking business logic every day are functionally doing a role analysts can do if you just give them a sane API and data tools.
Ah so two teams nobody should listen to.
"Hey everyone, we need to chose the option that involves us the most and provides us the most job security"
Lots of great people in both devops and security. But when teams position themselves as the conscience of the org and the gatekeepers of production, the defensive victim mentality can get pretty strong.
In summary, companies use the bus-metric to see how viable a project is. Bus, as in, how many people can be hit by a bus before there is no one left to maintain the project.
Despite its ubiquity, SQLite is maintained by only 3 people. That bus-metric for SQLite is 3, which is way too low for some companies.
Give the link a watch; it was really interesting.
if the unfortunate bus incident happens to sqlite developers, there is exactly ZERO chance that it will not be very well maintained on the count of all the users, many of whom already have support contracts going for decades, and which would require the same level of support they have already enjoyed.
https://www.reddit.com/r/sysadmin/comments/eaphr8/a_dropbox_...
So read the magic number, you shouldn't trust file extensions anyway
> that file can be copied around to other servers
So can spreadsheets
I'm not discounting that having centralized data access is desirable but it doesn't sound like that particular reasoning is well thought out
Yes, databases could have any extension. No sane dev team would accept code that doesn't use an object extension for a sqlite database.
Yes, databases can contain PII but no sane product manager would go "yes, that's a good use of sqlite".
Yes, you can trivially copy database files, but no sane product needs to in the same way that no sane product should require folks to just clone the db just to do some work.
Pretty much every reason a company has for banning sqlite is a red flag for working there.
If I was their CTO and they told me this, and it is not a joke, I'd fire them on the spot.
What is the longest surviving paper medium?
The specification is publicly available
- It is widely adopted - It is likely to remain readable in the future - It has little dependency on specific operating systems or services - It carries low patent risk
From the perspective of long-term continuity, avoiding dependence on any particular company or service is extremely important.
So this news is nearly <del>six</del> EIGHT years old. But I didn't happen to know about it until now, so that's not a complaint at all; rather, this is a thank-you for posting it.
(Thanks for the correction. Brief brain malfunction in the math department there).
I initially was writing a series of files and doing some quasi-append-only things with new files and compacting the old one to sort of reinvent journaling. What I did more or less worked but it was very ad hoc and bad and was probably hiding a lot of bugs I would eventually have to fix later.
And then I remembered SQLite. I realized that ACID was probably safe enough for my needs, and then all the hard parts I was reinventing were probably faster and less likely to break if I used something thoroughly audited and tested, so I reworked everything I was doing to SQLite and it worked fine.
I wish exFAT would die in a fire and a journaling filesystem would replace it as the "one filesystem you can use everywhere", but until it does I'm grateful SQLite exists.
Where exactly is everywhere? Win32? All of Linux? BSDs? MacOS? IOS? ...
"I can use Linux because if I get stuck I can just switch to Windows and still access my data" is a comfort that probably keeps people from even trying Linux (or other OSes)?
Why else would MS not support BTRFS/ZFS/Ext or whatever?
{I'm not saying that I think this works.}
You seriously can’t think of another reason? File systems are complex. Maintenance is a huge burden. Getting them wrong is a liability. Reason enough to only support the bare minimum. And then, 99% of their users don’t care about any of those. NTFS is good enough
https://www.realworldtech.com/forum/?threadid=189711&curpost...
".. there is no way I can merge any of the ZFS efforts until I get an official letter from Oracle that is signed by their main legal counsel or preferably by Larry Ellison himself .. Don't use ZFS. It's that simple. It was always more of a buzzword than anything else, I feel, and the licensing issues just make it a non-starter for me. .. The benchmarks I've seen do not make ZFS look all that great. And as far as I can tell, it has no real maintenance behind it either any more, so from a long-term stability standpoint, why would you ever want to use it in the first place?"
BTRFS: RedHat has removed all support for BTRFS and deprecated it: https://access.redhat.com/solutions/197643
BTRFS, "Linux's perpetually half-finished filesystem" by ArsTechnica: https://arstechnica.com/gadgets/2021/09/examining-btrfs-linu... with many problems still unaddressed in 2021 dating back to 2009.
I don't know what people on Solaris use, but I'm pretty sure everyone in the Linux and BSD community is running OpenZFS, which does get frequent updates and has been pretty stable as a kernel module for quite awhile. My main server in my house is running a RAID-Z2 on Linux and has been for more than six years, and I haven't really had any issues. I run scrubs regularly and things seem to work just fine.
I do wish that Oracle would give written permission to let Linux include it into the kernel, since I think it would make it easier to run ZFS on root (which I don't bother with, I just use btrfs on root and that's fine for single-drive systems, like a laptop).
In practice, every OS has its preferred system and the rest has varying levels of "I guess this works", with FAT32 and exFAT being the only real cross-platform options.
To wit:
* NTFS is only really properly and fully supported on Windows. Apple mounts it read-only. Linux can certainly mount NTFS and do some basic reads and writes. Unfortunately for whatever reason, the Linux fsck tools for NTFS are absolutely terrible, poorly designed and generally can't fix even the most basic of issues. At the same time, mount refuses to work with a partially corrupted filesystem, so if you're dealing with dirty unmounts (where the worst case usually is some unclosed file handle rather than data loss, but this also happens if you try to mount a suspended Windows parititon, which isn't uncommon since Windows hibernates by default and calls it fast boot), that's a boot to Windows just to fix it.
* Apple filesystems basically only work on apple devices. It's technically possible to mount them on Linux, but you end up digging into the guts of a bunch of stuff that Apple usually just masks for you.
* ext4 is only properly read/write under Linux and requires external drivers under Windows (which may not work properly either, as corruption issues are common).
FAT32 is reliable in that any OS can fsck/chkdsk it and properly mount it without needing special drivers, but is hindered by ancient filesize limitations. exFAT, at least for most cases, is the only filesystem you can plug into most devices and expect more or less the same capabilities as FAT32 (read/write support, can fix filesystem corruption.)
Out of the os specific ones, NTFS seems like it has the most potential to be the one filesystem that works everywhere; it's modern, works good-ish on most devices, it's just that the fsck/chkdsk tooling is awful outside of Windows.
I think that is at too low of a level for me to realistically solve it, but with SQLite it will at least do what little I can; the fact that it's been around for twenty years with extremely thorough testing and frequent updates means that it's more likely to be correct than some ad-hoc thing I come up with. I think I'm pretty clever sometimes and I could probably get something *as good as SQLite if I really wanted to, but I don't think I'd surpass it and at that point why not just use SQLite?
As long as they lie in order, or alternatively you have a way of verifying the write (e.g. by reading it back) then you should be able to make it work fairly easily.
If they just completely lie - the data is just cached but never actually written - then you're screwed. There's obviously no way to make a persistent storage device out of something that doesn't persist your data.
exFAT has the lovely feature of potentially not only corrupting the file, but also corrupting the metadata for the surrounding system as well. It's terrible.
SQLite is very good if you can fit into the single writer, multiple readers pattern; you'll never lose data if you use the correct settings, which takes a minute of Google search to figure out.
Today, most of my apps are simply go binary + SQLite + systemd service file.
I've yet to lose data. Performance is great and plenty for most apps
And even then, I've used a batch writer pattern to get 180k writes per second on a commodity vps.
"Batch writer pattern" is a good idea to get rid of expensive commits.
ex: main.db + fts.db. reading and writing to main.db is always available; updating the fts index can be done without blocking the main database — it only needs to read, the reads can be chunked, and delayed. fts.db keeps the index + a cursor table — an id or last change ts
could also use a shard to handle tables for metrics, or simply move old data out of main.db
* some examples:
conn = sqlite3.connect("data.db")
conn.execute("PRAGMA journal_mode=WAL") # concurrent reads (see above)
conn.execute("PRAGMA synchronous=NORMAL") # fsync at checkpoint, not every commit
conn.execute("PRAGMA cache_size=-62500") # ~61 MB page cache (negative = KB)
conn.execute("PRAGMA temp_store=MEMORY") # temp tables and indexes in RAM
conn.execute("PRAGMA busy_timeout=5000") # wait 5s on lock instead of failing
edit: orms will obliterate your performance — use raw queries instead. just make sure to run static analysis on your code base to catch sqli bugs.my replies are being ratelimited, so let me add this
the heavy duty server other databases have is doing that load bearing work that folks tend to complain about sqlite can't do
the real dmbs's are doing mostly the same work that sqlite does, you just don't have to think about it once they're set up. behind that chunky server process the database is still dealing with writing your data to a filesystem, handling transaction locks, etc.
by default sqlite gives you a stable database file, that when you see the transaction complete, it means the changes have been committed to storage, and cannot be lost if the machine were to crash exactly after that.
you can decide to wave some, or all of those guaranties in exchange for performance, and this doesn't even have to be an all or nothing situation.
https://the-php-bench.technex.us/
There's a huge performance difference between memory and file storage within sqlite itself. Not even getting into tuning specifics.
When i actually have something that requires multi nodes, i just use postgres (with replica) or mongo (with replica).
But it's for those apps which are in autoscaler.
For bulk data refresh I use build artifact and hotreload memort mapped files, by checking a manifest on object storage then only getting update if newer.
I've used this pattern everywhere and never really needed anything more, occasionally i might use redis if something required shared state across multiple nodes and fast.
For that matter if you write your system with the correct abstraction you can switch to Postgres _later_ if it becomes necessary. For every system that really did need to scale 10,000 are pointlessly overbuilt - worrying about scale when it just didn't matter.
(Hint: whatever your answer is it'll apply to SQLite too)
> As of this writing (2018-05-29) the only other recommended storage formats for datasets are XML, JSON, and CSV.
Preferred
1. Platform-independent, character-based formats are preferred over native or binary formats as long as data is complete, and retains full detail and precision. Preferred formats include well-developed, widely adopted, de facto marketplace standards, e.g.
a. Formats using well known schemas with public validation tool available
b. Line-oriented, e.g. TSV, CSV, fixed-width
c. Platform-independent open formats, e.g. .db, .db3, .sqlite, .sqlite3
2. Any proprietary format that is a de facto standard for a profession or supported by multiple tools (e.g. Excel .xls or .xlsx, Shapefile)
3. Character Encoding, in descending order of preference:
a. UTF-8, UTF-16 (with BOM),
b. US-ASCII or ISO 8859-1
c. Other named encoding
---
Acceptable
For data (in order of preference):
1. Non-proprietary, publicly documented formats endorsed as standards by a professional community or government agency, e.g. CDF, HDF
2. Text-based data formats with available schema
For aggregation or transfer:
1. ZIP, RAR, tar, 7z with no encryption, password or other protection mechanisms.
https://www.loc.gov/preservation/resources/rfs/data.html[0]: https://7-zip.org/7z.html
[1]: CVE-2025-0411
What are the advantages or reasons to use zstd in a 7z container versus just .zst?
Why use it w/ 7-zip though. 7-zip archives multiple files/directories and supports encryption. It has the UI too.. On Windows there is NanaZip that's available in the microsoft store which has been graced by corporate for user-install (unlike zstd that effectively needs WSL), and most folks won't be able to use the command line tool.
Of course using tar with zstd is always an option if you are on linux.
So I made a format that will never surpass SQLite, except that it's extremely lighter and faster and works on zstd compressed files. It has really small indexes and can contain binaries or text just like SQLite.
The wasm part that decompresses and reads and searches the databases is only 38kb (uncompressed (maybe 16kb gzipped)). Compare that to SQLite's 1.2mb of wasm and glue code it's 3% the size but searching and loading is much faster. My program isn't really column based and isn't suitable for managing spreadsheets, but it's great for dictionaries and file archives of images and audio.
I ported the jbig2 decoder as a 17kb wasm module, so I can load monochrome scans that are 8kb per page and still legible.
https://github.com/tnelsond/peakslab
SQLite is very well engineered, PeakSlab is very simple.
We should really consider eventually retiring memes because they just end up as thought-terminating cliches.
This is of course referring to xkcd #927. How do I know that?
Not everything needs to be real-time updated.
This is indeed not an optimal setup. A more careful design from first principles would not require seeking around the file as much as SQLite does, we'd do a better job on reading exactly the correct range of bytes for a given query since we know ahead of time what the access patterns are, and we could do reads in parallel. With SQLite we have to be very careful about the schema design to ensure it won't have to seek too many times to answer a query. But SQLite was expedient, and I'm confident I'll always be able to read the files. That's less certain for a custom file format.
DuckDB has built-in capability to read Parquet files with HTTP range requests.
I only reach for SQLite when it's big. Otherwise, I'd prefer to just use the existing SQL Server infrastructure, or .csv.gz files in S3. Internally, I use the term "bulk data" for data sets that are stored in S3 because they were too big for SQL Server.
https://www.igvita.com/2012/02/06/sstable-and-log-structured...
I'm storing null delimited sorted strings with an index that's an offset to where that entry begins, so if I wanted to do memtables for writing I would have to have two memtables and merge them on write? I've been considering this exact approach.
[0] https://cdb.cr.yp.to/ , https://en.wikipedia.org/wiki/Cdb_(software)
The current trunk is actually 1.7mb in its canonical unminified form (which includes very nearly as much docs as JS code), split almost evenly between the WASM and JS pieces :/. Edit: it is 1.2mb in minified form, though.
Disclosure: i'm its maintainer.
Edit: current trunk, for the sake of trivia:
sqlite3.wasm 896745
sqlite3.mjs 816270 # unminified w/ docs
sqlite3.mjs 431388 # unminified w/o docs
sqlite3.mjs 310975 # minified"Right joins are just left joins in the wrong direction, you don't need that crap"
Of course it always gets simpler or more specialised. I think many apps using databases would run with SQLite just as well. And some would probably run just as well with a textfile instead of any db like SQLite.
SQLite has supported all types of joins since version 3.39 in 2022.
You probably just needed to create indexes over your data to speed things up.
Fwiw LocalStorage is a SQLite db on most browsers, with a kv api. It’s be interesting to have the actual API available.
That's why PeakSlab is written in c, because what's faster than casting the whole database to a struct? ;-P
SQLite is 1.2mb in combined wasm and JavaScript and not really designed for my use case, so I would have to add all the things i need anyway like compression and HTML tag insertion. For my use case which is just for pwas SQLite takes too long to load and the files are too big and the search isn't tailored. So I made something else in 38kb instead
SQLite is not just a B-Tree+ file format manager. It's also a powerful CLI with tons of built in data manipulation and extraction tools which can all be scripted.
Finally, if your data is _relational_, then a format that pereserves those relations, even if they aren't intended to be updated, is exceptionally powerfull.
> This is a custom format built to be very fast to load (cast to a c-struct and done) and very fast to search with 3 binary search indexes built in.
What is the endianess of your format?
Instead of doing multiple columns, there's some of that with secondary and tertiary indexes, but mostly it's just key value.
To make the data more relational I find it's more natural to just have additional "columns" in separate peak files and when the peak files are queried they can be combined by headword. This is kinda like what parquet does putting all of one column together.
I could make PeakSlab have columns and have it query based off specific columns fairly easily, but it was a design decision to make things more tag based and less column based. I like that it's more flexible and less rigid in structure.
SQLite keeps everything relational and their raw filesizes are 50% larger than a .peak file, not sure what's exactly to blame, but I'm optimizing for small filesizes.
Will definitely have to look into your solution with JBIG2 next time. Though I may keep with png just for compatibility sake.
[1] https://www.loc.gov/preservation/resources/rfs/data.html
This can require nuance: for example, PDF has profiles because the core format is widely supported but you could do things like embed plugin content from now-defunct vendors and they would only want the former for long-term preservation.
Also, the lack of enforced column data types was always a negative for me.
What caused that?
I really like the simplicity and speed of SQLite, I've used in both personal and professional projects. For day-to-day work I still end up in Excel, not because I like it more (I don't), but because its ubiquity makes it the lowest friction way to share & explore datasets with less technical stakeholders and execs.
You can't precompile your SQL at build time, unfortunately, but you _can_ precompile all your SQL at the very start of your program and then never touch the parser again. This might be a good middle ground for you. It is infra that you can centralize, write some unit tests against, and then not worry about forever.
It's not common because the SQLite parser is lightning fast and it's so convenient to just write out a new query as you need one, versus having one bucket of all queries. But it's an option!
If your objection is to the SQL language itself then you might find Datalog interesting. Datalog is a logic-based language where you query by writing predicates rather than writing SQL statements. Check out Logica <https://logica.dev>. It's a language in the Datalog family that compiles to SQL.
In both cases, SQL is used only as a low-level IR for interfacing with the database engine.
Do you only use graphical programming languages or how is your "programmatic logic" typically expressed?
You can self-host and if all you care about is showing data in a digestible format to stakeholders, it's really simple. You can of course go overboard and regret all of your life's decisions with it, but I try and abstain myself.
Show HN: Honker – Postgres NOTIFY/LISTEN Semantics for SQLite | 327 points | 94 comments | https://news.ycombinator.com/item?id=47874647
Live notifications was one of the big missing pieces to implement whole apps on a sqlite backend, and now there's a decent solution.
In fact, I've worked on several projects, where I heavily advocated that even the primary app storage was SQLite, and that archival was simply copying the database after an event. Specifically, elections, petition verification, etc. It's kind of difficult coming up with complex schemas to handle multiple events as well as the state of data at those events... by separating the database itself, using SQLite, that simplifies a lot of thigs. Though it does, practically limit scale a bit. The main thing would be to archive the application and the database after a given event. If the application is containerized, you could create an image of the source, the container and the database after the event.
I think this kind of structure would work well for a lot of things... especially if you're considering data sharding anyway.