> Be extremely portable
> sp.h is written in C99, and it compiles against any compiler and libc imaginable. It works on Linux, on Windows, on macOS. It works under a WASM host. It works in the browser. It works with MSVC, and MinGW, it works with or without libc, or with weird ones like Cosmopolitan. It works with the big compilers and it works with TCC.
> And, best of all, it does all all of that because it’s small, not because it’s big.
vs
> Non-goals
> Obscure architectures and OSes
> I write code for x86_64 and aarch64. WASM is becoming more important, but is still secondary to native targets. I don’t care to bloat the library to support a tiny fraction of use cases.
> That being said, if you’re interested in using the library on an unsupported platform, I’m more than happy to help, and if we can make the patch reasonable, to merge it.
Those are contradictory. Either the code is extremely portable, or it can't support "obscure" platforms, but not both.
And he's already hit the hard targets. Many obscure OS's are generally UNIX like and should be easy ports. Many obscure arch's usually are running Linux and should be easy ports.
It's stated as a non-goal simply because it's not the most valuable thing I can do with my time. My fundamental stance is that writing new Windows or Linux or macOS or WASM programs in C is a good idea, and those are the programs that I write, so that's where my focus is. But if someone would like to come along and write the ~30 syscalls needed to port the library to a new platform, or even register any interest in such, I'd be happy to look into it at that point.
I see no contradiction in the desire to support x64 (because it would be ridiculous not to), ARM, and likely RISC-V, but not the venerable but now-fringe architectures like MIPS or Sparc or 68040 or even x86.
For those not in the know, Microchip still produces MIPS microcontrollers:
https://www.microchip.com/en-us/products/microcontrollers/32...
It sounds like it has a goal of being "extremely portable" across compilers, (although I'm curious how many compulers it is actually tested against) but only somewhat portable across architectures and operatings systems, just hitting the most popular ones.
I think it's perfectly valid to call code 'extremely portable' without supporting every special snowflake architecture. There's a spectrum from assumptions that hold on everything that isn't some esoteric joke architecture or archaeology to something that I would probably consider required for 'extremely portable'.
I would personally consider something that failed to support anything on this list above big endian as still being extremely portable: you'll build for any serious modern architecture that isn't a DSP.
- non twos complement integers
- (int) nullptr != 0
- segmented addressing
- non-8 bit char
- big endian
- missing floating point
ARM's done a good job of making it so that you can't assume the traditional x86 assumptions of being able to access any pointer unaligned or having sequentially consistent semantics on memory ordering (with the help of compilers getting better at reordering resulting in you needing to have proper semantics on x86 as well).Supporting obscure platforms is what makes portability "extreme", though.
If you have to write extensive patches to actually port the software, then it’s only “portable” in the same sense that any software can be ported with enough effort. Ie “Foo is portable. You just have to write a write a whole new kernel to port it”
The number might just be zero - did anyone check if this compiles? I am trying to track down where the function `sp_mem_allocator_alloc_type` is defined (used in 3x places) but it doesn't appear in the GH search results.
I'm not going to clone and build this (too dangerous).
A quick glance at the source on github and here you go: https://github.com/tspader/sp/blob/e64697aa649907ce3357a7dd0...
`sp_mem_allocator_alloc_type ` is going through a couple of macro resolutions which ends up at `sp_mem_allocator_alloc`
> I'm not going to clone and build this (too dangerous).
Your computer won't explode just from downloading and compiling some C code, don't worry ;)
The github repo builds and the examples run just fine on macOS by just running `make` in the project directory, although with one warning:
warning: 'posix_spawn_file_actions_addchdir_np' is deprecated: first deprecated in macOS 26.0Just create a disposable isolated environment, like VM or container, and do it inside? And, yes, does compile.
for embedded defs not against portable alternatives like this tho.
however ops post sure gets off on the wrong foot by saying this is "fixing C". the hubris of mankind on full display, yet again
Yes, I know the author's writeup then goes on to say that it is not a libc with a pile of questionable justfication. This is a custom runtime, in a single header no less, which is admittedly impressive, especially considering it provides runtime and thread safety primitives. This does not rise to the level of claiming the idea of a 'standard libarary' though, IMO. In that, I think the author misses the point.
TRIPLES = \
x86_64-linux-none x86_64-linux-gnu x86_64-linux-musl \
aarch64-linux-none aarch64-linux-gnu aarch64-linux-musl \
aarch64-macos \
x86_64-windows-gnu \
wasm32-freestanding wasm32-wasi
Or you could actually try the compliance suite on an architecture and report back to us if it works? Zig, one of the giants upon whose shoulders this library stands, coined a name for this
almost-but-not-quite-UTF encoding: WTF-8 and WTF-16. These encodings mean, simply, the
same as their UTF counterpart but allowing unpaired surrogates to pass through.
To give credit where credit is due, both WTF-8 and WTF-16 were devised by Simon Sapin [1] and Zig simply picked them up.[1] https://wtf-8.codeberg.page/
sizeof((T){0} = $value)
Wait, is a compound literal an l-value in that sense (as opposed to, just being able to take its reference)?! Take a look at the C99 standard Oh my, it indeed is (C99 §6.5.2.5 p5). Good to know!I have a wtf.c from 10+ years ago when I was re-implementing Windows-style Unicode handling for some project. You keep running into various quirks, which accumulate and you inevitably arrive at your WTF moment. So WTF as name comes up naturally, no special wit required.
Pointer/length is not just for strings - but for all arrays.
See my proposal:
Could I pick your brain a little more on the design? I'm spader at spader.zone; if you have time, drop me an email. I promise not to take too much of your time and I'd love to hear from you.
This can get you started:
https://dlang.org/spec/arrays.html
Strings (and arrays) being length/ptr is a freaking enormous win, in simplicity, performance, and overflow bug elimination.
One of D's secret features is that string literals still have a 0 appended to them, even though the length of the string does not include the 0. This makes it super slick to call C functions, like printf, using a string literal for the format string.
I'm baffled why C spends its energy doing things like normalized Unicode identifiers (an abomination) instead of something incredibly useful like length/ptr arrays.
https://github.com/gritzko/libabc
In this day and age, the top problem is Claude bringing lots and lots of bad C into the code base. Takes a weekend to clear the week's mess.
https://github.com/gritzko/beagle
In a cleared codebase though all the usual C memory bugs are virtually non-existant. When did I see core dump last time? I do not remember. Thus feel no urge to use Zig or Rust.
Interestingly, I rarely make a memory bug these days. Too much experience, I've just learned not to make them.
But I still prefer to use language features that make it easier to not make such errors.
It hasn't been much of an issue with decades of D code.
I had a hard time reading the wc code in the article. First I had to go to the GitHub to understand that "da" stands for dynamic array, and then understand that what the author calls wc is not at all the wc linux commands, which by default gives you the number of lines, words, and characters in a file, not the count of occurrences of each word in the file, which is what the proposed code does.
Also, since I had to read the GitHub README, another remark: it says that sp_io uses pthreads rather than fork and exec. Both of those approach (but especially pthreads) are contradictory to the explicit goals of programming against lowest level interfaces. I believe the lowest level syscall is clone3 [1], which gives you more fine grained control on what is shared between the parent and child processes, allowing to implement fork or threads.
[1] https://manpages.debian.org/trixie/manpages-dev/clone3.2.en....
> Program directly against syscalls
It's the very first one of the listed principles. In the paragraph after this title it even says it "must" be the case in italic to insist on it, and there's a footnote to define what they mean, which is very clear in that pthreads should be out according to this principle.
I must suspect the author has insufficient understanding of the dynamic linker, TLS memory management, and the vDSO.
I agree that pointer and length is better than null-terminated strings (although it is difficult in C, and as they mention you will have to use a macro (or some additional functions) to work this in C).
Making the C standard library directly against syscalls is also a good idea, although in some cases you might have an implementation that needs to not do this for some reason, generally it is better for the standard library directly against syscalls.
FILE object is sometimes useful especially if you have functions such as fopencookie and open_memstream; but it might be useful (although probably not with C) to be able to optimize parts of a program that only use a single implementation of the FILE interface (or a subset of its functions, e.g. that does not use seeking).
It seems like one of the worst data structures ever - lookup complexity of a linked list with a expansion complexity of an array list with security problems added as a bonus.
String tables in most object file formats work like that, a concatenated series of ASCIIZ strings. One byte of overhead (NUL), requires only an offset into one to address a string and you can share strings with common suffixes. It's a very compact layout.
Works nicely on Linux where the syscall interface is explicitly stable, but on many (most?) other platforms this is not the case.
> There Is No Heap
I don't understand what this means, when it's followed by the definition of a heap allocation interface. The paragraph after the code block conveys no useful information.
> Null-terminated strings are the devil’s work
Agreed! I also find the stance regarding perf optimization agreeable.
It looks like sp_log's string formatting is entirely unbuffered which results in lots of tiny write syscalls.
That does spin the meaning of "Sp.h is the standard library that C deserves"
... ... ... oh wow, the math functions are really bad implementations. The range reduction on the sin/cos functions are yikes-level. Like the wrong input gives you an infinite loop level of yikes.
It is not part of the core library. It is certainly not meant as a reference-level implementation of math functions. It's there so you can write an easing function for a game without pulling in libc. It seems like its existence has offended you. If that's the case...I'm sorry? At every possible point, I note as loudly as possible exactly what that library is. I found your tone extremely dismissive and disrespectful and I don't care to engage with that any more than I already have.
sp_log() writes directly to an IO writer. An IO writer can be buffered or unbuffered, but is unbuffered by default. This is a feature, not a bug. Have a look through the IO code!
Cheers and thanks for reading.
Why is the unbuffered default? Is there any thoughts on this?
Claude probably wrote it.
As far as the syscall thing, it's actually quite interesting. NT is also extremely stable. Likewise for the stock Darwin syscalls on macOS. In practice, though, Windows loads kernel32.dll automatically, so there's no drawback in using it when appropriate. I still call directly into NT sometimes (mostly to skip complex userspace path translations that aren't useful). On macOS, you are likewise forced to link to libc (libSystem.dylib), and so I usually just end up using the syscall-wrapper libc functions there.
There is a footnote on this saying as much:
> 3. Where “syscall” means “the lowest level primitive available”. On Linux, it’s always actual syscalls. On Windows, that’s usually NT. On macOS, it’s usually the syscall-wrapper subset of libc because you’re forced to link libc and it’s not quite as open as Linux (although there is a rich “undocumented” set of APIs and syscalls that are very interesting).
A C++ programmer might describe this as "PMR, but not default-constructible. And std::stable_sort takes a PMR allocator parameter. And PMR is the default, and there's no implementation of std::allocator (or new or delete)."
Probably I would have made different choices. For example, I'd rather have many modules that can be individually included, than one giant file.
Also from a purely aesthetic point of view, I would have opted for more readable function and type names: no sp_ prefix, recognizable names like dict istead of ht, vec instead of da, etc.
And I know there are compilers out there still stuck in the 90s, but I would have targeted C23, these days.
But that would be my highly opinionated library!
P.S. be aware that word frequency is not what the standard 'wc' does.
This is funny to me because just today some friends gave me a link to a C quiz:
From which I gathered that it is a much more cursed language than I remembered. Maybe we all just got used to C and just happen to use a minimal subset.
The problems with C are not mainly with the standard library, but any effort to improve things should be lauded.
If the only problem with C was that the stdlib is terrible that would be a very different situation.
There are much more fundamental problems with the language. Problems that are entirely understandable in K&R C but aren't acceptable half a century later. A "high quality" standard library can't fix these problems. In some cases it can paper over them though not others, and even then the actual problem wasn't fixed it's just not obvious with superficial examination any more.
First, the type system is crap. The array types don't work across function boundaries, there's no Empty type at all, you are provided with a user defined product type with names, but not one without names etc. There is no fat pointer type, slice reference, nothing like that.
Second, naming is also crap. There's no namespacing feature provided so you're left with the convention of picking a few letters as a prefix and hoping it doesn't overlap and yet is succinct enough to not be annoying.
Third, everything coerces, all the coercions you could want if you like coercions, and then ten times that many on top. Some people really like coercions, C will see them learn that actually they don't like them that much.
FWIW, the standard library being stuck in the K&R era is an actual problem since it doesn't make use of more modern language features and some functions are downright footgun magnets, but nobody quite agrees what a modern stdlib should look like, so a stdlib2 probably will never happen.
Yesterday I would have agreed that C is a nice and simple language, today I believe it is a cursed one that we just happen to make work somehow.
Note: I've written a lot of C by profession and passion, yet I find parent's criticisms mostly valid. At least he did not mention rust ;)
That isn't really an option if you are working on an existing project written in c.
Of course nobody forces me to use C, which is why I stopped writing C a few years ago.
I've been using C on a daily basis for 30+ years and name collisions has just never been a problem.
Granted, it might be due to lack of a package manager so micro dependencies ala import is_even is not a thing here, but still, in practice, no name collions occurs.
const sp_fs_entry_t* a = (const sp_fs_entry_t*)pa;
const sp_fs_entry_t* b = (const sp_fs_entry_t*)pb;
return sp_str_compare_alphabetical(a->name, b->name);
Instead it would just be: return sp_str_compare_alphabetical(pa->name, pb->name)
With the correct types declared for the parameter types instead of void pointers.Or if you do need a cast, not having to write "sp_fs_entry_t*" twice in the same line because the local variable's type is inferred.
Maybe after reading C for a while, you don't see all the noise anymore?
C is horribly and unfixably broken. We've known that for many decades. Just let it die already!
Let's move on.
I love that you are both making this argument and that you have a link to a boutique C compiler written in assembly on your home page.
While I'm commenting on your home page - I recognize that photo as being Red Rock. Possibly pine creek? but can I ask which route specifically?
First, (on unix) it's wrapping pthread mutex. That's part of libc! (Technically it might not be libc.so, but it's still the standard library.)
Also, none of the atomics talk about the memory model. You don't _have_ to use the C11 memory model (Linux, for example, doesn't). But if you're not using the C11 memory model and letting the compiler insert fences for you, you definitely need to have fence instructions, yourself.
While C11 atomics do rely on libgcc, so do the __sync* functions that this library uses (see https://godbolt.org/z/bW1f7xGas) for an example.
Oops... apparently this is vibecoded. Welp, I just wasted ten minutes of my life reviewing slop that I'm not going to get back.
But regarding: "Oops... apparently this is vibecoded. Welp, I just wasted ten minutes of my life reviewing slop that I'm not going to get back."
Do not talk to people like this. I don't care if you don't like the library, or if you found a flaw in it. I am a regular person who wrote this code for no other reason than I thought it would be good to exist. It's unbelievably rude to call it vibecoded slop, or a waste of your life, and it makes me sad that someone who would write an otherwise thoughtful comment would say something like that.
Could you clarify how much of this code and blog post was written by an LLM?
you interested in project and spent some time researching it, but stop when understand that it is vibe-coded (be it or not)?
Why care if it is interesting to you?
English language as she is spoke
Unfortunately though this particular header seems to include the system headers up in the declaration part of the header.
"Using a language other than C is like using a non-standard feature: it will cause trouble for users. Even if GCC supports the other language, users may find it inconvenient to have to install the compiler for that other language in order to build your program. So please write in C."
The GNU Coding Standard in 1994, http://web.mit.edu/gnu/doc/html/standards_7.html#SEC12
There is no language other than C and C++ that is mature enough that you can actually discard the implicit runtime stuff and still be able to code in the language. C++ is too complex in my opinion so I only get to use C as a minimal language.
Even if you look at a language like Zig. You have implicit error trace printing stuff that is inaccurate when using optimized builds, you get a very bad fuzz implementation that doesn't work properly, you get comptime reflection which will be insane in the hands of the people that are writing rust now. Also a bunch of features you would want to use to discard the runtime are not documented/stable.
You can't even use Odin without libc as far as I can understand.
Hare doesn't even have inline asm.
Contrast with using C with clang/gcc where you can do '-nostdinc' '-nostdlib', then implement memcpy etc. and you can do w/e you want after that.
Rust as a nother example is trash for doing low level projects without pulling the 10billion lines of code that comes with using rust like libc/stdlib/binding libaries etc. etc.
You can use libraries that other people built in Rust but doing it yourself takes much more time than doing it in a language like C or Zig.
Another thing is, C is easy to implement. Implementing Rust/C++/Zig or any of the other languages is basically impossible in comparison.
Also I found that C is the only language that you can go into a very big project and open a random file and roughly understand what is going on. This is not possible in any of these other laguages other than Zig and I suspect it will get very bad in Zig when(if) the lower skill level people that are currently writing Rust start moving to writing Zig.
In 1994 even dialup internet connections were rare and most software distribution occurred by floppy disk (encased in hardshell plastic). _storage_ space was also at a major premium with internal hard disk size indexed in CHS rather than LBA and new (rarely seen by most end consumers) models barely passing 1GB in capacity. https://en.wikipedia.org/wiki/Seagate_Barracuda
Even in the early 'dot com' era as DSL and early cable modem became common downloading software updates could still be painful, though far less so than hours or days on dialup.
"In the 1990s, 'hackers' would 'dial up' their flip phones to local BBSes (called 'phreaking'), where they played and exchanged small Flash games (the 'demo scene')." /s
c8 buf [SP_PATH_MAX] = sp_zero;
sp_cstr_copy_to_n(path, len, buf, SP_PATH_MAX);
since #define SP_PATH_MAX 4096
There should be a fallback for very long paths.https://learn.microsoft.com/en-us/windows/win32/fileio/maxim...
How does it deal with code executing before main? Libc does a bunch of necessary stuff, like calling initializers for global variables.
What sp.h does not do is reimplement all of libc's initialization code. If you want to build a freestanding binary, there are a few utilities in there for defining a _start so the loader can actually jump to your code. But it's not, and isn't meant to be, a libc replacement in this sense.
The functions strncpy, snprintf, strncat, are fountains of bugs.
I still use snprintf, though, because it is so darned useful. But I wrap it up in another function after carefully ensuring it is called correctly.
When one is competent to work at this level, strong opinions are in order.
Their correctness is something I cannot gage. I'm barely competent to follow the conversation.
Please, describe..
P.S. sad to see that HN becomes a witch hunting place
Designing software and data structures for performance against unknown use cases on unknown hardware is extremely difficult and the resulting code is much more complicated. Even then, it’s often better to use code written against your actual use case and hardware when performance is that critical.
Things that are off the table might be:
SIMD A highly optimized hash table rewrite Figuring out where inlining or LIKELY causes the compiler to produce better code."
LOL...
Classic vibe coder.
> Only couple of languages not affected are those that don't have a culture of downloading third party code, like C and C++
> Ex js and python developer publishes a 'library'
> Library is vibe coded
> Published on github amidst GitHub being hit by supply chain attacks, had their source code leaked.
The timing is terrible for starters, and I don't trust the vibe coded code at all. Imagine a pandemic and the cities are on fire, and you arrive to a rural town asking to kiss people.
Yet another slop coded library.
What could possibly go wrong...
Why do standard library headers always have to be insane?
You are not just parsing it. The header includes the implementation too which will get compiled. Then the linker has to do extra work in order to deduplicate all of this extra code that was made.
>Because C is so simple, this is virtually free.
This is not virtually free needing to compile a standard library for every file in one's project.
>In any case, calling it insane makes me feel disrespected
I would recommend you not take it personally. Fortunately, for better or worse software ends up being pretty robust so it can tolerate a lot. Even if you have to recompile the same standard library hundreds of times, it will eventually compile.