This might have been mildly interesting if there had been the assembly for a few different architectures (x86, MIPS, ARM, PowerPC, etc) showing how the C code was translated to assembler for each. And could have been very interesting with an additional discussion of memory barriers and atomic operations in C and their relation to assignments and pointers.
As someone who has had difficulty picking up real programming languages, and has only found some marginal success due to Obj-C's ARC feature, I can tell you this puts everything I've read in to much better perspective.
Try not to be so negative, man, I think it's clear you weren't even the intended target anyways.
It is almost always the first language ported to any system, almost every computer science program at least covers the basics, it has been in 1st/2nd place on the TIOBE index for over a decade, its the 5th most popular language on github by commits and it is over 40 years old.
But- I'm willing to accept there might be people on Hacker news that don't know C, thats why I gave suggestions to the author to expand on the content and make it interesting to a wider audience. That was the point of my post.
So a better formulation might be: "C provides an abstraction layer on top of a computer's memory model and instruction set that will allow your code to be portable between different machine architectures, but only if you play strictly by the rules."
By the way, the classic K&R book explains the fundamentals of C pretty well. If you really want to understand C, I'd recommend reading it cover to cover (it's pretty short).
I surely hope not.
If you read the C standard, you'll notice it doesn't talk much about "memory" (the word only appears 13 times in C99); it mostly talks about "objects" (mentioned 735 times in C99). These objects aren't OO-objects -- obviously C doesn't have OOP built in -- but rather all the basic types like int, float, struct, etc are objects. When you declare a variable like "int x", you are creating an object.
C's aliasing rules dictate that you can only access an object via a pointer of that object's actual type. This is why it is dangerous to think of the assignment operator as a simple memory-copying operation. If assignment were a simple memcpy, you could do something like this:
int x = 5;
// BAD: undefined behavior, violates aliasing.
short y = *(short*)&x;
If a variable were just a memory address and assignment were just a memory copy, this would be a valid operation. But the right way to think of it is that a variable is a storage object whose address can be taken, and and a dereference is an operation that reads a storage object.A pointer isn't a generic memory-reading facility, it must actually point to a valid storage object of the pointer's type (or to NULL).
If you do want to read and write arbitrary objects in memory, you can always use memcpy():
int x = 5;
short y;
// This is fine, and smart C compilers optimize away the
// function call.
memcpy(&y, &x, sizeof(y));It's a valid operation regardless of whether a standards body says it's not.
uint32 x = 5;
uint16 y = *(uint16*)&x;
The effect is to set y to the first two bytes of memory from x. Values assigned to x are serialized into memory in either big endian or little endian order. Those are the only two cases you have to account for. Quake 3 engine has a macro for the above operation which produces the same value of y on all platforms. This is useful for serializing x to disk, then loading it later (and possibly on a different architecture).One source of confusion is that int and short are essentially, for all intents and purposes, undefined -- they are of course defined by the standards, but their implementation is allowed to vary so much that no programmer can make any assumptions about their size (in bytes) at runtime.
int8, int16, int32, int64 are all explicit and force the compiler (and the hardware) to obey the wishes of the programmer. This is, I think, the right approach. People make much ado about the fact that "a byte isn't necessarily 8 bits" and "the only assumption you can make about a short is that it's smaller than an int, and larger than a char", etc, which is probably unnecessary mental effort.
"Bytes are 8 bits. Here are four bytes. Here's the value that the four bytes store. Copy two of the four bytes to this other spot (adjusting for endianness appropriately via a macro)."
You typically don't want a memcpy in situations like this due to endianness.
The reason it's useful to explicitly "break the rules" like this is because it's important to know what assumptions you in fact can rely on, regardless of what standards bodies have to say about it. Because at that point you can do incredible things such as http://www.codercorner.com/RadixSortRevisited.htm
inline float fabs(float x){
return (float&) ((unsigned int&)x)&0x7fffffff ;
}
The reason this is incredible and awesome (rather than horrible and dangerous) is because it enabled game developers to achieve a more impressive product for end users, because they were able to do more with the CPU resources that were available at the time.It's of course not so relevant nowadays, since it's reasonable to assume that most gamers have at least a core 2 duo. But it's one of those things that isn't relevant until suddenly it is -- you're in some situation that requires sorting millions of floats, and your dataset simply demands more performance than your compiler typically gives you. Then suddenly you find you can do amazing things like this, and surprise people with how effectively you can use a modern CPU.
(Although, the modern antidote to "I need to sort millions of floats quickly" is to use SSE, not to sort floats as integers. Yet that's even more evidence that it's better to understand the capabilities of the hardware.)
Whoa there, cowboy. You may not feel personally beholden to standards bodies, but compiler vendors are following their lead. The major compilers are getting more and more aggressive about optimizing away undefined behavior every year.
> The effect is to set y to the first two bytes of memory from x.
No, it's really not. It's undefined behavior and the compiler is free to do absolutely whatever it wants.
> One source of confusion is that int and short are essentially, for all intents and purposes, undefined -- they are of course defined by the standards, but their implementation is allowed to vary so much that no programmer can make any assumptions about their size (in bytes) at runtime.
I agree with this, and have made this argument before: http://blog.reverberate.org/2013/03/cc-gripe-1-integer-types...
But this is an entirely separate issue.
Given that compilers do break when programmers violate aliasing rules, you should recheck what assumptions you think you can rely on. Non-strict aliasing is not one of them. Unless you want to slow everything down with compiler-specific flags like -fno-strict-aliasing.
uint8_t foo[4]; *(uint32_t*)foo = 0;
Besides even without strict aliasing, the above is not at all guaranteed to work since not all architectures support unaligned loads. (and if you think "well but no one uses them, just like no one uses 1's complement architectures anymore", keep in mind that this includes ARM)(also use stdint types already)
At least in C99, the compiler doesn't need to support exact-width integer types.
>People make much ado about the fact that "a byte isn't necessarily 8 bits"
Well, POSIX.1-2004 requires that CHAR_BIT == 8.
All the world's a VAX, sure. Don't mind the next generation of hardware coming down the pike and the next wave of compiler optimizations.
The size of a type is just one of its many attributes. Even if, for example, "long", "float", and "void* " happen to have the same size, they're still very distinct types.
"Integer data types are defined in the limits.h file. Float data types are defined via macros in the floats.h file."
Integer and floating-point types are defined by the compiler, guided by the hardware and the ABI for the platform. <limits.h> and <float.h> document the characteristics of the predefined numeric types.
"A pointer doesn’t hold a memory address, it holds a number that represents a memory address."
Sure, and a floating-point object is ultimately just a collection of bits -- but that's hardly the best way to think about either of them. Integers and pointers (addresses) are logically very distinct things, even if they happen to have similar representations. For example, the addresses of two distinct variables have no defined relationship to each other (other than being unequal); just evaluating (&x < &y) has undefined behavior.
C lets you get away with a lot of type-unsafe stuff, particularly if you resort to pointer casts, but it's fundamentally much more strongly typed than the author seems to think it is.
2 &x = 20; // this doesn't work
3 * (&x) = 20; // this does work
Why does line 2 &x not work but line 3 does? Because &x returns a pointer, a number representing a memory address. This is an important distinction. A pointer doesn’t hold a memory address, it holds a number that represents a memory address.
=======
No, that is not why. Note that the following does work:
int * x = 0;
and the following works, though typically yields a warning:
int * x = 20;
Line 2 fails because & doesn't give back an l-value.
Definitely not true. More like, "it will have an address, if you take the address with the & operator". Otherwise, the compiler is quite free to store locals in registers.
As stated in the post.
"In most assembly languages, data types don’t exist. You operate on bytes and offsets."
This is just not true.
Most assembly languages (I learned on PDP-11 assembler, which I remember best, but what I say is true of 68000 and x86 too) have a notion of a byte, but also integers of various word lengths, and floating point numbers.
In fact, some registers are in effect designated as "pointers" for various kinds of conventional indirect addressing (the instruction pointer, the register holding the stack pointer, and others).
In this sense, C is even closer to assembly than you indicate, because the data types are so analogous.
What exactly is the "syntactic sugar" that hides the idea that names can have addresses? Structs? Some specific kind of expression? Array index syntax? The names themselves?
If you keep track of which boxes are and are not runtime memory cells, that should be enough to work out any particular C pointer problem except the pointer-array almost-equivalence mess.
When it comes to understanding memory in C, another important aspect is understanding how linkers and loaders work. Also, it's good to know something about calling conventions.
Also, when you get to manually allocated heap data (which this article doesn't cover) you don't have to worry about deallocations... usually.
Variables? What state? Everything is puuuuuuuuure.
In Python:
Everything is an object (numbers, true/false values, strings, etc), some are mutable and some are not. Variables are temporary labels on objects (think of them as hard links).
In Rust/C++:
There are various types of boxes / smart pointers (shared, unique, heap, etc), and unsafe / raw pointers should be avoided when possible.
In C:
Not every variable has a data type, e.g. void or function pointers.
A void pointer has type "void* "; a function pointer also has some appropriate type.
Not every object has a type (e.g., a chunk of memory allocated by `malloc()`), but if "variable" means "object created by a declaration", then yes, every object has a type.
As an example, an IPv4 address is 32 bits. Don't convert it to a string and put it in a varchar(64) in your database when you are optimizing for space (I actually saw this once). And yes, the DB had an inet type, but no one knew how to use it, what it was or why it mattered.
int r = ((int (*)())startAddress)(); // Wheeee!http://en.wikipedia.org/wiki/Lie-to-children
> A lie-to-children, sometimes referred to as a Wittgenstein's ladder (see below), is an expression that describes the simplification of technical or difficult-to-understand material for consumption by children. The word "children" should not be taken literally, but as encompassing anyone in the process of learning about a given topic, regardless of age. [snip] Because life and its aspects can be extremely difficult to understand without experience, to present a full level of complexity to a student or child all at once can be overwhelming. Hence elementary explanations tend to be simple, concise, or simply "wrong" — but in a way that attempts to make the lesson more understandable.
OK, the very first sentence of this piece falls flat on its face when you begin to think about how a computer actually handles getting data into and out of the parts of the CPU that actually do the work of modifying data according to the opcodes in flight.
In specific, C is meant to be a pleasant syntax to sling data around a large, flat address space, where the assumption is that every part of the address space can be treated like any other, with no special consideration given to some locations being faster than others. (The 'register' keyword mucked with this a bit, but approximately nobody uses it anymore in new code. Just as well, because good compilers ignore it anyway; more below.)
This is horribly, hilariously wrong when you learn about cache hierarchy, and becomes even more wrong when you throw an OS implementing virtual memory and a disk cache into the picture. C doesn't have any way to refer to cache; you can't tell the compiler 'store this in cache' because that would break the abstraction C enforces.
So we loop back around: C enforces the abstraction for a good reason; namely, compilers are better than humans at scheduling memory use in practically every case, and in the few cases they aren't, you're doing something hardware-specific enough you'll need to drop into assembly anyway. This is also the reason the 'register' keyword is a no-op and has been for decades. Compilers can schedule registers better than humans because compilers know more about all of the optimizations in play, and when they can't, you'll have to drop into assembly anyway.
TL;DR: This is a basic introductory post. Nitpicking it for things that compilers take care of for you anyway is pointless.