This reminds me of an article by Ted Unangst[1], in which he flattens the various libraries and abstractions to show how xterm (to cite one of many culprits) in one place is effectively doing:
if (poll() || poll())
while (poll()) {
/* ... */
}
In other words, if you don't know what your library/abstraction is doing, you can end up accidentally duplicating its work.Reminds me of some aphorism, "Those who do not learn from history..." ;)
[1] http://www.tedunangst.com/flak/post/accidentally-nonblocking
Well yes, This is the very definition and goal of abstraction.
I ran an experiment where I timed the runtime of the sample program provided in the OP, except I changed the number of calls to localtime() from ten times to a million. I then timed the difference with and without export TZ=:/etc/localhost. The net savings was .6 seconds. So for a single call to localtime(3), the net savings is 0.6 microseconds.
That's non-zero, but it's likely in the noise compared to everything else that your program might be doing.
"fast" is a relative term, and is somewhat orthogonal to "efficient".
There's a reason why certain functions use a vDSO. If you're just going to use a syscall anyway, there's kind of no point.
> formatting dates and times
This shouldn't require a call to localtime; more explanation on the part of the article is required here. Breaking a seconds-since-epoch out into year/mo/day/etc. is "simple" math, and shouldn't require a filesystem access. Something else is amiss here.
> for everything from log messages
You're about to hit disk; a cache'd stat() isn't going to matter.
> to SQL queries.
You're about to hit the network; a cache'd stat() isn't going to matter.
(Now, I'm not saying you shouldn't set TZ; if it saves some syscalls, fine, and it might be the only sane value anyways.)
¹one of my old teams had an informal rule that any invocation of datetime.datetime.now() was a bug.
One example is the folks over at slack record every syscall for security auditing. https://slack.engineering/syscall-auditing-at-scale-e6a3ca8a...
This can actually be a problem, since there are applications like git which assume stat is fast, and so it aggressively stat's all of the working files in the repository to check the mod times to see if anything has changed. That's fine on Linux, but it's a disaster on Windows, where the stat system call is dog-slow. Still, I'd call that a Windows bug, not a git bug.
On x86_64, syscalls only use SYSCALL. It's very fast if audit and such are off and reasonably fast otherwise. (I extensively rewrote this code recently. Older teardowns of the syscall path are dated.)
If that's true, then one test where you have a single process spinning into and out of a single syscall will have very different performance characteristics than a test where you have more processes than processor cores, because context switches flush the TLB.
Somebody who knows actual things about x86 and so forth please tell me if I'm spouting 90s-era comp sci architecture textbook stuff that no longer applies.
Obviously if you care about performance then you wouldn't be running your program on a Raspberry Pi in the first place. But for everything else there's this free speed up.
OTOH, I've never encountered an issue like this on those systems.. (yet)
But it's neat information to have in the back of your head.
As just one example, what you're stat()ing over NFS with a busy, flaky and/or distant server? A bit of thought and you'll come up with a bunch of other times it suddenly starts to matter.
$ time ./tz
./tz 2,24s user 6,28s system 98% cpu 8,612 total
$ export TZ=:/etc/localtime
$ time ./tz
./tz 1,35s user 0,00s system 98% cpu 1,364 total
So 0.7 microseconds on my machine.Hope this is just a typo in your comment, not the actual test ;)
See `man timezone` on a Linux system[1]. Specifically, see the passage that I've quoted below. Note that this is the third of three different formats that the man page describes that you can use in TZ:
> The second format specifies that the timezone information should be read from a file:
:[filespec]
> *If the file specification filespec is omitted, or its value cannot be interpreted, then Coordinated Universal Time (UTC) is used. If filespec is given, it specifies another tzfile(5)-format file to read the timezone information from. If filespec does not begin with a '/', the file specification is relative to the system timezone directory. If the colon is omitted each of the above TZ formats will be tried.http://mail-archives.apache.org/mod_mbox/httpd-dev/201111.mb...
https://github.com/apache/httpd/blob/trunk/server/util_time....
The internals of glibc can often be pretty surprising sometimes, I'd really encourage people to go spelunking into the glibc source when they are profiling applications.
If you enjoyed this post, you may also enjoy our deep dive explaining exactly how system calls work on Linux[1].
[1]: https://blog.packagecloud.io/eng/2016/04/05/the-definitive-g...
TZ=:/etc/localtime
I've set TZ sometimes without the colon and it seem to work. I did a quick online search and didn't find anything relevant.
However the reason it works without : is that the implementation is being lazy and just ignores the : delimiter and falls back to parsing out a filename either way:
https://sourceware.org/git/?p=glibc.git;a=blob;f=time/tzset....
https://www.gnu.org/software/libc/manual/html_node/TZ-Variab...
The third format looks like this:
:characters
Each operating system interprets this format differently; in the GNU C
Library, characters is the name of a file which describes the time zone.
The other formats specify the timezone directly, such as EST+5EDT. Interestingly, it seems to work okay without the colon. Perhaps the leading slash implies a filename?My favorite part:
> WTF?? Why is ls(1) running stat() on /etc/localtime for every line of output?
[1] http://www.brendangregg.com/blog/2014-05-11/strace-wow-much-...
- Why does glibc check /etc/localtime every time localtime is called? Wild guess: so that new values of /etc/localtime are picked at runtime without restarting programs.
- Corollary: why does glibc not check /etc/localtime every time localtime is called, when TZ is set to :/etc/localtime? Arguably the reason above should still apply when TZ is set to a file name, shouldn't it?
https://sourceware.org/git/?p=glibc.git;a=commit;h=68dbb3a69...
I'd argue it should cache the same when both old_tz and tz are NULL (but start with an old_tz that is not NULL).
I was about to file an upstream bug, but found https://sourceware.org/bugzilla/show_bug.cgi?id=5184 and https://sourceware.org/bugzilla/show_bug.cgi?id=5186
The latter actually implies the opposite should be happening: files given in TZ should be stat()ed just as much as /etc/localtime.
/etc/localtime is set by the administrator. It may change without notice to the user.
TZ is part of the user's environment and the user sets it. All applications run by the user should honor the user's wishes if the user's not falling back to system defaults.
If you're setting TZ for yourself, your libc can update things when you update the variable and restart any applications you're running under the old value. It can therefore save cycles. If you're falling back to the system default that's not under the same user's control, then it must be ready to deal with unexpected changes.
First:
> What’s going on here is that the first call to localtime in glibc opens and reads the contents of /etc/localtime. All subsequent calls to localtime internally call stat, but they do this to ensure that the timezone file has not changed.
and second: read the section titled "Preventing extraneous system calls" for the answer to your second question.
In general, the timezone is set during OS setup, and the system is left in a state where it's up to the applications to figure out what to do. For example, you might configure Apache (yes, I am old, leave me alone) to use a particular timezone. But if Apache senses an env var it may choose to override the configured value with what's in the env var. Or SSH might be configured to pass along all env vars, including TZ, which in all honesty it probably won't even if you tried, but it could, and then the destination server's application has the wrong timezone.
Point is, it's safer not to mess with env vars unless you need to.
Though PeterWillis makes a good point akin to yours, and your (plural) point does make sense.
(Edit: added mention of comment with additional background on why to avoid hardcoding the variable)
And for what benefit? A few hundred syscalls per second? Linux syscalls are fast enough that something of that magnitude shouldn't matter much. Given that /etc/localtime will certainly be in cache with that frequency of access, a stat() should do little work in the kernel to return, so that won't be slow either.
It's good that they did some benchmarking to look at the differences, but this feels like a premature optimization to me. I can't imagine that this did anything but make their application a tiny fraction of a percent faster. Was it worth the time to dig into that for this increase? Was it worth the maintenance cost I mention in my first paragraph? I wouldn't think so.
I'm really trying not to take a crap on what they did; as I said, it's really cool to dig into these sorts of abstractions and find out where they're inefficient or leak (or just great as a learning exercise; we all depend on a mountain of code that most people don't understand at all). But, when looked at from a holistic systems approach, a grab bag of little "tweaks" like this can become harmful in the long run.
I would definitely like to see a before/after real-world metric on impact here though.
Since most embedded systems can directly translate the amount of processing needed to achieve the product goal into a real dollar cost of the hardware, any savings, even a small one, is worth investigating. Since the hardware and system are generally well understood, implementing something like this is much more reasonable.
But I agree, on a general purpose OS doing general purpose thing, an optimization like what's proposed by the article may not be worth the other tradeoffs.
/* Update internal database according to current TZ setting.
POSIX.1 8.3.7.2 says that localtime_r is not required to set tzname.
This is a good idea since this allows at least a bit more parallelism. */
tzset_internal (tp == &_tmbuf && use_localtime, 1);
mktime also does the tzset call every time, though: time_t
mktime (struct tm *tp)
{
#ifdef _LIBC
/* POSIX.1 8.1.1 requires that whenever mktime() is called, the
time zone names contained in the external variable 'tzname' shall
be set as if the tzset() function had been called. */
__tzset ();
#endif
and I don't see any way around that other than setting TZ=: or some such. $ ldd test
linux-vdso.so.1 => (0x00007ffd80baf000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f8844bf7000)
/lib64/ld-linux-x86-64.so.2 (0x00007f8844fbc000)
Any thoughts why the behavior would be different?The former is the "traditional" function which returns a pointer to a statically allocated, global "struct_tm". The latter is the thread-safe version receiving a pointer to a use-supplied "struct tm" as it's second argument.
: do {
: t = time(NULL);
: localtime_r(&t, &tm);
: printf("The time is now %02d:%02d:%02d.\n",
: tm.tm_hour, tm.tm_min, tm.tm_sec);
: sleep(1);
: } while(--N);
with TZ set to Europe/Berlin, set to :/etc/localtime, or unset I never get a stat on anything. write(1, "The time is now 07:23:33.\n", 26The time is now 07:23:33. ) = 26
nanosleep({tv_sec=1, tv_nsec=0}, 0x7ffd9e798470) = 0
write(1, "The time is now 07:23:34.\n", 26The time is now 07:23:34. ) = 26
nanosleep({tv_sec=1, tv_nsec=0}, 0x7ffd9e798470) = 0
write(1, "The time is now 07:23:35.\n", 26The time is now 07:23:35. ) = 26
nanosleep({tv_sec=1, tv_nsec=0}, 0x7ffd9e798470) = 0
If I change it to tm = localtime()... stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2335, ...}) = 0
write(1, "The time is now 07:30:56.\n", 26The time is now 07:30:56.) = 26
nanosleep({tv_sec=1, tv_nsec=0}, 0x7ffc868c3010) = 0
One more reason to switch to the reentrant/thread-safe versions of those ugly library functions :-).Note, this is using glibc 2.24 under Arch.
$ /lib/libc.so.6
GNU C Library (GNU libc) stable release version 2.24, by Roland McGrath et al.
(...)
Compiled by GNU CC version 6.1.1 20160802.Now I want to know the number of other configs to reduce the number of system calls. This all adds up to being significant the greater the number of hosts in your environment.
Rails instrumentation code calls current time before and after any instrumentation block. So, when I looked at the trace there were a lot of `stat` calls coming for `/etc/localtime` and as stat is an IO operation, I thought I discovered the cause of slowness(which I attributed to high number of IO ops) but surprisingly when I saw the strace method summary; while the call count was high, the time taken by the calls in total was not significant(<1% if I remember correctly). So I decided to set TZ with the next AMI update 15 months back but forgot about it totally. I guess I should add it to my Trello list this time.
Also, I think he should have printed the aggregate summary of just CPU clock time(`-c`) as well as that is usually very low.
Yes, on ordinary filesystems if you run stat() over and over again on the same file then it's just copying from the in-memory inode into your struct stat, there's no IO.
[1] https://github.com/systemd/systemd/blob/master/src/timedate/...
The way you'd do this is to open an inotify (or platform equivalent) file descriptor on the first call to localtime(), and on future calls, only bother statting /etc/localtime again if that file descriptor reports something has changed. But checking if an FD has data requires a system call; you'd do a non-blocking read on the FD (or a non-blocking select or poll, or an ioctl(FIONREAD)). It's possible that system call is faster than stat, but it's still a system call.
You could do it with a thread that does a blocking read, but that's a mistake under the current UNIX architecture: lots of stuff (signal delivery and masks, for instance) gets weird as soon as you have threads, so unconditionally sticking a thread into every process using libc is a bad plan.
You could do it with fcntl(F_SETSIG), which would send you a signal (SIGIO by default) when the inotify file descriptor is readable, but you couldn't actually use SIGIO, since the user's process might have a handler. You'd need to steal one of the real-time signals and set SIGRTMIN = old SIGRTMIN + 1. (See https://github.com/geofft/enviable for a totally-non-production-suitable example of this approach.) This would probably mostly work, except changing SIGRTMIN is technically an ABI break (since programs communicate with each other with numerical signal values). You could maybe use one of the real-time signals that pthread already steals. Also, using signals is sort of a questionable idea in general; user programs now risk getting EINTR when /etc/localtime changes, which they're likely not to be prepared for. A single-threaded glibc program that sets no signal handlers never gets any EINTRs, which is nice.
In an ideal world, every program would have a standard event loop / mechanism for waiting on FDs, and libc could just register the file descriptor with that event loop. Then you'll get notified of /etc/localtime changing after the next run of the event loop, which is good enough, and it would take zero additional system calls. But unfortunately, UNIX wasn't designed that way, so something at libc's level can't assume the existence of any message loop. A higher-level library like GLib or Qt or libuv could probably do this, though.
(Better yet, if it were possible to memory-map the directory entry for /etc/localtime, parsing could be avoided as well).
I agree the best solution is to provide a more sensible API. Why should an application be limited to only _one_ timezone?
$ whois packagecloud.io|grep Owner
Owner Name : JOSEPH DAMATO
Owner OrgName : COMPUTOLOGY, LLC
Owner Addr : 359 FILLMORE ST!12
Owner Addr : SAN FRANCISCO
Owner Addr : CA
Owner Addr : USI'm under the impression that he may also write a lot these himself? Not entirely certain, though.
(man tzset for more info and examples of different values TZ can be set to. Mine is set to "Europe/London" which handles the DST switches automatically.)
Convert to local time only on the edges, and only for end users.
For example, the timezone America/New_York contains information not only about the UTC offset, but the offset during and not during DST, and when DST starts and ends. (And historical starts and ends to, so that UTC → local conversions (and vice versa) use the rules that were present at that time, not the rules that are present now, which may be different.)
E.g., my home desktop runs in the timezone America/Los_Angeles all year around. Most of my servers run in the timezone UTC all year. Both always have the appropriately correct time.
Although the hour offset can change at extremely short notice (e.g. discontinuing Daylight Savings during Ramadan [1]), the timezone declaration (e.g. "Africa/Casablanca"), shouldn't need to change, just the underlying timezone database.
[1] https://en.wikipedia.org/wiki/Daylight_saving_time_in_Morocc...
TZ=CET
or TZ=:Europe/Copenhagen
but not TZ=Europe/Copenhagen echo 'TZ=:/etc/localtime' > /etc/env.d/00localtime echo 'TZ=:/etc/localtime' >> /etc/environmentI see one reference to Apache below, but not whether it actually made a measurable difference.
write(1,"Greetings!n",11) = 11 (0xb)
access("/etc/localtime",R_OK) = 0 (0x0)
open("/etc/localtime",O_RDONLY,037777777600) = 3 (0x3)
fstat(3,{ mode=-r--r--r-- ,inode=11316113,size=2819,blksize=32768 }) = 0 (0x0)
read(3,"TZif20000000000000"...,41448) = 2819 (0xb03)
close(3) = 0 (0x0)
issetugid() = 0 (0x0)
open("/usr/share/zoneinfo/posixrules",O_RDONLY,00) = 3 (0x3)
fstat(3,{ mode=-r--r--r-- ,inode=327579,size=3519,blksize=32768 }) = 0 (0x0)
read(3,"TZif20000000000000"...,41448) = 3519 (0xdbf)
close(3) = 0 (0x0)
write(1,"Godspeed, dear friend!n",23) = 23 (0x17)
(FreeBSD caches the database on the first call: https://svnweb.freebsd.org/base/head/contrib/tzcode/stdtime/... )It looks like musl, which is used on Alpine Linux images for example, will only read it once, and then cache it:
https://github.com/esmil/musl/blob/master/src/time/__tz.c#L1...
It has a mutex/lock around the use of the TZ info, but avoids re-stat'ing the localtime file.