Grepping logs is still terrible (opens in new tab)

(asylum.madhouse-project.org)

100 points_5csa11y ago123 comments

123 comments

It's beyond me how he doesn't understand that text logs are a universal format, easily accessible, that can be instantly turned into whatever binary format you desire with a highly efficient insertion process (Splunk is just one of those that does a great job).

Here is the thing he doesn't seem to understand - all of us who are sysadmins absolutely understand the value of placing complex and large log files into database so that we can query them efficiently. We also understand why having multi-terabyte text log files is not useful.

But what we find totally unacceptable is log files being shoved into binary repositories as the primary storage location. Because you know what everyone has their own idea of what that primary storage location should be, and they are mostly incompatible with each other.

The nice thing about text - for the last 40 years it's been universally readable, and will be for the next 40 years. Many of these binary repositories will be unreadable within a short period, and will be immediately unreadable to those people who don't know the magic tool to open them.

scrollaway11y ago

> text logs are a universal format

Uh, I don't know what world you live in but I'd like the address because mine sucks in comparison.

Text logs are definitely not a "universal format". Easily accessible, sure. Human readable most of the time? Okay. Universal? Ten times nope.

Give you an example: uwsgi logs don't even have timestamps, and contain whatever crap the program's stdout outputs, so you often end up with three different types of your "universal format" in there. I'm not giving this example because it's contrived, but because I was dealing with it the very moment I read your comment.

ghshephard11y ago

But at least you have a fighting chance. What if that exact same data was dumped into a binary file, that you did not know how to decode?

Originally, you had a problem - the data wasn't formatted in a manner that you could parse cleanly.

Now, you have a new problem - not only is the data not formatted properly, it's now in some opaque binary file.

Saying that there are poorly formatted text files isn't a hit against text files, it's a hit against poor formatting. The exact same problem exists if the file is in binary form, and not formatted properly.

EmanueleAina11y ago

> a binary file, that you did not know how to decode

I guess nobody ever advocated putting stuff in a binary file with an undefined format. Databases, syslog-ng, elasticsearch and the systemd journal all have a defined format with plenty of tools to access the data in a more structured way (eg. treating dates as dates and matching on ranges).

I agree the issue at hand is not just binary vs. plain text, it's more "how much you want to structure your data".

The classic syslog format is very loosely defined, with every application defining its own dialect, each with its own way to separate fields and handle escaping. To fix that you could store the log data as JSON as many online services are doing. But once you have JSON, grep is no longer enough to properly handle the data even if it's still plain text. Now that you have both a quite verbose format on disk and the need for custom tools, why not store the log as binary encoded JSON (eg. something like JSONB in PostgreSQL)? Or make it even more efficient with an format optimized for the specific usage? Add some indexes and you get more or less what databases, ElasticSearch and the journal do.

Also keep in mind that most of the logs right now gets rotated and compressed with gzip, I'd doubt that the above binary formats are less resilient to errors than a gzip stream.

rubenv11y ago

Sure, an opaque binary file is pointless.

But that's not what most logging systems that log to binary files offer. They give you specs (example: http://www.freedesktop.org/wiki/Software/systemd/journal-fil...) and tools.

Binary doesn't have to mean closed/opaque.

2 more replies

crdoconnor11y ago

I've been producing a few services recently which output a chunk of JSON for each log message followed by a newline.

I think it actually solves most of the problems text logs have that binary don't (inability to easily present structured data, etc.) yet keeps the advantages of a text log (human readable, resistant to file corruption, future-proof).

ghshephard11y ago

Speaking for myself - multiline .json output is problematic, as most of the parsing tools work best when the data is on a single line, and it's a cognitive struggle to deal with multi-line output, even if you are clever with your tools. I usually have to end up writing a json parser in python to get the data into a format that I can manipulate it. (Thankfully, python does 95% of the work for you when reading a json file)

But - here is the thing, even though the .json format isn't convenient for me, I can, with about 20-30 minutes effort, write a parser that can get the data into a convenient format, because it started out as a text file.

2 more replies

aidos11y ago

Like you, I also deal with the kinda weird uwsgi logs. I feel like "universal format" probably didn't mean the format of all the lines in all the logs is the same - though your definition is probably more accurate.

Despite that, I can be pretty sure when I walk in to a foreign system there will be nginx logs, just where I expect them, almost certainly in the format I'm used to. And even if the format differs, it's not much of a problem. Binary logs, big problem.

EmanueleAina11y ago

Sure, on a site that uses ElasticSearch for its logs I would have no idea where to look at. I'd be more at ease with SQL, but first you need to locate the DB, figure out the schema, get the SQL dialect right.

That said, I'd be far more at ease writing a SQL query to extract analytics from logs than cooking up some regexes and doing complex stuff with awk.

And I find the --since/--until parameters to journalctl far easier than matching dates by regex. Or even the --boot parameter to restrict logs to a specific boot, which with would be probably doable with awk but definitely not as trivial.

I think that binary logs give you some compelling features, without taking away any: you can always just dump the logs on stdout and use grep as much as you want. :)

AlfaWolph11y ago

He's referring to the universal format being text itself.

peterwwillis11y ago

"Text logs" is not a format at all, so it can't really be a universal format, either. But if there were such a thing as a "universal" format, it would probably by definition encompass everything in time and space. You think timestamps are a problem? Just wait until your logs get trapped in a quantum state. Talk about a heisenbug...

rkrzr11y ago

"But what we find totally unacceptable is log files being shoved into binary repositories as the primary storage location"

The way I read his article, he's not really opposed to additionally keeping your logs around as text. But you make a good point of using text as the primary storage location, since you can always easily feed it to some binary system for further analysis.

Would the best practice then be to keep your logs around as (compressed) text, but additionally feed it to your log analysis system of choice for greater querying capabilities?

ghshephard11y ago

Exactly. And I think that's what every shop that has discovered Splunk (or other such tools) has started doing. Sysadmins love log data in queryable format in a database. I'm the hugest advocate of this. I have some queries that took greater than 30 minutes when coming from a modest text files, that can be performed in under 50 msec when in a database.

But don't cripple me by shoving your primary log files into binary format so I can't quickly pull data out of them with awk/grep/sed when I need to quickly diagnose a local issue.

magnifyingglass11y ago

Agreed. Logs are for when everything and anything is broken. They aren't supposed to be pretty or highly functional, they are just meant as a starting point for gathering data.

cones68811y ago

why not just do both?

our product stores all the logs raw in flats files on the file system, we don't use databases for keeping the logs in, this allows you to scale massively (ingestion limit is that of the correlation engine and disk bandwidth). You then just need an efficient search crawler and use of metadata so search performance is good too.

Issue is if you every need to pull the logs for court and you have messed with them (i.e. normalized them and stuffed them into a DB) then your chain of custody is broken.

Best of both worlds means parsed out normalisation so I don't have to remember that Juniper calls source ip srcIP and Cisco SourceIP, but the original logs under the covers for grepping if you need.

sobkas11y ago

> text logs are a universal format

Then punch in the face is a universal form of communication. Also EBCDIC is the only encoding future will recognize!

thaumaturgy11y ago

Cool, so which standard binary log storage format should we all switch to?

Should I submit patches to jawstats so that it'll support google-log-format 1.0 beta, or the newer Amazon Cloud Storage 5 format? Or both? Or just go with the older Microsoft Log Storage Format? Or wait until Gruber releases Fireball Format? Has he decided yet whether to store dates as little-endian Unix 64 bit int timestamps, or is he still thinking about going with the Visual FoxPro date format, y'know, where the first 4 bytes are a 32-bit little-endian integer representation of the Julian date (so Oct. 15, 1582 = 2299161) and the last 4 bytes are the little-endian integer time of day represented as milliseconds since midnight? (True story, I had to figure that one out once. Without documentation.)

Should I write a new plugin for Sublime Text to handle the binary log formats? Or write something that will read the binary storage format and spit out text? Or is that too inefficient? Or should I give up on reading logs in a text form at all and write a GUI for it (maybe in Visual Basic)?

Do you know when I should expect suexec to start writing the same binary log format as Apache, or should I give up waiting on that and just write a daemon to read the suexec binary logs and translate them to the Apache binary logs?

Should I take the time to write a natural language parsing search engine for my custom binary log format? Do you think that's worth the time investment? I would really like to be able to search for common misspellings when users ask about a missing email, you know, like "/[^\s]+@domain.com/" does now.

I look forward to your guidance. I've been eagerly awaiting the day that I can have an urgent situation on my hands and I can dig through server logs with all of the ease and convenience of the Windows system logs.

geographomics11y ago

The system should provide a standard API for writing and reading logs. The precise format of the underlying log files is thus rather unimportant at this level of abstraction. Other than the logging subsystem and recovery tools, there's no need for any software to be accessing such log files directly (outside of the API functions). This is how Windows has done it for years.

nieve11y ago

Even if you can manage one per OS, it's not good enough. Have you ever worked in a non-monoculture or dealt with recovery of a system severely damaged by malice or accident?

I doubt my Linux (including webOS & Android), FreeBSD, and OS X boxes are going to settle on a single binary format in the next couple of decades or even a single API & toolset. In your brave new world the very first thing I'm going to need to do if I have to combine logs across them is to extract data from at least three formats and the most convenient format is often going to be text - i.e. right back where we started, but with extra work for each OS. More likely you'll get a mix of things using the system APIs, custom binary formats, custom text formats, and syslog. Adding more steps to get at the same data doesn't help.

More importantly, binary logs are unreliable when you're dealing with a system that's completely trashed. You can often get usable text logs off a disk that's throwing I/O errors every few dozen bytes or even from a corrupted raw disk image. They may not be cryptographically "sealed", but I'd rather have them than an error message about the binary format being corrupt. That should be an implementation detail, but I haven't seen much interest from the binary logs camp in making the file formats resilient.

felixgallo11y ago

You missed the joke at the end where he correctly pointed out that Windows' logging is a total joke, and that discovering information from Windows logs is essentially impossible unless the tool writer specifically predicted your use case.

And that's the nub of it: text logs are for when you may have many varied, complex reader use-cases, and you don't understand all those cases well enough yet to lock them down forever, and you have a thousand excellent tools at your disposal that you would like to be able to continue to use.

Recent log spelunking for me included 'cat log.? | grep fail | sed 's/^.worker_id$//g' | awk '{ print $5, $4 }' | sort -n -r | sed 30q'.

There's no analogue in any binary logging system I've ever found.

dragonwriter11y ago

It seems to me that a simple transitional tool for a binary logging system would be for the implementer of the binary logging system to also include a tool that consumed a binary log file on stdin and produced a stream on stdout in one (or more, selecting which by command line arguments) common text log formats.

That lets you develop an ecosystem of supporting tools that take advantage of any strengths of the binary format, while still allowing the freedom of using the (initially, at least, probably far more capable) set of tools available for the text formats.

1 more reply

geographomics11y ago

This isn't really true as the Windows event logs contain text as well as the other structured data, which you can search for using tools on the system. For example to search for some specific text in the system log using Powershell:

    Get-EventLog -LogName System | Where {$_.Message -Match "something"}

To process text as fields, as with awk, one would use the Split method (at least to start off with):

    Get-EventLog -Log System | Where {$_.Message -Match "something"} | %{ $_.Message.Split()[5,4] }

But as message text is often parameterised, it may be easier to take advantage of this data to get what you need. For instance, this command would extract the latest machine sleep and wake times from the system log, and calculate the duration:

    Get-EventLog -Log System -Source Microsoft-Windows-Power-Troubleshooter -InstanceID 1 | Select-Object @{n="SleepTime";e={$_.ReplacementStrings[0]}}, @{n="WakeTime";e={$_.ReplacementStrings[1]}}, @{n="SleepDuration";e={([DateTime]$_.ReplacementStrings[1])-([DateTime]$_.ReplacementStrings[0])}}

One can also sort and get unique values, just as in Unix-type systems - this command lists all drives defragmented in the past 30 days:

   Get-EventLog -Log Application -Source Microsoft-Windows-Defrag -InstanceID 258 -After (Get-Date).AddDays(-30) | Select @{n="Drive";e={$_.ReplacementStrings[1]}} | Sort Drive | Unique -AsString

So all the same capabilities are there, and then some. You just need to know your tools well enough to take advantage of it.

1 more reply

pjc5011y ago

Binary logs may be fine for you, but don't force it on us!

This is really the important point here. For small systems, grep works fine. The number of people administering small systems is much greater than the number of people administering large systems. The systemd controversy has caused people to fear that change they don't want will be imposed on them and their objections insultingly dismissed: a consequence of incredibly bad social "change management" by its proponents.

They are therefore deploying pre-emptive rhetorical covering fire against the day when greppable logs will be removed from the popular Linux distributions. Plain text is the lingua franca; binary formats bind you to their tools with a particular set of design choices, bugs and disadvantages. My adhoc log grepping workflow has a different set of bugs and disadvantages, but they're mine.

mrweasel11y ago

>For small systems, grep works fine

That really the key for me. My go to example is searching for IP numbers across different logs. If I have just one machine, and I want to find an IP in the SSH, web and mail logs I shouldn't have to use multiple tools for getting that data.

Logstash, Splunk and other tools store stuff binary, as he writes, and that's perfectly valid, the only solution in fact. But I don't want to be force to run a centralized logging server, if I have just the one or two servers.

If it's okay to claim that binary logging is the only way to go, because you have hundreds of servers, it's also okay to claim that text files are the only solution, because I just have one server.

Finally, isn't those binary logs (those that come from individual services) going to be transformed into text when I transmit them to something like Splunk, only to be transformed back to some internal binary format when received? It seems we could save a transformation in that process.

icebraining11y ago

In the setup the author presents, using syslog-ng and elasticsearch, it seems the logs are serialized as json for the transmission.

mrweasel11y ago

Yes, which means that if say systemd logs where to be shipped to his ElasticSearch instance, he need to configure Journald to log to text files first, and then what's the point of having the binary format?

Yes, ElasticSearch is storing data in binary, and that's fine, but you're not going to ship the raw Systemd binary log to ElasticSearch, nor any other binary logs for that matter.

In fact in the examples he provides both sources are plain text. Syslog-ng and Apache are plain-text logs. He then transfer them to ElasticSearch, where they're store binary, but that's not what anyone is complaining about. The original source should be text, what you choose to do afterwards is your business.

nrr11y ago

Oddly enough, even for large (>=1e5 physical machines) systems, grep works fine. Better yet, if the logs are important, you're shunting them off for some sort of longer-term storage for post-processing and indexing _anyway_, irrespective of the underlying disk format. Some folks continue to use plain text even then, just with some distributed systems magic wrapped around the traditional Unix tools.

(If you're shunting _all_ of your log data off at that scale, you're crazy, and you'll melt your switches if you aren't careful.)

The name of the game is to think of the problems that you're solving and how they relate to the business bottom line. No sooner, no later. Additionally, what's most troubling is that we've turned this exercise into an emotional one, not one with any sort of scientific-oriented perspective.

I can personally say with conviction that I'd like to sit down and actually collect data on, e.g., how many instructions it takes to store logs to disk in plain text versus a binary format, how many it takes to retrieve logs from disk in both situations, and how much search latency I incur when trying to retrieve said logs from disk in the same. At scale, which is where most of my attention lies these days, that's the kind of thing that matters because those effects get amplified automatically—often to operators' and capacity planners' horrors—by the number of machines you have.

If you're dealing with smaller systems, it won't matter as much, but at that point, you're probably dealing with the other side of this, which is having information on how many requests you get for historical log data and what sort of criteria were used in that search. If you're getting requests less frequently than, say, once per quarter, it likely wouldn't be worth your time to invest in what Mr. Nagy is evangelizing.

tl;dr: Continue using your ad hoc grep-fu, but be mindful of how much time it takes you to get the data you're looking for. That alone will be your decision criterion for adopting something like this.

ghshephard11y ago

grep definitely breaks down on large systems. I have one environment with approx 5 million nodes - (1e6), and the only way to coherently manage the log updates from them is in binary format.

But even still - I like to have the text files as journals of original entry - so I can occasionally do a tail -f incoming.log| egrep -i "somedevice".

And having the original files in text format is zero impediment to getting them into handy binary database form.

nrr11y ago

I hate arguing semantics, but 1e6 is not just large but very large indeed. (:

That said, I'd be curious to know some more of the details of that system actually! If you're aggregating all of those devices together, using something binary in that context definitely makes sense. In fact, if I were in your shoes and tasked with designing some means of solving that problem, I would probably use something like protobuf or capnp to emit those messages since they're well-known and well-understood serialization mechanisms.

Now, that's the integration and aggregation side of this exercise.

On a local node-by-node basis, though, I absolutely agree; having the raw text as journals of original entry for inspection in real time with `tail -f` (or, if you're using multilog, `tail -F`…) would still be incredibly useful.

Going back to Mr. Nagy's article, the space of problems that `tail -f` solves is barely overlapped by the space of problems solved by aggregation. I think he's conflated the two spaces in his article here (and especially in the one previous) whereby he's applied a one-size-fits-all solution to both where it demonstrably does not fit all.

1 more reply

mdekkers11y ago

"The number of people administering small systems is much greater than the number of people administering large systems"

Do you have any evidence for this statement? Because it sounds all kinds of wrong.

pjc5011y ago

Power-law distribution and economies of scale?

There are a lot of hobbyists, a vast number of people with a Linux box in the corner of the office or a few cloud instances, a smaller number of people running IT for multinationals and one or two people who have whole datacenters to themselves. The larger the system, the lower the computer/human ratio.

danieltillett11y ago

I would tend to agree with the OP but with a caveat - most of the people who administer system work on small systems, while most people who's full time job is administration work on large systems. Basically there are an awful lot of people in the world who's job description includes part time system administration.

nieve11y ago

By definition just about everyone who's actually working in DevOps counts.

4ydx11y ago

http://www.internetlivestats.com/total-number-of-websites/

If I read it correctly there are about 250 million active sites (roughly). It seems unlikely that they are all massive corporate sites.

As an aside, the idea that systemd is a good thing is hilarious to me at the least because it is so brash about making an important change to a huge chunk of the system. Yes the bugs will eventually get ironed out, but in the meantime? Count me out! I have work to do and am not interested in being a free tester for Redhat on my live systems.

nl11y ago

I'm pretty sure that counts (eg) each wordpress.com subdomain as a separate website. [1] counts like that and gives a roughly comparable number.

That gives a lot of economy scales.

[1] http://news.netcraft.com/archives/category/web-server-survey...

1 more reply

mdekkers11y ago

...systemd?

onli11y ago

You don't need evidence for the obvious. There are a few million personal desktop pcs with linux on them, then there are single servers used by exactly one person. Count that against the people working as a professional sysadmin on a big system.

EmanueleAina11y ago

For sure the storage format should not hinder you from using grep if you want. Even with systemd you can pipe journalctl's output and use the same old regexes as its default behaviour is to be a glorified `cat` (but being able to use the --since and --until flags instead of matching date ranges by regexes makes it much better than `cat` for me).

rlpb11y ago

Take this philosophy to an extreme and you end up with a dedicated data format and tooling/APIs to access the data for every subsystem, not just logging. Essentially, this is Windows.

The downside to this is that now you don't have a set of global tools which can easily operate across these separate datasets without writing code against an API. I hear PowerShell tackles this; I don't know how well. The general principle though harms velocity at just getting something simple done, to the benefit of being able to do extremely complex things more easily. See Event Viewer for a good example of this.

Logs don't exist in isolation. I want to use generally global tooling to access and manipulate everything. I don't want to have to write (non-shell) code, recall a logging-specific API or to have to take the extra step of converting my logs back to the text domain in order to manipulate data from them against text files I have exported from elsewhere for a one-off job. An example might be if I have a bunch of mbox files and need to process them against log files that have message IDs in them. I could have an API to read the emails, and an API to read the logs, or I could just use textutils because I know an exact, validating regexp is not necessary and log format injection would have no consequence in this particular task.

I do see the benefits of having logs be better structured data, but I also see downsides of taking plain text logs away. Claiming that there are no downsides, and therefore no trade-off to be made, is futile. It's like playing whack-a-mole, because nobody is capable of covering every single use case.

pjmlp11y ago

> Essentially, this is Windows.

Actually any non-UNIX OS clone out there, including mainframe and embedded OSes.

mugsie11y ago

Honestly - I agree about the ELK stack side - piping all your logs into ES / Logstash is a great idea. (Or Splunk / Greylog / Logentries)

If you run any sort of distributed system, this is vital. And while that counts as binary logs, I would argue that on the local boxes it should stay text.

I would agree, if you are running any sort of complex queries on your data - go to logstash, and do it there - it much nicer than regexes.

If on the other hand, you just want to see how a development environment is getting on, or to troubleshoot a known bad component tail'ing to | grep (or just tail'ing depending on the verbosity of your logs) is fine.

I don't have to remember some weird incantation to see the local logs, worry about corruption etc.

One problem I will point out with the setup described is syslon-ng can be blocking. If the user is disconnected from the central logstash, and their local one dies, as soon as the FIFO queue in syslog-ng fills, good luck writing to /dev/log , which means things like 'sudo' and 'login' have .... issues.

Instead, if you have text files being written out, and something like beaver collecting them and sending them to logstash, you have the best of both worlds.

Spooky2311y ago

Windows has had binary logging forever. Is windows administration some wonderland of awesome capability for getting intelligence out of logs? Hell no.

For administering Unix like systems, the ability to use a variety of tools to process streams of text is an advantage and valuable capability.

That said, your needs do change when you're talking about managing 10 vs 10,000 vs 100,000 hosts. I think what you're really seeing here is a movement to "industrialize" the operations of these systems and push capabilities from paid management tools into the OS.

dfox11y ago

I think that largest problem with Event Log is overreliance on structure. Often you have one particular log record that you know is the problem, but no idea what it means because you have some generic event code and bunch of meaningless structured data.

Freeform text logs usually contain more detail as to what exactly happened.

geographomics11y ago

That's not a limitation of the Event Log system itself, as you can easily write freeform text in the message rather than building it up with localised strings and structured data, e.g. https://msdn.microsoft.com/en-us/library/6w20x90k(v=vs.110)....

indymike11y ago

Grepping logs is terrible. Reverse engineering a binary format so you can diagnose why you are down/crashing/losing data is far worse. Logs should be handled as text until they reach their long term storage... then whatever helps analyze and query is fine...

phn11y ago

Yeah,in the presence of adequate tooling you don't need to grep logs. But how much more effort is required to use those tool-friendly loggings? Where is your god when the tool fails?

For me the main reason to access plaintext logs is they seldom fail, and they are simple. They are a bore to analyse, they CAN be analysed.

Anyway, this discussion only makes sense if the task at hand involves heavy log analysis, don't complicate what is simple when it isn't needed.

As for the razor analogy, you're right, however I wouldn't change my beard to be "razor compatible only". In the software world I'd say it is still not uncommon to find yourself "stranded in a desert island".

laumars11y ago

Oh jeez. Yes there are better and more performant tools for parsing optimised binary databases; nobody disputes that. And yes, tools like Splunk are more user friendly than grep; nobody disputes that either. But to advocate a binary only system for logs is short sighted because logs are the goto when everything else fails and thus need to be readable when every other tool dies. There's quite a few scenarios that could cause this too:

  * log file corruption - text parsing would still work,

  * tooling gets deleted - there's a million ways you
    can still render plain text even when you've lost
    half your POSIX/GNU userland,

  * network connection problems, breaking push to a
    centralised database - local text copies would still
    be readable.

In his previous blog post he commented that there's no point running both a local text version and a binary version, but since the entirety of his rant is really about tooling rather than log file format, I'm yet to see a convincing argument against running the two paradigms in parallel.

geographomics11y ago

The ease of recovering data from a corrupted log file depends on whether the logged events have been written as sequential records. This is true for text-based logs (the record delimited being a newline), and is also true of the most popular binary (i.e. structured) log formats, namely Windows event logs, and systemd's journals. Probably not if you're storing them in a more general purpose database though.

So this really is dependant on the file format of your log data, rather than an inherent difference between text and binary logging.

laumars11y ago

But if you're not storing them in a database then your primary advantage of using a binary format (namely performance) evaporates.

geographomics11y ago

The difference is that a general purpose database typically organises data by fixed-size pages, so new data could be anywhere in the file as there is no guarantee of page ordering with regard to inserts. Whereas a specialised file format for logging would add new records at the end of the file (or in a circular fashion, depending on the design). But will have features similar to a database like a defined schema, and some form of indexing. This is true of systemd journals and Windows event logs anyway.

1 more reply

nailer11y ago

If 'tooling gets deleted' is a problem you probably have much bigger concerns than log files.

laumars11y ago

> If 'tooling gets deleted' is a problem you probably have much bigger concerns than log files.

You do have a bigger concern, but once that needs to be addressed by consulting the log files.

I fully accept that most of the situations I exampled are rare fringe cases, but log files are the go to when all else fails and thus there needs to be a copy that's readable if and when everything else does fail.

gambiter11y ago

'tooling gets deleted' could easily happen after changing logging systems... while it would be shortsighted to uninstall your old logging system entirely (if you have logs laying around in that format) it's not unheard of.

The more likely situation would be that the logs are stored on a shared storage server, and the machine you are using to look at the logs doesn't have the logging system installed.

nailer11y ago

> The more likely situation would be that the logs are stored on a shared storage server, and the machine you are using to look at the logs doesn't have the logging system installed.

So expose the shared storage to a system running any current mainstream Linux distribution. I understand what you're saying, but this still doesn't seems like a huge concern.

1 more reply

arpa11y ago

This is a discussion for a sake of discussion. The way I see it is that author has a niche situation on his hands and therefore should use a product designed for that particular niche, instead of complaining how everyone's wrong and trying to shove his perspective down peoples' throats.

4ydx11y ago

Sounds like somebody in the systemd camp. I really dislike added complexity when it is totally unnecessary. If people want to transform their logs into a different storage format, that is up to them. Text files, however, are a fantastically simple way of storing... (drumroll please) text. Surprising /s

robinhouston11y ago

> For example: find all logs between 2013-12-24 and 2015-04-11, valid dates only.

That’s a straw man. If you’re grepping logs, you don’t need a regular expression that matches only valid dates because you can assume that the timestamps on the log records are valid dates. But I suppose

    2013-12-(2[4-9]|3.)|2014-..-..|2015-0([123]-..|4-(0.|1[01]))

doesn’t look so bad.

The whole thing is similarly exaggerated.

ghshephard11y ago

Not to mention 99.9% of the searches one does of a log file isn't really that complex. Heck, I'm willing to wager that 90% + of my searches over the last 20 years have been in log files from a particular day.

That's the thing about having simple text log files - the cognitive load required to pull data out of them, often into a format that can then be manipulated by another tool (awk, being one of the more well known), is so low that you can perform them without a context switch.

If you have a problem, you can reach into the log files, pull out the data you need, possibly massage/sum/count particular records with awk, all without missing a beat.

This is particularly important for sysadmins who may be managing dozens of different applications and subsystems. Text files pull all of them together.

But, and here is the most important thing that people need to realize - for scenarios in which complex searching is required, by all means move it into a binary format - that just makes sense if you really need to do so.

The argument isn't all text instead of binary, it is at least text and then use binary where it makes sense.

cbd198411y ago

More to the point: Text logs are just as structured as binary logs, but they have the additional property of not being as opaque and, therefore, being immediately usable with more preexisting, well-tested, well-known tooling.

deathanatos11y ago

> If you’re grepping logs, you don’t need a regular expression that matches only valid dates because you can assume that the timestamps on the log records are valid dates.

Even _if_ I agreed with your assumption[1], are you actually suggesting that

    2013-12-(2[4-9]|3.)|2014-..-..|2015-0([123]-..|4-(0.|1[01]))

is a serious solution? I admit that it is shorter than the author's solution, _but it still proves his point_.

And then what about multi-line log lines? `grep` can't tell where the next line is; sure, I can -A, but there's no number I can plug in that's going to just work: I need to guess, and if I get a truncated result or too much output, adjust. Worse, I get too much output _and_ a truncated record where I need it…

    log-cat --from 2013-12-24 --to 2015-04-11 | grep <further processing>

[1] most log file formats I've run across do not guarantee the date to appear in a given location.

nailer11y ago

Using regexs for time is like using regexs for HTML: it's possible-ish, but most people are probably doing it wrong and storing things using their correct data structures is a much simpler solution.

TillE11y ago

Or you could just write about five lines of Python that splits up each line and uses datetime for comparisons.

4ydx11y ago

Agreed. It makes one wonder just how much administration this person has actually done in their life.

erikb11y ago

After reading the article I wonder if there are lots of tools that do all the binary advantages in indexes but leave the logs as text files, why that is not fine. To get the binary advantage the log does not have to be binary.

The example with the timestamps is also strange. No matter how you store the timestamps, parsing a humanly reasonable query like "give me 10 hours starting from last Friday 2am" to an actual filter is a complex problem. The problem is complex no matter how you store your timestamp. You can choose to do the complexity before and create complex index structures. You can choose to have complex algorithms to parse simple timestamps in binary or text form, you can build complex regexes. But something needs to be complex, because the problem space is. Just being binary doesn't help you.

And that's really the point here, isn't it? Just being binary in itself is not an advantage. It doesn't even mean by itself that it will save disk space. But text in itself is an advantage, always, because text can be read by humans without help (and in some instances without any training or IT education), binary not.

Yesterday I was thinking there might be something about binary logs. Now I'm convinced there isn't. The only disadvantage seems to be that you also lose disk space if you store it in clear text. But disk space isn't an issue in most situations (and in many situations where it is an issue you might have resources and tools at hand to handle that as well) It is added complexity for no real advantage. Thanks for clearing that up.

geographomics11y ago

Another advantage of using structured data rather than free-form text is that you can more precisely encode the essence of the event, with fields for timestamp, event source, type of event, its severity, any important parameters, and so on. This permits logging to be independent of the language of the system operator. Rather than grepping for what is almost always English text, one can query a language-independent set of fields, and then, if a suitable translation has been done, see the event in one's native language.

When applied widely throughout a system, this leads to the internationalisation of log messages. Thus lessening the anglocentric bias in systems software. Windows has done this for years, at least with its own system logging (other applications can still put free-form text into the event logs if they wish.)

erikb11y ago

About your first point: Independence. You are less independent of English and more dependent on the binary format and the tools who can handle it. It's a trade-off. And it might be just an opinion, but for me it's not a good trade-off. Learning English was a one-time endeavour for me. But binary formats and tools have to be learned separately.

About what you put in the log message: You can also put different fields in a line of text. Not getting the advantage or trade-off here.

About the internationalisations: As non-English developers we force all our systems who have logging internationalisation to English system language so we have a common ground for the messages. Understanding the English message is nearly no burden. Log Messages are Event triggers, either in code or in a developer's/admin's mind. If I get a log message in my native language I don't know which event that triggers, which makes it actually harder.

Really. I don't know any non-English person who considers log internationalisation a good thing. Fighting anglocentricism is a very anglocentric topic. Outside of UK/US that's a non topic. We (non-English people) are happy that there is a language we can use to talk to each other and we don't really care how it came to be that widely known.

And even if you don't speak English, I don't see the advantage of parsing \x03 instead of "Error:.*". Both are strings that have a meaning which is rather independent of its encoding.

geographomics11y ago

This is also just anecdotal, but I used to work with a Chinese sysadmin (I was in the UK, he was in China) who found it much more preferable to work on the localised Windows servers we had installed over there, as all the UI and messages were in his native tongue. I'm sure it was easier for him to gain expertise in the tools he needed for his work, than to become proficient enough in English to understand every obscure log or error message that the system might throw at him.

ghshephard11y ago

With regards to disk space - compressed text logs are pretty common. The frequency with which they are compressed is adjustable, and, gzcat is a pretty well known mechanism for opening them.

rquirk11y ago

> "give me 10 hours starting from last Friday 2am"

journalctl --since="$(date -d'last friday 2am' '+%F %X')" --until="$(date -d'last friday 2am + 10 hours' '+%F %X')"

Now I'm no systemd apologist but maybe some of the hate towards systemd, journald and pals is unwarranted. If one gives these newer tools a chance, they actually have some nice features. Despite the Internet's opinion, seems like they were not actually created to make Linux users' lives difficult.

If binary logs turn out to be the wrong technological decision, I'm sure we'll figure that out and change over to text logs again. All it would take is a few key savvy users losing their logs to journald corruption and the change in the wider "ecosystem" would be made. But if all goes well... then what's to complain about? :-D

erikb11y ago

It's not a counter argument to journalctl's usefulness being independent of binary logs, right? In fact using that nice tool for querying I don't really care how it stores logs under the hood.

rquirk11y ago

Right, I'm sure you could do the same thing with grep and a classic /var/log/messages - in fact there's probably something to do this already. Or you'd find a gz from the day in question and read all of it. Just happened that I'd recently read the man page for journald and that was something I recalled.

indymike11y ago

Grepping logs is terrible. Reverse engineering a binary format so you can diagnose why you are is worse. Logs should be handled as text until they reach their long term storage... then whatever helps analyze and query is fine...

4ydx11y ago

My main problem with this is that ascii is not something that will ever change over time. The data format is wonderfully static. Forever. Introduce a binary format? You get versioning. It is a major downside.

Frondo11y ago

What you lose when you move away from text logs is not any real benefit; what you lose is the illusion of control you have with text logs.

Text logs can be corrupted, text logs can be made unusable, you need a ton of domain-specific knowledge to even begin to make sense of text logs, etc.

But there's always a sense that, if you had the time, you could still personally extract meaning from them. With binary logs, you couldn't personally sit there and read them out line by line.

The issue is psychology, not pragmatism, and that's why text logs have been so sticky for so long.

4ydx11y ago

A substring of text may or may not be a date and based on the excellent tools available in linux you can decide how to extract that "data point". If binary logging is little more than a stream of text, then that is fine, but I seriously doubt that is the push happening. Personally I prefer having a raw stream of data that I have to work with as best as I can rather than having to use some flag defined by somebody else to range across dates. That is the fundamental difference it seems: do you want a collection of tools that can be applied in a variety of ways or do you want the "one way" (with potential versioning... have fun!).

Again if the binary log is simply better compressed data, well we have ways of compressing text already as an afterthought. This really, fundamentally, seems to be a conflict in how people want to administer their systems and, for the most part, this seems to be about creating a "tool" that people then have to pay money for to better understand.

jack911y ago

> Does database store the data in text files? No? That's my point.

This guy is a first class idiot who knows enough to reformulate a decided issue into yet another troll article. "a database (which then goes and stores the data in a binary format)". How about a text file IS a database. It's encoded 1s and 0s in a universal format instead of the binary DB format which can be corrupted with the slightest modification or hardware failure.

KaiserPro11y ago

I think there are a number of issues that are getting mushed into one.

* Journal is just terrible.

* some text logs are perfectly fine.

* when you are in rescue mode, you want text logs

* some people use text logs as a way to compile metrics

I think the most annoying thing for me about journald is that it forces you to do something their way. However its optional, and in centos7 its turned off, or its beaten into such a way that I haven't noticed its there.... (if that is the case, I've not really bothered to look, I poked about to see if logs still live in /var/log/ they did, and that was the end of it. Yes, I know that if this is the case, I've just undermined my case. Shhhhh.)

/var/log/messages for kernel oopes, auth for login, and all the traditional systemy type things are good for text logs. Mainly because 99.9% of the time you get less than 10 lines a minute.

being able to sed, grep, tee and pipe text files are brilliant on a slow connection with limited time/mental capacity. ie. a rescue situation. I'm sure there will be a multitude of stable tools that'll popup to deal with a standardised binary log format, in about ten years.

The last point is the big kicker here. This is where, quite correctly its time to question the use of grep. Regex is terrible. Its a force/problem amplfier. If you get it correct, well done. Wrong? you might not even know.

Unless you don't have a choice, you need to make sure that your app kicks out metrics directly. Or as close to directly as possible. Failing that you need to use something like elastic search. However because you're getting the metrics as an afterthought, you have to do much more work to make sure that they are correct. (although forcing metrics into an app is often non trivial)

If you're starting from scratch, writing custom software, and think that log diving is a great way to collect metrics, you've failed.

if you are using off the shelf parts, its worth Spending the time and interrogating the API to gather stats directly. you never know, collectd might have already done the hard work for you.

The basic argument he puts forth is this: text logs are a terrible way to interchange and store metrics. And yes, he is correct.

ownagefool11y ago

Not that I think it matters all that much but journald will be running on your centos box, but it'll be configured to spew text into /var/log.

Just type journalctl and you should see the data there.

KaiserPro11y ago

I suspected as much, so long as I don't have to use its tools I couldn't care less. Unless it eats resources.

kasabali11y ago

> Unless it eats resources

I have no idea on its effect in practice, but theoretically it should have negative effect because it soaks in all the logs, and then it forwards them to the logging daemon, even when journal storage is turned off.

sika_grr11y ago

Of course you need to log some data in textual format for emergencies, but if you had a tool that indexes events on timestamps, servers, monitorees, severity and event type, while severely reducing the storage required, you would be able to log much more data, and find problems faster. Arguing binary vs text logs is like arguing serial port vs USB on some industrial systems.

arenaninja11y ago

Great to see some effort in this area. I've been using New Relic and it's pretty great for errors because we've setup Slack/email notifications. However, there's nothing for general log (e.g.: access log) parsing. I'm installing an ELK stack on my machine right now and hope that it's enough

amelius11y ago

Doesn't this just mean that we should have a more "intelligent" version of grep? For example, this "supergrep" could periodically index the files it is used on, so searching becomes faster.

erikb11y ago

*edit: I'm wrong, this was not the link posted yesterday. https://news.ycombinator.com/item?id=9496850

hartator11y ago

Isn't everything will be solved by a some kind of grep that's date/timespan aware?

lurkinggrue11y ago

But then how will I watch the log files go by in real time?

deathanatos11y ago

It seems to me that most of the worry about a binary log file being "opaque" could be solved with a single utility:

    log-cat <binary-log-file>

… that just outputs it in text. Then you can attack the problem with whatever text-based tools you want.

But to me, having a utility that I could do things like, get a range of log lines — in sorted order —, or, grep on just the message, would be amazing. These are all things that proponents of grep I'm sure will say "you can!" do with grep… but you can't.

The dates example was a good one. I'd much rather:

    log-cat <bin-log> --from 2014-12-14 --to 2015-01-27

Also, my log files are not "sorted". They are, but they're sorted _per-process_, and I might have multiple instances of some daemon running (perhaps on this VM, perhaps across many VMs), and it's really useful to see their logs merged together[2]. For this, you need to understand the notion of where a record starts and ends, because you need to re-order whole records. (And log records' messages are _going_ to contain newlines. I'm not logging a backtrace on one line.) grep doesn't sort. |sort doesn't know enough about a text log to adequately sort, but

    $ log-cat logs/*.log --from 2014-12-14 --to 2015-01-27
    <sorted output!>

Binary files offer the opportunity for structured data. It's really annoying to try to find all 5xx's in a log, and your grep matches the process ID, the line number, the time of day…

I've seen some well-meaning attempts at trying to do JSON logs, s.t. each line is a JSON object[1]. (I've also seen it attempted were all that is available is a rudimentary format string, and the first " breaks everything.)

Lastly, log files sometimes go into metrics (I don't really think this is a good idea, personally, but we need better libraries here too…). Is your log format even parseable? I've yet to run across one that had an unambiguous grammar: a newline in the middle of a log message, with the right text on the second line, can easily get picked up as a date, and suddenly, it's a new record. Every log file "parser" I've seen was a heuristic matcher, and I've seem most all of them make mistakes. With the simple "log-cat" above, you can instantly turn a binary log into a text one. The reverse — if possible — is likely to be a "best-effort" transformation.

[1]: the log writer is forbidden to output a newline inside the object. This doesn't diminish what you can output in JSON, and allows newline to be the record separator.

[2]: I get requests from mobile developers tell me that the server isn't acting correctly all the time. In order to debug the situation, I first need to _find_ their request in the log. I don't know what process on what VM handled their request, but I often have a _very_ narrow time-range that it occurred in.

scrollaway11y ago

What you're describing is journalctl.

deathanatos11y ago

I've not yet had the opportunity to try systemd. :-) Someday.

imaginenore11y ago

Logstash, Kibana, Splunk

geographomics11y ago

Windows systems have had better log querying tools than grep for years now, with a well structured log file format to match. It's good to see Linux distributions finally catching up in this regard.

Not that the log files on Linux are all entirely text-based anyway. The wtmpx and btmpx files are of a binary format, with specialised tools for querying. I don't see anyone complaining about these and insisting that they be converted to a text-only format.

j / k navigate · click thread line to collapse

123 comments

ghshephard11y ago

scrollaway11y ago

> text logs are a universal format

Uh, I don't know what world you live in but I'd like the address because mine sucks in comparison.

Text logs are definitely not a "universal format". Easily accessible, sure. Human readable most of the time? Okay. Universal? Ten times nope.

ghshephard11y ago

But at least you have a fighting chance. What if that exact same data was dumped into a binary file, that you did not know how to decode?

Originally, you had a problem - the data wasn't formatted in a manner that you could parse cleanly.

Now, you have a new problem - not only is the data not formatted properly, it's now in some opaque binary file.

EmanueleAina11y ago

> a binary file, that you did not know how to decode

I agree the issue at hand is not just binary vs. plain text, it's more "how much you want to structure your data".

Also keep in mind that most of the logs right now gets rotated and compressed with gzip, I'd doubt that the above binary formats are less resilient to errors than a gzip stream.

rubenv11y ago

Sure, an opaque binary file is pointless.

But that's not what most logging systems that log to binary files offer. They give you specs (example: http://www.freedesktop.org/wiki/Software/systemd/journal-fil...) and tools.

Binary doesn't have to mean closed/opaque.

2 more replies

crdoconnor11y ago

I've been producing a few services recently which output a chunk of JSON for each log message followed by a newline.

ghshephard11y ago

2 more replies

aidos11y ago

EmanueleAina11y ago

That said, I'd be far more at ease writing a SQL query to extract analytics from logs than cooking up some regexes and doing complex stuff with awk.

I think that binary logs give you some compelling features, without taking away any: you can always just dump the logs on stdout and use grep as much as you want. :)

AlfaWolph11y ago

He's referring to the universal format being text itself.

peterwwillis11y ago

rkrzr11y ago

"But what we find totally unacceptable is log files being shoved into binary repositories as the primary storage location"

Would the best practice then be to keep your logs around as (compressed) text, but additionally feed it to your log analysis system of choice for greater querying capabilities?

ghshephard11y ago

But don't cripple me by shoving your primary log files into binary format so I can't quickly pull data out of them with awk/grep/sed when I need to quickly diagnose a local issue.

magnifyingglass11y ago

Agreed. Logs are for when everything and anything is broken. They aren't supposed to be pretty or highly functional, they are just meant as a starting point for gathering data.

cones68811y ago

why not just do both?

Issue is if you every need to pull the logs for court and you have messed with them (i.e. normalized them and stuffed them into a DB) then your chain of custody is broken.

Best of both worlds means parsed out normalisation so I don't have to remember that Juniper calls source ip srcIP and Cisco SourceIP, but the original logs under the covers for grepping if you need.

sobkas11y ago

> text logs are a universal format

Then punch in the face is a universal form of communication. Also EBCDIC is the only encoding future will recognize!

thaumaturgy11y ago

Cool, so which standard binary log storage format should we all switch to?

geographomics11y ago

nieve11y ago

Even if you can manage one per OS, it's not good enough. Have you ever worked in a non-monoculture or dealt with recovery of a system severely damaged by malice or accident?

felixgallo11y ago

Recent log spelunking for me included 'cat log.? | grep fail | sed 's/^.worker_id$//g' | awk '{ print $5, $4 }' | sort -n -r | sed 30q'.

There's no analogue in any binary logging system I've ever found.

dragonwriter11y ago

1 more reply

geographomics11y ago

    Get-EventLog -LogName System | Where {$_.Message -Match "something"}

To process text as fields, as with awk, one would use the Split method (at least to start off with):

    Get-EventLog -Log System | Where {$_.Message -Match "something"} | %{ $_.Message.Split()[5,4] }

    Get-EventLog -Log System -Source Microsoft-Windows-Power-Troubleshooter -InstanceID 1 | Select-Object @{n="SleepTime";e={$_.ReplacementStrings[0]}}, @{n="WakeTime";e={$_.ReplacementStrings[1]}}, @{n="SleepDuration";e={([DateTime]$_.ReplacementStrings[1])-([DateTime]$_.ReplacementStrings[0])}}

One can also sort and get unique values, just as in Unix-type systems - this command lists all drives defragmented in the past 30 days:

   Get-EventLog -Log Application -Source Microsoft-Windows-Defrag -InstanceID 258 -After (Get-Date).AddDays(-30) | Select @{n="Drive";e={$_.ReplacementStrings[1]}} | Sort Drive | Unique -AsString

So all the same capabilities are there, and then some. You just need to know your tools well enough to take advantage of it.

1 more reply

pjc5011y ago

Binary logs may be fine for you, but don't force it on us!

mrweasel11y ago

>For small systems, grep works fine

If it's okay to claim that binary logging is the only way to go, because you have hundreds of servers, it's also okay to claim that text files are the only solution, because I just have one server.

icebraining11y ago

In the setup the author presents, using syslog-ng and elasticsearch, it seems the logs are serialized as json for the transmission.

mrweasel11y ago

Yes, ElasticSearch is storing data in binary, and that's fine, but you're not going to ship the raw Systemd binary log to ElasticSearch, nor any other binary logs for that matter.

nrr11y ago

(If you're shunting _all_ of your log data off at that scale, you're crazy, and you'll melt your switches if you aren't careful.)

ghshephard11y ago

grep definitely breaks down on large systems. I have one environment with approx 5 million nodes - (1e6), and the only way to coherently manage the log updates from them is in binary format.

But even still - I like to have the text files as journals of original entry - so I can occasionally do a tail -f incoming.log| egrep -i "somedevice".

And having the original files in text format is zero impediment to getting them into handy binary database form.

nrr11y ago

I hate arguing semantics, but 1e6 is not just large but very large indeed. (:

Now, that's the integration and aggregation side of this exercise.

1 more reply

mdekkers11y ago

"The number of people administering small systems is much greater than the number of people administering large systems"

Do you have any evidence for this statement? Because it sounds all kinds of wrong.

pjc5011y ago

Power-law distribution and economies of scale?

danieltillett11y ago

nieve11y ago

By definition just about everyone who's actually working in DevOps counts.

4ydx11y ago

http://www.internetlivestats.com/total-number-of-websites/

If I read it correctly there are about 250 million active sites (roughly). It seems unlikely that they are all massive corporate sites.

nl11y ago

I'm pretty sure that counts (eg) each wordpress.com subdomain as a separate website. [1] counts like that and gives a roughly comparable number.

That gives a lot of economy scales.

[1] http://news.netcraft.com/archives/category/web-server-survey...

1 more reply

mdekkers11y ago

...systemd?

onli11y ago

EmanueleAina11y ago

rlpb11y ago

Take this philosophy to an extreme and you end up with a dedicated data format and tooling/APIs to access the data for every subsystem, not just logging. Essentially, this is Windows.

pjmlp11y ago

> Essentially, this is Windows.

Actually any non-UNIX OS clone out there, including mainframe and embedded OSes.

mugsie11y ago

Honestly - I agree about the ELK stack side - piping all your logs into ES / Logstash is a great idea. (Or Splunk / Greylog / Logentries)

If you run any sort of distributed system, this is vital. And while that counts as binary logs, I would argue that on the local boxes it should stay text.

I would agree, if you are running any sort of complex queries on your data - go to logstash, and do it there - it much nicer than regexes.

I don't have to remember some weird incantation to see the local logs, worry about corruption etc.

Instead, if you have text files being written out, and something like beaver collecting them and sending them to logstash, you have the best of both worlds.

Spooky2311y ago

Windows has had binary logging forever. Is windows administration some wonderland of awesome capability for getting intelligence out of logs? Hell no.

For administering Unix like systems, the ability to use a variety of tools to process streams of text is an advantage and valuable capability.

dfox11y ago

Freeform text logs usually contain more detail as to what exactly happened.

geographomics11y ago

indymike11y ago

phn11y ago

Yeah,in the presence of adequate tooling you don't need to grep logs. But how much more effort is required to use those tool-friendly loggings? Where is your god when the tool fails?

For me the main reason to access plaintext logs is they seldom fail, and they are simple. They are a bore to analyse, they CAN be analysed.

Anyway, this discussion only makes sense if the task at hand involves heavy log analysis, don't complicate what is simple when it isn't needed.

laumars11y ago

  * log file corruption - text parsing would still work,

  * tooling gets deleted - there's a million ways you
    can still render plain text even when you've lost
    half your POSIX/GNU userland,

  * network connection problems, breaking push to a
    centralised database - local text copies would still
    be readable.

geographomics11y ago

So this really is dependant on the file format of your log data, rather than an inherent difference between text and binary logging.

laumars11y ago

But if you're not storing them in a database then your primary advantage of using a binary format (namely performance) evaporates.

geographomics11y ago

1 more reply

nailer11y ago

If 'tooling gets deleted' is a problem you probably have much bigger concerns than log files.

laumars11y ago

> If 'tooling gets deleted' is a problem you probably have much bigger concerns than log files.

You do have a bigger concern, but once that needs to be addressed by consulting the log files.

gambiter11y ago

The more likely situation would be that the logs are stored on a shared storage server, and the machine you are using to look at the logs doesn't have the logging system installed.

nailer11y ago

> The more likely situation would be that the logs are stored on a shared storage server, and the machine you are using to look at the logs doesn't have the logging system installed.

So expose the shared storage to a system running any current mainstream Linux distribution. I understand what you're saying, but this still doesn't seems like a huge concern.

1 more reply

arpa11y ago

4ydx11y ago

robinhouston11y ago

> For example: find all logs between 2013-12-24 and 2015-04-11, valid dates only.

    2013-12-(2[4-9]|3.)|2014-..-..|2015-0([123]-..|4-(0.|1[01]))

doesn’t look so bad.

The whole thing is similarly exaggerated.

ghshephard11y ago

If you have a problem, you can reach into the log files, pull out the data you need, possibly massage/sum/count particular records with awk, all without missing a beat.

This is particularly important for sysadmins who may be managing dozens of different applications and subsystems. Text files pull all of them together.

The argument isn't all text instead of binary, it is at least text and then use binary where it makes sense.

cbd198411y ago

deathanatos11y ago

> If you’re grepping logs, you don’t need a regular expression that matches only valid dates because you can assume that the timestamps on the log records are valid dates.

Even _if_ I agreed with your assumption[1], are you actually suggesting that

    2013-12-(2[4-9]|3.)|2014-..-..|2015-0([123]-..|4-(0.|1[01]))

is a serious solution? I admit that it is shorter than the author's solution, _but it still proves his point_.

    log-cat --from 2013-12-24 --to 2015-04-11 | grep <further processing>

[1] most log file formats I've run across do not guarantee the date to appear in a given location.

nailer11y ago

Using regexs for time is like using regexs for HTML: it's possible-ish, but most people are probably doing it wrong and storing things using their correct data structures is a much simpler solution.

TillE11y ago

Or you could just write about five lines of Python that splits up each line and uses datetime for comparisons.

4ydx11y ago

Agreed. It makes one wonder just how much administration this person has actually done in their life.

erikb11y ago

geographomics11y ago

erikb11y ago

About what you put in the log message: You can also put different fields in a line of text. Not getting the advantage or trade-off here.

And even if you don't speak English, I don't see the advantage of parsing \x03 instead of "Error:.*". Both are strings that have a meaning which is rather independent of its encoding.

geographomics11y ago

ghshephard11y ago

With regards to disk space - compressed text logs are pretty common. The frequency with which they are compressed is adjustable, and, gzcat is a pretty well known mechanism for opening them.

rquirk11y ago

> "give me 10 hours starting from last Friday 2am"

journalctl --since="$(date -d'last friday 2am' '+%F %X')" --until="$(date -d'last friday 2am + 10 hours' '+%F %X')"

erikb11y ago

It's not a counter argument to journalctl's usefulness being independent of binary logs, right? In fact using that nice tool for querying I don't really care how it stores logs under the hood.

rquirk11y ago

indymike11y ago

4ydx11y ago

Frondo11y ago

What you lose when you move away from text logs is not any real benefit; what you lose is the illusion of control you have with text logs.

Text logs can be corrupted, text logs can be made unusable, you need a ton of domain-specific knowledge to even begin to make sense of text logs, etc.

But there's always a sense that, if you had the time, you could still personally extract meaning from them. With binary logs, you couldn't personally sit there and read them out line by line.

The issue is psychology, not pragmatism, and that's why text logs have been so sticky for so long.

4ydx11y ago

jack911y ago

> Does database store the data in text files? No? That's my point.

KaiserPro11y ago

I think there are a number of issues that are getting mushed into one.

* Journal is just terrible.

* some text logs are perfectly fine.

* when you are in rescue mode, you want text logs

* some people use text logs as a way to compile metrics

/var/log/messages for kernel oopes, auth for login, and all the traditional systemy type things are good for text logs. Mainly because 99.9% of the time you get less than 10 lines a minute.

If you're starting from scratch, writing custom software, and think that log diving is a great way to collect metrics, you've failed.

if you are using off the shelf parts, its worth Spending the time and interrogating the API to gather stats directly. you never know, collectd might have already done the hard work for you.

The basic argument he puts forth is this: text logs are a terrible way to interchange and store metrics. And yes, he is correct.

ownagefool11y ago

Not that I think it matters all that much but journald will be running on your centos box, but it'll be configured to spew text into /var/log.

Just type journalctl and you should see the data there.

KaiserPro11y ago

I suspected as much, so long as I don't have to use its tools I couldn't care less. Unless it eats resources.

kasabali11y ago

> Unless it eats resources

sika_grr11y ago

arenaninja11y ago

amelius11y ago

Doesn't this just mean that we should have a more "intelligent" version of grep? For example, this "supergrep" could periodically index the files it is used on, so searching becomes faster.

erikb11y ago

*edit: I'm wrong, this was not the link posted yesterday. https://news.ycombinator.com/item?id=9496850

hartator11y ago

Isn't everything will be solved by a some kind of grep that's date/timespan aware?

lurkinggrue11y ago

But then how will I watch the log files go by in real time?

deathanatos11y ago

It seems to me that most of the worry about a binary log file being "opaque" could be solved with a single utility:

    log-cat <binary-log-file>

… that just outputs it in text. Then you can attack the problem with whatever text-based tools you want.

The dates example was a good one. I'd much rather:

    log-cat <bin-log> --from 2014-12-14 --to 2015-01-27

    $ log-cat logs/*.log --from 2014-12-14 --to 2015-01-27
    <sorted output!>

Binary files offer the opportunity for structured data. It's really annoying to try to find all 5xx's in a log, and your grep matches the process ID, the line number, the time of day…

[1]: the log writer is forbidden to output a newline inside the object. This doesn't diminish what you can output in JSON, and allows newline to be the record separator.

scrollaway11y ago

What you're describing is journalctl.

deathanatos11y ago

I've not yet had the opportunity to try systemd. :-) Someday.

imaginenore11y ago

Logstash, Kibana, Splunk

geographomics11y ago

Windows systems have had better log querying tools than grep for years now, with a well structured log file format to match. It's good to see Linux distributions finally catching up in this regard.

j / k navigate · click thread line to collapse