undefined | Better HN

0 pointsedmack11y ago0 comments

Getting loads of core services out into third parties is really wonderful for logging. E.g. if email sending happens in Mandrill, then you never need to write decent logging calls for that and you have a reliable source of truth!

0 comments

8 comments · 1 top-level

tlack11y ago· 7 in thread

Except you won't know if your server ever sent it to Mandrill. :) Always be extremely verbose with logging!

porker11y ago

This brings up a tangential problem I've yet to solve: how do you warn that something didn't happen when it should?

E.g. you have a script that does backups. You log the script's output, but one day something fails and the script is no longer executed.

Some form of dead man's handle is needed; the only way I can think of is to set up a monitoring service to check your log store for these entries every X hours.

Any alternatives?

tlack11y ago

I've had this same issue over and over again in my career.

I've toyed with the idea of writing a daily "sanity checker" in crontab that verifies various concepts of system health.

Examples: Did the latest batch of data transfer to S3? Did we delete old customer accounts today? Did we get any signups (because if not, something may be broken, but not triggering an exception report etc)? Did we send out daily report emails?

But I could see this easily becoming a pointless exercise, and I doubt I'd have the time to keep the sanity checker updated with the latest requirements. In fact, the sanity checker would probably become insane pretty quickly.

Perhaps the platform itself should do this for you, in some way. Idea: while coding, indicate that this procedure should be running periodically, ie:

    Monitor.registerPeriodicTask('email-reports', 'daily')

and then the system would log every time it occurs, with a generic task that would run periodically and scan for things that should have occurred, but haven't in some while.

rwbhn11y ago

Monitor that the newest backup is less than N hours old.

1 more reply

reymus11y ago

I have always heard the opposite, that too much logging is as bad as no logging. I see the point of having the logs to be able to find out what happened, but what happens when there' s so much logging that the information needed is just buried into huge amount of noise?

msielski11y ago

This is true, without the right tools. I am moving to logstash with kibana to do this, and it's looking very promising. See http://www.elasticsearch.org/videos/kibana-logstash/

1 more reply

exelius11y ago

This was true before Splunk. If you logged too much, your logs could start to outstrip the assumptions behind your log rotations and cause trouble. Now the common wisdom is to just log everything so you can Splunk it later if you have a problem. Verbose logging + Splunk have made production incident identification so much easier than it used to be.

Splunk DOES charge by the GB, but it's not very expensive in the long run.

gizmo68611y ago

My favorite systems to work with are the ones with overly verbose logs, where the overly verbose parts were clearly tagged and could be filtered out. Generally, we would never look at the verbose lines, and even when we did, we would normally have some idea what we were looking for, and be able to filter somewhat for it.

j / k navigate · click thread line to collapse

0 comments

8 comments · 1 top-level

tlack11y ago· 7 in thread

Except you won't know if your server ever sent it to Mandrill. :) Always be extremely verbose with logging!

porker11y ago

This brings up a tangential problem I've yet to solve: how do you warn that something didn't happen when it should?

E.g. you have a script that does backups. You log the script's output, but one day something fails and the script is no longer executed.

Some form of dead man's handle is needed; the only way I can think of is to set up a monitoring service to check your log store for these entries every X hours.

Any alternatives?

tlack11y ago

I've had this same issue over and over again in my career.

I've toyed with the idea of writing a daily "sanity checker" in crontab that verifies various concepts of system health.

Perhaps the platform itself should do this for you, in some way. Idea: while coding, indicate that this procedure should be running periodically, ie:

    Monitor.registerPeriodicTask('email-reports', 'daily')

and then the system would log every time it occurs, with a generic task that would run periodically and scan for things that should have occurred, but haven't in some while.

rwbhn11y ago

Monitor that the newest backup is less than N hours old.

1 more reply

reymus11y ago

msielski11y ago

This is true, without the right tools. I am moving to logstash with kibana to do this, and it's looking very promising. See http://www.elasticsearch.org/videos/kibana-logstash/

1 more reply

exelius11y ago

Splunk DOES charge by the GB, but it's not very expensive in the long run.

gizmo68611y ago

j / k navigate · click thread line to collapse