I have been adding Riemann alerts for high CPU/disk usage/memory and encountered two issues that need fixing. That’s the downside. The upside is my Clojure is improving.
Anyway, error 1:
ERROR [2017-09-16 17:09:59,617] main - riemann.bin - Couldn't start clojure.lang.Compiler$CompilerException: java.lang.RuntimeException: Invalid token: /percent_bytes-used, compiling:(/etc/riemann/riemann.config:58:52)
No idea what is going on there. Can be hard to google this stuff. Might be a typo somewhere.
And error 2:
WARN [2017-09-16 17:56:17,775] defaultEventExecutorGroup-2-1 - riemann.config - riemann.email$mailer$make_stream__9273$stream__9275@552dcd79 threw com.sun.mail.util.MailConnectException: Couldn't connect to host, port: smtp.gmail.com:587, 25; timeout -1
Wonder what is going on there? Same stuff works on other two instances. Another argument for Dockerising the whole thing. Did wonder whether this due to one of my nodes spamming gmail. Tomorrow’s problem. Made some progress but now time for a beer.
I have decided to blog about some of the books I have been reading. Reading is a vital part of my mission.
I am not going to “review” books, I am going to comment on them and how they have related to my education.
I am about to finish Release It! – author’s exclamation mark, not mine. When the same book is referenced in a few other books you have been reading, it is probably worth a look. I think that maybe I have tackled this one a bit too early. It definitely taught me a few things, but it went into some technical depths that I am equipped to deal with yet. It is also one of the few tech books that has made me laugh. More than once. It has helped me make the case for failing fast, timeouts, bulkheads and circuit breakers on a current assignment too.
Next up, I may continue with Site Reliability Engineering. Other options are re-reading The DevOps Handbook. I finished it in April but given it was such an easy read, I am going to try it again to see if it offers any insights I missed last time around. I am also considering re-reading The Docker Book for similar reasons.
My full disks and Riemann logging issues continued over the past few days but appear to have been calmed. Sadly I am not sure why. I have a couple of theories though.
Firstly, after running through section 6.2 of The Art of Monitoring (checking processes are running), I pasted in new riemann.config files. Not 100% sure but wonder if that has corrected a previous error – all the more reason for automation/Puppet/Docker eh?
Which brings me onto theory two. I wonder if I stopped midway between sections and needed do further work to stop this happening. This has happened to me before with The Docker Book when I exposed the Docker API publicly. That’s a subject for another blog.
Of course, I may not have fixed this issue at all. If I have, I would like to know what the fix was. Chapters coming up will graph disk usage I believe.
On another note, it has occurred to me that there is probably huge value in revisiting the books I have gone through recently now that I know more. That kind of makes my heart sink given my mission’s target date. I have a continuing sense that I am learning different things to what the books intend. Still learning in this space is surely going to be useful.
Finally, this pulled up outside my house this week. Someone is trying to tell me something. Insert your own Unikernel joke here.
Over the past two weeks I have become a bit like Richard Burton in The Medusa Touch. In this film he plays a character who has visions of disasters before they happen.
It seems that I only have to read about how unwise it is to share a database between customer and reporting traffic in Sam Newman’s Microservices before a slow running reporting query creates issues for customers.
On another occasion, I read about circuit breakers and fail-fast timeouts in Release It! and almost immediately afterwards hit an issue that would have been avoidable if circuit breakers and fail-fast timeouts were in place.
And then, shortly after listening to three principles of CI on this podcast (whilst dog-walking, naturally), I run into issues with devs checking in code whilst the build pipeline is down.
For the time being, my team have asked me to stop reading about things that can go wrong or at least warn them in advance.
Realising that I am learning a lot about Unix and infra, but could learn this without ever learning anything about DevOps. I can see where DevOps could make my life easier but the books I am following remove automation, duplication in order to explain config and the like.
Anyway my Riemann Mission Control is still in a state even when I free up disk space. In today’s lesson I was setting up the collectd write-riemann plugin and had errors as follows on my problem-child host:
/etc/collectd.d$ sudo service collectd start * Starting statistics collection and monitoring daemon collectd ERROR: lt_dlopen ("/usr/lib/collectd/write_riemann. so") failed: file not found. The most common cause for this problem is missing dependencies. Use ldd(1) to check the dependencies of the plugin / shared object.
This lead to my first contact with the ldd command which revealed:
libprotobuf-c.so.0 => not found
Finally with a bit of research the below fixed my issue:
sudo apt-get install protobuf-c-compiler protobuf-compiler libprotobuf-c0 libprotobuf-c0-dev
Having configured this plugin on four ubuntu hosts, I wonder why this dependency was missing on only one. Two theories. One – whatever is busting my disk space may be related to the absence of a working dependency. Two – I may have missed a step on one of the four hosts.
Either way, if had configuration management tools or containerised, automated builds I suspect this error may not have occurred at all or at the very least have been fixable before building four hosts using the same image/container/scripts.
One thought I had was that it would be great to rebuild riemannmc. Of course this would take a while to retrace my steps unless I had a Docker image to help me out (for example).
The other issue I had today is that collectd on my Red Hat hosts is not logging. One for tomorrow.
Quiet-ish couple of days on the mission front. Rugby and work filling a lot of time. However, in recent months I have learnt to regain time lost on dog-walking by listening to the following wonderful podcasts:
All brilliant in their own ways. My introduction to James Turnbull and his books came from the DevOps cafe. The Kelsey Hightower episode was a classic too. The reminders I add to my phone to look at post-walk grows every time,
If anyone can recommend any more podcasts, please let me know.
Riemann mystery continues
Not had time to look at this in any detail, but issue remains. I clear out the huge files and get an email the following day when the next log files grows to consume all of my space.
It appears the issue occurs, in Art Of Monitoring speak, on my Riemann Mission Control host. So maybe other Riemann hosts are spamming it. We will see. The other two Riemann hosts and all Carbon hosts are performing well.
LCD Soundsystem are back
Finally, it is always a good week when this lot put out new music. Not related to my mission but I am sure it will soundtrack much of my studying in the coming weeks.