Deploying Logstash with Puppet

As an extension to my last post, here’s an initial recipe for deploying Logstash over Puppet. You can grab my gist for the init script here https://gist.github.com/2620449

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
class logstash {
  file {"/opt/logstash-1.1.0-monolithic.jar":
    source => "puppet:///modules/logstash/logstash-1.1.0-monolithic.jar",
    ensure => present
  }
 
  file {"/etc/init.d/logstash":
    source => "puppet:///modules/logstash/logstash",
    ensure => present,
    mode => 0755,
    owner => root,
    group => root
  }
 
  file {"/etc/logstash.conf":
    source => "puppet:///modules/logstash/logstash.conf",
    ensure => present,
    replace => false
  }
 
  service {"logstash":
    ensure => running,
    subscribe => File["/etc/logstash.conf"],
    require => [
      File["/etc/init.d/logstash"],
      File["/etc/logstash.conf"],
      File["/opt/logstash-1.1.0-monolithic.jar"]
    ]
  }
}

 

You probably need different logging configurations per application, in which case the best thing to do is to move the config definition per application, or modify it where needed which is why replace is set to false so the file will only be created by Puppet once.

Centralized logging with Graylog2

Design considerations

Centralized logging isn’t an easy task, you need to be able to handle very large amounts of data with a lot of write operations and heavy indexing which amounts to ample CPU and memory usage making a scalable text indexing and storage backend extremely important as well as a decoupled architecture. Over the course of a month we evaluated various tools and architectures at Praekelt to find one that worked well (never mind many that just don’t work at all).The final configuration uses Ubuntu 12.04 (Precise) for the central server, Graylog2 to receive logs and do analysis etc, RabbitMQ to queue logs for Graylog2 from Logstash which aren’t syslog related, then ElasticSearch and MongoDB  which are used by Graylog2 to store logs and stats.

Logging setup

Logstash

Logstash is a good tool, I’m somewhat annoyed by it’s version dependencies writing directly to Elasticsearch but no matter, we can use Graylog2 to fill the gap.

Syslog

Ubuntu is nice enough to use rsyslog since quite a while ago. While we do want to make use of remote syslog to collect the usual system logs and dispatch them to the log server, what we absolutely don’t want to do is pipe noisy application logs (like HTTP access logs) into syslog. Keeping them all separate has a lot of benefits when it comes to trying to troubleshoot your system later, and avoiding possibly flooding local message facilities away. So while using rsyslog imfile is tempting and easy, it becomes difficult to manage later.

Graylog2

Where Logstash seems to fail on the centralised side of things, Graylog2 is substantially easier to deploy and works on the latest versions of Elasticsearch, and also employs MongoDB as a key store for aggregating statistics which is a very good idea. Graylog2 has some setup guides (and packages) for Ubuntu Lucid – unfortunately Lucid support just ended, so we’ll just configure it from scratch in Precise.

Setup

Install some necessary packages

root@logger:~# aptitude install build-essential rabbitmq-server openjdk-6-jre-headless mongodb rubygems

Grab the deb packages for Elasticsearch from http://www.elasticsearch.org/download/ and both the Graylog2 server and web interface from http://graylog2.org/download.

root@logger:~# dpkg -i elasticsearch-0.19.3.deb

Take note of any errors or missing dependencies and make sure it starts itself up. You don’t need to configure anything else for Elasticsearch. Now get RabbitMQ running a bit more securely.

root@logger:~# rabbitmqctl add_user logging mypassword
Creating user “logging” …done.
root@logger:~# rabbitmqctl set_permissions logging “.*” “.*” “.*”
Setting permissions for user “logging” in vhost “/” …done.
root@logger:~# rabbitmqctl delete_user guest
Deleting user “guest” …done.

First thing to do is setup Logstash to get logs shipped over AMQP.

Client configuration

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
input {
  file {
    type => "syslog"
    path => [ "/var/log/messages", "/var/log/syslog", "/var/log/*.log" ]
  }
  file {
    type => "apache-access"
    path => "/var/log/nginx/access.log"
  }
}
 
output {
  amqp {
    host => "logger.acme.com"
    exchange_type => "fanout"
    name => "rawlogs"
    user => "logging"
    password => "mypassword"
  }
}

Server configuration

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
input {
  amqp {
    type => "all"
    host => "localhost"
    exchange => "rawlogs"
    name => "rawlogs_consumer"
    user => "logging"
    password => "mypassword"
  }
}
 
output {
  stdout { }
  gelf {
    facility => "logstash-gelf"
    host => '127.0.0.1'
  }
}

We leave the stdout output enabled just to check things are working, it’s a good idea to disable it when everything is running. We essentially just use Logstash as a broker to get stuff from RabbitMQ into Graylog2 via GELF. Graylog2 does support AMQP directly, but there are some good reasons we do this – namely it doesn’t support using AMQP in the same way that Logstash does.

Kickstart both of them the same way, assuming you stored the config as logstash.conf

# java -jar logstash-1.1.0-monolithic.jar agent -f logstash.conf

Next get Graylog2 going. Extract both the server and the web interface into /opt and configure the server. Copy graylog2.conf.example to /etc/graylog2.conf and make the relevant changes. You can use mongodb with or without authentication if it’s not accessible externally, we’re using authentication here. For setting up MongoDB authentication read more here, which has a bunch of other info on configuring Graylog2 which is worth reading.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
syslog_listen_port = 514
syslog_protocol = udp
 
elasticsearch_url = http://localhost:9200/
elasticsearch_index_name = graylog2
 
force_syslog_rdns = false
 
mongodb_useauth = true
mongodb_user = graylog
mongodb_password = graylog
mongodb_host = localhost
 
#mongodb_replica_set = localhost:27017,localhost:27018,localhost:27019
mongodb_database = graylog2
mongodb_port = 27017
 
mq_batch_size = 4000
mq_poll_freq = 1
 
mq_max_size = 0
 
mongodb_max_connections = 100
mongodb_threads_allowed_to_block_multiplier = 5
 
use_gelf = true
gelf_listen_address = 0.0.0.0
gelf_listen_port = 12201
 
# AMQP
amqp_enabled = false
amqp_subscribed_queues = gqueue:gelf,squeue:syslog
amqp_host = localhost
amqp_port = 5627
amqp_username = guest
amqp_password = guest
amqp_virtualhost = /
 
forwarder_loggly_timeout = 3

You can start Graylog2 up now with ‘bin/graylog2ctl start’.

To get the web interface working, do the following

root@logger:/opt/graylog2-web-interface-0.9.6# gem install bundler
root@logger:/opt/graylog2-web-interface-0.9.6# bundle install
root@logger:/opt/graylog2-web-interface-0.9.6# script/rails runserver -e production -p 80

You should now have a running logging system. Now you should go back, get the web interface behind passenger and nginx running as an unprivileged user.