DataDog(Modern monitoring & analytics)

Part-1

Reference Links:

https://www.datadoghq.com/

Note: Installation performed with ubuntu 16.10

What is DataDog?

– Datadog is SAAS application.

– Datadog is a monitoring service for cloud-scale applications, which allows to monitor        servers, databases, tools, and services.

– These capabilities are provided on a SaaS-based data analytics platform.

– DataDog is monitoring service for IT, operations and Develpoment teams who
write and run applications at scale and want to turn the massive amounts
of data produced by their apps,tools and services into actionable insights.

– Datadog supports to cloud infrastructure monitoring service, with a dashboard,      alerting, and visualizations of metrics.

– As cloud adoption increased, Datadog grew rapidly and expanded its product           offering to cover service providers including Amazon Web Services (AWS),               Microsoft Azure, Google Cloud Platform, Red Hat OpenShift, and OpenStack.

Why we use DataDog?

  • All cloud, servers, apps, services, metrics, and more – gets all in one place.
  • It supports to HipChat, slack,pagerduty,OpsGenie and VictorOps for messaging purpose.
  • Can take snapshot of a perticular graph at any time frame.
  • supports with static reports with dynamic view of graphs
  • can sets alert proactively(can define multiple alerts at a time.)
  • allows to get metrics info in advance using restAPI call.
  • provides libraries for common languages like java,node.js,perl,ruby,php,go,python.
  • provides integration libraries for saltstack,ansible,freeswitch,google analytics,logstash,elasticsearch,apache2,jenkins,nginx,aws,etc….
  • can write custom metrics to get perticular info related to our application
  • provides log management also.

Installation:

Reference:

1. https://docs.datadoghq.com/agent/basic_agent_usage/ubuntu/

Step -1 Create free-trial account on data dog. It allows you to use data dog application for                  14 days trial basis. To create account, Please visit the following link:

https://www.datadoghq.com/

and click on “Free Trial” button as shown in below picture:

free-trial-1.jpg

It will opens following:

free-trial-2.jpg

Fill all the information and click on “SignUp” Button.

It will prompt the following. Don’t write anything inside any block. Just click on “Next” Button till step 2.

In Step 3, It will asks, that on which OS you want to install the datadog agent. So, Please select the Operating system. Once you select the os, on right hand side you will see the installation steps(i.e how to install data dog agent on particular OS’s).

Here, In this Blog, I am going to install data dog agent on ubuntu machine. So As shown below, I have selected “ubuntu” from left pane.

install-dd-agent-ubuntu-4

Here, I have selected OS as Ubuntu and follow the steps(Mostly It will take only 1st step to install the data dog-agent). Then click on “Finish”.

check whether datadog-agent service is running or not , using following command:

Command:

for ubuntu 16.04 or higher

sudo systemctl restart datadog-agent.service for Start the Agent with Ubuntu 14.04:
sudo initctl start datadog-agent
You are Done with Data Dog agent installation on ubuntu machine.

After that, when you click on following link:

https://www.datadoghq.com/

Go to –> login–>enter username and password–>click on login

You can see the following:

getting-started-5

Part-1 ends here.

Part-2 Monitor Jenkins Jobs with Data Dog:

Reference Links:

https://docs.datadoghq.com/integrations/jenkins/

https://www.datadoghq.com/blog/monitor-jenkins-datadog/

  • Jenkins plugin sends your build and deployment events to Datadog. From there, you can overlay them onto graphs of your other metrics so you can identify which deployments really affect your application’s performance and reliability—for better or worse.
  • The plugin also tracks build times (as a metric) and statuses (as a service check), so you’ll know when your builds aren’t healthy.

The Jenkins check for the Datadog Agent is deprecated. Use the Jenkins plugin

Installation

This plugin requires Jenkins 1.580.1 or newer.

This plugin can be installed from the Update Center (found at Manage Jenkins -> Manage Plugins) in your Jenkins installation.

  1. Navigate to the web interface of your Jenkins installation.
  2. From the Update Center (found at Manage Jenkins -> Manage Plugins), under the Available tab, search for Datadog Plugin.
  3. Check the checkbox next to the plugin, and install using one of the two install buttons at the bottom of the screen.
  4. To configure the plugin, navigate to the Manage Jenkins -> Configure System page, and find the Datadog Pluginsection.
  5. Copy/Paste your API Key from the API Keys page on your Datadog account, into the API Key textbox on the configuration screen.
  6. Before saving your configuration, test your API connection using the Test Key button, directly below the API Keytextbox.
  7. Restart Jenkins to get the plugin enabled.
  8. Optional: Set a custom Hostname You can set a custom hostname for your Jenkins host via the Hostname textbox on the same configuration screen. Note: Hostname must follow the RFC 1123 format.

Configuration:

This feature is not applicable for jenkins integration.

Validation

You will start to see Jenkins events in the Event Stream when the plugin is up and running.

Metrics

The following metrics are available in Datadog:

METRIC NAME DESCRIPTION
jenkins.queue.size (Gauge) Size of your Jenkins queue
jenkins.job.waiting (Gauge) Time spent by a job waiting in seconds
jenkins.job.duration (Gauge) Duration of a job in seconds

Events

The following events are generated by the plugin:

  • Started build
  • Finished build

Service Checks

  • jenkins.job.status: Build status

When you done with above steps then

login to datadog UI–>on leftpane –>dashboard lists–>search for “jenkins overview dashboard”–>click on it–>get all jenkins details at one place.

Jenkins is an open source, Java-based continuous integration server that helps organizations build, test, and deploy projects automatically. Jenkins is widely used, having been adopted by organizations like GitHub, Etsy, LinkedIn, and Datadog.

You can set up Jenkins to test and deploy your software projects every time you commit changes, to trigger new builds upon successful completion of other builds, and to run jobs on a regular schedule. With hundreds of plugins, Jenkins supports a wide variety of use cases.

As shown in the out-of-the-box dashboard below, our Datadog plugin will provide more insights into job history and trends than Jenkins’s standard weather reports. You can use the plugin to:

  • Set alerts for important build failures
  • Identify trends in build durations
  • Correlate Jenkins events with performance metrics from other parts of your infrastructure in order to identify and resolve issues
Monitor Jenkins default dashboard in Datadog

Monitor Jenkins build status in real-time

Once you install the Jenkins-Datadog plugin, Jenkins activities (when a build starts, fails, or succeeds) will start appearing in your Datadog event stream. You will also see what percentage of builds failed within the same job, so that you can quickly spot which jobs are experiencing a higher rate of failure than others.

Monitor Jenkins events in Datadog event stream

Remember to blacklist any jobs you don’t want to track by indicating them in your plugin configuration.

Datadog’s Jenkins dashboard gives you a high-level overview of how your jobs are performing. The status widget displays the current status of all jobs that have run in the past day, grouped by success or failure. To explore further, you can also click on the widget to view the jobs that have failed or succeeded in the past day.

Monitor Jenkins jobs tagged by result success or failure

You can also see the proportion of successful vs. failed builds, along with the total number of job runs completed over the past four hours.

Datadog also enables you to correlate Jenkins events with application performance metrics to investigate the root cause of an issue. For example, the screenshot below shows that average CPU on the app servers increased sharply after a Jenkins build was completed and deployed (indicated by the pink bar). Your team can use this information as a starting point to investigate if code changes in the corresponding release may be causing the issue.

Monitor Jenkins - build affects CPU graph

Visualize job duration metrics

Every time a build is completed, Datadog’s plugin collects its duration as a metric that you can aggregate by job name or any other tag, and graph over time. In the screenshot below, we can view the average job durations in the past four hours, sorted in decreasing order:

Monitor Jenkins job durations ranked in Datadog

You can also graph and visualize trends in build durations for each job by using Datadog’s robust_trend() linear regression function, as shown in the screenshot below. This graph indicates which jobs’ durations are trending longer over time, so that you can investigate if there appears to be a problem. If you’re experimenting with changes to your CI pipeline, consulting this graph can help you track the effects of those changes over time.

Monitor Jenkins build duration trends graph

Use tags to monitor Jenkins jobs

Tags add custom dimensions to your monitoring, so you can focus on what’s important to you right now.

Every Jenkins event, metric, and service check is auto-tagged with jobresult, and branch (if applicable). You can also enable the optional node tag in the plugin settings.

As of version 0.5.0, the plugin supports custom tags. This update was developed by one of our open source contributors, Mads Nielsen. Many thanks to Mads for helping us implement this feature.

You can create custom tags for the name of the application you’re building, your particular team name (e.g. team=licorice), or any other info that matters to you. For example, if you have multiple jobs that perform nightly builds, you might want to create a descriptive tag that distinguishes them from other types of jobs.

 

Testing Part-2 functionality:

login to –>Jenkins –> run any single job–>after completion of this job please go to the datadog ui page and click on “Dashboard Lists”–>”Jenkins verview” dashboard.

There you can see the jenkins job details which you executed just couple of minutes ago.

Part-2 ends here

 

Part-3 Log collection with DataDog

Reference: https://docs.datadoghq.com/integrations/nginx/

Overview

The Datadog Agent can collect many metrics from NGINX instances, including (but not limited to)::

  • Total requests
  • Connections (e.g. accepted, handled, active)

For users of NGINX Plus, the commercial version of NGINX, the Agent can collect the significantly more metrics that NGINX Plus provides, like:

  • Errors (e.g. 4xx codes, 5xx codes)
  • Upstream servers (e.g. active connections, 5xx codes, health checks, etc.)
  • Caches (e.g. size, hits, misses, etc.)
  • SSL (e.g. handshakes, failed handshakes, etc.)

Setup

Installation

The NGINX check is included in the Datadog Agent package, so you don’t need to install anything else on your NGINX servers.

NGINX STATUS MODULE

The NGINX check pulls metrics from a local NGINX status endpoint, so your nginx binaries need to have been compiled with one of two NGINX status modules:

NGINX Plus packages always include the http status module, so if you’re a Plus user, skip to Configuration now. For NGINX Plus release 13 and above, the status module is deprecated and you should use the new Plus API instead. See the announcement for more information.

If you use open source NGINX, however, your instances may lack the stub status module. Verify that your nginx binary includes the module before proceeding to Configuration:

$ nginx -V 2>&1| grep -o http_stub_status_module
http_stub_status_module

If the command output does not include http_stub_status_module, you must install an NGINX package that includes the module. You can compile your own NGINX—enabling the module as you compile it—but most modern Linux distributions provide alternative NGINX packages with various combinations of extra modules built in. Check your operating system’s NGINX packages to find one that includes the stub status module.

Configuration

Edit the nginx.d/conf.yaml file, in the conf.d/ folder at the root of your Agent’s configuration directory to start collecting your NGINX metrics and logs. See the sample nginx.d/conf.yaml for all available configuration options.

PREPARE NGINX

On each NGINX server, create a status.conf file in the directory that contains your other NGINX configuration files (e.g. /etc/nginx/conf.d/).

server {
  listen 81;
  server_name localhost;

  access_log off;
  allow 127.0.0.1;
  deny all;

  location /nginx_status {
    # Choose your status module

    # freely available with open source NGINX
    stub_status;

    # for open source NGINX < version 1.7.5
    # stub_status on;

    # available only with NGINX Plus
    # status;
  }
}

NGINX Plus can also use stub_status, but since that module provides fewer metrics, you should use status if you’re a Plus user.

You may optionally configure HTTP basic authentication in the server block, but since the service is only listening locally, it’s not necessary.

Reload NGINX to enable the status endpoint. (There’s no need for a full restart)

METRIC COLLECTION

  • Add this configuration block to your nginx.d/conf.yaml file to start gathering your NGINX metrics:
  init_config:

  instances:
    - nginx_status_url: http://localhost:81/nginx_status/
    # If you configured the endpoint with HTTP basic authentication
    # user: <USER>
    # password: <PASSWORD>

See the sample nginx.d/conf.yaml for all available configuration options.

LOG COLLECTION

Available for Agent >6.0

  • Collecting logs is disabled by default in the Datadog Agent, you need to enable it in datadog.yaml:
  logs_enabled: true
  • Add this configuration block to your nginx.d/conf.yaml file to start collecting your NGINX Logs:
  logs:
    - type: file
      path: /var/log/nginx/access.log
      service: nginx
      source: nginx
      sourcecategory: http_web_access

    - type: file
      path: /var/log/nginx/error.log
      service: nginx
      source: nginx
      sourcecategory: http_web_access

Change the service and path parameter values and configure them for your environment. See the sample nginx.d/conf.yaml for all available configuration options.

Learn more about log collection in the log documentation

Validation

Run the Agent’s status subcommand and look for nginx under the Checks section.

Data Collected

Metrics

nginx.net.writing
(gauge)
The number of connections waiting on upstream responses and/or writing responses back to the client.
shown as connection
nginx.net.waiting
(gauge)
The number of keep-alive connections waiting for work.
shown as connection
nginx.net.reading
(gauge)
The number of connections reading client requets.
shown as connection
nginx.net.connections
(gauge)
The total number of active connections.
shown as connection
nginx.net.request_per_s
(gauge)
Rate of requests processed.
shown as request
nginx.net.conn_opened_per_s
(gauge)
Rate of connections opened.
shown as connection
nginx.net.conn_dropped_per_s
(gauge)
Rate of connections dropped.
shown as connection
nginx.cache.bypass.bytes
(gauge)
The total number of bytes read from the proxied server
shown as byte
nginx.cache.bypass.bytes_count
(count)
The total number of bytes read from the proxied server (shown as count)
shown as byte
nginx.cache.bypass.bytes_written
(gauge)
The total number of bytes written to the cache
shown as byte
nginx.cache.bypass.bytes_written_count
(count)
The total number of bytes written to the cache (shown as count)
shown as byte
nginx.cache.bypass.responses
(gauge)
The total number of responses not taken from the cache
shown as response
nginx.cache.bypass.responses_count
(count)
The total number of responses not taken from the cache (shown as count)
shown as response
nginx.cache.bypass.responses_written
(gauge)
The total number of responses written to the cache
shown as response
nginx.cache.bypass.responses_written_count
(count)
The total number of responses written to the cache (shown as count)
shown as response
nginx.cache.cold
(gauge)
A boolean value indicating whether the “cache loader” process is still loading data from disk into the cache
shown as response
nginx.cache.expired.bytes
(gauge)
The total number of bytes read from the proxied server
shown as byte
nginx.cache.expired.bytes_count
(count)
The total number of bytes read from the proxied server (shown as count)
shown as byte
nginx.cache.expired.bytes_written
(gauge)
The total number of bytes written to the cache
shown as byte
nginx.cache.expired.bytes_written_count
(count)
The total number of bytes written to the cache (shown as count)
shown as byte
nginx.cache.expired.responses
(gauge)
The total number of responses not taken from the cache
shown as response
nginx.cache.expired.responses_count
(count)
The total number of responses not taken from the cache (shown as count)
shown as response
nginx.cache.expired.responses_written
(gauge)
The total number of responses written to the cache
shown as response
nginx.cache.expired.responses_written_count
(count)
The total number of responses written to the cache (shown as count)
shown as response
nginx.cache.hit.bytes
(gauge)
The total number of bytes read from the cache
shown as byte
nginx.cache.hit.bytes_count
(count)
The total number of bytes read from the cache (shown as count)
shown as byte
nginx.cache.hit.responses
(gauge)
The total number of responses read from the cache
shown as response
nginx.cache.hit.responses_count
(count)
The total number of responses read from the cache (shown as count)
shown as response
nginx.cache.max_size
(gauge)
The limit on the maximum size of the cache specified in the configuration
shown as byte
nginx.cache.miss.bytes
(gauge)
The total number of bytes read from the proxied server
shown as byte
nginx.cache.miss.bytes_count
(count)
The total number of bytes read from the proxied server (shown as count)
shown as byte
nginx.cache.miss.bytes_written
(gauge)
The total number of bytes written to the cache
shown as byte
nginx.cache.miss.bytes_written_count
(count)
The total number of bytes written to the cache (shown as count)
shown as byte
nginx.cache.miss.responses
(gauge)
The total number of responses not taken from the cache
shown as response
nginx.cache.miss.responses_count
(count)
The total number of responses not taken from the cache (shown as count)
shown as response
nginx.cache.miss.responses_written
(gauge)
The total number of responses written to the cache
shown as response
nginx.cache.miss.responses_written_count
(count)
The total number of responses written to the cache
shown as response
nginx.cache.revalidated.bytes
(gauge)
The total number of bytes read from the cache
shown as byte
nginx.cache.revalidated.bytes_count
(count)
The total number of bytes read from the cache (shown as count)
shown as byte
nginx.cache.revalidated.response
(gauge)
The total number of responses read from the cache
shown as responses
nginx.cache.revalidated.response_count
(count)
The total number of responses read from the cache (shown as count)
shown as responses
nginx.cache.size
(gauge)
The current size of the cache
shown as response
nginx.cache.stale.bytes
(gauge)
The total number of bytes read from the cache
shown as byte
nginx.cache.stale.bytes_count
(count)
The total number of bytes read from the cache (shown as count)
shown as byte
nginx.cache.stale.responses
(gauge)
The total number of responses read from the cache
shown as response
nginx.cache.stale.responses_count
(count)
The total number of responses read from the cache (shown as count)
shown as response
nginx.cache.updating.bytes
(gauge)
The total number of bytes read from the cache
shown as byte
nginx.cache.updating.bytes_count
(count)
The total number of bytes read from the cache (shown as count)
shown as byte
nginx.cache.updating.responses
(gauge)
The total number of responses read from the cache
shown as response
nginx.cache.updating.responses_count
(count)
The total number of responses read from the cache (shown as count)
shown as response
nginx.connections.accepted
(gauge)
The total number of accepted client connections.
shown as connection
nginx.connections.accepted_count
(count)
The total number of accepted client connections (shown as count).
shown as connection
nginx.connections.active
(gauge)
The current number of active client connections.
shown as connection
nginx.connections.dropped
(gauge)
The total number of dropped client connections.
shown as connection
nginx.connections.dropped_count
(count)
The total number of dropped client connections (shown as count).
shown as connection
nginx.connections.idle
(gauge)
The current number of idle client connections.
shown as connection
nginx.generation
(gauge)
The total number of configuration reloads
shown as reload
nginx.generation_count
(count)
The total number of configuration reloads (shown as count)
shown as reload
nginx.load_timestamp
(gauge)
Time of the last reload of configuration (time since Epoch).
shown as millisecond
nginx.pid
(gauge)
The ID of the worker process that handled status request.
nginx.ppid
(gauge)
The ID of the master process that started the worker process
nginx.processes.respawned
(gauge)
The total number of abnormally terminated and respawned child processes.
shown as process
nginx.processes.respawned_count
(count)
The total number of abnormally terminated and respawned child processes (shown as count).
shown as process
nginx.requests.current
(gauge)
The current number of client requests.
shown as request
nginx.requests.total
(gauge)
The total number of client requests.
shown as request
nginx.requests.total_count
(count)
The total number of client requests (shown as count).
shown as request
nginx.server_zone.discarded
(gauge)
The total number of requests completed without sending a response.
shown as request
nginx.server_zone.discarded_count
(count)
The total number of requests completed without sending a response (shown as count).
shown as request
nginx.server_zone.processing
(gauge)
The number of client requests that are currently being processed.
shown as request
nginx.server_zone.received
(gauge)
The total amount of data received from clients.
shown as byte
nginx.server_zone.received_count
(count)
The total amount of data received from clients (shown as count).
shown as byte
nginx.server_zone.requests
(gauge)
The total number of client requests received from clients.
shown as request
nginx.server_zone.requests_count
(count)
The total number of client requests received from clients (shown as count).
shown as request
nginx.server_zone.responses.1xx
(gauge)
The number of responses with 1xx status code.
shown as response
nginx.server_zone.responses.1xx_count
(count)
The number of responses with 1xx status code (shown as count).
shown as response
nginx.server_zone.responses.2xx
(gauge)
The number of responses with 2xx status code.
shown as response
nginx.server_zone.responses.2xx_count
(count)
The number of responses with 2xx status code (shown as count).
shown as response
nginx.server_zone.responses.3xx
(gauge)
The number of responses with 3xx status code.
shown as response
nginx.server_zone.responses.3xx_count
(count)
The number of responses with 3xx status code (shown as count).
shown as response
nginx.server_zone.responses.4xx
(gauge)
The number of responses with 4xx status code.
shown as response
nginx.server_zone.responses.4xx_count
(count)
The number of responses with 4xx status code (shown as count).
shown as response
nginx.server_zone.responses.5xx
(gauge)
The number of responses with 5xx status code.
shown as response
nginx.server_zone.responses.5xx_count
(count)
The number of responses with 5xx status code (shown as count).
shown as response
nginx.server_zone.responses.total
(gauge)
The total number of responses sent to clients.
shown as response
nginx.server_zone.responses.total_count
(count)
The total number of responses sent to clients (shown as count).
shown as response
nginx.server_zone.sent
(gauge)
The total amount of data sent to clients.
shown as byte
nginx.server_zone.sent_count
(count)
The total amount of data sent to clients (shown as count).
shown as byte
nginx.slab.pages.free
(gauge)
The current number of free memory pages
shown as page
nginx.slab.pages.used
(gauge)
The current number of used memory pages
shown as page
nginx.slab.slots.fails
(gauge)
The number of unsuccessful attempts to allocate memory of specified size
shown as request
nginx.slab.slots.fails_count
(count)
The number of unsuccessful attempts to allocate memory of specified size (shown as count)
shown as request
nginx.slab.slots.free
(gauge)
The current number of free memory slots
shown as slot
nginx.slab.slots.reqs
(gauge)
The total number of attempts to allocate memory of specified size
shown as request
nginx.slab.slots.reqs_count
(count)
The total number of attempts to allocate memory of specified size (shown as count)
shown as request
nginx.slab.slots.used
(gauge)
The current number of used memory slots
shown as slot
nginx.ssl.handshakes
(gauge)
The total number of successful SSL handshakes.
nginx.ssl.handshakes_count
(count)
The total number of successful SSL handshakes (shown as count).
nginx.ssl.handshakes_failed
(gauge)
The total number of failed SSL handshakes.
nginx.ssl.handshakes_failed_count
(count)
The total number of failed SSL handshakes (shown as count).
nginx.ssl.session_reuses
(gauge)
The total number of session reuses during SSL handshake.
nginx.ssl.session_reuses_count
(count)
The total number of session reuses during SSL handshake (shown as count).
nginx.stream.server_zone.connections
(gauge)
The total number of connections accepted from clients
shown as connection
nginx.stream.server_zone.connections_count
(count)
The total number of connections accepted from clients (shown as count)
shown as connection
nginx.stream.server_zone.discarded
(gauge)
The total number of requests completed without sending a response.
shown as request
nginx.stream.server_zone.discarded_count
(count)
The total number of requests completed without sending a response (shown as count).
shown as request
nginx.stream.server_zone.processing
(gauge)
The number of client requests that are currently being processed.
shown as request
nginx.stream.server_zone.received
(gauge)
The total amount of data received from clients.
shown as byte
nginx.stream.server_zone.received_count
(count)
The total amount of data received from clients (shown as count).
shown as byte
nginx.stream.server_zone.sent
(gauge)
The total amount of data sent to clients.
shown as byte
nginx.stream.server_zone.sent_count
(count)
The total amount of data sent to clients (shown as count).
shown as byte
nginx.stream.server_zone.sessions.2xx
(gauge)
The number of responses with 2xx status code.
shown as session
nginx.stream.server_zone.sessions.2xx_count
(count)
The number of responses with 2xx status code (shown as count).
shown as session
nginx.stream.server_zone.sessions.4xx
(gauge)
The number of responses with 4xx status code.
shown as session
nginx.stream.server_zone.sessions.4xx_count
(count)
The number of responses with 4xx status code (shown as count).
shown as session
nginx.stream.server_zone.sessions.5xx
(gauge)
The number of responses with 5xx status code.
shown as session
nginx.stream.server_zone.sessions.5xx_count
(count)
The number of responses with 5xx status code (shown as count).
shown as session
nginx.stream.server_zone.sessions.total
(gauge)
The total number of responses sent to clients.
shown as session
nginx.stream.server_zone.sessions.total_count
(count)
The total number of responses sent to clients (shown as count).
shown as session
nginx.stream.upstream.peers.active
(gauge)
The current number of connections
shown as connection
nginx.stream.upstream.peers.backup
(gauge)
A boolean value indicating whether the server is a backup server.
nginx.stream.upstream.peers.connections
(gauge)
The total number of client connections forwarded to this server.
shown as connection
nginx.stream.upstream.peers.connections_count
(count)
The total number of client connections forwarded to this server (shown as count).
shown as connection
nginx.stream.upstream.peers.downstart
(gauge)
The time (time since Epoch) when the server became “unavail” or “checking” or “unhealthy”
shown as millisecond
nginx.stream.upstream.peers.downtime
(gauge)
Total time the server was in the “unavail” or “checking” or “unhealthy” states.
shown as millisecond
nginx.stream.upstream.peers.fails
(gauge)
The total number of unsuccessful attempts to communicate with the server.
shown as fail
nginx.stream.upstream.peers.fails_count
(count)
The total number of unsuccessful attempts to communicate with the server (shown as count).
shown as fail
nginx.stream.upstream.peers.health_checks.checks
(gauge)
The total number of health check requests made.
shown as request
nginx.stream.upstream.peers.health_checks.checks_count
(count)
The total number of health check requests made (shown as count).
shown as request
nginx.stream.upstream.peers.health_checks.fails
(gauge)
The number of failed health checks.
shown as fail
nginx.stream.upstream.peers.health_checks.fails_count
(count)
The number of failed health checks (shown as count).
shown as fail
nginx.stream.upstream.peers.health_checks.last_passed
(gauge)
Boolean indicating if the last health check request was successful and passed tests.
nginx.stream.upstream.peers.health_checks.unhealthy
(gauge)
How many times the server became unhealthy (state “unhealthy”).
nginx.stream.upstream.peers.health_checks.unhealthy_count
(count)
How many times the server became unhealthy (state “unhealthy”) (shown as count).
nginx.stream.upstream.peers.id
(gauge)
The ID of the server.
nginx.stream.upstream.peers.received
(gauge)
The total number of bytes received from this server.
shown as byte
nginx.stream.upstream.peers.received_count
(count)
The total number of bytes received from this server (shown as count).
shown as byte
nginx.stream.upstream.peers.selected
(gauge)
The time (time since Epoch) when the server was last selected to process a connection.
shown as millisecond
nginx.stream.upstream.peers.sent
(gauge)
The total number of bytes sent to this server.
shown as byte
nginx.stream.upstream.peers.sent_count
(count)
The total number of bytes sent to this server (shown as count).
shown as byte
nginx.stream.upstream.peers.unavail
(gauge)
How many times the server became unavailable for client connections (state “unavail”).
nginx.stream.upstream.peers.unavail_count
(count)
How many times the server became unavailable for client connections (state “unavail”) (shown as count).
nginx.stream.upstream.peers.weight
(gauge)
Weight of the server.
nginx.stream.upstream.zombies
(gauge)
The current number of servers removed from the group but still processing active client connections.
shown as server
nginx.timestamp
(gauge)
Current time since Epoch.
shown as millisecond
nginx.upstream.keepalive
(gauge)
The current number of idle keepalive connections.
shown as connection
nginx.upstream.peers.active
(gauge)
The current number of active connections.
shown as connection
nginx.upstream.peers.backup
(gauge)
A boolean value indicating whether the server is a backup server.
nginx.upstream.peers.downstart
(gauge)
The time (since Epoch) when the server became “unavail” or “unhealthy”.
shown as millisecond
nginx.upstream.peers.downtime
(gauge)
Total time the server was in the “unavail” and “unhealthy” states.
shown as millisecond
nginx.upstream.peers.fails
(gauge)
The total number of unsuccessful attempts to communicate with the server.
nginx.upstream.peers.fails_count
(count)
The total number of unsuccessful attempts to communicate with the server (shown as count).
nginx.upstream.peers.health_checks.checks
(gauge)
The total number of health check requests made.
nginx.upstream.peers.health_checks.checks_count
(count)
The total number of health check requests made (shown as count).
nginx.upstream.peers.health_checks.fails
(gauge)
The number of failed health checks.
nginx.upstream.peers.health_checks.fails_count
(count)
The number of failed health checks (shown as count).
nginx.upstream.peers.health_checks.last_passed
(gauge)
Boolean indicating if the last health check request was successful and passed tests.
nginx.upstream.peers.health_checks.unhealthy
(gauge)
How many times the server became unhealthy (state “unhealthy”).
nginx.upstream.peers.health_checks.unhealthy_count
(count)
How many times the server became unhealthy (state “unhealthy”) (shown as count).
nginx.upstream.peers.id
(gauge)
The ID of the server.
nginx.upstream.peers.received
(gauge)
The total amount of data received from this server.
shown as byte
nginx.upstream.peers.received_count
(count)
The total amount of data received from this server (shown as count).
shown as byte
nginx.upstream.peers.requests
(gauge)
The total number of client requests forwarded to this server.
shown as request
nginx.upstream.peers.requests_count
(count)
The total number of client requests forwarded to this server (shown as count).
shown as request
nginx.upstream.peers.responses.1xx
(gauge)
The number of responses with 1xx status code.
shown as response
nginx.upstream.peers.responses.1xx_count
(count)
The number of responses with 1xx status code (shown as count).
shown as response
nginx.upstream.peers.responses.2xx
(gauge)
The number of responses with 2xx status code.
shown as response
nginx.upstream.peers.responses.2xx_count
(count)
The number of responses with 2xx status code (shown as count).
shown as response
nginx.upstream.peers.responses.3xx
(gauge)
The number of responses with 3xx status code.
shown as response
nginx.upstream.peers.responses.3xx_count
(count)
The number of responses with 3xx status code (shown as count).
shown as response
nginx.upstream.peers.responses.4xx
(gauge)
The number of responses with 4xx status code.
shown as response
nginx.upstream.peers.responses.4xx_count
(count)
The number of responses with 4xx status code (shown as count).
shown as response
nginx.upstream.peers.responses.5xx
(gauge)
The number of responses with 5xx status code.
shown as response
nginx.upstream.peers.responses.5xx_count
(count)
The number of responses with 5xx status code (shown as count).
shown as response
nginx.upstream.peers.responses.total
(gauge)
The total number of responses obtained from this server.
shown as response
nginx.upstream.peers.responses.total_count
(count)
The total number of responses obtained from this server (shown as count).
shown as response
nginx.upstream.peers.selected
(gauge)
The time (since Epoch) when the server was last selected to process a request (1.7.5).
shown as millisecond
nginx.upstream.peers.sent
(gauge)
The total amount of data sent to this server.
shown as byte
nginx.upstream.peers.sent_count
(count)
The total amount of data sent to this server (shown as count).
shown as byte
nginx.upstream.peers.unavail
(gauge)
How many times the server became unavailable for client requests (state “unavail”) due to the number of unsuccessful attempts reaching the max_fails threshold.
nginx.upstream.peers.unavail_count
(count)
How many times the server became unavailable for client requests (state “unavail”) due to the number of unsuccessful attempts reaching the max_fails threshold (shown as count).
nginx.upstream.peers.weight
(gauge)
Weight of the server.
nginx.version
(gauge)
Version of nginx.

Not all metrics shown are available to users of open source NGINX. Compare the module reference for stub status (open source NGINX) and http status (NGINX Plus) to understand which metrics are provided by each module.

A few open-source NGINX metrics are named differently in NGINX Plus; they refer to the exact same metric, though:

NGINX NGINX PLUS
nginx.net.connections nginx.connections.active
nginx.net.conn_opened_per_s nginx.connections.accepted
nginx.net.conn_dropped_per_s nginx.connections.dropped
nginx.net.request_per_s nginx.requests.total

These metrics don’t refer exactly to the same metric, but they are somewhat related:

NGINX NGINX PLUS
nginx.net.waiting nginx.connections.idle

Finally, these metrics have no good equivalent:

nginx.net.reading The current number of connections where nginx is reading the request header.
nginx.net.writing The current number of connections where nginx is writing the response back to the client.

Events

The NGINX check does not include any events at this time.

Service Checks

nginx.can_connect:

Returns CRITICAL if the Agent cannot connect to NGINX to collect metrics, otherwise OK.

Troubleshooting

You may observe one of these common problems in the output of the Datadog Agent’s info subcommand.

Agent cannot connect

  Checks
  ======

    nginx
    -----
      - instance #0 [ERROR]: "('Connection aborted.', error(111, 'Connection refused'))"
      - Collected 0 metrics, 0 events & 1 service check

Either NGINX’s local status endpoint is not running, or the Agent is not configured with correct connection information for it.

Check that the main nginx.conf includes a line like the following:

http{

  ...

  include <directory_that_contains_status.conf>/*.conf;
  # e.g.: include /etc/nginx/conf.d/*.conf;
}

Otherwise, review the Configuration section.

Part-3 Ends Here

ELK(Elasticsearch,Logstash,Kibana)

ELK(Elasticsearch,Logstash,Kibana):

Reference links:

Note: This tutorial performed on ubuntu machine

https://www.digitalocean.com/community/tutorials/how-to-install-elasticsearch-logstash-and-kibana-elk-stack-on-ubuntu-14-04
https://logz.io/blog/10-elasticsearch-concepts/
https://www.elastic.co/guide/en/elasticsearch/reference/current/_basic_concepts.html#_near_realtime_nrt
https://qbox.io/blog/welcome-to-the-elk-stack-elasticsearch-logstash-kibana

for centos 7 use following link:

https://www.digitalocean.com/community/tutorials/how-to-install-elasticsearch-logstash-and-kibana-elk-stack-on-centos-7

Why we use ELK?

The Elastic Stack (aka ELK) is a robust solution for search, log management, and data analysis. ELK consists of a combination of three open source project: Elasticsearch, Logstash, and Kibana. These projects have specific roles in ELK:

  • Elasticsearch handles storage and provides a RESTful search and analytics endpoint.
  • Logstash is a server-side data processing pipeline that ingests, transforms and loads data.
  • Kibana lets you visualize your Elasticsearch data and navigate the Elastic Stack.

 

1. Elasticsearch -The Amazing Log Search Tool:

  • Real-time data extraction, and real-time data analytics. Elasticsearch is the engine that gives you both the power and the speed.

2.Logstash — Routing Your Log Data:

  •  Logstash is a tool for log data intake, processing, and output. This includes virtually any type of log that you manage: system logs, webserver logs, error logs, and app logs.
  • As administrators, we know how much time can be spent normalizing data from disparate data sources.
    We know, for example, how widely Apache logs differ from NGINX logs.

3.Kibana — Visualizing Your Log Data:

  • Kibana is your log-data dashboard.
  • Get a better grip on your large data stores with point-and-click pie charts, bar graphs, trendlines, maps and scatter plots.
  • You can visualize trends and patterns for data that would otherwise be extremely tedious to read and interpret.

Benefits:

  1. Real-time data and real-time analytics::
  • The ELK stack gives you the power of real-time data insights, with the ability to perform super-fast data extractions from virtually all structured or unstructured data sources.
  • Real-time extraction, and real-time analytics. Elasticsearch is the engine that gives you both the power and the speed.

2. Scalable, high-availability, multi-tenant:

  • With Elasticsearch, you can start small and expand it along with your business growth-when you are ready.
  • It is built to scale horizontally out of the box. As you need more capacity, simply add another node and let the cluster reorganize itself to accommodate and exploit the extra hardware.
  • Elasticsearch clusters are resilient, since they automatically detect and remove node failures.

You can set up multiple indices and query each of them independently or in combination.

Some Important Concepts In ELK as follows:

1. Documents:

  • Documents are JSON objects that are stored within an Elasticsearch index and are    considered the base unit of storage.
  • In the world of relational databases, documents can be compared to a row in table    Data in documents is defined with fields comprised of keys and values.
  • A key is the name of the field, and a value can be an item of many different types such as a string, a number, a boolean expression, another object, or an array of values.
  • Documents also contain reserved fields that constitute the document metadata such as:

1.  _index – the index where the document resides
2. _type – the type that the document represents
3.  _id – the unique identifier for the document

2.Index:

  • Indices are the largest unit of data in Elasticsearch, are logical partitions of documents and can be compared to a database in the world of relational databases.
  • You can have as many indices defined in Elasticsearch as you want.
  • These in turn will hold documents that are unique to each index.
  • Indices are identified by lowercase names that refer to actions that are performed actions (such as searching and deleting)against the documents that are inside each index.

3.Shards:

  • Elasticsearch provides the ability to subdivide your index into multiple pieces called shards.
  • When you create an index, you can simply define the number of shards that you want.
  • Each shard is in itself a fully-functional and independent “index” that can be hosted on any node in the cluster.

When you create an index, you can define how many shards you want. Each shard is an independent Lucene index that can be hosted anywhere in your cluster.

Sharding is important for two primary reasons:

  • It allows you to horizontally split/scale your content volume.
  • It allows you to distribute and parallelize operations across shards (potentially on multiple nodes) thus increasing performance/throughput.

example :
curl -XPUT localhost:9200/example -d ‘{
“settings” : {
“index” : {
“number_of_shards” : 2,
“number_of_replicas” : 1
}
}
}’

4.Replicas:

  • Replicas, as the name implies, are Elasticsearch fail-safe mechanisms and are basically copies of your index’s shards.
  • This is a useful backup system for a rainy day — or, in other words, when a node crashes.
  • Replicas also serve read requests, so adding replicas can help to increase search performance.

To ensure high availability, replicas are not placed on the same node as the original shards (called the “primary” shard) from which they were replicated.

To ensure high availability, replicas are not placed on the same node as the original shards (called the “primary” shard)from which they were replicated.

Replication is important for two primary reasons:

  • It provides high availability in case a shard/node fails. For this reason,
    it is important to note that a replica shard is never allocated on the same node as the original/primary shard that it was copied from.
  • It allows you to scale out your search volume/throughput since searches can be executed on all replicas in parallel.

5.Analyzers:

Analyzers are used during indexing to break down phrases or expressions into terms.
Defined within an index, an analyzer consists of a single tokenizer and any number of token filters.
For example, a tokenizer could split a string into specifically defined terms when encountering a specific expression.

A token filter is used to filter or modify some tokens. For example, a ASCII folding filter will convert characters like ê, é, è to e.
example:

curl -XPUT localhost:9200/example -d ‘{
“mappings”: {
“mytype”: {
“properties”: {
“name”: {
“type”: “string”,
“analyzer”: “whitespace”
}
}
}
}
}’

6.Nodes:

The heart of any ELK setup is the Elasticsearch instance, which has the crucial task of storing and indexing data.

In a cluster, different responsibilities are assigned to the various node types:
1.Data nodes — stores data and executes data-related operations such as search and aggregation.
2.Master nodes — in charge of cluster-wide management and configuration actions such as adding and removing nodes
3.Client nodes — forwards cluster requests to the master node and data-related requests to data nodes
4.Tribe nodes — act as a client node, performing read and write operations against all of the nodes in the cluster
5.Ingestion nodes (this is new in Elasticsearch 5.0) — for pre-processing documents before indexing

By default, each node is automatically assigned a unique identifier, or name, that is used for management purposes and becomes even more important in a multi-node, or clustered, environment.

When installed, a single node will form a new single-node cluster entitled elasticsearch,” but it can also be configured to join an existing cluster (see below) using the cluster name.

In a development or testing environment, you can set up multiple nodes on a single server.
In production, however, due to the number of resources that an Elasticsearch node consumes,
it is recommended to have each Elasticsearch instance run on a separate server.

7.Cluster:

An Elasticsearch cluster is comprised of one or more Elasticsearch nodes.
As with nodes, each cluster has a unique identifier that must be used by any node attempting to join the cluster.
By default, the cluster name is “elasticsearch,” but this name can be changed, of course.

One node in the cluster is the “master” node, which is in charge of cluster-wide management and configurations actions (such as adding and removing nodes).

This node is chosen automatically by the cluster, but it can be changed if it fails. (See above on the other types of nodes in a cluster.)

For example, the cluster health API returns health status reports of either “green” (all shards are allocated), “yellow” (the primary shard is allocated but replicas are not), or “red” (the shard is not allocated in the cluster).

# Output Example
{
“cluster_name” : “elasticsearch”,
“status” : “yellow”,
“timed_out” : false,
“number_of_nodes” : 1,
“number_of_data_nodes” : 1,
“active_primary_shards” : 5,
“active_shards” : 5,
“relocating_shards” : 0,
“initializing_shards” : 0,
“unassigned_shards” : 5,
“delayed_unassigned_shards” : 0,
“number_of_pending_tasks” : 0,
“number_of_in_flight_fetch” : 0,
“task_max_waiting_in_queue_millis” : 0,
“active_shards_percent_as_number” : 50.0
}

ELK Installation:

Our Goal:

The goal of the tutorial is to set up Logstash to gather syslogs of multiple servers, and set up Kibana to visualize the gathered logs.

Our ELK stack setup has four main components:

  • Logstash: The server component of Logstash that processes incoming logs
  • Elasticsearch: Stores all of the logs
  • Kibana: Web interface for searching and visualizing logs, which will be proxied through Nginx
  • Filebeat: Installed on client servers that will send their logs to Logstash, Filebeat serves as a log shipping agent that utilizes the lumberjack networking protocol to communicate with Logstash

ELK_Architecture.png

NOTE:

We will install the first three components on a single server, which we will refer to as our ELK Server. Filebeat will be installed on all of the client servers that we want to gather logs for, which we will refer to collectively as our Client Servers.

Pre-requisites:

The amount of CPU, RAM, and storage that your ELK Server will require depends on the volume of logs that you intend to gather. For this tutorial, we will be using a VPS with the following specs for our ELK Server:

  • OS: Ubuntu 14.04
  • RAM: 4GB
  • CPU: 2

In addition to your ELK Server, you will want to have a few other servers that you will gather logs from.

Let’s get started on setting up our ELK Server!

Step-1 : Install Java8

Elasticsearch and Logstash require Java, so we will install that now. We will install a recent version of Oracle Java 8 because that is what Elasticsearch recommends. It should, however, work fine with OpenJDK, if you decide to go that route.

Add the Oracle Java PPA to apt:

$ sudo add-apt-repository -y ppa:webupd8team/java

Update your apt package database:

$ sudo apt-get update -y

Install the latest stable version of Oracle Java 8 with this command (and accept the license agreement that pops up):

$ sudo apt-get -y install oracle-java8-installer

Now that Java 8 is installed.

let’s install ElasticSearch.

Step-2: Install ElasticSearch

Elasticsearch can be installed with a package manager by adding Elastic’s package source list.

Run the following command to import the Elasticsearch public GPG key into apt:

$ wget -qO – https://packages.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add –

If your prompt is just hanging there, it is probably waiting for your user’s password (to authorize the sudocommand). If this is the case, enter your password.

Create the Elasticsearch source list:

$ echo “deb http://packages.elastic.co/elasticsearch/2.x/debian stable main” | sudo tee -a /etc/apt/sources.list.d/elasticsearch-2.x.list

Update your apt package database:

$ sudo apt-get update -y

Install Elasticsearch with this command:

$ sudo apt-get -y install elasticsearch

Elasticsearch is now installed. Let’s edit the configuration:

$ sudo vi /etc/elasticsearch/elasticsearch.yml

You will want to restrict outside access to your Elasticsearch instance (port 9200), so outsiders can’t read your data or shutdown your Elasticsearch cluster through the HTTP API. Find the line that specifies network.host, uncomment it, and replace its value with “localhost” so it looks like this:

elasticsearch.yml excerpt (updated)
network.host: localhost

Save and exit elasticsearch.yml.

Now start Elasticsearch:

$ sudo service elasticsearch restart

Then run the following command to start Elasticsearch on boot up:

$ sudo update-rc.d elasticsearch defaults 95 10

Now that Elasticsearch is up and running, let’s install Kibana.

Step-3:Install Kibana

Kibana can be installed with a package manager by adding Elastic’s package source list.

Create the Kibana source list:

$ echo “deb http://packages.elastic.co/kibana/4.5/debian stable main” | sudo tee -a /etc/apt/sources.list.d/kibana-4.5.x.list

Update your apt package database:

  • sudo apt-get update -y

Install Kibana with this command:

  • sudo apt-get -y install kibana

Kibana is now installed.

Open the Kibana configuration file for editing:

$ sudo vi /opt/kibana/config/kibana.yml

In the Kibana configuration file, find the line that specifies server.host, and replace the IP address (“0.0.0.0” by default) with “localhost”:

server.host: "localhost"

Save and exit. This setting makes it so Kibana will only be accessible to the localhost. This is fine because we will use an Nginx reverse proxy to allow external access.

Now enable the Kibana service, and start it:

  • sudo update-rc.d kibana defaults 96 9
  • sudo service kibana start

Before we can use the Kibana web interface, we have to set up a reverse proxy. Let’s do that now, with Nginx.

Step-4:Install Nginx

Because we configured Kibana to listen on localhost, we must set up a reverse proxy to allow external access to it. We will use Nginx for this purpose.

Note: If you already have an Nginx instance that you want to use, feel free to use that instead. Just make sure to configure Kibana so it is reachable by your Nginx server (you probably want to change the hostvalue, in /opt/kibana/config/kibana.yml, to your Kibana server’s private IP address or hostname). Also, it is recommended that you enable SSL/TLS.

Use apt to install Nginx and Apache2-utils:

$ sudo apt-get install nginx apache2-utils -y

Use htpasswd to create an admin user, called “kibanaadmin” (you should use another name), that can access the Kibana web interface:

$ sudo htpasswd -c /etc/nginx/htpasswd.users kibanaadmin

Enter a password at the prompt. Remember this login, as you will need it to access the Kibana web interface.

Now open the Nginx default server block in your favorite editor. We will use vi:

$ sudo vim /etc/nginx/sites-available/default

Delete the file’s contents, and paste the following code block into the file. Be sure to update the server_name to match your server’s name:

server {

listen 80;

server_name example.com;

auth_basic “Restricted Access”;

auth_basic_user_file /etc/nginx/htpasswd.users;

location / {

proxy_pass http://localhost:5601;

proxy_http_version 1.1;

proxy_set_header Upgrade $http_upgrade;

proxy_set_header Connection ‘upgrade’;

proxy_set_header Host $host;

proxy_cache_bypass $http_upgrade;

}

}

Nginx configuration look like :

nginx.png

 

Save and exit. This configures Nginx to direct your server’s HTTP traffic to the Kibana application, which is listening on localhost:5601. Also, Nginx will use the htpasswd.users file, that we created earlier, and require basic authentication.

Now restart Nginx to put our changes into effect:

$ sudo service nginx restart

Kibana is now accessible via your FQDN or the public IP address of your ELK Server i.e. http://elk-server-public-ip/. If you go there in a web browser, after entering the “kibanaadmin” credentials, you should see a Kibana welcome page which will ask you to configure an index pattern. Let’s get back to that later, after we install all of the other components.

Step-5 Install Logstash

The Logstash package is available from the same repository as Elasticsearch, and we already installed that public key, so let’s create the Logstash source list:

$ echo ‘deb http://packages.elastic.co/logstash/2.2/debian stable main’ | sudo tee /etc/apt/sources.list.d/logstash-2.2.x.list

Update your apt package database:

$ sudo apt-get update -y

Install Logstash with this command:

$ sudo apt-get install logstash -y

Logstash is installed but it is not configured yet.

Since we are going to use Filebeat to ship logs from our Client Servers to our ELK Server, we need to create an SSL certificate and key pair. The certificate is used by Filebeat to verify the identity of ELK Server. Create the directories that will store the certificate and private key with the following commands:

  • sudo mkdir -p /etc/pki/tls/certs
  • sudo mkdir /etc/pki/tls/private

Now you have two options for generating your SSL certificates. If you have a DNS setup that will allow your client servers to resolve the IP address of the ELK Server, use Option 2. Otherwise, Option 1 will allow you to use IP addresses.

Option 1:IP Address

If you don’t have a DNS setup—that would allow your servers, that you will gather logs from, to resolve the IP address of your ELK Server—you will have to add your ELK Server’s private IP address to the subjectAltName (SAN) field of the SSL certificate that we are about to generate. To do so, open the OpenSSL configuration file:

$ sudo vim /etc/ssl/openssl.cnf

Find the [ v3_ca ] section in the file, and add this line under it (substituting in the ELK Server’s private IP address):

subjectAltName = IP: ELK_server_private_IP

Save and exit.

Now generate the SSL certificate and private key in the appropriate locations (/etc/pki/tls/), with the following commands:

  • cd /etc/pki/tls
  • sudo openssl req -config /etc/ssl/openssl.cnf -x509 -days 3650 -batch -nodes -newkey rsa:2048 -keyout private/logstash-forwarder.key -out certs/logstash-forwarder.crt

The logstash-forwarder.crt file will be copied to all of the servers that will send logs to Logstash but we will do that a little later. Let’s complete our Logstash configuration. If you went with this option, skip option 2 and move on to Configure Logstash.

Option 2: FQDN(DNS)

If you have a DNS setup with your private networking, you should create an A record that contains the ELK Server’s private IP address—this domain name will be used in the next command, to generate the SSL certificate. Alternatively, you can use a record that points to the server’s public IP address. Just be sure that your servers (the ones that you will be gathering logs from) will be able to resolve the domain name to your ELK Server.

Now generate the SSL certificate and private key, in the appropriate locations (/etc/pki/tls/…), with the following command (substitute in the FQDN of the ELK Server):

$ cd /etc/pki/tls; sudo openssl req -subj ‘/CN=ELK_server_fqdn/’ -x509 -days 3650 -batch -nodes -newkey rsa:2048 -keyout private/logstash-forwarder.key -out certs/logstash-forwarder.crt

The logstash-forwarder.crt file will be copied to all of the servers that will send logs to Logstash but we will do that a little later. Let’s complete our Logstash configuration.

Configure Logstash

Logstash configuration files are in the JSON-format, and reside in /etc/logstash/conf.d. The configuration consists of three sections: inputs, filters, and outputs.

Let’s create a configuration file called 02-beats-input.conf and set up our “filebeat” input:

$ sudo vi /etc/logstash/conf.d/02-beats-input.conf

Insert the following input configuration:

input {

beats {

port => 5044

ssl => true

ssl_certificate => “/etc/pki/tls/certs/logstash-forwarder.crt”

ssl_key => “/etc/pki/tls/private/logstash-forwarder.key”

}

}

Or

02-beats-input.conf file content looks like:

logstash1.png

Save and quit. This specifies a beats input that will listen on tcp port 5044, and it will use the SSL certificate and private key that we created earlier.

Now let’s create a configuration file called 10-syslog-filter.conf, where we will add a filter for syslog messages:

$ sudo vi /etc/logstash/conf.d/10-syslog-filter.conf

filter {

if [type] == “syslog” {

grok {

match => { “message” => “%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:\[%{POSINT:syslog_pid}\])?: %{GREEDYDATA:syslog_message}” }

add_field => [ “received_at”, “%{@timestamp}” ]

add_field => [ “received_from”, “%{host}” ]

}

syslog_pri { }

date {

match => [ “syslog_timestamp”, “MMM d HH:mm:ss”, “MMM dd HH:mm:ss” ]

}

}

}

Or 10-syslog-filter.conf file content looks like this:

logstash2

Save and quit. This filter looks for logs that are labeled as “syslog” type (by Filebeat), and it will try to use grok to parse incoming syslog logs to make it structured and query-able.

Lastly, we will create a configuration file called 30-elasticsearch-output.conf:

$ sudo vim /etc/logstash/conf.d/30-elasticsearch-output.conf

output {
elasticsearch {

hosts => [“localhost:9200”]
sniffing => true
manage_template => false
index => “%{[@metadata][beat]}-%{+YYYY.MM.dd}”
document_type => “%{[@metadata][type]}”

}

}

Or 30-elasticsearch-output.conf file content looks like:

logstash3.png

Save and exit. This output basically configures Logstash to store the beats data in Elasticsearch which is running at localhost:9200, in an index named after the beat used (filebeat, in our case).

If you want to add filters for other applications that use the Filebeat input, be sure to name the files so they sort between the input and the output configuration (i.e. between 02- and 30-).

Test your Logstash configuration with this command:

  • $ sudo service logstash configtest

It should display Configuration OK if there are no syntax errors. Otherwise, try and read the error output to see what’s wrong with your Logstash configuration.

Restart Logstash, and enable it, to put our configuration changes into effect:

  • sudo service logstash restart
  • sudo update-rc.d logstash defaults 96 9

Next, we’ll load the sample Kibana dashboards.

Loading Sample Kibana Dashboards:

Elastic provides several sample Kibana dashboards and Beats index patterns that can help you get started with Kibana. Although we won’t use the dashboards in this tutorial, we’ll load them anyway so we can use the Filebeat index pattern that it includes.

First, download the sample dashboards archive to your home directory:

Install the unzip package with this command:

  • sudo apt-get -y install unzip

Next, extract the contents of the archive:

  • unzip beats-dashboards-*.zip

And load the sample dashboards, visualizations and Beats index patterns into Elasticsearch with these commands:

  • cd beats-dashboards-*
  • ./load.sh

These are the index patterns that we just loaded:

  • [packetbeat-]YYYY.MM.DD
  • [topbeat-]YYYY.MM.DD
  • [filebeat-]YYYY.MM.DD
  • [winlogbeat-]YYYY.MM.DD

When we start using Kibana, we will select the Filebeat index pattern as our default.

Load Filebeat index templates in Elasticsearch

Because we are planning on using Filebeat to ship logs to Elasticsearch, we should load a Filebeat index template. The index template will configure Elasticsearch to analyze incoming Filebeat fields in an intelligent way.

First, download the Filebeat index template to your home directory:

Then load the template with this command:

$ curl -XPUT ‘http://localhost:9200/_template/filebeat?pretty&#8217; -d@filebeat-index-template.json

If the template loaded properly, you should see a message like this:

Output:
{
  "acknowledged" : true
}

Now that our ELK Server is ready to receive Filebeat data, let’s move onto setting up Filebeat on each client server.

Step-6: SetUp filebeat(add clients servers)

Do these steps for each Ubuntu or Debian server that you want to send logs to Logstash on your ELK Server. For instructions on installing Filebeat on Red Hat-based Linux distributions (e.g. RHEL, CentOS, etc.), refer to the Set Up Filebeat (Add Client Servers) section of the CentOS variation of this tutorial.

Copy SSL Certificate

On your ELK Server, copy the SSL certificate—created in the prerequisite tutorial—to your Client Server(substitute the client server’s address, and your own login):

  • scp /etc/pki/tls/certs/logstash-forwarder.crt user@client_server_private_address:/tmp

After providing your login’s credentials, ensure that the certificate copy was successful. It is required for communication between the client servers and the ELK Server.

Now, on your Client Server, copy the ELK Server’s SSL certificate into the appropriate location (/etc/pki/tls/certs):

  • Client $ sudo mkdir -p /etc/pki/tls/certs
  • CLient$ sudo cp /tmp/logstash-forwarder.crt /etc/pki/tls/certs/

Now we will install the Topbeat package.

Install Filebeat packages:

On Client Server, create the Beats source list,using following command :

Client$ echo “deb https://packages.elastic.co/beats/apt stable main” | sudo tee -a /etc/apt/sources.list.d/beats.list

It also uses the same GPG key as Elasticsearch, which can be installed with this command:

Client$ wget -qO – https://packages.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add –

Then install the Filebeat package:

  • sudo apt-get update
  • sudo apt-get install filebeat

Filebeat is installed but it is not configured yet.

Configure filebeat

Now we will configure Filebeat to connect to Logstash on our ELK Server. This section will step you through modifying the example configuration file that comes with Filebeat. When you complete the steps, you should have a file that looks something like this.

On Client Server, create and edit Filebeat configuration file:

  • sudo vi /etc/filebeat/filebeat.yml

Note: Filebeat’s configuration file is in YAML format, which means that indentation is very important! Be sure to use the same number of spaces that are indicated in these instructions.

Near the top of the file, you will see the prospectors section, which is where you can define prospectorsthat specify which log files should be shipped and how they should be handled. Each prospector is indicated by the - character.

We’ll modify the existing prospector to send syslog and auth.log to Logstash. Under paths, comment out the - /var/log/*.log file. This will prevent Filebeat from sending every .log in that directory to Logstash. Then add new entries for syslog and auth.log. It should look something like this when you’re done:

...
      paths:
        - /var/log/auth.log
        - /var/log/syslog
#        - /var/log/*.log
...

Then find the line that specifies document_type:, uncomment it and change its value to “syslog”. It should look like this after the modification:

...
      document_type: syslog
...

This specifies that the logs in this prospector are of type syslog (which is the type that our Logstash filter is looking for).

If you want to send other files to your ELK server, or make any changes to how Filebeat handles your logs, feel free to modify or add prospector entries.

Next, under the output section, find the line that says elasticsearch:, which indicates the Elasticsearch output section (which we are not going to use). Delete or comment out the entire Elasticsearch output section (up to the line that says #logstash:).

Find the commented out Logstash output section, indicated by the line that says #logstash:, and uncomment it by deleting the preceding #. In this section, uncomment the hosts: ["localhost:5044"]line. Change localhost to the private IP address (or hostname, if you went with that option) of your ELK server:

 ### Logstash as output
  logstash:
    # The Logstash hosts
    hosts: ["ELK_server_private_IP:5044"]

This configures Filebeat to connect to Logstash on your ELK Server at port 5044 (the port that we specified a Logstash input for earlier).

Directly under the hosts entry, and with the same indentation, add this line in filebeat.yml file:

bulk_max_size: 1024

Next, find the tls section, and uncomment it. Then uncomment the line that specifies certificate_authorities, and change its value to ["/etc/pki/tls/certs/logstash-forwarder.crt"]. It should look something like this:

...
    tls:
      # List of root certificates for HTTPS server verifications
      certificate_authorities: ["/etc/pki/tls/certs/logstash-forwarder.crt"]

This configures Filebeat to use the SSL certificate that we created on the ELK Server.

Save and quit.

Now restart Filebeat to put our changes into place:

  • sudo service filebeat restart
  • sudo update-rc.d filebeat defaults 95 10

Again, if you’re not sure if your Filebeat configuration is correct, compare it against this example Filebeat configuration.

Now Filebeat is sending syslog and auth.log to Logstash on your ELK server! Repeat this section for all of the other servers that you wish to gather logs for.

Test the filebeat installation:

If your ELK stack is setup properly, Filebeat (on your client server) should be shipping your logs to Logstash on your ELK server. Logstash should be loading the Filebeat data into Elasticsearch in a date-stamped index, filebeat-YYYY.MM.DD.

On your ELK Server, verify that Elasticsearch is indeed receiving the data by querying for the Filebeat index with this command:

You should see a bunch of output that looks like this:

Sample Output:
...
{
      "_index" : "filebeat-2016.01.29",
      "_type" : "log",
      "_id" : "AVKO98yuaHvsHQLa53HE",
      "_score" : 1.0,
      "_source":{"message":"Feb  3 14:34:00 rails sshd[963]: Server listening on :: port 22.","@version":"1","@timestamp":"2016-01-29T19:59:09.145Z","beat":{"hostname":"topbeat-u-03","name":"topbeat-u-03"},"count":1,"fields":null,"input_type":"log","offset":70,"source":"/var/log/auth.log","type":"log","host":"topbeat-u-03"}
    }
...

If your output shows 0 total hits, Elasticsearch is not loading any logs under the index you searched for, and you should review your setup for errors. If you received the expected output, continue to the next step.

Connect to Kibana

When you are finished setting up Filebeat on all of the servers that you want to gather logs for, let’s look at Kibana, the web interface that we installed earlier.

example:

http://localhost:5601/         and hit enter

In a web browser, go to the FQDN or public IP address of your ELK Server. After entering the “kibanaadmin” credentials, you should see a page prompting you to configure a default index pattern:

kibana1.png

Go ahead and select [filebeat]-YYY.MM.DD from the Index Patterns menu (left side), then click the Star (Set as default index) button to set the Filebeat index as the default.

Now click the Discover link in the top navigation bar. By default, this will show you all of the log data over the last 15 minutes. You should see a histogram with log events, with log messages below:

kibana2.png

Right now, there won’t be much in there because you are only gathering syslogs from your client servers. Here, you can search and browse through your logs. You can also customize your dashboard.

Try the following things:

  • Search for “root” to see if anyone is trying to log into your servers as root
  • Search for a particular hostname (search for host: "hostname")
  • Change the time frame by selecting an area on the histogram or from the menu above
  • Click on messages below the histogram to see how the data is being filtered

Kibana has many other features, such as graphing and filtering, so feel free to poke around!

Conclusion

Now that your syslogs are centralized via Elasticsearch and Logstash, and you are able to visualize them with Kibana, you should be off to a good start with centralizing all of your important logs. Remember that you can send pretty much any type of log or indexed data to Logstash, but the data becomes even more useful if it is parsed and structured with grok.

To improve your new ELK stack, you should look into gathering and filtering your other logs with Logstash, and creating Kibana dashboards. You may also want to gather system metrics by using Topbeat with your ELK stack. All of these topics are covered in the other tutorials in this series.

Good luck!