Skip navigation
All Places > InsightOps > Blog

InsightOps

6 posts
Rapid Seven

What is Syslog?

Posted by Rapid Seven Employee May 24, 2017

This post has been written by Dr. Miao Wang, a Post-Doctoral Researcher at the Performance Engineering Lab at University College Dublin.

This post is the first in a multi-part series of posts on the many options for collecting and forwarding log data from different platforms and the pros and cons of each. In this first post we will focus on Syslog, and will provide background on the Syslog protocol.

What is Syslog?

Syslog has been around for a number of decades and provides a protocol used for transporting event messages between computer systems and software applications. The Syslog protocol utilizes a layered architecture, which allows the use of any number of transport protocols for transmission of Syslog messages. It also provides a message format that allows vendor-specific extensions to be provided in a structured way. Syslog is now standardized by the IETF in RFC 5424 (since 2009), but has been around since the 80’s and for many years served as the de facto standard for logging without any authoritative published specification.

Syslog has gained significant popularity and wide support on major operating system platforms and software frameworks and is officially supported on almost all versions of Linux, Unix, and MacOS platforms. On Microsoft Windows, Syslog can also be supported through a variety of open source and commercial third-party libraries.

Syslog best practices often promote storing log messages on a centralized server that can provide a correlated view on all the log data generated by different system components. Otherwise, analyzing each log file separately and then manually linking each related log message is extremely time-consuming. As a result, forwarding local log messages to a remote log analytics server/service via Syslog has been commonly adopted as a standard industrial logging solution.

How does Syslog work?

The Syslog standard defines three different layers, namely the Syslog content, the Syslog application and the Syslog transport. The content refers to the information contained in a Syslog event message. The application layer is essentially what generates, interprets, routes, and stores the message while the Syslog transport layer transmits the message via the network.

Screen Shot 2014-08-29 at 8.55.34 AM

Diagram 1 from the RFC 5424 Syslog Spec

According to the Syslog specification, there is no acknowledgement for message delivery and although some transports may provide status information, the Syslog protocol is described as a pure simplex protocol. Sample deployment scenarios in the spec show arrangements where messages are said to be created by an ‘originator’ and forwarded on to a ‘collector’ (generally a logging server or service used for centralized storage of log data). Note ‘relays ’ can also be used between the originator and the collector and can do some processing on the data before it is sent on (e.g. filtering out events, combining sources of event data).

Applications can be configured to send messages to multiple destinations, and individual Syslog components may be running in the same host machine.

The Syslog Format

Sharing log data between different applications requires a standard definition and format on the log message, such that both parties can interpret and understand each other’s information. To provide this, RFC 5424 defines the Syslog message format and rules for each data element within each message.

A Syslog message has the following format: A header, followed by structured-data (SD), followed by a message.

The header of the Syslog message contains “priority”, “version”, “timestamp”, “hostname”, “application”, “process id”, and “message id”. It is followed by structured-data, which contains data blocks in the “key=value” format enclosed in square brackets “[]”, e.g. [SDID@0 utilization=“high” os=”linux”] [SDPriority@0 class=”medium”]. In the example image below, the SD is simply represented as “-“, which is a null value (nilvalue as specified by RFC 5424). After the SD value, BOM represents the UTF-8 and “su root failed on /dev/pts/7” shows the detailed log message, which should be encoded UTF-8. (For more details of the data elements of SLP, please refer to: http://tools.ietf.org/html/rfc5424)

Untitled

A Sample Syslog Message with Format Broken Out

Why Syslog?

The complexity of modern application and systems is ever increasing and to understand the behavior of complex systems, administrators/developers/Ops etc. often need to collect and monitor all relevant information produced by their applications. Such information often needs to be analyzed and correlated to determine how their systems are behaving. Consequently, administrators can apply data analytic techniques to either diagnose root causes once problems occur or gain an insight into current system behavior based on statistical analysis.

Frequently, logs have been applied as a primary and reliable data source to fulfill such a mission for lots of reasons, some of which I’ve listed here:

  • Logs can provide transient information for administrators to roll back the system to a proper status after a failure accident. E.g. when a banking system fails, all transactions lost from the main memory can be recorded in the logs.
  • Logs can contain a rich diversity of substantial information produced by individual applications to allow administrators/developers/ops teams to understand system behavior from many aspects such as current system statistics, trend predictions, and troubleshooting.
  • Logs are written externally by the underlying application to hard disks and external services such that by reading these log files, there will not be any direct performance impact on the monitored system. Therefore, in a production environment administrators can safely monitor running applications via their logs without worrying about impacting performance.

 

However, a key aspect of log analysis is to understand the format of the arriving log data, especially in a heterogeneous environment where different applications may be developed using different log formats and network protocols to send these log data. Unless this is well defined, it is quite difficult to interpret log messages sent by an unknown application. To solve this issue Syslog defines a logging standard for different systems and applications to follow in order to easily exchange log information. Based on the logging protocol, Syslog helps applications effectively interpret each log attribute to understand the meaning of the log message.

Ready to put Syslog into action?

Try our free log management tool today.

It’s generally a good practice to minify and combine your assets (Javascript & CSS) when deploying to production. This process reduces the size of your assets and dramatically improves your website’s load time.

 

what are javascript source maps

Source maps create a map from these compressed asset files back to the source files.

This source map allows you to debug and view the source code of your compressed assets, as if you were actually working with the original CSS and Javascript source code.

Take a look at jQuery minified & combined code that was generated from the original source code. The code is practically unreadable and would be difficult to debug.

 

But, as we all know, no matter how thoroughly you test, sometimes bugs will fall through the cracks. This is why it’s useful to debug Javascript code in production, and that’s when source maps come in handy.

How do you use Javascript source maps?

With InsightOps we use UglifyJS for minification and Javascript source map generation. UglifyJS is a NodeJS library written in Javascript.

To install Uglify JS with NPM:

npm install uglify-js -g

Minify the files and generate source maps:

uglify-js file1.js file2.js -o output.js --source-map output.map.js

The code above tells UglifyJS to:

  • Take file1.js and file2.js as input
  • Compress input files and output them to output.js
  • Generate the source map for the compressed file and output it to output.map.js

 

Marrying source maps and Django Compressor

 

Django Compressor is a great Django plugin to mark assets for minification right inside your templates:

{% load compress %} {% compress js %} <script src="/static/js/one.js" 
type
="text/javascript" charset="utf-8"></script> <script type="text/ja
vascript"
charset="utf-8">obj.value = "value";</script> {% endcompress
%}

Behind the scenes you can develop logic to combine and minify the files with any algorithm or third party tools of your choosing.

A great blog post by Chris Roby goes into great detail about how to extend compressor to work with UglifyJS and produce source maps. It is definitely worth the read if you’re running Django.

Browser support

Source maps are a new addition to the developer toolbox. Although the source maps spec lives in Google docs (no kidding), they’re supported by all major browsers: Chrome, Safari, Firefox, and IE11. By default, source maps are disabled so your users will not incur any unnecessary bandwidth overheads.

To enable source maps in Google Chrome, go to Developer Tools, click the little cog icon, and then make sure that “Enable Javascript source maps” is checked.

Enable source maps

That’s it.

Now each compressed asset file contains a link pointing to its source map, and we’ve just told Chrome not to ignore them.

See Javascript source maps in action

If you’d like to see Javascript source maps in action, check out our free log management tool and take a look at our source code.

 

Logentries source code

The files highlighted in green are compressed Javascript files; the folders highlighted in blue are generated from source maps and contain the original source code that’s mapped onto the compressed files. We can set breakpoints on mapped code, inspect variables, step through, and do pretty much anything that we can with original code.

Pretty cool, huh?

Rapid Seven

Heroku Dynos Explained

Posted by Rapid Seven Employee May 24, 2017

What are Heroku Dynos?

 

If you’ve ever hosted an application on Heroku, the popular platform as a service, you’re likely at least aware of the existence of “Dynos”. But what exactly are Heroku Dynos and why are they important?

 

heroku-dynos-explained

As explained in Heroku’s docs, Dynos are simply lightweight Linux containers dedicated to running your application processes. At the most basic level, a newly deployed app to Heroku will be supported by one Dyno for running web processes. You then have the option of adding additional Dynos and specifying Dyno processes in your procfile. Dynos actually come in three different flavors:

 

  • Web Dynos: for handling web processes
  • Worker Dynos: for handling any type of process you declare (like background jobs)
  • One-off Dynos: for handling one-off tasks, such as database migrations

 

One of the great things about Heroku Dynos is how easy they are to scale up and out. Through Heroku’s admin portal or via the command line, you can easily add more Dynos or larger Dynos. Adding additional Dynos can help speed up your application’s response time by handling more concurrent requests, whereas adding larger Dynos can provide additional RAM for your application.

Using Heroku Dynos to get the insights you need

Great. I get it. Dynos make it easy to run my application with less hassle. For this reason, I should have to think very little about Heroku Dynos, right? Wrong!

As Dynos are individual containers and identified uniquely in your Heroku logs, they can provide some great insight into where issues may be stemming from when things go wrong.

When sending Heroku logs to InsightOps, a Dyno’s unique ID will automatically be logged in key-value pair format along with information about the process it handled. In the example below, we see web Dynos being identified along with the HTTP requests being handled:

heroku_log_view

An easier way to view this data in InsightOps is to use the Table View:

heroku_tableview

As you can see, the Heroku Dyno is easily identified along other pertinent data. Since I’ve enabled Heroku’s beta log-runtime-metrics from Heroku Labs, I can also see data related to CPU and memory per Dyno, which is particularly useful for identifying issues like too much swap memory being used (perhaps indicating a need to scale up my Dynos).

 

Since Dynos are uniquely identified in key value pairs, you can also use them to visualize data. In the example below, I’m visualizing how much swap memory each Dyno is using over a given period of time:

heroku_swap_memory

You can also create visualizations to help you monitor how much each Dyno is being used compared to others:

Screen Shot 2016-04-27 at 1.03.45 AM

Finally, checking the Heroku Dyno related to errors in your logs could hint at Dyno-related issues. In the example below, we see that Dyno web.2 is related to both errors, which happen to be backend connection timeouts. While Heroku Dynos are allowed to fail and automatically restart on a different server, this finding could warrant you manually restarting your Dynos to alleviate the issue.

Screen Shot 2016-04-27 at 1.08.23 AM

Start logging with Heroku Dynos today

Ready to start logging from your Heroku app today? Check out our free log management tool.

Server monitoring is a requirement, not a choice. It is used for your entire software stack, web-based enterprise suites, custom applications, e-commerce sites, local area networks, etc. Unmonitored servers are lost opportunities for optimization, difficult to maintain, more unpredictable, and more prone to failure.

 

While it is very likely that your team has a log management and analysis initiative, it’s important to determine if you are only responding to what the logs tell you about the past, or are you planning ahead based on the valuable log data you are monitoring and analyzing?

 

There are two basic approaches to server monitoring: passive monitoring and active monitoring. They are as much a state of mind as a process. And there are significant differences in the kinds of value each type of server monitoring provides; each type has its own advantages and disadvantages.

 

What is Passive Server Monitoring?

 

Passive server monitoring looks at real-world historical performance by monitoring actual log-ins, site hits, clicks, requests for data, and other server transactions. When it comes to addressing issues in the system, the team will review historical log data, and from there they analyze the logs to troubleshoot and pinpoint issues. This was previously done with a manual pull of logs. While this helps developers identify where issues are, using a powerful modern log analysis service to simply automate an existing process is a waste.

 

Passive monitoring only shows how your server handles existing conditions, but it may not give you much insight into how your server will deal with future ones. For example, if one of the components of the system, a database server, is likely to be overloaded when the load rate of change is reached. This is not going to be clear when server log data has already been recorded, unless your team is willing to stare at a graph in real-time, 24/7…which has been nearly the case in some NOC operations I have witnessed.


What is Active Server Monitoring?

 

The most effective way to get past these limits is by using active server monitoring. Active monitoring is the approach that leverages smart recognition algorithms to take current log data and use it to predict future states. This is done by some complex statistics (way over my head) that compare real-time to previous conditions, or past issues. For example it leverages anomaly detection, steady state analysis, and trending capabilities to predict that a workload is about to hit its max capacity. Or there is a sudden decrease in external network-received packets, a sign of public web degradation.

 

Besides finding out what is possibly going to happen, active server monitoring also helps to avoid the time spent on log deep dives. Issues will sometimes still pass you by, and you will still need to take a deeper look, but because information is pushed to you, some of the work is already done, and you can avoid the log hunt.

 

Oh and active monitoring can help the product and dev team from an architectural standpoint. If, for example, a key page is being accessed infrequently, or if a specific link to that page is rarely used, it may indicate a problem with the design of the referring page, or with one of the links leading to that page. A close look at the log can also tell you whether certain pages are being accessed more often than expected  —  which can be a sign that the information on those pages should be displayed or linked more prominently.

 

Any Form of Server Monitoring is Better Than None

 

Log analysis tools are the heart of both server monitoring approaches. Log analysis can indicate unusual activity which might slip past and already overloaded team. Another serious case is security. A series of attempted page hits that produce “page not found” or access denied” errors, for example, could just be coming from a bad external link  —  or they could be signs of an attacker probing your site. HTTP request that are pegging a server process could be a sign that a denial of service attack has begun.

 

It is hard to make the shift from passive to active monitoring. Why? Not because you and your team are not interested in thinking ahead. But more so because many operations are entrenched in existing processes that are also reactive. And sometimes teams are just unaware that their tool can provide this type of functionality. Until one day it does it automatically for you, and you have a pleasant surprise.

 

Active server monitoring can mean the difference between preventing problems before they get a chance to happen, or rushing to catch up with trouble after it happens. And they are the difference between a modern version of an old process, and moving forward to a modern software delivery pipeline.

 

Ready to Make the Shift from Passive to Active Monitoring?

 

Sign up for our free log management tool today.

Merry HaXmas to you! Each year we mark the 12 Days of HaXmas with 12 blog posts on hacking-related topics and roundups from the year. This year, we’re highlighting some of the “gifts” we want to give back to the community. And while these gifts may not come wrapped with a bow, we hope you enjoy them.

 

Machine generated log data is probably the simplest and one of the most used data source for everyday use cases such as troubleshooting, monitoring, security investigations … the list goes on. Since log data records exactly what happens in your software over time it is extremely useful for understanding what had caused an outage or security vulnerability. With technologies like InsightOps, it can also be used to monitor systems in real time by looking at live log data which can contain anything from resource usage information, to error rates, to user activity etc.

 

So in short when used for the right job, log data is extremely powerful…. until it’s NOT!

 

When is it not useful to look at logs? When your logs don’t contain the data you need. How many times during an investigation have your logs contained enough information to point you in the right direction, but then fell short of giving you the complete picture. Unfortunately, it is quite common to run out of road when looking at log data; if only you had recorded 'user logins’, or some other piece of data that was important with hindsight, you could figure out what user installed some malware and your investigation would be complete. 

 

Log data, by its very nature, provides an incomplete view of your system, and while log and machine data is invaluable for troubleshooting, investigations and monitoring, it is generally at its most powerful when used in conjunction with other data sources. If you think about it, knowing exactly what to log up front to give you 100% code or system coverage is like trying to predict the future. Thus when problems arise or investigations are underway, you may not have the complete picture you need to identify the true root cause.

 

So our gift to you this HaXmas is the ability to generate log data on the fly through our new endpoint technology, InsightOPs, which enables you to  fill in any missing information during troubleshooting or investigations. InsightOps is pioneering the ability to generate log data on the fly by allowing end users to ask questions of their environment, InsightOps is pioneering the ability to generate log data on the fly by returning answers in the form of logs. Essentially, it will allow you to create synthetic logs which can be combined with your traditional log data - giving you the complete picture! It also gives you all this information in one place (so no need to combine a bunch of different IT monitoring tools to get all the information you need).

 

Live+Questions_InsightOps.png

You will be able to ask anything from 'what processes are running on every endpoint in my environment’ to ‘what is the memory consumption' of a given process or machine. In fact, our vision is to allow users to ask any question that might be relevant for their environment such that you will never be left in the dark and never again have to say ‘if only I had logged that.’

 

Memory+Utilization_InsightOps+Question.png

Interested in trying InsightOps for yourself? Don’t forget to sign up to our beta here: https://www.rapid7.com/products/insightops/beta-request  

 

Happy HaXmas!

Our mission at Rapid7 is to solve complex security and IT challenges with simple, innovative solutions. Late last year Logentries joined the Rapid7 family to help to drive this mission. The Logentries technology itself had been designed to reveal the power of log data to the world and had built a community of 50,000 users on the foundations of our real time, easy to use yet powerful log management and analytics engine.

 

Today we are excited to announce InsightOps, the next generation of Logentries. InsightOps builds on the fundamental premise that in a world where systems are increasingly distributed, cloud-based and made up of connected/smart devices, log and machine data is inherently valuable to understand what is going on, be that from a performance perspective, troubleshooting customer issues or when investigating security threats.

 

However, InsightOps also builds on a second fundamental premise, which is that log data is very often an incomplete view of your system, and while log and machine data is invaluable for troubleshooting, investigations and monitoring, it is generally at its most powerful when used in conjunction with other data sources.

 

If you think about it, knowing exactly what to log up front to give you 100% code or system coverage is like trying to predict the future. Thus when problems arise or investigations are underway, you may not have the complete picture you need to identify the true root cause.

 

To solve this problem InsightOps allows users to ask questions of specific endpoints in your environment. The endpoints return answers to these questions, in seconds, in the form of log events such that they can be correlated with your existing log data. I think of it as being able to generate 'synthetic logs' on the fly - logs designed to answer your questions as you investigate or need vital missing information. How often have you said during troubleshooting or an investigation "I wish I had logged that…”? Now you can ask questions in real time to fill in the missing details e.g. “who was the last person to have logged into this machine?”

 

Live Questions_InsightOps.png

Fig1. InsightOPs Endpoint Question Examples

 

InsightOps combines both log data and endpoint information such that users can get a more complete understanding of their infrastructure and applications through a single solution. InsightOps will now deliver this IT data in one place and thus avoids the need for IT professionals to jump between several, disparate tools in order to get a more complete picture of their systems. By the way - this is the top pain point IT professionals have reported across lots and lots of conversations that we have had, and that we continue to have, with our large community of users.

 

Memory Utilization_InsightOps Question.pngFig2. InsightOPs Endpoint Data Example

 

To say I am excited about this is an understatement - I’ve been building and researching log analytics solutions for more than 10 years and I truly believe the power provided by combining logs and endpoints will be a serious game changer for anybody who utilizes log data as part of their day to day responsibilities -- be that for asset management, infrastructure monitoring, maintaining compliance or simply achieving greater visibility, awareness and control over your IT environment.

 

InsightOps will also be providing some awesome new capabilities beyond our new endpoint technology, including:

 

Visual Search: Visual search is an exciting new way of searching and analyzing trends in your log data by interacting with auto-generated graphs. InsightOps will automatically identify key trends in your logs and will visualize these when in visual search mode. You can interact with these to filter your logs allowing you to search and look for trends in your log data without having to write a single search query.

 

New Dashboards and Reporting: We have enhanced our dashboard technology making it easier to configure dashboards as well as providing a new, slicker look and feel. Dashboards can also be exported to our report manager where you can store and schedule reports, which can be used to provide a view of important trends e.g. reporting to management or for compliance reporting purposes.

 

Data Enrichment: Providing additional context and structuring log data can be invaluable for easier analysis and ultimately to drive more value from your log and machine data. InsightOps enhances your logs by enriching them in 2 ways, (1) by combining endpoint data with your traditional logs to provide additional context and (2) by normalization your logs into a common JSON structure such that it is easier for users to work with, run queries against, build dashboards etc.

 

 

Visual Search_InsightOps.pngFig3. InsightOPs Visual Search

 

As always check it out and let us know what you think - we are super excited to lead the way into the next generation of log analytics technologies. You can apply for access to the InsightOps beta program here: https://www.rapid7.com/products/insightops/beta-request