In the year since we ported osquery to Windows, the operating system instrumentation and endpoint monitoring agent has attracted a great deal of attention in the open-source community and beyond. In fact, it recently received the 2017 O’Reilly Defender Award for best project.
Many large and leading tech firms have deployed osquery to do totally customizable and cost-effective endpoint monitoring. Their choice and subsequent satisfaction fuels others’ curiosity about making the switch.
But deploying new software to your company’s entire fleet is not a decision to be made lightly. That’s why we sought to take the pulse of the osquery community – to help current and potential users know what to expect. This marks the start of a four-part blog series that sheds light on the current state of osquery, its shortcomings and opportunities for improvement.
Hopefully, the series will help those of you who are sitting on the fence decide if and how to deploy the platform in your companies.
For our research, we interviewed teams of osquery users at five major tech firms. We asked them:
- How is osquery deployed and used currently?
- What benefits are your team seeing?
- What have been your biggest pain points about using osquery?
- What new features would you most like to see added?
This post will focus on current use of osquery and its benefits.
How are companies using osquery today?
osquery’s affordability, flexibility, and cross-platform compatibility has quickly established its place in the endpoint monitoring toolkits of top tech firms. Since its debut in October, 2014, over 1,000 users from more than 70 companies have engaged with the development community through its Slack channel and GitHub repo. In August, osquery developers at Facebook began offering bi-weekly office hours to discuss issues, new features, and design direction.
Users have increased due to a number of recent developments. Since contributors like Trail of Bits and Facebook have transformed osquery to support more operating systems (Windows and FreeBSD), a broader number of organizations are now able to install osquery on a greater portion of their endpoints. Multiple supplementary tools, such as Doorman, Kolide, and Uptycs, have emerged to help users deploy and manage the technology. Monitoring of event-based logs (e.g. process auditing and file integrity monitoring) has further enhanced its utility for incident response. Each of these developments has spurred more organizations with unique data and infrastructure needs to use osquery, sometimes in favor of competing commercial products.
All the companies surveyed leveraged osquery for high performance and flexible monitoring of their fleets. Interviewees expressed particular interest in just-in-time incident response including initial malware detection and identifying propagation.
Many teams used osquery in conjunction with other open source and commercial technologies. Some used collection and aggregation services such as Splunk to mine data collected by osquery. One innovative team built incident alerting with osquery by piping log data into ElasticSearch and auto-generated Jira tickets through ElastAlert upon anomaly detection. Most of the companies interviewed expected to phase out some paid services, especially costly suites (e.g. Carbon Black, Tripwire, Red Cloak), in favor of the current osquery build or upon addition of new features.
Deployment maturity for osquery varied widely. One company reported being at the phase of testing and setting up infrastructure. Other companies had osquery deployed on most or all endpoints in their fleets, including one team who reported plans to roll out to 17,500 machines. Three out of the five companies we interviewed had osquery deployed on production servers. However, one of these companies reported having installed osquery on production machines but rarely querying these endpoints due to concerns about osquery’s reliability and scalability. Runaway queries on production fleets was a major concern for all companies interviewed though no production performance incidents were reported.
Strategies for Deployment
Most companies used Chef or Puppet to deploy, configure, and manage osquery installations on their endpoints. One company used the fleet management tool Doorman to maintain their fleet of remote endpoints and bypass the need for separate aggregation tools. Many teams leveraged osquery’s TLS documentation to author their own custom deployment tools that granted them both independence from third party applications and freedom to fully customize features/configurations to their native environments.
Multiple teams took precautions while rolling out osquery by deploying in stages. One team avoided potential performance issues by assigning osquery tasks to CGroups with limits on CPU and memory usage.
Security teams were responsible for initiating the installation of osquery in the fleet. While most teams did so with buy-in and collaboration from other teams, some executed the installation covertly. One team reported that a performance incident had mildly tarnished the osquery reputation within their organization. Some security teams we interviewed collaborated with other internal teams such as Data Analytics and Machine Learning to mine log data and generate actionable insights.
Benefits of osquery
Teams reported that they liked osquery better than other fleet management tools for a variety of reasons, including:
- simpler to use,
- more customizable, and
- exposed new endpoint data that they had never before had access to.
For teams exploring alternatives to their current tools, the open-source technology helped them avoid the bureaucratic friction of buying new commercial security solutions. For one team, osquery also fit into a growing preference for home-built software within their company.
In its current state, osquery appeared to be most powerful when leveraged as a flexible building block within a suite of tools. Where other endpoint monitoring tools expose users to select log data, osquery provided simple, portable access to a far richer variety of endpoint data. For teams who want to roll their own solutions, or who can’t afford expensive commercial comprehensive suites, osquery was the best option.
How it compares to other endpoint monitoring solutions
Our interviewees mentioned having used or evaluated some alternative endpoint monitoring solutions in addition to osquery. We list the highlights of their comments below. While osquery did present a more flexible, affordable solution overall, some paid commercial solutions still offer distinct advantages, especially in integrating automated prevention and incident response. However, as the development community continues to build features in osquery, the capability gap appears to be closing.
OSSEC is an open source system monitoring and management platform. It features essential incident response tools such as file integrity checking, log monitoring, rootkit detection, and automatic incident response. However, OSSEC lacks osquery’s ability to query multiple hosts (Windows, BSD, etc) with a universal syntax. It’s also not as flexible; users of osquery can quickly form new queries with the usability of SQL syntax, while OSSEC requires cumbersome log file decoders and deliberate ahead-of-time configuration. Both the overall simplicity and the on-going development for community contributed tables have often been cited as advantages osquery has over OSSEC.
SysDig provides a commercial container performance monitoring tool and an open source container troubleshooting tool. While osquery is used for security and malicious incident detection, SysDig tools work with real-time data streams (network or file I/O, or tracking errors in running processes) and are best suited for monitoring performance. However, despite significant recent gains in container support including new Docker tables that allow instrumentation at the host level, SysDig maintains the advantage over osquery for performance-sensitive container introspection. Though osquery is capable of running within containers, our respondents indicated that the current version isn’t yet built to support all deployments cleanly. One user reported avoiding deployment of osquery on their Docker-based production fleet for this reason.
Carbon Black is one of the industry’s leading malware detection, defense, and response packages. In contrast, osquery by itself only provides detection capabilities. However, when combined with alerting systems such as PagerDuty or ElastAlert, osquery can transform into a powerful incident response tool. Finally, interviewees considering Carbon Black remarked on its high price tag and voiced a desire to minimize its use.
Bromium vSentry provides impact containment and introspection powered by micro-virtualization and supported by comprehensive dashboards. While companies can leverage tools like Kolide and Uptycs to access data visualizations similar to osquery, Bromium’s micro-virtualization isolation functionality to quarantine attacks remains an advantage. However, Bromium’s introspection is significantly less flexible and expansive. It can only access data about targeted isolated applications. osquery can be configured to gather data from a growing number of operating-level logs, events, and processes.
Red Cloak provides automated threat detection as part of a service offering from Dell SecureWorks. It has two advantages over osquery: first, it provides an expert team to help with analysis and response; second, it aggregates endpoint information from all customers to inform and improve its detection and response. For organizations focused solely on breach response, Red Cloak may be worth its cost. However, for IT teams who want direct access to a variety of endpoint data, osquery is a better and cheaper solution.
osquery fills a need in many corporate security teams; its transparency and flexibility make it a great option for rolling bespoke endpoint monitoring solutions. Without any modification, it exposes all the endpoint data an analysis engine needs. We expect (and hope) to hear from more security teams multiplying osquery’s power with their incident response toolkit.
That will happen faster if teams would share their deployment techniques and lessons learned. Much of the Slack and Github discussions focus on codebase issues. Too few users openly discuss innovative implementation strategies. But that isn’t the only reason holding back osquery’s adoption.
The second post in this series will focus on users’ pain points. If you use osquery today and have pain points you’d like to add to our research, please let us know! We’d love to hear from you.
How does your experience with osquery compare to that of the teams mentioned in this post? Do you have other workarounds, deployment strategies, or features you’d like to see built in future releases? Tell us! Help us lead the way in improving osquery’s development and implementation.