Welcome to the third post in our series about osquery. So far, we’ve described how five enterprise security teams use osquery and reviewed the issues they’ve encountered. For our third post, we focus on the future of osquery. We asked users, “What do you wish osquery could do?” The answers we received ranged from small requests to huge advancements that could disrupt the incident-response tool market. Let’s dive into those ‘super features’ first.
osquery super features
Some users’ suggestions could fundamentally expand osquery’s role from an incident detection tool, potentially allowing it to steal significant market share from commercial tools in doing prevention and response (we listed a few of these in our first blog post). This would be a big deal. A free and open source tool that gives security teams access to incident response abilities normally reserved for customers of expensive paid services would be a windfall for the community. It could democratize fleet security and enhance the entire community’s defence against attackers. Here are the features that could take osquery to the next level:
Writable access to endpoints
What it is: Currently, osquery is limited to read-only access on endpoints. Such access allows the program to detect and report changes in the operating systems it monitors. Write-access via an osquery extension would allow it to edit registries in the operating system and change the way endpoints perform. It could use this access to enforce security policies throughout the fleet.
Why it would be amazing: Write-access would elevate osquery from a detection tool to the domain of prevention. Rather than simply observing system issues with osquery, write-access would afford you the ability to harden the system right from the SQL interface. Application whitelisting and enforcement, managing licenses, partitioning firewall settings, and more could all be available.
How we could build it: If not built correctly, write-access in osquery could cause more harm than good. Write-access goes beyond the scope of osquery core. Some current users are only permitted to deploy osquery throughout their fleet because of its limited read-only permissions. Granting write-access through osquery core would bring heightened security risks as well as potential for system disruption. The right way to implement this would be to make it available to extensions that request the functionality during initialization and minimize the impact this feature has on the core.
IRL Proof: In fact, we have a pull request waiting on approval that would support write-access through extensions! The code enables write-permissions for extensions but also blocks write-permissions for tables built into core.
We built this feature in support of a client who wanted to block malicious IP addresses, domains and ports for both preventative and reactive use-cases. Once this code is committed, our clients will be able to download our osquery firewall extension to use osquery to partition firewall settings throughout their fleets.
What it is: If osquery reads a log entry that indicates an attack, it could automatically respond with an action such as quarantining the affected endpoint(s). This super feature would add automated prevention and incident response to osquery’s capabilities.
Why it would be amazing: This would elevate osquery’s capabilities to those of commercial vulnerability detection/response tools, but it would be transparent and customizable. Defense teams could evaluate, customize, and match osquery’s incident-response capabilities to their companies’ needs, as a stand-alone solution or as a complement to another more generic response suite.
How we could build it: Automated event response for osquery could be built flexibly to allow security teams to define their own indicators of incidents and their preferred reactions. Users could select from known updated databases: URL reputation via VirusTotal, file reputation via ReversingLabs, IP reputation of the remote addresses of active connections via OpenDNS, etc. The user could pick the type of matching criteria (e.g., exact, partial, particular patterns, etc.), and prescribe a response such as ramping up logging frequency, adding an associated malicious ID to a firewall block list, or calling an external program to take an action. As an additional option, event triggering that sends logs to an external analysis tool could provide more sophisticated response without damaging endpoint performance.
IRL Proof: Not only did multiple interviewees long for this feature; some teams have started to build rudimentary versions of it. As discussed in “How are teams currently using osquery?”, we spoke with one team who built incident alerting with osquery by piping log data into ElasticSearch and auto-generated Jira tickets through ElastAlert upon anomaly detection. This example doesn’t demonstrate full response capability, but it illustrates how useful just-in-time business process reaction to incidents is possible with osquery. If osquery can monitor event-driven logs (FIM, process auditing, etc), trigger an action based on detection of a certain pattern, and administer a protective response, it can provide an effective endpoint protection platform.
Technical debt overhaul
What it is: Many open source projects carry ‘technical debt.’ That is, some of the code engineering is built to be effective for short-term goals but isn’t suitable for long-term program architecture. A distributed developer community each enhancing the technology for slightly different requirement exacerbates this problem. Solving this problem requires costly coordination and effort from multiple community members to rebuild and standardize the system.
Why it would be amazing: Decreasing osquery’s technical debt would upgrade the program to a standard that’s adoptable to a significantly wider range of security teams. Users in our osquery pain points research cited performance effects and reliability among organizational leadership’s top concerns for adopting osquery. Ultimately, the teams we interviewed won the argument, but there are likely many teams who didn’t get the green light on using osquery.
How we could build it: Tackling technical debt is hard enough within an organization. It’s liable to be even harder in a distributed community. Unless developers have a specific motivation for tackling very difficult high-value inefficiencies, the natural reward for closing an issue biases developers toward smaller efforts. To combat this, leaders in the community could dump and sort all technical debt issues along a matrix of value and time, leave all high-value/low-time issues for individual open source developers, and pool community resources to resolve harder problems as full-fledged development projects.
IRL Proof: We know that pooling community resources to tackle technical debt works. We’ve been doing it for over a year. Trail of Bits has been commissioned by multiple companies to build features and fixes too big for the open source community. We’ve leveraged this model to port osquery to Windows, enhance FIM and process auditing, and much more that we’re excited to share with the public over the coming months. Often, multiple clients are interested in building the same things. We’re able to pool resources to make the project less expensive for everyone involved while the entire community benefits.
Other features users want
osquery shows considerable potential to grow beyond endpoint monitoring. However, the enterprise security teams and developers whom we interviewed say that the open source tool has room for improvement. Here are some of the other requests we heard from users:
- Guardrails & rules for queries: Right now, a malformed query or practice can hamper the user’s workflow. Interviewees wanted guidance on targeting the correct data, querying at correct intervals, gathering from recommended tables, and customized recommendations for different environments.
- Enhance Deployment Options: Users sought better tools for deploying throughout fleets and keeping these implementations updated. Beyond recommended QueryPacks, administrators wanted to be able to define and select platform-specific configurations of osquery across multi-platform endpoints. Automatically detecting and deploying configurations for unique systems and software was another desired feature.
- Integrated Testing, Debugging, and Diagnostics: In addition to the current debugging tools, users wanted more resources for testing and diagnosing issues. New tools should help improve reliability and predictability, avoid performance issues, and make osquery easier to use.
- Enhanced Event-Driven Data Collection: osquery has support for event-based data collection through FIM, Process Auditing, and other tables. However, these data sources suffer from logging implementation issues and are not supported on all platforms. Better event-handling configurations, published best practices, and guardrails for gathering data would be a great help.
- Enhanced Performance Features: Users want osquery to do more with fewer resources. This would either lead to overall performance enhancements, or allow osquery to operate on endpoints with low resource profiles or mission-critical performance requirements.
- Better Configuration Management: Enhancements such as custom tables and osqueryd scheduled queries for differing endpoint environments would make osquery easier to deploy and maintain on a growing fleet.
- Support for Offline Endpoint Logging: Users reported a desire for forensic data availability to support remote endpoints. This would require offline endpoints to store data locally –- including storage of failed queries –- and push to the server upon reconnection
- Support for Common Platforms: Facebook built osquery for its fleet of macOS- and Linux-based endpoints. PC sysadmins were out of luck until our Windows port last year. Support for other operating systems has been growing steadily thanks to the development community’s efforts. Nevertheless, there are still limitations. Think of this as one umbrella feature request: support for all features on all operating systems.
The list keeps growing
Unfortunately for current and prospective osquery users, Facebook can’t satisfy all of these requests. They’ve shared a tremendous gift by open sourcing osquery. Now it’s up to the community to move the platform forward.
Good news: none of these feature requests are unfeasible. The custom engineering is just uneconomical for individual organizations to invest in.
In the final post in this series, we’ll propose a strategy for osquery users to share the cost of development. Companies that would benefit could pool resources and collectively target specific features.
This would accelerate the rate at which companies could deprecate other full-suite tools that are more expensive, less flexible and less transparent.
If any of these items resonate with your team’s needs, or if you use osquery currently and have another request to add to the list, please let us know.
Pingback: Manage Santa within osquery | Trail of Bits Blog
Pingback: QueryCon 2018: our talks and takeaways | Trail of Bits Blog
Pingback: Announcing the Trail of Bits osquery support group | Trail of Bits Blog
We have built an osquery saas platform with integrated threat feeds and vulnerable databases, virustotal integration, etc. Have a look, https://sttor.com/