Search is a fundamental requirement for anyone working with log files. When you have terabytes and petabytes of data, you need to find answers to questions – fast. The search engine that you choose sits as the cornerstone for any technology that helps you look for the information needed to answer questions. While OpenSearch and Elasticsearch may have similar beginnings, their modern iterations have significant differences. By understanding these similarities and differences, you can make informed decisions about which one works best for your log management search needs.
What is Elasticsearch?
Based on the Apache Lucene project, Elasticsearch’s highly scalable design enables organizations to engage in everything from simple search to complex data analysis. Elasticsearch enables users to search structured and unstructured data across various use cases, including:
- Log monitoring
- Infrastructure monitoring
- Application performance monitoring
- Endpoint security monitoring
The distributed search and analytics engine is compatible with various APIs, including RESTful, making it easy for developers to integrate it into their applications. With the ability to harness data at cloud speed and scale, Elasticsearch offers capabilities for implementing security analytics.
As organizations adopted the Elasticsearch, Logstash, and Kibana (ELK) stack, they found that the open-source license enables them to build strong analytics technologies. In 2021, Elastic, the company responsible for the ELK stack, shifted from an open source software (OSS) licensing model to a Server Side Public License (SSPL) model, changing the costs associated with the software. Elasticsearch continues to offer a free and open source based product.
What is OpenSearch?
OpenSearch is an open source search and analytics suite consisting of the following:
- OpenSearch: data store and search engine
- OpenSearch Dashboards: visualization and user interface
- Data Prepper: server-side data collector
OpenSearch is built on an open-source fork of Elasticsearch and Kibana, remaining a community-driven technology that enables developers to create products by using various plug-ins for:
- Search
- Analytics
- Observability
- Security
- Machine learning
With OpenSearch, users can leverage the search and visualization capabilities to get more information from their log data, enabling the following use cases:
- Rapid search across high volumes of data
- Application monitoring
- Infrastructure monitoring
- Security monitoring and forensic analysis
- Operational health monitoring
As an open-source project, OpenSearch has transparency, providing visibility into the code and control over how developers can use it.
OpenSearch and Elasticsearch Features
Elasticsearch and OpenSearch begin with a similar foundation. However, as the two followed different development paths, their functionalities evolved differently.
Common Functionalities
The common functionalities arise from both engines being built on the Lucene and Elasticsearch 7.10.2 meaning that they share the following capabilities:
- Indexing
- Document merging
- Similarity scores used for search relevance
- Filter caches
Competing Functionalities
The fundamental differences lie in licensing and transparency. With ElasticSearch, functionalities are proprietary with not all features available in open source. While with OpenSearch, the functionalities are open-source code, meaning people can use them to develop their own applications and are free to use.
When comparing the functionalities, you should consider whether OpenSearch gives you the same capabilities for free that Elasticsearch places behind a paid license. For example, while both offer the following functionalities, only OpenSearch enables you to incorporate them without having to pay a fee. A few listed here:
- Centralized user access controls, like LDAP or OpenID
- Cross-cluster replication
- IP filtering
- Configurable retention period
- Anomaly detection
- Java Database Connectivity (JDBC)
- Open Database Connectivity (ODBC)
Elasticsearch vs. OpenSearch – Query Types
Although both provide robust query options, you should understand the variations between the two.
Elasticsearch
Elasticsearch’s Query Domain Specific Language (DSL) is based on JavaScript Object Notation (JSON). By default, Elasticsearch sorts matching results according to its internally derived relevance score.
Elasticsearch offers the following query types:
- Compound queries: wrapping other compound or leaf queries to combine results/scores, change behaviors, or switch to filtering context
- Full text queries: queries for analyzed text fields after converting unstructured text
- Geo queries: queries that support latitude/longitude pairs and fields for various geometric shapes
- Shape queries: index arbitrary dimensions that supports mapping out cartesian data
- Joining queries: nested, has_child, and has_parent queries to identify results across a distributed system
- Match all/Match none: queries for 100% or 0% matching a parameter
- Span queries: low-level positional query to control the order and proximity of specified terms
- Specialized queries: various query types including distance_feature, more_like_this, percolate, rank_feature, script, script_score, wrapper, pinned, and rule
- Term level queries: search structured data based on precise values, like date ranges or IP addresses
- Text expansion: convert query text into a list of token-weight pairs
OpenSearch
The OpenSearch Query DSL is a flexible language with a JSON interface. It has many of the same query methods.
For leaf queries, where you search for a specified value within a field or fields, the query types available are:
- Term-level: searching index for documents with an exact search term
- term: an exact term in a specific field
- terms: one or more terms in a specific field
- terms_set: matching a minimum number of terms in a specific field
- ids: searching by document ID
- Range: field values in a specific range
- prefix: documents containing terms that begin with a defined prefix
- exists: documents with any indexed value in a specific field
- fuzzy: terms similar to search term within a defined variance
- wildcard: terms that match a wildcard pattern
- regexp: terms matching a regular expression
- Full-text: used with text documents and analyzed text fields, including
- intervals: control over proximity and order
- match: for fuzzy matching or proximity searches
- match_bool_prefix: all terms in any position
- match_phrase: whole phrases
- match_phrase__prefix: terms as a whole phrase
- multi_match: queries against multiple fields
- query_string: strict syntax for Boolean conditions and multi-field search within a single query string
- Geographic and xy: fields containing points and shapes on a map or coordinate plane, like geospatial data or two-dimensional coordinate data
- Geo-bounding box: geopoint field values within a bounding box
- Geodistance: geopoints within a specified distance from provided geopoint
- Geopolygon: geopoints within a polygon
- Geoshape: one of four spatial relations to provided shape, intersects, disjoint, within, contains
- Joining: search nested fields or return parent and child documents with queries nested, has_child, has_parent, parent_id
- Span: low-level positional query to control the order and proximity of specified terms
- Specialized queries: other query types not aligned to those previously listed
- distance_feature: distance between origin and documents date, data_nanos, or geo_points fields
- more_like_this: similar to provided text, document, or collection of documents
- neural: vector field searches used with machine learning (ML) models
- neural_sparse: vector field search that runs queries with reduced memory and CPU resources
- percolate: queries stored as documents matching a provided document
- rank_feature: score calculated based on numeric features
- script: script as filter
- script_score: custom score using a script
- Wrapper: accepts queries as JSON or YAML strings
Graylog: Support for OpenSearch
At Graylog, we believe that software should be open and accessible to all. You shouldn’t have to pay to analyze your own data, no matter how much you have. With Graylog 5.0 we added support for OpenSearch 2.x versions to create a sustainable path for Graylog Open so that we can continue to provide an open-source centralized log management tool for customers that need it. Further, for organizations that want to scale their Graylog deployments in the future and gain the value of Graylog Enterprise and Graylog Security, OpenSearch makes these upgrades easier and is a required component.
With Graylog, you can build sophisticated queries in minutes, searching terabytes of data in milliseconds so that you get the answers you need at lightning-fast speed. With the ability to customize dashboards, you can quickly spot trends and find anomalies, enabling cross-functional teams with insights into security, application, and IT infrastructure health.
Contact us today to learn how Graylog can help you optimize your log data for real-time visibility.