Your Web News in One Place

Help Webnuz

Referal links:

Sign up for GreenGeeks web hosting
March 3, 2020 02:05 pm GMT

Getting Started with Elasticsearch and Ruby

Recently, DEV has started the migration from Algolia to Elasticsearch. Since I am often asked what is the best way to get started with Elasticsearch, I figured I would share how we have been making the switch. Hopefully, you can use this post as a template if you decide to implement Elasticsearch in your Rails or Ruby app in the future.

Before I get started I want to preface this post by saying that this article assumes you understand the basics of Elasticsearch. You should be familiar with the terms index, mappings, and documents since we will be covering those. If you need a refresher or want to learn about how Elasticsearch works I highly recommend the Elastic docs!

1) Install Elasticsearch

Alt Text

Well, it isn't quite that easy . Before you start hacking away at your code you need to get Elasticsearch up and running so you can talk to it. There are a million different ways to do this depending on your environment so I am going to point you towards the Installing Elasticsearch docs for getting started.

Many of us at DEV use Macs and ended up installing from archive since the Homebrew install seemed to be broken for the majority of us. Once you have Elasticsearch up and running the next step is to get your code talking to it.

2) Install the Elasticsearch Ruby gem

Related Pull Request

The Elasticsearch ruby gem installs just like any other gem, all you have to do is add a line to your Gemfile.

gem "elasticsearch", "~> 7.4" 

One important thing to note is what version of Elasticsearch you are planning on using. The gem versions are numberered to match the Elasticsearch versions. If you are on Elasticsearch version 5 then you will want to use the latest version 5 release of the gem.

Another thing you might notice in the pull request that I reference above is that we also installed the Typhoeus gem.

gem "typhoeus", "~> 1.3.1"

The Elasticsearch gem docs suggest using an HTTP library such as Typhoeus for optimal performance because it supports persistent ("keep-alive") connections.

Once the gem has been successfully installed then you need to create a client within your code to talk to Elasticsearch. We choose to do this through an initializer file, config/initializers/elasticsearch.rb and it looks like this.

require "elasticsearch"SearchClient = Elasticsearch::Client.new(  url: ApplicationConfig["ELASTICSEARCH_URL"],  retry_on_failure: 5,  request_timeout: 30,  adapter: :typhoeus,  log: Rails.env.development?,)

Let's go over the arguments we are passing in here.

  • url: (required) We are passing the client a URL param. You communicate to Elasticsearch via HTTP so you need a URL that your client can use to make requests to. In development, by default, this will be http://localhost:9200

The rest of the arguments are optional.

  • retry_on_failure: The number of times the client will retry before it gives up
  • request_timeout: Sets the time limit for a request to get a response. Any request that takes over 30 seconds to respond will timeout.
  • adapter: The HTTP library in ruby we want to use to help us make these requests. As stated above, ideally you want to use Typhoeus because of its support for Keep Alive connections.
  • log: Determines whether your client is outputting logs for each request you are making.

There are many other options you can pass to your client but these are the basic ones that we use. At this point, some people might be inclined to start writing code to throw things in Elasticsearch. I'm not one of those people.

Alt Text

Whenever I add a new external dependency like a database I like to deploy the interface for using it, in this case, the gem, by itself. This way you can deploy and then jump into a console and make sure everything is hooked up correctly before you start using it in your code. If there are any configuration tweaks that need to be made then you can make those without having to worry about the code breaking.

To validate that you have the cluster hooked up correctly you can jump into a Rails console and issue this command with your new SearchClient:

[1] pry(main)> SearchClient.infoETHON: Libcurl initializedETHON: performed EASY effective_url=http://localhost:9200/ response_code=200 return_code=ok total_time=0.392646=> {"name"=>"mollys_computer", "cluster_name"=>"elasticsearch", "cluster_uuid"=>"123abc456", "version"=>  {"number"=>"7.5.2",   "build_flavor"=>"default",   "build_type"=>"tar",   "build_hash"=>"8bec50e1e0ad29dad5653712cf3bb580cd1afcdf",   "build_date"=>"2020-01-15T12:11:52.313576Z",   "build_snapshot"=>false,   "lucene_version"=>"8.3.0",   "minimum_wire_compatibility_version"=>"6.8.0",   "minimum_index_compatibility_version"=>"6.0.0-beta1"}, "tagline"=>"You Know, for Search"}

If you get a 200 response back like the one above then you know everything is configured correctly. With the gem setup correctly the next step is to start using Elasticsearch, and we are going to do that by making our first index!

2) Setting Up the Tag Index

Related Pull Request

For this example, I am going to show you how we set up our very simple Tag index. The capabilities of Elasticsearch are tremendous but I want to keep it simple with this example so you have a good base to get you started.

Alt Text

To start, we need to do a couple of different things. First, we need to create our index.

index_settings = { number_of_shards: 1, number_of_replicas: 0 }settings = { settings: { index: index_settings } }SearchClient.indices.create(index: "tag_development", body: settings)

Here, we are creating a simple index with 1 shard and 0 replicas. In development, you will often only have a single node, so keeping indexes to a single shard is usually the way to go. However, in production, depending on your data size and number of requests you are making, you may want more shards for your index.

You can run the above command in a console to see it in action. A successful response will look like this:

[37] pry(main)> SearchClient.indices.create(index: "molly", body: settings)ETHON: performed EASY effective_url=http://localhost:9200/molly response_code=200 return_code=ok total_time=0.656192020-02-24 16:00:54 -0500: PUT http://localhost:9200/molly [status:200, request:0.660s, query:n/a]{"acknowledged":true,"shards_acknowledged":true,"index":"molly"}

Once your index is created, the next thing you will need to do is define your mappings. This is where you will define the fields you want to search for.

I HIGHLY suggest when you are working with Elasticsearch for integrated search within an application that you set your mapping dynamic value to strict. Setting the value to strict means that if you try to index a field that is not in your mappings Elasticsearch will raise an error. When doing integrated search you want to keep your documents lean and mean and this ensures that you don't end up with any surprise fields from possible indexing bugs.

Below are the mappings for our tags index.

{  "dynamic": "strict",  "properties": {    "id": {      "type": "keyword"     },    "name": {      "type": "text",      "fields": {        "raw": {          "type": "keyword"        }      }    },    "hotness_score": {      "type": "integer"    },    "supported": {      "type": "boolean"    },    "short_summary": {      "type": "text"    },    "rules_html": {      "type": "text"    }  }}

Before I move on, I want to point out a couple of things here. You probably noticed that we are mapping our id field as a keyword rather than an integer. This is because keywords are optimized for terms queries which is what we will be doing with our ID field. However, for a field like hotness_score, we want to use an integer because we will be searching that using range queries with things like greater or less than.

Another thing you will notice is that name has two types. The text datatype means that we will analyze the field and break it up into tokens to make it easier to full-text search. The keyword datatype is viewed by calling name.raw. Our raw field is storing the name as is, in one complete string. Having two field types allows us to search the tokens of the tag name or the entire name itself.

Ok, now that you understand a little bit about our mappings, lets talk about how we apply them to our newly created index. To keep our linters happy we have the mappings stored in a JSON file and then we import them into our Ruby file like so:

MAPPINGS = JSON.parse(File.read("config/elasticsearch/mappings/tags.json"), symbolize_names: true).freeze

Once we have the mappings set, the next step is to apply them to the new index we just created. You can do this by executing the code below

SearchClient.indices.put_mapping(index: "tags_development", body: MAPPINGS)

If the request is successful you should get a response like this

[38] pry(main)> SearchClient.indices.put_mapping(index: "tags_development", body: MAPPINGS)ETHON: performed EASY effective_url=http://localhost:9200/tags_development/_mapping response_code=200 return_code=ok total_time=0.0799152020-02-24 16:45:56 -0500: PUT http://localhost:9200/tag_development/_mapping [status:200, request:0.095s, query:n/a]2020-02-24 16:45:56 -0500: > {"dynamic":"strict","properties":{"id":{"type":"keyword"},"name":{"type":"text","fields":{"raw":{"type":"keyword"}}},"hotness_score":{"type":"integer"},"supported":{"type":"boolean"},"short_summary":{"type":"text"},"rules_html":{"type":"text"}}}2020-02-24 16:45:56 -0500: < {"acknowledged":true}

Even though you got a 200 response back, you might still want to double-check that your index was created correctly. Once again, you can do this in a console like so:

[2] pry(main)> SearchClient.indices.get(index: "tags_development")ETHON: performed EASY effective_url=http://localhost:9200/tag_development response_code=200 return_code=ok total_time=0.048122=> {"tags_development"=>   "mappings"=>    {"dynamic"=>"strict",     "properties"=>      {"hotness_score"=>{"type"=>"integer"},       "id"=>{"type"=>"keyword"},       "name"=>{"type"=>"text", "fields"=>{"raw"=>{"type"=>"keyword"}}},       "rules_html"=>{"type"=>"text"},       "short_summary"=>{"type"=>"text"},       "supported"=>{"type"=>"boolean"}}},   "settings"=>    {"index"=>      {"creation_date"=>"1581527116462", "number_of_shards"=>"1", "number_of_replicas"=>"0", "uuid"=>"kO-MGUiFSJObSMY_22mrzg", "version"=>{"created"=>"7050299"}, "provided_name"=>"tag_development"}}}}

Now that we have verified that our index is created and has the proper mappings, it's time to start filling it with data!

Alt Text

3) Indexing a Tag Document

Related Pull Request

Before we can send data to Elasticsearch, we first have to get it in the proper format by serializing it. To handle serializing our ActiveRecord model we use the Fast JSON API serializer.

module Search  class TagSerializer    include FastJsonapi::ObjectSerializer    attributes :id, :name, :hotness_score, :supported, :short_summary, :rules_html  endend

Once you have a way to serialize your model data, then all that is left to do is make the request to send it to Elasticsearch. Here is how we do that with our SearchClient:

tag = Tag.find(id)serialized_data = Search::TagSerializer.new(tag).serializable_hash.dig(:data, :attributes)SearchClient.index(id: tag.id, index: "tags_development", body: serialized_data)

Here is what a successful response to the index request above will look like:

{"_index"=>"tags_development", "_type"=>"_doc", "_id"=>"39", "_version"=>10, "result"=>"created", "_shards"=>{"total"=>1, "successful"=>1, "failed"=>0}, "_seq_no"=>351, "_primary_term"=>3}

Another way we can validate that our indexing worked correctly, is by asking Elasticsearch for the tag document using a GET request.

SearchClient.get(id: tag.id, index: "tags_development") 

The above request will give you a response containing all of your tag data in the _source param of the response hash.

{"_index"=>"tags_development", "_type"=>"_doc", "_id"=>"39", "_version"=>10, "_seq_no"=>351, "_primary_term"=>3, "found"=>true, "_source"=>  {"id"=>39,   "name"=>"coolbean",   "hotness_score"=>4,   "supported"=>false,   "short_summary"=>nil,   "rules_html"=>""}}

Now that our index is set up and we have data in it, it's time for the best part.

Alt Text

4) Searching Tags

Related Pull Request

For this search example, I am only going to show you how to set up a query string search. However, search is where Elasticsearch(obviously) really shines, so I highly encourage you to checkout the search docs they have and explore all of the possibilities.

Let's say we want to search for all tags who have a name that starts with "python" AND we want to sort them by hotness_score. Here is how we would do that:

SearchClient.search(  index: "tags_development",  body: {    query: {      query_string: {        query: "name:python*",        analyze_wildcard: true,        allow_leading_wildcard: false      }    },    sort: { hotness_score: "desc" }  })

This request is running a basic query, python*, on the name field in our index. We have also added a wildcard character, *, to indicate that we want all tags that have a name that starts with python. When you run that query you are going to get a result that looks like this:

=> {"took"=>251, "timed_out"=>false, "_shards"=>{"total"=>1, "successful"=>1, "skipped"=>0, "failed"=>0}, "hits"=>  {"total"=>{"value"=>3, "relation"=>"eq"},   "max_score"=>nil,   "hits"=>    [{"_index"=>"tags_development",      "_type"=>"_doc",      "_id"=>"10",      "_score"=>nil,      "_source"=>{"id"=>10, "name"=>"python", "hotness_score"=>2, "supported"=>true, "short_summary"=>nil, "rules_html"=>nil},      "sort"=>[2]},     {"_index"=>"tags_development",      "_type"=>"_doc",      "_id"=>"40",      "_score"=>nil,      "_source"=>{"id"=>40, "name"=>"PythonBeginners", "hotness_score"=>0, "supported"=>false, "short_summary"=>nil, "rules_html"=>nil},      "sort"=>[0]},     {"_index"=>"tags_development",      "_type"=>"_doc",      "_id"=>"41",      "_score"=>nil,      "_source"=>{"id"=>41, "name"=>"PythonExpert", "hotness_score"=>0, "supported"=>false, "short_summary"=>nil, "rules_html"=>nil},      "sort"=>[0]}]}}

BOOM prince harry gif

BOOM! We just ran our first Elasticsearch query! The last thing we need to do is dig out the document hits, aka tags, from our response.

results = SearchClient.search(...)results = search(query_string)  results.dig("hits", "hits").map { |tag_doc| tag_doc.dig("_source") }end=> [{"id"=>10, "name"=>"python", "hotness_score"=>2, "supported"=>true, "short_summary"=>nil, "rules_html"=>nil},    {"id"=>40, "name"=>"PythonBeginners", "hotness_score"=>0, "supported"=>false, "short_summary"=>nil, "rules_html"=>nil},    {"id"=>41, "name"=>"PythonExpert", "hotness_score"=>0, "supported"=>false, "short_summary"=>nil, "rules_html"=>nil}]

Your turn!

Now that you have all of the pieces, it is time for you to go out and start integrating Elasticsearch into your own Ruby or Rails application. Let me know if you have any questions. Happy Searching!

Alt Text

PS I've been on a Schitt's Creek binge lately, your welcome for all the GIFs


Original Link: https://dev.to/molly_struve/getting-started-with-elasticsearch-and-ruby-30hh

Share this article:    Share on Facebook
View Full Article

Dev To

An online community for sharing and discovering great ideas, having debates, and making friends

More About this Source Visit Dev To