An Interest In:
Web News this Week
- March 27, 2024
- March 26, 2024
- March 25, 2024
- March 24, 2024
- March 23, 2024
- March 22, 2024
- March 21, 2024
How to Index and Query Data With Haystack and Elasticsearch in Python
Haystack
Haystack is a Python library that provides modular search for Django. It features an API that provides support for different search back ends such as Elasticsearch, Whoosh, Xapian, and Solr.
Elasticsearch
Elasticsearch is a popular Lucene search engine capable of full-text search, and it's developed in Java.
Google search uses the same approach of indexing their data, and that's why it's very easy to retrieve any information with just a few keywords, as shown below.
Install Django Haystack and Elasticsearch
The first step is to get Elasticsearch up and running locally on your machine. Elasticsearch requires Java, so you need to have Java installed on your machine.
We are going to follow the instructions from the Elasticsearch site.
Download the Elasticsearch 1.4.5 tar as follows:
curl -L -O https://download.elastic.co/elasticsearch/elasticsearch/elasticsearch-1.4.5.tar.gz
Extract it as follows:
tar -xvf elasticsearch-1.4.5.tar.gz
It will then create a batch of files and folders in your current directory. We then go into the bin directory as follows:
cd elasticsearch-1.4.5/bin
Start Elasticsearch as follows.
./elasticsearch
To confirm if it has installed successfully, go to https://127.0.0.1:9200/, and you should see something like this.
{
"name" : "W3nGEDa",
"cluster_name" : "elasticsearch",
"cluster_uuid" : "ygpVDczbR4OI5sx5lzo0-w",
"version" : {
"number" : "5.6.3",
"build_hash" : "1a2f265",
"build_date" : "2017-10-06T20:33:39.012Z",
"build_snapshot" : false,
"lucene_version" : "6.6.1"
},
"tagline" : "You Know, for Search"
}
Ensure you also have haystack installed.
pip install django-haystack
Let's create our Django project. Our project will be able to index all the customers in a bank, making it easy to search and retrieve data using just a few search terms.
django-admin startproject Bank
This command creates files that provide configurations for Django projects.
Let's create an app for customers.
cd Bank
python manage.py startapp customers
settings.py
Configurations
In order to use Elasticsearch to index our searchable content, we’ll need to define a back-end setting for haystack in our project's settings.py
file. We are going to use Elasticsearch as our back end.
HAYSTACK_CONNECTIONS
is a required setting and should look like this:
HAYSTACK_CONNECTIONS = {
'default': {
'ENGINE': 'haystack.backends.elasticsearch_backend.ElasticsearchSearchEngine',
'URL': 'https://127.0.0.1:9200/',
'INDEX_NAME': 'haystack',
},
}
Within the settings.py
, we are also going to add haystack and customers to the list of installed apps
.
INSTALLED_APPS = [
'django.contrib.admin',
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.messages',
'django.contrib.staticfiles',
'rest_framework',
'haystack',
'customer'
]
Create Models
Let's create a model for Customers. In customers/models.
py
, add the following code.
from __future__ import unicode_literals
from django.db import models
# Create your models here.
customer_type = (
("Active", "Active"),
("Inactive", "Inactive")
)
class Customer(models.Model):
id = models.IntegerField(primary_key=True)
first_name = models.CharField(max_length=50, null=False, blank=True)
last_name = models.CharField(
max_length=50, null=False, blank=True)
other_names = models.CharField(max_length=50, default=" ")
email = models.EmailField(max_length=100, null=True, blank=True)
phone = models.CharField(max_length=30, null=False, blank=True)
balance = models.IntegerField(default="0")
customer_status = models.CharField(
max_length=100, choices=customer_type, default="Active")
address = models.CharField(
max_length=50, null=False, blank=False)
def save(self, *args, **kwargs):
return super(Customer, self).save(*args, **kwargs)
def __unicode__(self):
return "{}:{}".format(self.first_name, self.last_name)
Register your Customer
model in admin.py
like this:
from django.contrib import admin
from .models import Customer
# Register your models here.
admin.site.register(Customer)
Create Database and Super User
Apply your migrations and create an admin account.
python manage.py migrate
python manage.py createsuperuser
Run your server and navigate to https://localhost:8000/admin/. You should now be able to see your Customer model there. Go ahead and add new customers in the admin.
Indexing Data
To index our models, we begin by creating a SearchIndex
. SearchIndex
objects determine what data should be placed in the search index. Each type of model must have a unique searchIndex
.
SearchIndex
objects are the way haystack determines what data should be placed in the search index and handles the flow of data in. To build a SearchIndex
, we are going to inherit from the indexes.SearchIndex
and indexes.Indexable
, define the fields we want to store our data with, and define a get_model
method.
Let's create the CustomerIndex
to correspond to our Customer
modeling. Create a file search_indexes.py
in the customers app directory, and add the following code.
from .models import Customer
from haystack import indexes
class CustomerIndex(indexes.SearchIndex, indexes.Indexable):
text = indexes.EdgeNgramField(document=True, use_template=True)
first_name = indexes.CharField(model_attr='first_name')
last_name = indexes.CharField(model_attr='last_name')
other_names = indexes.CharField(model_attr='other_names')
email = indexes.CharField(model_attr='email', default=" ")
phone = indexes.CharField(model_attr='phone', default=" ")
balance = indexes.IntegerField(model_attr='balance', default="0")
customer_status = indexes.CharField(model_attr='customer_status')
address = indexes.CharField(model_attr='address', default=" ")
def get_model(self):
return Customer
def index_queryset(self, using=None):
return self.get_model().objects.all()
The EdgeNgramField
is a field in the haystack SearchIndex
that prevents incorrect matches when parts of two different words are mashed together.
It allows us to use the autocomplete
feature to conduct queries. We will use autocomplete when we start querying our data.
document=True
indicates the primary field for searching within. Additionally, the use_template=True
in the text
field allows us to use a data template to build the document that will be indexed.
Let's create the template inside our customers template directory. Inside search/indexes/customers/customers_text.txt
, add the following:
{{object.first_name}}
{{object.last_name}}
{{object.other_names}}
Reindex Data
Now that our data is in the database, it's time to put it in our search index. To do this, simply run ./manage.py rebuild_index
. You’ll get totals of how many models were processed and placed in the index.
Indexing 20 customers
Alternatively, you can use RealtimeSignalProcessor
, which automatically handles updates/deletes for you. To use it, add the following in the settings.py
file.
HAYSTACK_SIGNAL_PROCESSOR = 'haystack.signals.RealtimeSignalProcessor'
Querying Data
We are going to use a search template and the Haystack API to query data.
Search Template
Add the haystack urls to your URLconf.
url(r'^search/', include('haystack.urls')),
Let's create our search template. In templates/search.html
, add the following code.
{% block head %}
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/css/bootstrap.min.css">
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.12.0/jquery.min.js"></script>
<script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/js/bootstrap.min.js"></script>
{% endblock %}
{% block navbar %}
<nav class="navbar navbar-default">
<div class="container">
<div class="navbar-header">
<button type="button" class="navbar-toggle" data-toggle="collapse" data-target="#myNavbar">
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a class="navbar-brand" href="#">HOME</a>
</div>
<div class="collapse navbar-collapse" id="myNavbar">
<ul class="nav navbar-nav navbar-right">
<li><input type="submit" class="btn btn-primary" value="Add Customer"> </li>
</ul>
</div>
</div>
</nav>
{% endblock %}
{% block content %}
<div class="container-fluid bg-3 text-center">
<form method="get" action="." class="form" role="form">
{{ form.non_field_errors }}
<div class="form-group">
{{ form.as_p }}
</div>
<div class="form-group">
<input type="submit" class="btn btn-primary" value="Search">
</div>
{% if query %}
<h3>Results</h3>
<div class="container-fluid bg-4 text-left">
<div class="row">
{% for result in page.object_list %}
<div class="col-sm-4">
<div class="thumbnail">
<div class="form-group">
<p>First name : {{result.first_name}} </p>
</div>
<div class="form-group">
<p>Last name : {{result.last_name}} </p>
</div>
<div class="form-group">
<p>Balance : {{result.balance}} </p>
</div>
<div class="form-group">
<p>Email : {{result.email}} </p>
</div>
<div class="form-group">
<p>Status : {{result.customer_status}} </p>
</div>
</div>
</div>
{% empty %}
<p style="text-center">No results found.</p>
{% endfor%}
</div>
</div>
{% endif %}
</form>
</div>
{% endblock %}
The page.object_list
is a list of SearchResult
objects that allows us to get the individual model objects, for example, result.first_name
.
Your complete project structure should look something like this:
Now run server, go to 127.0.0.1:8000/search/
, and do a search as shown below.
A search of Albert
will give results of all customers with the name Albert
. If no customer has the name Albert, then the query will give empty results. Feel free to play around with your own data.
Haystack API
Haystack has a SearchQuerySet
class that is designed to make it easy and consistent to perform searches and iterate results. Much of the SearchQuerySet
API is familiar with Django’s ORM QuerySet
.
In customers/views.py
, add the following code:
from django.shortcuts import render
from rest_framework.decorators import (
api_view, renderer_classes,
)
from .models import Customer
from haystack.query import SearchQuerySet
from rest_framework.response import Response
# Create your views here.
@api_view(['POST'])
def search_customer(request):
name = request.data['name']
customer = SearchQuerySet().models(Customer).autocomplete(
first_name__startswith=name)
searched_data = []
for i in customer:
all_results = {"first_name": i.first_name,
"last_name": i.last_name,
"balance": i.balance,
"status": i.customer_status,
}
searched_data.append(all_results)
return Response(searched_data)
autocomplete
is a shortcut method to perform an autocomplete search. It must be run against fields that are either EdgeNgramField
or NgramField
.
In the above Queryset
, we are using the contains
method to filter our search to retrieve only the results that contain our defined characters. For example, Al
will only retrieve the details of the customers which contain Al
. Note that the results will only come from fields that have been defined in the customer_text.txt file
.
Apart from the contains
Field Lookup, there are other fields available for performing queries, including:
- content
- contains
- exact
- gt
- gte
- lt
- lte
- in
- startswith
- endswith
- range
- fuzzy
Conclusion
A huge amount of data is produced at any given moment in social media, health, shopping, and other sectors. Much of this data is unstructured and scattered. Elasticsearch can be used to process and analyze this data into a form that can be understood and consumed.
Elasticsearch has also been used extensively for content search, data analysis, and queries. For more information, visit the Haystack and Elasticsearch sites.
Original Link:
TutsPlus - Code
Tuts+ is a site aimed at web developers and designers offering tutorials and articles on technologies, skills and techniques to improve how you design and build websites.More About this Source Visit TutsPlus - Code