Using Elasticsearch with Spring Boot - Technical background
May 27, 2015
This is the third part in a series of four. It explains the technical background.
With Spring Boot it is easy to glue together different components into a complex application. The following is the list of dependencies for this project used by the build tool gradle:
dependencies {
compile 'com.fasterxml.jackson.core:jackson-core:2.6.4'
compile 'com.fasterxml.jackson.core:jackson-databind:2.6.4'
compile 'org.springframework.boot:spring-boot-starter-data-elasticsearch'
compile 'org.springframework.boot:spring-boot-starter-mail'
compile 'org.springframework.boot:spring-boot-starter-web'
compile 'org.codehaus.groovy:groovy'
providedRuntime 'org.springframework.boot:spring-boot-starter-tomcat'
providedRuntime 'org.apache.tomcat.embed:tomcat-embed-jasper'
providedRuntime 'javax.servlet:jstl'
testCompile 'org.springframework.boot:spring-boot-starter-test'
testCompile 'org.spockframework:spock-core:1.0-groovy-2.4'
}
The spring-boot-starter
-projects make it possible, to glue the components together into
Springs IoC container.
Build and running this projects is a one liner on the command line: $ gradle bootRun
Importing the emails into Elastiksearch
A class that should be persisted in Elasticsearch has to be marked with the @Document
annotation
from the Spring Data Elasticsearch project.
The index and the type that Elasticsearch should use is specified as parameters. In this case an index named
"email" and a type named "email" is used.
In this example an Email
has a list of recipients
, a list of senders
a subject
, a sentDate
, a receivedDate
and a list of texts
(email is usually send as multipart message,
so we have to use a list of texts). We ignore attached documents and images:
@Document(indexName = "email", type = "email")
class Email {
@Id
Long id // Spring Data needs an @Id, so we use a surrogate one
@Field( type = FieldType.Object )
List<EmailAddress> recipients
@Field( type = FieldType.Object )
List<EmailAddress> senders
String subject
@Field( type = FieldType.Date, format = DateFormat.custom, pattern = Constants.DATE_FORMAT)
@JsonFormat(shape = JsonFormat.Shape.STRING, pattern = "yyyy-MM-dd HH:mm:ss")
Date sentDate
@Field( type = FieldType.Date, format = DateFormat.custom, pattern = Constants.DATE_FORMAT)
@JsonFormat(shape = JsonFormat.Shape.STRING, pattern = "yyyy-MM-dd HH:mm:ss")
Date receivedDate
List<String> texts
Email() {
recipients = new LinkedList<EmailAddress>()
froms = new LinkedList<EmailAddress>()
}
...
For simple datatypes nothing has to be specified, see for example the subject
,
which is a plain string. For complex datatypes like the lists of EmailAddress
es, Elasticsearch
has to know that the data should be stored as an internal document. This is done by using the @Field
annotation and settting the type
to FieldType.Object
.
An email looks like this to Elasticsearch:
{
_index: "email",
_type: "email",
_id: "-9223372036854775763",
_source: {
id: -9223372036854776000,
recipients: [
{
orig: "joern@dinkla.com",
name: "",
email: "joern@dinkla.com"
}
],
froms: [
{
orig: "Some company <no_reply@somecompany.de>",
name: "Some company",
email: "no_reply@somecompany.de"
}
],
subject: "Very important message ...",
sentDate: "2015-08-07 00:38:11",
receivedDate: "2015-08-07 00:38:12",
texts: [
"Sehr geehrter Herr Dinkla, anbei erhalten Sie ...",
"<html><head>..."
]
}
}
The conversion from and to JSON is done by the Jackson JSON library. The @JsonFormat
annotations
in the class definition specify the date format.
Querying the emails
In the application we want to count the number of emails that contain a specific text in the subject or in the body. This is an aggregation. In SQL you would write something like:
SELECT dt, topic, COUNT(*) as num
FROM table
WHERE topic IN topiclist
GROUP BY dt, topic
In Spring Boot and Spring Data the class that communicates with the database is called a repository. Spring Data has powerful mechanism to automatically create a repository with many methods to query the repository. If you just want the vanilla functionality then it is sufficient to create an interface that extends a repository class. In this app we need
import org.springframework.data.elasticsearch.repository.ElasticsearchRepository
interface EmailRepository extends ElasticsearchRepository<Email, Long>, EmailRepositoryCustom {}
The user defined methods are provided in the interface EmailRepositoryCustom
.
interface EmailRepositoryCustom {
Long findMaximalId()
Histogram<String, Integer> getWeeklyHistogram(String topic)
}
These two methods are implemented in EmailRepositoryCustom
.
@Repository
class EmailRepositoryImpl implements EmailRepositoryCustom {
@Autowired
ElasticsearchTemplate elasticsearchTemplate;
Long findMaximalId() { ...
Histogram<String, Integer> getWeeklyHistogram(String topic) {
SearchQuery searchQuery = new NativeSearchQueryBuilder()
.withQuery(matchQuery("texts", topic))
.withSearchType(SearchType.COUNT)
.withIndices("email")
.withTypes("email")
.addAggregation(
AggregationBuilders.dateHistogram(topic)
.field("sentDate")
.interval(DateHistogram.Interval.WEEK)
.format("yyyy-MM-dd"))
.build();
Aggregations aggregations = elasticsearchTemplate.query(searchQuery, new ResultsExtractor<Aggregations>() {
@Override
Aggregations extract(SearchResponse response) {
return response.getAggregations()
}
});
Map a = aggregations.asMap()
InternalDateHistogram tmpHist = a[topic]
return new Histogram<String, Integer>(topic, tmpHist)
}
}
The @Repository
annotation tells Spring Boot that this is the implementation of a repository.
The @Autowired
annotation causes Spring Boot to instantiate the field with a "bean" of type
ElasticsearchTemplate
.
The ElasticsearchTemplate
is used in the method getWeeklyHistogram
to execute a
query build with the NativeSearchQueryBuilder
.