Using Elasticsearch with Spring Boot - Technical background
May 27, 2015
This is the third part in a series of four. It explains the technical background.
With Spring Boot it is easy to glue together different components into a complex application. The following is the list of dependencies for this project used by the build tool gradle:
dependencies {
compile 'com.fasterxml.jackson.core:jackson-core:2.6.4'
compile 'com.fasterxml.jackson.core:jackson-databind:2.6.4'
compile 'org.springframework.boot:spring-boot-starter-data-elasticsearch'
compile 'org.springframework.boot:spring-boot-starter-mail'
compile 'org.springframework.boot:spring-boot-starter-web'
compile 'org.codehaus.groovy:groovy'
providedRuntime 'org.springframework.boot:spring-boot-starter-tomcat'
providedRuntime 'org.apache.tomcat.embed:tomcat-embed-jasper'
providedRuntime 'javax.servlet:jstl'
testCompile 'org.springframework.boot:spring-boot-starter-test'
testCompile 'org.spockframework:spock-core:1.0-groovy-2.4'
}
The spring-boot-starter-projects make it possible, to glue the components together into
Springs IoC container.
Build and running this projects is a one liner on the command line: $ gradle bootRun
Importing the emails into Elastiksearch
A class that should be persisted in Elasticsearch has to be marked with the @Document annotation
from the Spring Data Elasticsearch project.
The index and the type that Elasticsearch should use is specified as parameters. In this case an index named
"email" and a type named "email" is used.
In this example an Email has a list of recipients, a list of senders
a subject, a sentDate, a receivedDate and a list of texts
(email is usually send as multipart message,
so we have to use a list of texts). We ignore attached documents and images:
@Document(indexName = "email", type = "email")
class Email {
@Id
Long id // Spring Data needs an @Id, so we use a surrogate one
@Field( type = FieldType.Object )
List<EmailAddress> recipients
@Field( type = FieldType.Object )
List<EmailAddress> senders
String subject
@Field( type = FieldType.Date, format = DateFormat.custom, pattern = Constants.DATE_FORMAT)
@JsonFormat(shape = JsonFormat.Shape.STRING, pattern = "yyyy-MM-dd HH:mm:ss")
Date sentDate
@Field( type = FieldType.Date, format = DateFormat.custom, pattern = Constants.DATE_FORMAT)
@JsonFormat(shape = JsonFormat.Shape.STRING, pattern = "yyyy-MM-dd HH:mm:ss")
Date receivedDate
List<String> texts
Email() {
recipients = new LinkedList<EmailAddress>()
froms = new LinkedList<EmailAddress>()
}
...
For simple datatypes nothing has to be specified, see for example the subject,
which is a plain string. For complex datatypes like the lists of EmailAddresses, Elasticsearch
has to know that the data should be stored as an internal document. This is done by using the @Field
annotation and settting the type to FieldType.Object.
An email looks like this to Elasticsearch:
{
_index: "email",
_type: "email",
_id: "-9223372036854775763",
_source: {
id: -9223372036854776000,
recipients: [
{
orig: "joern@dinkla.com",
name: "",
email: "joern@dinkla.com"
}
],
froms: [
{
orig: "Some company <no_reply@somecompany.de>",
name: "Some company",
email: "no_reply@somecompany.de"
}
],
subject: "Very important message ...",
sentDate: "2015-08-07 00:38:11",
receivedDate: "2015-08-07 00:38:12",
texts: [
"Sehr geehrter Herr Dinkla, anbei erhalten Sie ...",
"<html><head>..."
]
}
}
The conversion from and to JSON is done by the Jackson JSON library. The @JsonFormat annotations
in the class definition specify the date format.
Querying the emails
In the application we want to count the number of emails that contain a specific text in the subject or in the body. This is an aggregation. In SQL you would write something like:
SELECT dt, topic, COUNT(*) as num
FROM table
WHERE topic IN topiclist
GROUP BY dt, topicIn Spring Boot and Spring Data the class that communicates with the database is called a repository. Spring Data has powerful mechanism to automatically create a repository with many methods to query the repository. If you just want the vanilla functionality then it is sufficient to create an interface that extends a repository class. In this app we need
import org.springframework.data.elasticsearch.repository.ElasticsearchRepository
interface EmailRepository extends ElasticsearchRepository<Email, Long>, EmailRepositoryCustom {}
The user defined methods are provided in the interface EmailRepositoryCustom.
interface EmailRepositoryCustom {
Long findMaximalId()
Histogram<String, Integer> getWeeklyHistogram(String topic)
}
These two methods are implemented in EmailRepositoryCustom.
@Repository
class EmailRepositoryImpl implements EmailRepositoryCustom {
@Autowired
ElasticsearchTemplate elasticsearchTemplate;
Long findMaximalId() { ...
Histogram<String, Integer> getWeeklyHistogram(String topic) {
SearchQuery searchQuery = new NativeSearchQueryBuilder()
.withQuery(matchQuery("texts", topic))
.withSearchType(SearchType.COUNT)
.withIndices("email")
.withTypes("email")
.addAggregation(
AggregationBuilders.dateHistogram(topic)
.field("sentDate")
.interval(DateHistogram.Interval.WEEK)
.format("yyyy-MM-dd"))
.build();
Aggregations aggregations = elasticsearchTemplate.query(searchQuery, new ResultsExtractor<Aggregations>() {
@Override
Aggregations extract(SearchResponse response) {
return response.getAggregations()
}
});
Map a = aggregations.asMap()
InternalDateHistogram tmpHist = a[topic]
return new Histogram<String, Integer>(topic, tmpHist)
}
}
The @Repository annotation tells Spring Boot that this is the implementation of a repository.
The @Autowired annotation causes Spring Boot to instantiate the field with a "bean" of type
ElasticsearchTemplate.
The ElasticsearchTemplate is used in the method getWeeklyHistogram to execute a
query build with the NativeSearchQueryBuilder.