Elasticsearch2.3.4分词插件开发

0. Elasticsearch的版本是2.3.4

首先你得有自己的一AnalyzerTokenizer,可以参考前面的一篇文章

1. pom.xml

使用maven来构建项目,需要包含elasticsearch的依赖,否则编译不了

1
2
3
4
5
6
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch</artifactId>
<version>2.3.4</version>
<scope>provided</scope>
</dependency>

然后es的plugin需要一个描述文件plugin-descriptor.properties,把这个文件放在src/main/resources这个目录,打包后安装到es里面就会出现在插件目录下了。这个文件需要写入下面的内容:

1
2
3
4
5
6
7
8
description=${project.description}
version=${project.version}
name=${elasticsearch.plugin.name}
site=${elasticsearch.plugin.site}
jvm=${elasticsearch.plugin.jvm}
classname=${elasticsearch.plugin.classname}
java.version=${elasticsearch.plugin.java.version}
elasticsearch.version=${elasticsearch.version}

这些才使用了maven在pom.xml定义的内容,直接定义在pom.xml的根路径下:

1
2
3
4
5
6
7
8
9
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<elasticsearch.plugin.name>elasticsearch-analysis-demo</elasticsearch.plugin.name>
<elasticsearch.plugin.site>false</elasticsearch.plugin.site>
<elasticsearch.plugin.jvm>true</elasticsearch.plugin.jvm>
<elasticsearch.plugin.java.version>1.8</elasticsearch.plugin.java.version>
<elasticsearch.version>2.3.4</elasticsearch.version>
<elasticsearch.plugin.classname>org.elasticsearch.plugin.AnalysisDemoPlugin</elasticsearch.plugin.classname>
</properties>

最重要的就是plugin-descriptor.properties里面的classname,也就是pom.xml定义的elasticsearch.plugin.classname节点里面的类名,这个类就是es加载plugin的入口。

2. 继承Plugin

现在来看Java代码,分词插件入口类需要继承org.elasticsearch.plugins.Plugin类,并实现三个方法:

  1. public String name() 返回插件名字
  2. public String description 返回插件的描述
  3. public void onModule(AnalysisModule model) 可以使用model这个AnalysisModule对象添加自定义的分词器

上代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
package org.elasticsearch.plugin;
import com.lexiscn.elasticsearch.hylanda.HylandaAnalysisBinderProcessor;
import org.elasticsearch.index.analysis.AnalysisModule;
import org.elasticsearch.plugins.Plugin;
public class AnalysisDemoPlugin extends Plugin {
public static String PLUGIN_NAME = "elasticsearch-analysis-demo";
@Override
public String name() {
return PLUGIN_NAME;
}
@Override
public String description() {
return PLUGIN_NAME;
}
public void onModule(AnalysisModule model) {
model.addProcessor(new HylandaAnalysisBinderProcessor());
}
}

model.addProcessor这个方法调用的时候,需要一个org.elasticsearch.index.analysis.AnalysisModule.AnalysisBinderProcessor对象。所以还需要实现一个继承处AnalysisBinderProcessor的类:

2. 实现AnalysisBinderProcessor

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
package org.elasticsearch.demo;
import org.elasticsearch.index.analysis.AnalysisModule;
public class DemoAnalysisBinderProcessor extends AnalysisModule.AnalysisBinderProcessor {
@Override
public void processAnalyzers(AnalyzersBindings analyzersBindings) {
analyzersBindings.processAnalyzer("hylanda_analyzer", DemoAnalyzerProvider.class);
}
@Override
public void processTokenizers(TokenizersBindings tokenizersBindings) {
tokenizersBindings.processTokenizer("hylanda_tokenizer", DemoTokenizerFactory.class);
}
}

这个继承自AnalysisBinderprocessor的类根据需要实现processAnalyzersprocessTokenizers方法。一个添加analyzer,一个是添加tokenizer,第一个参数就是analyzer或者tokenizer的名字。这里就可以添加自己的分词器了,不过还需要一点点努力。如果是Analyzer还需要一个继承自org.elasticsearch.index.analysis.AbstractIndexAnalyzerProvider的类;如果是Tokenizer还需要一个继承自org.elasticsearch.index.analysis.AbstractTokenizerFactory的类。

3. 实现AbstractTokenizerFactory

先来看继承自AbstractTokenizerFactoryTokenizer的工厂类:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
package org.elasticsearch.demo;
import org.apache.lucene.analysis.Tokenizer;
import org.elasticsearch.common.inject.Inject;
import org.elasticsearch.common.inject.assistedinject.Assisted;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.index.Index;
import org.elasticsearch.index.analysis.AbstractTokenizerFactory;
import org.elasticsearch.index.settings.IndexSettingsService;
public class DemoTokenizerFactory extends AbstractTokenizerFactory {
public DemoTokenizerFactory(Index index, Settings indexSettings, String name, Settings settings) {
super(index, indexSettings, name, settings);
}
/**
* 必须实现此方法,而且需要标记为@Inject
*
* @param index
* @param indexSettingsService
* @param name
* @param settings
*/
@Inject
public DemoTokenizerFactory(
Index index, IndexSettingsService indexSettingsService,
@Assisted String name, @Assisted Settings settings) {
super(index, indexSettingsService.getSettings(), name, settings);
}
@Override
public Tokenizer create() {
return new DemoTokenizer();
}
}

在2.3.4的es版本里面使用的是Guice依赖注入框架,所以必须实现上面标注了@Inject的构造方法。还要实现public Tokenizer create()方法,然后在create方法里面new自己的Tokenizer即可。

4. 实现AbstractIndexAnalyzerProvider

再来看看Analyzer的Provider:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
package org.elasticsearch.demo;
import org.apache.lucene.analysis.Analyzer;
import org.elasticsearch.common.inject.Inject;
import org.elasticsearch.common.inject.assistedinject.Assisted;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.env.Environment;
import org.elasticsearch.index.Index;
import org.elasticsearch.index.analysis.AbstractIndexAnalyzerProvider;
import org.elasticsearch.index.settings.IndexSettingsService;
public class DemoAnalyzerProvider extends AbstractIndexAnalyzerProvider {
public DemoAnalyzerProvider(Index index, Settings indexSettings, String name, Settings settings) {
super(index, indexSettings, name, settings);
}
/**
* 必须实现此方法,而且需要标记为@Inject
*
* @param index
* @param indexSettingsService
* @param env
* @param name
* @param settings
*/
@Inject
public DemoAnalyzerProvider(Index index, IndexSettingsService indexSettingsService, Environment env,
@Assisted String name, @Assisted Settings settings) {
super(index, indexSettingsService.getSettings(), name, settings);
}
@Override
public Analyzer get() {
return new DemoAnalyzer();
}
}

跟上面的Tokenizer一样,必须实现上面代码标注了@Inject的构造方法,然后就需要实现public Analyzer get()方法,在这个get方法里面可以new自己的Analyzer

5. package

这需要使用到maven的assembly插件来对代码以及相应的资源打包,这样就可以使用es提供的plugin命令行工具进行安装了。

首先需要在pom.xml添加assembly插件:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<configuration>
<outputDirectory>${project.build.directory}/releases/</outputDirectory>
<descriptors>
<descriptor>${basedir}/src/main/assembly/plugin.xml</descriptor>
</descriptors>
</configuration>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>

然后添加src/main/assembly/plugin.xml,这个文件也可以放到其他地方,主要是上面的descriptor节点指定

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
<?xml version="1.0"?>
<assembly xmlns="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.0 http://maven.apache.org/xsd/assembly-1.1.0.xsd">
<id>release</id>
<formats>
<format>jar</format>
</formats>
<includeBaseDirectory>false</includeBaseDirectory>
<fileSets>
<fileSet>
<directory>${project.basedir}/src/main/plugin-metadata</directory>
<outputDirectory>/</outputDirectory>
</fileSet>
<fileSet>
<directory>${project.basedir}/lib</directory>
<outputDirectory>/</outputDirectory>
</fileSet>
</fileSets>
<files>
<file>
<source>${project.basedir}/src/main/resources/plugin-descriptor.properties</source>
<filtered>true</filtered>
<outputDirectory>/</outputDirectory>
</file>
</files>
<dependencySets>
<dependencySet>
<outputDirectory>/</outputDirectory>
<useProjectArtifact>true</useProjectArtifact>
<excludes>
<exclude>org.elasticsearch:elasticsearch</exclude>
<exclude>org.apache.lucene:lucene*</exclude>
<exclude>com.spatial4j:spatial4j</exclude>
</excludes>
</dependencySet>
</dependencySets>
</assembly>

根据自己需要做相应修改。基本上就是上面一些东西,这些坑都填平了,其他就就可以进行打包安装了

6. 打包安装

在source根目录下使用mvn clean install。然后到ES根目录使用下面的命令进行插件的安装:

./bin/plugin install file:///absolute/path/to/source/target/release/xxxx.jar

Windows使用:

.\bin\plugin.bat install file:///C:/absolute/path/to/source/target/release/xxxx.jar
`

好了。