solrcloud-5.2.1 集群部署及测试

2018/12/25 solr

安装

下载Solr-5.2.1安装包

wget http://archive.apache.org/dist/lucene/solr/5.2.1/solr-5.2.1.tgz

从压缩包中抽出安装脚本

tar -zxvf solr-5.2.1.tgz  solr-5.2.1/bin/install_solr_service.sh  --strip-components=2

创建安装目录

mkdir -p /data/solr5

运行安装脚本

./install_solr_service.sh solr-5.2.1.tgz -i /data/solr5 -d /data/solr5 -u solr -s solr -p 8983

具体参数说明可以看安装脚本 install_solr_service.sh

  echo ""
  echo "Usage: install_solr_service.sh path_to_solr_distribution_archive OPTIONS"
  echo ""
  echo "  The first argument to the script must be a path to a Solr distribution archive, such as solr-5.0.0.tgz"
  echo "    (only .tgz or .zip are supported formats for the archive)"
  echo ""
  echo "  Supported OPTIONS include:"
  echo ""
  echo "    -d     Directory for live / writable Solr files, such as logs, pid files, and index data; defaults to /var/solr"
  echo ""
  echo "    -i     Directory to extract the Solr installation archive; defaults to /opt/"
  echo "             The specified path must exist prior to using this script."
  echo ""
  echo "    -p     Port Solr should bind to; default is 8983"
  echo ""
  echo "    -s     Service name; defaults to solr"
  echo ""
  echo "    -u     User to own the Solr files and run the Solr process as; defaults to solr"
  echo "             This script will create the specified user account if it does not exist."
  echo ""
  echo " NOTE: Must be run as the root user"
  echo ""

也可以直接解压安装包,然后自定义配置,安装脚本已经做了一些配置逻辑,大致流程如下:

  • 判断系统类型 Debian、RedHat、Ubuntu、SUSE(选择下面具体的执行命令)
  • 解压安装,此处用到上述的参数指定,不指定,使用默认值配置
  • 判断是否存在 /etc/init.d/$SOLR_SERVICE,$SOLR_SERVICE为-s配置,存在直接退出
  • 判断配置solr用户是否存在,不存在则创建用户
  • 创建data、logs目录,复制/server/solr/solr.xml/bin/solr.in.sh/server/resources/log4j.properties配置文件到指定安装目录,并给solr用户授权 chown -R $SOLR_USER: $SOLR_VAR_DIR
  • 将配置文件信息写入$SOLR_VAR_DIR/solr.in.sh中,如下所示
SOLR_PID_DIR=/data/solr5
SOLR_HOME=/data/solr5/data
LOG4J_PROPS=/data/solr5/log4j.properties
SOLR_LOGS_DIR=/data/solr5/logs
SOLR_PORT=8983
  • 复制启动脚本solr,并授权
    cp $SOLR_INSTALL_DIR/bin/init.d/solr /etc/init.d/$SOLR_SERVICE
    chmod 744 /etc/init.d/$SOLR_SERVICE
    chown root:root /etc/init.d/$SOLR_SERVICE
    
  • 将环境基本配置信息写入/etc/init.d/solr
  • 启动solr service solr start
  • 判断启动状态 service solr status
  • 安装结束

其实也可以大致看一下/etc/init.d/solr中干了啥操作,无非就是下面几个操作

  • 检查运行环境JAVA_HOME
  • 执行组装启动脚本

集群配置

执行以下命令编辑solr.in.sh文件

vim /data/solr5/solr.in.sh

参照如下进行修改

ZK_HOST="172.16.185.68:2181,172.16.185.69:2181,172.16.185.70:2181/solr_cluster"

注意

solr_cluster 可以自行修改,此处理解为集群在zk上的命名空间,强烈建议集群配置时增加此命名,方便管理,否则集群配置信息默认都在zk根目录下,不好维护

此命名空间需要手动创建,

[zk: localhost:2181(CONNECTED) 2] create /solr_cluster ''
Created /solr_cluster

否则启动会报错

null:org.apache.solr.common.cloud.ZooKeeperException: A chroot was specified in ZkHost but the znode doesn't exist. 172.16.185.68:2181,172.16.185.69:2181,172.16.185.70:2181/solr_cluste

重复按照上述步骤在其他节点执行安装

ik分词器配置

  • 下载ik。地址
  • mvn clean package 打包,或者直接下载我已经打包好的ik-analyzer-solr5-5.x.jar 提取码:v868

  • 拷贝ik-analyzer-solr5-5.x.jar 到solr目录
    /data/solr5/solr/server/solr-webapp/webapp/WEB-INF/lib
    
  • 配置schema.xml
<!-- ik的智能分词 -->
<fieldType name="text_ik" class="solr.TextField">   
  <analyzer type="index">
    <tokenizer class="org.wltea.analyzer.lucene.IKTokenizerFactory" useSmart="false" />
  </analyzer>
  <analyzer type="query">
    <tokenizer class="org.wltea.analyzer.lucene.IKTokenizerFactory" useSmart="true" />
  </analyzer>
</fieldType>

检查是否生效

image

启动

在Solr集群中各节点执行以下命令启动Solr服务

service solr start

其实执行的具体命令如下,来自 /etc/init.d/solr

if [ -n "$RUNAS" ]; then
  su -c "SOLR_INCLUDE=$SOLR_ENV $SOLR_INSTALL_DIR/bin/solr $SOLR_CMD" - $RUNAS
else
  SOLR_INCLUDE=$SOLR_ENV $SOLR_INSTALL_DIR/bin/solr $SOLR_CMD
fi

即执行

//$RUNAS非空
su -c "SOLR_INCLUDE=/data/solr5/solr.in.sh /data/solr5/solr/bin/solr start" - solr

//$RUNAS为空
SOLR_INCLUDE=/data/solr5/solr.in.sh /data/solr5/solr/bin/solr start

查看Solr状态

# SOLR_INCLUDE=/data/solr5/solr.in.sh /data/solr5/solr/bin/solr status

Found 1 Solr nodes: 

Solr process 15734 running on port 8983
{
  "solr_home":"/data/solr5/data/",
  "version":"5.2.1 1684708 - shalin - 2015-06-10 23:20:13",
  "startTime":"2018-12-25T03:21:24.774Z",
  "uptime":"0 days, 1 hours, 39 minutes, 36 seconds",
  "memory":"44.6 MB (%9.1) of 490.7 MB",
  "cloud":{
    "ZooKeeper":"172.16.185.68:2181,172.16.185.69:2181,172.16.185.70:2181",
    "liveNodes":"3",
    "collections":"3"}}

登录查看

http://172.16.185.68:8983

image

测试

Solr提供了几种常用的方式来操作,Shell命令、REST API、SolrJ接口等等,根据实际情况选择。下面的操作为了简单方便的演示使用Shell命令的方式。

创建一个collection

并上传关联配置文件至Zookeeper

  • 创建core配置文件夹
    mkdir /data/solr5/solr/server/solr/captcha_core
    
  • 复制配置文件
cp -r /data/solr5/solr/server/solr/configsets/basic_configs/* /data/solr5/solr/server/solr/captcha_core/
  • 创建collection
    /data/solr5/solr/bin/solr create_collection -c captcha -d /data/solr5/solr/server/solr/captcha_core/conf -shards 4 -replicationFactor 3
    

检测collection是否成功创建,在Solr UI中刷新页面,点击Cloud如果成功创建了Collection会显示出Solr的集群拓扑

image

集群信息

image

  • 如果以后要更新配置文件到Zookeeper,可以使用以下命令更新全部配置
    /data/solr5/solr/server/scripts/cloud-scripts/zkcli.sh -zkhost 172.16.185.68:2181,172.16.185.69:2181,172.16.185.70:2181/solr_cluster -cmd upconfig -confname captcha -confdir /data/solr5/solr/server/solr/captcha_core/conf
    
  • 如果只更新单个文件使用putfile命令
    /data/solr5/solr/server/scripts/cloud-scripts/zkcli.sh -zkhost 172.16.185.68:2181,172.16.185.69:2181,172.16.185.70:2181/solr_cluster -cmd putfile /configs/captcha/schema.xml /data/solr5/solr/server/solr/captcha_core/conf/schema.xml
    

    路径的前者为配置文件在Zookeeper中的存储路径,后者是配置文件的本地路径。需要注意的是如果Zookeeper中的这个文件存在需要先删除之,然后在上传更新

注意:zk配置修改后需要reload,如下操作

http://172.16.185.68:8983/solr/admin/collections?action=RELOAD&name=captcha

删除Collection

http://172.16.185.69:8983/solr/admin/collections?action=DELETE&name=captcha

问题

  • 执行启动 service solr start 失败

执行启动命令,发现返回成功,但是进程不存在

# service solr start

Started Solr server on port 8983 (pid=18611). Happy searching!

查看启动日志

# tail -n 20 solr-8983-console.log 
Exception in thread "main" java.lang.ClassFormatError: org.eclipse.jetty.start.Main (unrecognized class file version)
   at java.lang.VMClassLoader.defineClass(libgcj.so.10)
   at java.lang.ClassLoader.defineClass(libgcj.so.10)
   at java.security.SecureClassLoader.defineClass(libgcj.so.10)
   at java.net.URLClassLoader.findClass(libgcj.so.10)
   at java.lang.ClassLoader.loadClass(libgcj.so.10)
   at java.lang.ClassLoader.loadClass(libgcj.so.10)
   at gnu.java.lang.MainThread.run(libgcj.so.10)

此异常表示java运行环境不正确,这就奇怪了,启动脚本中不是执行了java环境检测了么???并且本地环境已经配置了$JAVA_HOME

if [ -n "$SOLR_JAVA_HOME" ]; then
  JAVA="$SOLR_JAVA_HOME/bin/java"
elif [ -n "$JAVA_HOME" ]; then
  for java in "$JAVA_HOME"/bin/amd64/java "$JAVA_HOME"/bin/java; do
    if [ -x "$java" ]; then
      JAVA="$java"
      break
    fi
  done
  if [ -z "$JAVA" ]; then
    echo >&2 "The currently defined JAVA_HOME ($JAVA_HOME) refers"
    echo >&2 "to a location where Java could not be found.  Aborting."
    echo >&2 "Either fix the JAVA_HOME variable or remove it from the"
    echo >&2 "environment so that the system PATH will be searched."
    exit 1
  fi
else
  JAVA=java
fi

# test that Java exists and is executable on this server
"$JAVA" -version >/dev/null 2>&1 || {
  echo >&2 "Java not found, or an error was encountered when running java."
  echo >&2 "A working Java 7 or later is required to run Solr!"
  echo >&2 "Please install Java or fix JAVA_HOME before running this script."
  echo >&2 "Command that we tried: '${JAVA} -version'"
  echo >&2 "Active Path:"
  echo >&2 "${PATH}"
  exit 1
}

经调试,发现在启动脚本中 $JAVA_HOME 获取居然为空,同时也打印了执行脚本,如下

# service solr status
su -c "SOLR_INCLUDE=/data/solr5/solr.in.sh /data/solr5/solr/bin/solr status" - solr

Found 1 Solr nodes: 

Solr process 16450 running on port 8983
Exception in thread "main" java.lang.NoClassDefFoundError: org.apache.solr.util.SolrCLI
   at gnu.java.lang.MainThread.run(libgcj.so.10)
Caused by: java.lang.ClassNotFoundException: org.apache.solr.util.SolrCLI not found in gnu.gcj.runtime.SystemClassLoader{urls=[], parent=gnu.gcj.runtime.ExtensionClassLoader{urls=[], parent=null}}
   at java.net.URLClassLoader.findClass(libgcj.so.10)
   at java.lang.ClassLoader.loadClass(libgcj.so.10)
   at java.lang.ClassLoader.loadClass(libgcj.so.10)
   at gnu.java.lang.MainThread.run(libgcj.so.10)

然后又对比执行了一下

# SOLR_INCLUDE=/data/solr5/solr.in.sh /data/solr5/solr/bin/solr status
/data/java/bin/java

Found 1 Solr nodes: 

Solr process 16450 running on port 8983
{
  "solr_home":"/data/solr5/data/",
  "version":"5.2.1 1684708 - shalin - 2015-06-10 23:20:13",
  "startTime":"2018-12-25T06:46:29.929Z",
  "uptime":"0 days, 0 hours, 12 minutes, 6 seconds",
  "memory":"71.6 MB (%14.6) of 490.7 MB",
  "cloud":{
    "ZooKeeper":"172.16.185.68:2181,172.16.185.69:2181,172.16.185.70:2181",
    "liveNodes":"2",
    "collections":"1"}}

哎呦,此方式居然成功了

那就是说明su -c模式下获取环境变量有问题,那就继续找此sudo su后获取不到JAVA_HOME环境变量的问题,可自行网上找资料解决

此处我这里的处理方式是在 /data/solr5/solr.in.sh中增加环境变量配置

vim /data/solr5/solr.in.sh
#配置
SOLR_JAVA_HOME="/data/java"

参考

SolrCloud-5.2.1 集群部署及测试

solr5.5+中文分词

Search

    Post Directory