Centos OS installation recomendations

Install as Centos 7 amd64 package


This way requires root privileges or sudo for user.

1) To ensure that we have the latest version of default system tools, let’s begin with running a base update on our system:

sudo yum -y update

2) Add the untrusted repository to /etc/yum.repos.d/hce.repo (replace baseurl)

[hce]
name=hce repo
baseurl=http://packages.hierarchical-cluster-engine.com/centos/7/$basearch/
gpgcheck=0
enabled=1

3) The HCE package have some dependencies which can be resolved with adding Epel repository.

# If you are on a 64-bit CentOS / RHEL based system:

 sudo rpm -ivh http://dl.fedoraproject.org/pub/epel/7/x86_64/e/epel-release-7-5.noarch.rpm

4) Run a base update on our system again:

sudo yum -y update

5) Install “hce-node” package:

 sudo yum install hce-node

6) To install the Bundle to the home directory run with regular user privileges script that is included in to the package distribution:

 hce-node-bundle-install.sh

The hce-node-bundle directory will be created in the home directory of current user. Please, read the ~/hce-node-bundle/api/php/doc/readme.txt file to continue, install Bundle Environment and run demo and test mode of HCE.
7) Install Dev Tools:

yum groupinstall 'development tools'

Install the Bundle Environment for PHP language


1) Install zmq library:

sudo yum install libzmq3-dev

If php is already installed this step can be skipped.

2) Install php:

sudo yum install php php-devel
sudo yum install php-pear pkgconfig openpgm-devel zeromq3-devel
sudo pecl install --ignore-errors zmq-beta

After that possible need to create file /etc/php.d/zmq.ini and add there:

extension=zmq.so

3) Install Sphinx search engine:

sudo yum install sphinx sphinx-php

4) For test of DC service and main crawling process (~/hce-node-bundle/api/python/ftests/dc_test_rnd_site.sh) install httpd:

yum -y install httpd
systemctl start httpd

And copy files in to the httpd root directory:

sudo cp ~/hce-node-bundle/api/python/data/ftests/test_site/* /var/www/html/

5) Install bc for DRCE tests:

sudo yum install bc

6) Install Java 7 for DRCE tests (optional):

sudo yum install java-1.7.0-openjdk

Please, read the ~/hce-node-bundle/api/php/doc/readme.txt file to continue.

Install Bundle Environment for Python language, DC and DTM services


This way requires root privileges or sudo for user.

1) CentOS packages dependencies:

sudo yum install openpgm-devel mariadb-server mariadb python-pip python-devel python-flask python-flask-wtf ruby libffi-devel
libxml2-devel libxslt-devel mariadb-devel mysql-connector-python libicu-devel gmp-devel libtidy-devel python-dateutil

add the mariadb in autorun:

systemctl enable mariadb.service

and run mariadb:

systemctl start mariadb

run mysql_secure_installation and create pwd for mysql root user:

mysql_secure_installation

2) Python modules dependencies:

sudo pip install cement sqlalchemy Flask-SQLAlchemy scrapy gmpy lepl requests
sudo pip install urlnorm pyicu mysql-python newspaper goose-extractor
sudo pip install pytidylib uritools python-magic
sudo pip install pyzmq --install-option="--zmq=bundled"

For dynamic pages crawling:

sudo pip install Ghost.py

3) Create MySQL user and DB schema for Distributed Crawler Application:

cd ~/hce-node-bundle/api/python/manage/
sudo ./mysql_create_user.sh
./mysql_create_struct.sh

Install Distributed Crawler client Environment for Python language


This way requires root privileges or sudo for user.

1) CentOS packages dependencies:

sudo yum install python-pip python-devel libffi-devel libxml2-devel libxslt-devel

2) Python packets dependencies:

sudo pip install cement scrapy w3lib
sudo pip install pyzmq --install-option="--zmq=bundled"

3) In case of DTS archive was downloaded directly after unzip run:

chmod 777 ~/hce-node-bundle/usr/bin/hce-node-permissions.sh
~/hce-node-bundle/usr/bin/hce-node-permissions.sh