Install as Centos 7 amd64 package
This way requires root privileges or sudo for user.
1) To ensure that we have the latest version of default system tools, let’s begin with running a base update on our system:
sudo yum -y update
2) Add the untrusted repository to /etc/yum.repos.d/hce.repo (replace baseurl)
[hce] name=hce repo baseurl=http://packages.hierarchical-cluster-engine.com/centos/7/$basearch/ gpgcheck=0 enabled=1
3) The HCE package have some dependencies which can be resolved with adding Epel repository.
# If you are on a 64-bit CentOS / RHEL based system:
sudo rpm -ivh http://dl.fedoraproject.org/pub/epel/7/x86_64/e/epel-release-7-5.noarch.rpm
4) Run a base update on our system again:
sudo yum -y update
5) Install “hce-node” package:
sudo yum install hce-node
6) To install the Bundle to the home directory run with regular user privileges script that is included in to the package distribution:
hce-node-bundle-install.sh
The hce-node-bundle directory will be created in the home directory of current user. Please, read the ~/hce-node-bundle/api/php/doc/readme.txt file to continue, install Bundle Environment and run demo and test mode of HCE.
7) Install Dev Tools:
yum groupinstall 'development tools'
Install the Bundle Environment for PHP language
1) Install zmq library:
sudo yum install libzmq3-dev
If php is already installed this step can be skipped.
2) Install php:
sudo yum install php php-devel sudo yum install php-pear pkgconfig openpgm-devel zeromq3-devel sudo pecl install --ignore-errors zmq-beta
After that possible need to create file /etc/php.d/zmq.ini and add there:
extension=zmq.so
3) Install Sphinx search engine:
sudo yum install sphinx sphinx-php
4) For test of DC service and main crawling process (~/hce-node-bundle/api/python/ftests/dc_test_rnd_site.sh) install httpd:
yum -y install httpd systemctl start httpd
And copy files in to the httpd root directory:
sudo cp ~/hce-node-bundle/api/python/data/ftests/test_site/* /var/www/html/
5) Install bc for DRCE tests:
sudo yum install bc
6) Install Java 7 for DRCE tests (optional):
sudo yum install java-1.7.0-openjdk
Please, read the ~/hce-node-bundle/api/php/doc/readme.txt file to continue.
Install Bundle Environment for Python language, DC and DTM services
This way requires root privileges or sudo for user.
1) CentOS packages dependencies:
sudo yum install openpgm-devel mariadb-server mariadb python-pip python-devel python-flask python-flask-wtf ruby libffi-devel libxml2-devel libxslt-devel mariadb-devel mysql-connector-python libicu-devel gmp-devel libtidy-devel python-dateutil
add the mariadb in autorun:
systemctl enable mariadb.service
and run mariadb:
systemctl start mariadb
run mysql_secure_installation and create pwd for mysql root user:
mysql_secure_installation
2) Python modules dependencies:
sudo pip install cement sqlalchemy Flask-SQLAlchemy scrapy gmpy lepl requests sudo pip install urlnorm pyicu mysql-python newspaper goose-extractor sudo pip install pytidylib uritools python-magic sudo pip install pyzmq --install-option="--zmq=bundled"
For dynamic pages crawling:
sudo pip install Ghost.py
3) Create MySQL user and DB schema for Distributed Crawler Application:
cd ~/hce-node-bundle/api/python/manage/ sudo ./mysql_create_user.sh ./mysql_create_struct.sh
Install Distributed Crawler client Environment for Python language
This way requires root privileges or sudo for user.
1) CentOS packages dependencies:
sudo yum install python-pip python-devel libffi-devel libxml2-devel libxslt-devel
2) Python packets dependencies:
sudo pip install cement scrapy w3lib sudo pip install pyzmq --install-option="--zmq=bundled"
3) In case of DTS archive was downloaded directly after unzip run:
chmod 777 ~/hce-node-bundle/usr/bin/hce-node-permissions.sh ~/hce-node-bundle/usr/bin/hce-node-permissions.sh