HCE and Distributed Remote Command Execution

General definitions


The Distributed Remote Command Execution (DRCE) functionality provides hce-node application clients with the possibility to execute Linux shell commands, to spawn process via pipe, and to interact with the spawned process by std input, output, error as well as by file. Flexible protocol defines custom way to interact with the process, to give him incoming data and to collect produced data. By this service handler integrated inside the hce-node application any kind of computational task can be executed remotely on the low level replica or shard nodes of the cluster hierarchy. Most typical way of possible usage of DRCE:

  • Execute data processing task, collect and fetch results from down replica or shard nodes according with cluster hierarchy.
  • Manage some Linux host systems remotely by group actions that can be spread wide on correspondent cluster infrastructure.
  • Spread some data to target cluster nodes for immediate or farther processing.
  • Fetch some data from many target host systems, like logs and collect them at client side.
  • Any kind of recombination of all above.

Main advantage of DRCE it is that structure of cluster for requests distribution, tasks execution planning, results collection and so on – can be custom defined and changed at run-time period.

Execution units can be connected at any time and extend power of processing pool. Also, execution units can be started on different physical servers and united in balance mode. It is very important to note that request message handling inside one unit is single-threaded, so no any mutexes or blocking as well as asynchronous request/response TCP sockets operation provided with ZMQ sockets library. On schema below simple DRCE computational cluster schema illustrate the connections direction and behavior of nodes in different roles:
1
Implementation of DRCE functionality in hce-node package includes client-side API for php language and client application to test and benchmark some algorithms and execution environment.

The Demo Test Suit includes set of prepared requests in json format that can be extended and changed manually. This set covers most important for hce-node execution environment like php and java, as well as provided with base Debian OS distributions like python, ruby and perl. Because very good affinity of hce-node application process instance and CPU kernel – the pool of hce-node replicas can to utilize CPU and I/O resources effective way provides up to 800 rps on one four kernel i5 physical server. On screenshot below the load level and by CPU kernel distribution shown.
2
As that can be seen above, for very short task that has execution time about 1 ms and 16 client threads – four computational units loads four CPU kernels in most effective way. Note, that in this test clients that requested tasks (drce.php application) and HCE cluster nodes instances located on the same physical host server. This is test #00 of Demo Tests Suit package of DRСE tests1.

The client application drce.php supports cli interface and provide usage of prepared jasons in many different tests and demos. Request and response protocols are very simple and formalized in json format. So, can be easy implemented on many other languages that are supported by ZMQ library.

The DRCE it is a second functionality after the Sphinx search that is natively supported by HCE.

Brief view of HCE answer on PHP developers questions

Question


So far I studied your project’s idea (Hierarchical Cluster Engine) and Doxygen descriptions, solution has a goal to allow to create distributed data storage. I can be wrong, Sphinx is used to process search along this storage.

What is final goal of HCE and PHP API implementation?

Answer


Yes, it is HCE supports distributed Sphinx search indexes internally.

But, this is only first of possible usage of HCE.

The DRCE (Distributed Remote Command Execution) functionality will be ready soon.

By this solution the HCE can to construct and manage freely structured network clusters as a distributed data and computations.

The main idea is to give the possibility to execute some artifact (binary executable, script on any program language, whatever…), get income source data and return resulted processed data with some reducing from down nodes to upper using any kind of standard internally supported or custom algorithms of sorting, elimination, grouping and so on…

In general HCE – is a network infrastructure support transport with data management (see main article) on target data nodes, reducing algorithms and of course it supports Sphinx search native way.

The main networking in the HCE based on ZMQ sockets solution (externally and internally for inter-thread data transfer).

So, all kind of request/response network interactions are based on ZMQ client library for any language.

The main goal of PHP design is to create OOP binding implementation correspondent to existing structural code for PHP language.

Also, we planned to create a client side API bindings for many popular languages like Python, Java, Ruby, Perl and so on…

Please, check the implementation of some utilities, like import to key-value DB storage: http://hierarchical-cluster-engine.com/docs/dox…

Also, the searcher and manager utility implementation that are supports most of operations that can be done by HCE core engine.

They are uses ready to use but have a structural API.

May be this helps to understand the subject area more…

Question


Got your idea. Do you have some vectors to advertise HCE? onference, forums? Habrahabr? 🙂

May be some competitive exhibitions?

So, if we know this, we’ll be able to build a pull of tasks with priorities. So after fixed time there will be some ready box-product, for example. Or, probably, I’m trying to think about things you already thought.

Answer


The HCE project now is a pilot implementation for open source community.

But, we have internal customer inside the IOIX corp., so we are not a team of investors, but have possibility to do some research investigations in measures of available budget and applied needs of current main project called the SNATZ (http://snatz.com) that using the ASM v2.0 for internal full-text engine tasks.

Now the HCE project site and components like Debian Linux packages and tarballs – are under finish of release v1.0 and official open date is 2014-01-01.

Of course i’ll publish some information articles news on most of well known sites like IT news in application area before the HCE site will be opened.

Because the team is small and applied tasks eats many time i cant to estimate real plans on box-product now. But, after some community discussion i’ll start to plan some more precise view of set of features that can be formed as a box product and distributed as complete solution. But now it is a Lego puzzle that can be used by some professionals as a tool that has specific cover and possible very good implementation for their project infrastructure.

Off coarse there are several competitors on marked now.

All of them are build by huge companies with great possibilities and most of them uses Java or completely implemented on Java platform.

Now, we cant to have some comparable position, so it is very wrong idea to compare HCE with the Storm or Hadoop…

But, power of huge projects is some imperfection because they need resources to manage and support as well as complex closed internal behavior and logic.

More light and flexible tool possible can to give advantages of functionality that can be freely extended, modified, tuned and even – redesigned for special needs with relatively small resources. (I hope…)

Hierarchical Cluster Engine (HCE) project

v.1.0 general description and architecture basics

HCE project aim and main idea


This project became the successor of Associative Search Machine (ASM) full-text web search engine project that was developed from 2006 to 2012 by IOIX Ukraine.

The main idea of this new project – to implement the solution that can be used to: construct custom network mesh or distributed network cluster structure with several relations types between nodes, formalize the data flow processing goes from upper node level central source point to down nodes and backward, formalize the management requests handling from multiple source points, support native reducing of multiple nodes results (aggregation, duplicates elimination, sorting and so on), internally support powerful full-text search engine and data storage, provide transactions-less and transactional requests processing, support flexible run-time changes of cluster infrastructure, have many languages bindings for client-side integration APIs in one product build on C++ language…

HCE application area


  • As a network infrastructure and messages transport layer provider –  the HCE can be used in any big-data solution that needs some custom network structure to build distributed high-performance easy scalable vertically and horizontally data processing or data-mining architecture.
  • As a native internally supported full text search engine interface  provider – the HCE can be used in web or corporate network solutions that needs smoothly integrated with usage of natural target project specific languages, fast and powerful full text search and NOSQL distributed data storage. Now the Sphinx (c) search engine with extended data model internally supported.
  • AS a Distributed Remote Command Execution service provider – the HCE can be used for automation of administration of many host servers in ensemble mode for OS and services deployment, maintenance and support tasks.

Hierarchical Cluster as engine


  • Provides hierarchical cluster infrastructure – nodes connection schema, relations between nodes, roles of nodes, requests typification and data processing sequences algorithms, data sharding modes, and so on.
  • Provides network transport layer for data of client application and administration management messages.
  • Manages native supported integrated NOSQL data storage (Sphinx (c) search index and Distributed Remote Command Execution).
  • Collect, reduce and sort results of native and custom data processing.
  • Ready to support transactional messages processing.

HCE key functional principles


  • Free network cluster structure architecture. Target applied project can construct specific schema of network relations that succeeds on hardware, load-balancing, file-over, fault-tolerance and another principles on the basis of one simple Lego-like engine.
  • Stable based on ZMQ sockets reversed client-server networking protocol with connection heart-beating, automated restoration and messages buffering.
  • Easy asynchronous connections handling with NUMA oriented architecture of messages handlers.
  • Unified I/O messages based on json format.
  • Ready to have client APIs bindings for many programmer languages covered by ZMQ library. Can be easily integrated and deployed.

HCE-node application


The heart and main component of the HCE project it is hce-node application. This application integrates complete set of base functionality to support network infrastructure, hierarchical cluster construction, full-text search system integration and so on, see “Hierarchical Cluster as engine” main points.

  • Implemented for Linux OS environment and distributed in form of source code tarball archive and Debian Linux binary package with dependencies packages.
  • Supports single instance configuration-less start or requires set of options that used to build correspondent network cluster architecture.
  • Supposes usage with client-side applications or integrated IPI.
  • First implementation of client-side API and cli utilities bind on PHP.

HCE-node roles in the cluster structure


Internally HCE-node application contains seven basic handler threads. Each handler acts as special black-box messages processor/dispatcher and used in
combination with other to work in one of five different roles of node:

  • Router – upper end-point of cluster hierarchy. Has three server-type connections. Handles client API, any kind of another node roles instances (typically, shard or replica managers) and admin connections.
  • Shard manager – intermediate-point of cluster hierarchy. Routes messages between upper and down layers. Uses data sharding and messages multicast dispatching algorithms. Has two server-type and one client connections.
  • Replica manager – the same as shard manager. Routes messages between upper and down layers uses data balancing and messages round-robin algorithms.
  • Replica – down end-point of cluster hierarchy. Data node, interacts with data storage and/or process data with target algorithm(s), provides interface with fill-text search engine, target host for Distributed Remote Commands Execution. Has one server- and one client-side connections used for cluster infrastructure Also can to have several data storage-dependent connections.

HCE-node typical connection points


1

HCE-node internal architecture seven handlers objects in relations


2

HCE-node in cluster structure relations


3

Simple cluster structures comparison


4

Hierarchical cluster structure example


This is just one example of four level cluster structure. This structure can be used to create four hosts distributed data model solution to split one huge documents storage on three physical host servers and utilize all CPU and I/O resources in effective way.

Each level uses different data distribution modes (shard or replica). Two replica nodes of each shard balances requests and have complete mirrored data. Shards are multicasted and requests results accumulated and reduced by shard manager node.
5

Extended Sphinx index support


  • Smooth transparent aggregation of several different indexes and search algorithms in federated search.
  • Native support by file operation the run-time indexes attach, copy, replace, swap, mirror, merge.
  • Native support of indexes sharding and mirroring according different data schema.
  • Extended full text search queries including full match, pattern match, set of fields match, logical operators, proximity limits, stop-words lists and many other provided by regular Sphinx search (c) engine.
  • Filters for numeric fields including exact match and range comparisons.
  • Multi-value numeric fields, bit-set and logical operations simulation in search query.
  • Custom order including Sphinx full text search weight and custom weightier calculation algorithms definitions like SQL ORDER BY  prioritized fields list or more complex logic.

HCE Sphinx Index Extension – SIE


hce-node application provides solution named HCE Sphinx Index Extension (HCE SIE) that add to the Sphinx index nature more functionality and two high level structures:

  • Index – local FS folder contains complete set of sub-folders and files including native Sphinx indexes (trunk and branches), configuration files for searchd, source xml data files of branches, prepared xml data files of branches, schema xml file and other…
    Index supports operations: create, copy, rename, delete, start, stop, rebuild, merge branches, delete document, connect, and other…
  • Branch – local FS files like source xml data file, prepared xml data file and native Sphinx Index files created from prepared xml data file. All branches of index uses the same schema and represents some part of index. Set of branches can be merged in trunk that is used in search process. Branch supports operations: create, delete, rebuild, insert document, delete document and other…

HCE SIE folder structure


6

HCE technologies basics


  • ZMQ (c) sockets for inter-process and in-process interactions.
  • POCO C++ framework (c) for application architecture and algorithms including json and xml parsers, objects serialization, configuration settings and many other.
  • Sphinx search (c) engine client native integration with extensions.
  • Mutex-less multithread application architecture and messages handling.
  • Strict C++ OOP design, including elements of 11 standard.

HCE distribution and deployment


  • Open source distribution in source code of tarball archive.
  • Binary Debian Linux native packages separated on HCE-node and utility packages as well as complete full meta-package. Dependencies libraries packaging and maintenance to complete cover dependencies and make installation smooth.
  • Public repositories accessible for contributors community.
  • Upgrade dependency packages and libraries as part of product.
  • Box ready to start distributive with Demo Test Suit (DTS) with examples of simple cluster structure to start full text Sphinx-based search right after installation.
  • Open documentation.

Transaction


The HCE cluster networking uses MOM-based pattern, so is ready to be transactional.

The transaction model now under progress of research.

Durability


The cluster structure is hierarchical and infrastructure can be complete redundant. All user’s data like Sphinx Index, document files, configurations, source codes and so on that are stored on data nodes can be redundant or mirrored using internally supported replications or user-defined algorithms of applied applications that uses HCE engine as service.

For cluster infrastructure node networking uses ZMQ PPP pattern with extensions. This network pattern supports heart-biting, but extended with automated reconnects and connections management. This helps to have the stable auto-recovered channels as well as to build several kind of management under the cluster infrastructure to have dynamic structural changes, extensible and near real-time scalable multi-host architecture.

The node of any role and mode does not stores unrecoverable data in memory, but only messages that is under process of sending/receiving. This provides the possibility to be safe and to lost minimal important data in case of node application crash.

The instance of node application takes minimal amount of RAM and minimal CPU resources as well as has no big files on disk that belongs to the application specific functionality. All another user’s data can be redundant on the basis of applied target user-level algorithms and functional logic.

Technical characteristics


HCE project todo…


  1. Additions and extensions to support transparent configurable transactional model for all kind of actions.
  2. Extension of data node with native support of another index types and data storage like SQLite.
  3. Extensions with set of API and utility tools for creating typical cluster structures and sharding models.
  4. Extensions with set of API for general management and tracking of statistics, including web-UI for state visualization and visual management.
  5. Extensions for statistics and logging data analysis and visualization.

to be continued…

Quality Assurance requirements

What a QA engineer does


  • Write test plans from the requirements, specifications and test strategies
  • Use versioning systems to code test scripts
  • Create and perform test campaign whenever it is necessary to fit in the overall planning
  • Use bug tracking database to report bugs
  • Analyses test results
  • Reports results to the QA manager
  • Raise an alert when an important issue is likely to put in jeopardy the whole project

The practical type of job tasks and required possibilities


  • Understanding of documentation of all project’s products, including the json format protocols and data interactions specification.
  • Understanding of main business logic of product’s services, roles and behaviors of compounds.
  • Writing and execution of functional, installation, configuration, deployment, crash, load-balancing, durability and another kind of tests for services using public documented API with bush scripting (possible the php or the Python languages can be used too) and regular cli utilities for Linux OS.
  • Analyzing the results of tests and applications logs, reconstruction of behavior of main objects of business logic of the application services, seeking on misconfiguration and lacks of correspondence of real product behavior and documented specification.

What makes a good QA Engineer


Broad understanding of the product
To test efficiently a product, the QA engineer must know it well enough. This sounds obvious must unfortunately, this is often under-estimated. Knowing well the product includes also knowing how end-users expect it to work. Again this may sound obvious but remember that the biggest part in testing is black-box testing. The QA engineer must have a “customer-focus” vision.

But a good QA engineer must also know how the product is designed because the more you know the product, the better you’re able to test it. However, the QA engineer will have to analyse the design only after his black-box testplan is completed. Indeed, knowing the design can widely influence the test strategy. It is better to first write the test plan with a high-level vision, then getting more and more information to refine the testing.

Effective communication


Communication is an extremely important skill for a QA engineer. Of course, meetings (stand-up etc.) and status reports are part of the communication but more importantly, a QA engineer must be particularly efficient in the following tasks:

  • Direct communication with both Development and Product definition teams
  • Capability to communicate with technical and non-technical people
  • Having the diplomacy to say “no” when a bug is considered as not fixed
  • Having the diplomacy to communicate about a bug without “offensing” the developer. Developers may often feel offensed when a bug is submited. This is 100% natural. This is why the QA engineer must have the ability to “criticize without offensing”
  • Do not rely on “bug tracking” database for communication! there is nothing better that a bug tracking system to create “misunderstanding” between Development and QA teams

Creativity


Testing requires a lot of creativity. Bugs are often hidden and just performing the obvious positive tests will have only a few chances to actually find bugs. Hence, the QA engineer must use its creativity to figure out all the scenarios that are likely to detect a bug. In other words, the QA engineer must be able to “see beyond the obvious”.

Development knowledge


Quality Assurance requires knowledge about software development for two basic reasons:

  • Development capabilities are required to eventually code automated tests
  • If you know how to develop, you have better ideas on what is “dangerous” to code, so what to test more thoroughly

Driving for results


A good QA engineer never forgets that the ultimate goal is not only to find bugs but also have them fixed. Once a bug has been found and has been “acknowledged” by the development team, the QA engineer may be required to “convince” people to fix it.

Additionally, getting a nice automation framework with smart tools does not bring anything if it does not find any bug at the end.

  • Ask yourself if the automation is going to help to find more bugs and when
  • Prioritize your testing tasks on the only important criteria
    • How many bugs is this likely going to find?
    • How major will be the found bugs (detecting thousands of cosmetic bugs is irrelevant/useless – and often easy – until all major/show-stopper bugs have been found)?

Job Description Senior Quality Assurance Engineer

Purpose


Develops, publishes, and implements test plans for complex, multi-tier, distributed, applications throughout the full life-cycle of the software. Works on all ASM applications, including new and in production. Writes and maintains test automation. Publishes test results. Develops quality assurance standards. Defines and tracks quality assurance metrics such as defect densities and open defect counts.

Essential Duties

Quality Assurance


  • Defines, develops and implements quality assurance practices and procedures, test plans and other QA assessments.
  • Establishes standards and best practices for the use of the Rational Unified Process (“RUP”).
  • Develops automated testing systems using commercial tools, scripts and data set.
  • Ensures that all items follow the change management process and are entered and tracked through the change management software.
  • Works directly with appropriate ASM personnel to understand project concept, objectives and approach of software development projects.
  • Act as a consultant to ASM on quality methods, processes, and tools.
  • Able to work independently.

Testing


  • Defines scope and objectives of all levels of QA testing.
  • Participate in all aspects of testing, including functional, regression, load and system testing.
  • Responsible for the overall success of testing. Manages assigned projects from IT Quality Testing through final User Acceptance Testing.
  • Establishes the purpose and deliverables of the test effort.
  • Provides resource planning, management and resolution of issues that impede the test effort.
  • Assures the appropriate level of quality by the resolution of important defects by working with developers to ensure the software development process has an appropriate level of testing.
  • Creates effective manual and automated test plans, using a variety of toolsets, including Rational Test Manager. Rational Robot, and other automated tools.
  • Performs black box testing as required.

Job Description


  • Senior Quality Assurance Engineer
  • Works collaboratively with development during all stages of projects to provide in process testing results.
  • Coordinates groups of business personnel who test, evaluate and validate new functions and applications, and identify issues in software or services.
  • Records and reports on testing metrics.
  • Obtains final signoff for code releases to production from the appropriate Project Sponsor.
  • Track and report defects using appropriate tools such as Rational ClearQuest

Supervision


  • Works under the general supervision of the Application Development Specialist III. This position does not supervise other personnel.

Proven success in the following job competencies:


  • Analysis and Reporting
  • Business Planning and Management
  • Communication and Presentation
  • Customer Focus and Relationship Building
  • Champion for Change
  • Influencing
  • Information and Technology Proficiency
  • Leadership
  • Problem Solving and Decision Making
  • Technical Industry and/or Profession Expertise

Work Experience


  • Demonstrated ability to work well with business analysts, programmers and end users in a cross functional team.
  • At least one years minimum of software test experience.
  • At least one years experience testing multiple software projects simultaneously with J2EE or MS.NET applications.
  • At least one years experience working with a structured software methodology and software test experience.
  • Experienced user of Rational Test Manager/Robot/Requisite Pro or other automated testing and defect tracking applications.
  • Experienced user of SQL to create data sets.
  • Experience creating scripts for automating processes.
  • Thorough knowledge and understanding of Rational Unified Process (RUP) used for the software development life cycle including requirements definition, initial application design, testing, final implementation and operations.

QA Manager Required Skills

What a QA manager does


  • Planning, prioritization of all the test-related tasks (use proven project management tools such as the V-Cycle or Scrum methodology)
  • Writing the test strategies
  • Reviewing the test plans
  • Taking the responsibility of certain designs if people have not the required competencies
  • Code reviewing
  • Spreading expertise and good usage of tools such as bug-tracking database or versioning systems
  • Delegating…
  • Having people judgment skills to hire the right people
  • Writing performance review

What makes a good QA Manager

Being a good QA engineer


Of course, a good QA Manager is first of all a good QA engineer. It requires additional skill, though.

Effective communication


A QA Manager must be an extremely good communicator. This includes:

    • Report global status and risk analysis to top management
    • Capability to communicate with technical and non-technical people
    • Having the diplomacy to say “no” when global quality is not acceptable for release

Part of the communication is also the not so well appreciated “meetings”. Meetings are useful to a good organization, though.

  • For large teams, privilege:
    • Formal meetings
    • Scheduled and iterated on a regular basis
    • Production of agendas (pre-meeting) and minutes (post-meeting)
  • For small teams, privilege:
    • Stand-up meetings
    • Not necessarily planned
    • Agendas an minutes not necessarily needed

Having and spreading the “customer-focus” vision.

To have the QA engineers efficient in their work, they must have the desire to see customers happy.

Developing people


Developing people in a QA team as in any team is essential. The main goal is to improve the learning curve and this can be achieved by:

  • Spreading best practises you’ve learn along your whole career as QA engineer/manager
  • Organizing trainings (external as well as internal)
  • Working in group to share competencies
  • Leaving some time to people to let them learn by themselves

Bringing out creativity in others


This can be achieved by:

  • Organizing brainstorm sessions on a regular basis
  • Discussing a lot with QA engineers to lead them to have the “idea” instead of exposing directly the idea (if you’ve got it before them). A good QA manager teaches the “way of thinking” before anything else
  • Explaining any decision you take so that the team get the intellectual process that led to that decision
  • Working in group

Motivating people


Motivating people is also necessary. To do that:

  • Be motivated yourself
  • Share your motivation to the others
  • Explain why QA is an interesting job:
    • Too often, people are reluctant to do QA because:
      “QA = finding problems = people (dev.) don’t like me”
      The good way of seeing the job is:
      “QA = avoid future problems = people (dev./support/customers) like me”
    • The result of QA activities is immediately seen by the end-user which is quite valorising
    • Seeing a “manual operation” becoming a completely automated job is demonstrating how talented people can use machines to improve life
    • Thoroughly testing a feature A is often more complex than developing the feature A. It’s then a challenging (so enhancive) task.

Team building


A “dream” team is a team where all people are technically very good in their job but also like to work together and appreciate each others. To improve the chances to have this happening, a QA manager shouldn’t hesitate to:

  • Organize events (these do not necessarily require expensive activities!)
  • Have some chats together on the working hours about non-technical subjects when the whole team is present
  • Have QA people cooperating more with other teams (especially with Development team)
  • Have a beer together sometime 😉

Enabling changes


Changing for changing is a bad concept. Conversely, when something does not work, change is mandatory. The process of changing must go through 3 steps:

  • Making an audit to see what’s wrong in the process (i.e. difficult to maintain very similar test scripts)
  • Determine with the team what has to be changed (i.e. setup data-driven testing)
  • Implement the change

Decision making


In QA manager there is “manager”, which means that this includes making decisions.

  • Not being overwhelmed by stress
  • Not hesitating to recognize a mistake soon – it is much better than trying to workaround for years the issue
  • Taking innovative (or risky) initiatives
  • Not hesitating to change the processes at risk to destabilize some people if you think it is necessary

Sources about QA:


  • http://xqual.com/documentation/tutorial_qa_engineer_skills.html#skills
  • http://xqual.com/documentation/tutorial_qa_manager_skills.html
  • http://xqual.com/qa/books.html
  • http://programmers.stackexchange.com/questions/46425/what-are-good-requirements-for-a-qa-engineer
  • https://cours.etsmtl.ca/mgl801/private/Take%20home/Aurum%20A.,%20Wohlin%20C.%20-%20Chapter%208.pdf

Technical Writer Requirements

Duties


Technical writers produce information for audiences ranging from novices to technical experts. In general, they:

  • research subjects by analyzing reference materials (for example, specifications, blueprints, diagrams, maintenance manuals, reports, studies), consulting experts and using products themselves
  • gather information about target audience needs and analyze how to structure and format information to meet those needs
  • prepare a documentation plan for each document for monitoring progress and reporting
  • select appropriate technology and media to deliver technical information
  • create, follow and enforce the use of consistent style and formatting by developing a house style guide
  • write content for on-line help files, reference materials, educational materials, procedural and policy manuals, user guides, proposals, technical reports and instructional materials that explain the installation, operation and maintenance of mechanical, electronic and other equipment (for example, oil industry equipment or computer applications)
  • engage subject matter experts for the duration of the project especially highly complex subject matter
  • create auxiliary resources such as diagrams or interactive learning processes if required
  • rewrite and edit drafts after they have been reviewed by technical experts for accuracy
  • test products (especially software and hardware)
  • manage documentation projects including translation and localization if required.

Technical writers also may define terms in glossary format, index or cross-reference information, or obtain copyright permissions to reprint material. They may work independently or as part of a team that includes scientists, engineers, computer specialists, management personnel, editors, other writers, illustrators or photographers.

Working Conditions


Technical writers employed by medium-sized or large organizations generally work standard office hours.Overtime sometimes is required to meet deadlines. Contract writers working from home can set their own hours but must be prepared to work long hours when required to complete projects on time. However, many contract writers are required to work in offices during normal business hours. The pressure associated with having to meet deadlines can be stressful.

Personal Characteristics


Technical writers need the following characteristics:

  • the ability to communicate and work effectively with a variety of people (for example, engineers, educators, publishers, editors, art directors, film producers, readers of varying ability)
  • adaptability and flexibility
  • the ability to pay close attention to detail
  • an interest in new communications technologies, particularly those involving multi-media and the Internet
  • the ability to analyze and think critically
  • the ability to deal with and learn from criticism
  • the ability to handle multiple requests during high pressure periods
  • strong organizational, time management and project management skills.

They should enjoy gathering and synthesizing information, taking a methodical approach to explaining procedures, and finding out how things are built and operate.

Educational Requirements


There are no standard education requirements for technical writers but they generally need:

  • an ability to write well with an understanding of plain language, sentence structure, presentation formats and readability
  • a good grasp of grammar and the ability to express ideas clearly in writing
  • knowledge of technology (ranging from basic to expert depending on the project)
  • research, interviewing and analytical skills
  • editing and proofreading skills.

Technical writers are typically required to hold bachelor’s degrees. While some technical writers enter the occupation with degrees applicable to a technical specialty, most have a degree in English, journalism, communications or a related field. Some colleges and universities offer bachelor’s degree programs in English with concentrations in writing.
These programs consist of 4-year curricula that provide students with a solid grasp of English and communication skills so that they may write across a variety of genres and fields.
Courses may include:

  • Technical writing
  • Writing theory
  • Journalism
  • Fiction
  • Non-fiction
  • Editing
  • Web writing

Web content writer


  • Develop content strategies : apply information architecture & design principles for social media content.
  • Ability to write in a variety of styles and formats for multiple audiences.
  • Moderate online communities: review submissions, create tags,track useful content.
  • Online Content writing &  development for websites, new features & social media (Blog creation & submission).
  • Content writing / editing / Submitting for Case studies, Product Info, & testimonials, articles, newsletters.
  • Candidate should be able  to conduct online research, generate / develop original content for various websites and write creative unique content .

Technical Writer Qualifications


Along with strong writing and communication skills, technical writers must be proficient in online publishing software and programs. The ability to produce video and audio for the Web may be required for some positions. Technical writers often handle large amounts of complex data, so information management skills are necessary. A sharp eye for detail, research skills and the ability to work under strict deadlines are also important for a career in technical writing.

Sources about TW:

  • http://education-portal.com/articles/Technical_Writer_Job_Outlook_and_Educational_Requirements.html
  • http://en.wikipedia.org/wiki/Technical_writer
  • http://alis.alberta.ca/occinfo/content/requestaction.asp?aspaction=gethtmlprofile&format=html&occpro_id=71002832
  • http://www.writingassist.com/resources/articles/which-skill-sets-are-important-for-a-technical-writer/
  • http://www.school-for-champions.com/techwriting/skills_required.htm

Test tasks


  • Short outline essay about features of HCE-DC service. Up to three web pages with illustrative examples of some key technical or technological solutions. Aim – for IT market specialists and product managers.
  • Short scientific article about some solution or technology that is new and modern for this area of applications that is implemented in one of three key products – hce-node, DTM or DC services. Up to five web pages with scientific or technical illustrations or graphs visualized some principal schema or measured statistical data. Aim – for scientific area in target technology specialists.
  • Short observation article about one of three HCE’s project products: hce-node, DTM or DC service. Common description with set of key features review and general specifications with advantages and characteristics. Up to five web pages with schema or graphs of typical usage or configurations for different target project needs. Aim – for IT market specialists system integrators, architects and product managers promoters.
  • Short observation article of some user’s experience about installation and any test usage of HCE products from package or source code. From three to ten web pages step-by-step shows some stages of products deployment, configuration, installation and test usage with screenshots. Aim – advertizement and information space filling.

* Just one or several test task can be done and represented in free form as single document (doc, pdf, etc…) or web page located on any public resource.

C/C++ senior developer requirements

Practical skills:

  • C/C++ ANSI/ISO (STL, I/O, TCP, memory allocation, containers, algorithms, date-time), GNU – strong; POCO, Boost – is desirable.
  • gcc, make – strong.
  • SQL ANSI/ISO 1992-2008, mysql – well.
  • Linux applied development, I/O, system API, well known libraries: curl, gd, ImageMagick – well.
  • IDE – Eclipse, KDeveloper, Aptana – well.

Linux system development knowledge:

  • The networking, TCP/IP, TCP sockets (epoll, polling, select) – strong;
  • Asynchronous networking based on boost, libev, libevent, etc… – desirable.
  • The multithreading model based on POSIX definitions (threads, mutex, rwlock, bariiers, conditional variables) – strong.

Theoretical knowledge:

  • The complex data structures (Balanced trees (AVL, binary), stacks, lists, dictionaries, collections, hashes, maps, sets) – strong.
  • The internal data representation and bit-operations including the platform dependent – strong.
  • The synchronization resources access algorithms (readers-writers models) – strong.
  • Main data mining and processing algorithms – strong
  • Network programming – strong
  • The HTTP, HTML – strong.
  • Internet data formats – well.
  • The multi-byte encoding – well
  • The theory of compilers (finite automata, LL grammars) – basic.
  • The matrix math – basic.
  • The system modeling – basic.

Language:

  • Russian – strong, main.
  • English – technical reading, writing – well.

Project’s position:

The senior engineer developer of multi-threaded application’s system architecture, data structures and access algorithms, data access architecture, network data structures, protocols and processing systems, multithreaded applications architecture.

Projects role:

Data structures and system level API developer and support, senior system architecture engineer, system administrator. Implementation and support of data main structures and subsystems for fast data access index and specialized properties representation; implementation and support of algorithms of network data processing subsystems, internal application’s architecture construction; construction, implementation and support of mapping algorithms and highlighting techniques; multithreaded architecture solutions partially design, implementation and support; support of related words subsystem; support of unique identifiers subsystem; building and support of most of system internal applications architectures (Context Data repository (CDR), Context Data Handler (CDH), Highlight Handler (HH), Related Words (RW), Dictionary Handler (DH), Resources Proxy Cache (RPC), Connections Manager (CM)).

Common questions for interviewing and knowledge tests

Questions for C++/C developers

Programming and Unix/Linux


  1. How you understand the distributed client/server application structure?
  2. What is process and process environment?
  3. What is the coding, encoding and characters representation? What do you know about Unicode?
  4. How the Unix/Linux supports multi-byte characters?
  5. What are the libraries, components, interfaces and configuration management and how to use this in a developing of reusable Unix software?
  6. What is the lexical analysis and parsing? How to use it in a streaming data processing?
  7. Interpretation vs. compilation, hardware independent and machine code from your point of view.
  8. How you understand possible localization principles in application with UI and without?
  9. What do you know about well known libraries for Unix/Linux developers?

C/C++ language and programming in general


  1. What do you know about C preprocessor?
  2. How do you understanding the standard data types in C and C++?
  3. What are constants, variables, operators, expressions, casting and type conversion in C and C++ languages?
  4. How do you understand pointers and arrays language?
  5. How do you understand strings?
  6. How do you understand dynamic and static memory allocation?
  7. How do you understand process input/output management in C and C++ applications?
  8. How do you understand streamed and non-streamed I/O?
  9. How to make strings manipulation with multi-byte strings? Example of create/initialize copy, substring select etc. operations.
  10. What is a library? Static and shared libraries, what is it?
  11. What type of library load you know?
  12. How to use a shared library to create interfaces and API?
  13. How you understand threads and process threading in the Linux OS? Threads vs. process – your understanding.
  14. How you understand structuring the large project in C and C++. What do you know about headers, external variables, subdividing on several files, modules, classes, templates and libraries?
  15. How do you understand inheritance and virtualization in C++?
  16. How do you understand memory allocation in C++?
  17. How do you understand name spacing and scope in C++?
  18. How do you understand type casting differences in C and C++?

Unix/Linux C programming


  1. What does fork() do?
  2. What’s the difference between fork() and vfork()?
  3. Why use _exit rather than exit in the child branch of a fork?
  4. How can I get/set an environment variable from a program?
  5. How can I sleep for less than a second?
  6. How can I get a finer-grained version of alarm()?
  7. How can a parent and child process communicate?
  8. How do I get my program to act like a daemon?
  9. How can I look at process in the system like ps does?
  10. Given a pid, how can I tell if it’s a running program?
  11. What’s the return value of system/pclose/waitpid?
  12. How do I find out about a process’ memory usage?
  13. How do I change the name of my program (as seen by `ps`)?
  14. How can I find a process’ executable file?
  15. Why doesn’t my process get SIGHUP when its parent dies?
  16. How can I kill all descendents of a process?
  17. How to manage multiple connections?
  18. How do I use select()?
  19. How do I use poll()?
  20. How do I use epoll?
  21. How can I tell when the other end of a connection shuts down?
  22. What the best way to read directories?
  23. How can I find out if someone else has a file open?
  24. How do I `lock’ a file?
  25. How do I find out if a file has been updated by another process?
  26. How do I find the size of a file?
  27. How do I use named and unnamed pipes?
  28. How do I compare strings using wildcards?
  29. How do I compare strings using regular expressions?
  30. How can I debug the children after a fork?
  31. How to do inter process communications by standard libraries functions set (pipes, message queues, memory mapped IO, shared memory, signals, semaphores and sockets)?
  32. What is make, imake, CVS and RCS? How to use scripts to generate makefiles automatically?

Network programming and TCP/IP


  1. How you understand the main principles of TCP/IP?
  2. What are the TCP/IP services in the Internet, how you understand service application as a technology and how it works?
  3. What do you know about sockets?
  4. What do you know about TCP connections, data transferring, requesting and responses using TCP sockets?
  5. What are HTML, HTTP and WWW?
  6. What is MIME, how it is used in the WWW?
  7. What are cookies and sessions? How they used in WWW?
  8. What is content and content type header?
  9. What is HTML forms and form submission? How to handle forms submission?

Database programming


  1. What do you know about SQL as a declarative language?
  2. What do you know about the Data Definition Language (DDL) as a part of SQL?
  3. What do you know about the Data Manipulation Language (DML) part of SQL?
  4. What do you know about creating tables and indexes, selecting rows and columns, creating views, updating and deleting in SQL.
  5. What the relationship between C data types and Relational Database data types?
  6. What is embedded static and dynamic SQL?
  7. What is ODBC?
  8. What are embedded mysql libraries?
  9. What the main principles of database programming interface for C/C++ developers?
  10. What do you know about stored procedures and functions?
  11. What do you know about tablespaces, control files, memory structure, process structure in db servers?
  12. What do you know about principles of database analysis and tuning, CPU performance management, effective usage of indexes and SQL statements?

About ASM Search engine

Technology


Associative Search Engine – it is a distributed computational cluster system that integrates several modern technological approaches and solutions like: multi threaded high specialized binary applications, web-server, script-based web-applications, relational sql database, OS Linux and many specialized and well known protocols, data formats, algorithms and technologies.

The main aim and functionality reason of ASM – is to build the kernel of the thematic web search systems like the thematic web portals with powerful full-text search and flexible fast data crawling. It implements all components and subsystems that need to deploy the natural web search engine:

  • Multi-threaded distributed incremental indexation subsystem for web sites of huge depth and large number of pages. It consists of set of multi threaded crawler applications (include dedicated images crawler) and indexers that implements high productivity and flexible configurable crawling and indexation processes.
  • Distributed index storage engine – search machines/nodes subsystem for fast access of indexed data and search with many optimized high specialized algorithms. It implements conveyer architecture of search requests processing and uses hybrid incremental multi-parametrical full text search algorithms mixing typified search, fuzzy logic and elements of artificial intelligence to balance the quality of search results rank and time to solve the search query.
  • Distributed textual data repository storage engine that implements set of algorithms of storage and fast access of indexed and searched textual fragments for visualization of search results for client-side visualization.
  • Multi-threaded high productive text-mining subsystem that implements the set of multi-level text parsers include algorithms of template-based semantic web and so on…
  • Linguistic kernel subsystem – that implements morphological analysis, structural analysis and normalization of words and phrases. It is the subsystem based on the reconstruction of the paradigms of the words, languages detection, and short phrases analysis algorithms.
  • Hierarchical multi-threaded search query handler subsystem that processes the client-side search queries, interacts with the distributed index data storage search machines/nodes, collects the responses of search results sets, sort and classify them according the relevancy rules, merge and filter search results sets in one array and returns it to the client-side applying the custom defined templates substitutions with the pagination and caching architecture. This is high productivity subsystem that allowed building the vertical hierarchical structured systems that unites sets of single ASM clusters in huge network with tree structure.
  • High specialized service applications like: the “Related Words” service that gives possibility to get thematic pairs of words from indexed contents ordered by popularity; the “Resource Data” service that provides direct (without search process) fast access to the data of the indexed web pages.
  • Backend management application – implemented as regular DB-driven web application that executes all administration tasks and provides set of client-side interfaces for web portals for tools like widgets.

The key ASM’s feature – it is algorithms of search and indexation in distributed computing cluster. Because this is associative search engine – the methods of candidates search, selection, and calculation of relevancy and ordering of results complete different from global web search engines like Google uses. They are based on the words sequences analysis and morphological attributes and not use different pre-calculated rates like citation, cross-liking, etc – to select candidates and order results. Instead of this the semantic web based search mixed with parametric typified search ordering and usage of elements of artificial intelligence in process of filtration of results as well as to escape the full candidates scan on manner the SQL DB. This complex hybrid algorithms can be tuned many ways and criteria, but main principles always to show on top the resources that are more closed to search query as textual information, contains bigger number of longest chains of searched words in forms closer to user-defined phrase in search query; contains them in the title of the resource and bolded text of source HTML code of web-page; as well as maximizes detection of associated data like quantitative and qualitative attributes and so on…

The set of search optimization algorithms increases the productivity of the search allows to guarantee the process of hundreds searches per seconds and maximizes interest rate of resources. ASM provides the possibility to create groups of sites for indexation and search by all of them in group, by one of them, by domain name or all of them in one installation’s data center. The group of sites is related with the user’s account that allows creating multi users indexation environment.

Template-base indexation gives possibility to index only defined area of web page by set of the marked fragments in template. Each site can have many different templates with many fragments defined.

Extended indexation algorithms can separate the significant and negligible parts of web page. This solution allows decreasing value of noise of some words that was got from high popularity web page area like menu, headers and footers. Significant or negligible parts can be searched separately or can be ignored and not indexed.

Index storage supports eight main and eight extended content sources and set of special attributes. Main content sources are typified and each of them stores data of special kind like URL, HREFs, TITLE, H, and other parts of the regular web page. Extended content sources are numerical integers or logical bits/masks and can store any numbers that was got by direct detection and converting or by some additional complex algorithms like mapped indexation.

ASM’s indexation and search supports detection of many textual formats like HTML, XML, RSS, PDF, plain text; graphical formats like JPG, PNG, GIF images and video content in regular HTML page and raw SQL database data. Also, external data filtration gateway can extend the data sources to any kind of possible include local file or network sources and third party applications data… Crawling and indexation has more than fifty configuration parameters allows flexible settings for each indexed site.

Template based search results response generation subsystem allows to create the data format of response of any kind from textual like HTML, XML, JSon, plain-text and even most of binary. Network subsystem uses modern fast high productivity sockets’ polling that allows handling huge number of simultaneous client connections.

HTTP-based networking supports HTTP 1.0 and 1.1, chunking, mode gzip and deflates and mostly acts as regular web server in client-side interactions.

Linguistic subsystem supports multiple languages (up to 32 languages simultaneously. Currently implemented are: English, German, French, Russian, Ukrainian, and Japanese as basic dictionaries and Russian, English and Japanese – with extended morphological analysis support). Integrated administering web application gives many automated administration actions and statistical reports of internal and external system activity.

ASM core uses the technologies: С/C++, PHP, JavaScript, Apache, nginx, MySql, Sun office, HTTP, NFS, Linux, TCP sockets, epolling, threads poolling, xpdflib, iconv, curl, zlib, imagemagick, tidy, ICU, and other.

User functionality


The main idea of the resource data indexing in ASM – it is a words associations in the resource context. Search queries processed by search machines that works with the distributed data repository. Than results are sorted according relevancy, combined from different parts of the distributed repository, filtered by filters criteria and formed for the client response. Finally, results will be returned to user as a response with list of resources related data like link, title, part or full of source web-page context, images and video links and so on…

Crawling and processing


Naturally – the web-crawling is a process of seek the web and fetch resources from it. But, implementation of the crawling can include some additional kinds of processing or pre-processing of the data fetched from the web.

ASM supports two different kinds of crawling and indexation – natural web and typified template-based.

In the natural web-crawling mode ASM’s crawler supports several main textual sources of the textual data from the source web-page. These sources are:

  • Title text – context of the HTML TITLE tag
  • H text – context of the HTML Hn tag
  • Alt text – context of the HTML ALT and TITLE attributes
  • Body text – context of the HTML BODY tag
  • URL’s text – context of the URLs that have been captured from the page.
  • Keywords – context of the HTML META tag name KEYWORDS
  • Description – context of the HTML META tag name DESCRIPTION

In the template-based crawling mode ASM’s can detect and evaluate eight extended numerical fields. These extended fields can be used as extended search criterions for set of combinations of additional conditions like equal, less, grate and bitwise. It can be any kind of single numerical value as quantity, number, time, date, string Id/crc, counter or single or multiple bits set/mask, etc…

During the crawling process, web-page context prepared for indexing and split on several parts of the text sources. Then during search process entrances of searched words will be included in to the relevancy calculations according the order of list of sources listed above. So if some searched term found in the Title text and in the body text of the several resources, resources with entrance in the title will be moved up in order and displayed first.

The next part of the preparations – it is a dictionary normalization and words indexes calculations.

Normalizations – it is a word transformation from source form to normalized, in a result of those words becomes to more common form that exists in the main dictionary. Part of this process – it is linguistic-dependent processing like splitting of Japanese context. (Now ASM dictionary uses only simple dictionary-based template-oriented algorithms, but in future deep morphological linguistic analysis will be used also.)

Indexation – it is a main important part of the ASM engine. It includes pre-relevance calculations per each resource and text source, relations and frequencies analysis and calculations and so on. In a result index data and calculated parameters are stored in the repository and became accessible for the search machines for the search process.

Search requests are accepted by the search handler and after a preparation similar that has done for the web-page context – will be transferred to the search machines.

Now ASM supports search requests with two: simple and complex form, and several different algorithms of search.

Simple requests


Simple form of the query does not contain any equations and operators excerpt the searched terms. In this case all terms are searched according to the AND logic. For example, by default and if any another specified the query:

quick brown fox

will search the resources that contains all three words. The relevance will be calculated according this rate:

  • all words in “Title text” source
  • maximum length of words chains (maximization of the count of the words with the minimal distance between them)
  • words in the high rated textual data source
  • maximum searched words count on the resource
  • max frequency or value of extended fields

and found resources will be ranged according relevance index and represented as ordered list. This type of search uses blacklist/stop-word for many English words like a, and, or, etc… and skips the one and two characters dictionary words. (A “dictionary word” means that word exists in the main ASM’s dictionary.)

Requests modifiers


Fast search – changes the main algorithm of search described above and searches only with usage of the limited simple criterions like presence of the searched terms in some main content source fields, searched terms frequencies, and so on, but not using detailed information about the chains and sequences of words in context. Also, this method saves the priority of main content sources like entrance in title under entrance in body of the web page. This method called “fast”, because this way it not uses the data typically located on the disk, but only memory resides and as a result – processed very rapidly without unpredictable delays by the OS I/O reasons…

Include single words – change the main algorithm of search the way that will to use only single words entrance condition instead of using relations between words and words chains. In this case resources with at least one searched word will be included in to the results.

Quoted text will be searched “as is” without the blacklisted words escaping. In this case only resources with full text entrances will be added in to the results, but the relevance index calculation algorithm the same. So resources with the searched phrase in the HTML document title will be rated upper than in the body and with the searched phrase in the HTML Hn tag – upper than in the regular text sequence and so on. More than one quoted phrases will be combined by AND logic.

Logical operators Logical operators combine the searched terms and phrases in to the equations. There logical operators supported:

“&” – logical AND (spaces are treated as AND by default if “include single words” modifier is Off). This operator means that all terms must be found on the same resource.

“|” – logical OR (spaces are treated as OR if “include single words” modifier is On.) This operator means that at least one term entrance must be present on resource.

“!” or “-” – logical NOT, all logical NOTs are combined in one list of terms that will be used as condition to skip resources with these terms entrances. NOT is unary operator preceded the term or phrase.

“+” – ADD operator that means what term follows after “+” sign need to be deliberately included in to the search process and default blacklist, stop-words or filters rules are ignored. So, blacklisted words can be included this way.

Examples of the complex requests:

+The quick “brown fox”
“brown fox” | “lazy dog”
“brown fox jumps over” ! “lazy dog”

Distances


Distances – it is an additional condition that brings more strict rules to the search process and helps to select words from more proper context environment area. Distances can be set between the any two words or for the all words in the search request.

If the distance is set for the pair(s) of the words it will limit the maximum number of any words between this two searched words in the context. This limitation will work not as filter, but while selection of the resource and will lead to choose only those resources that will satisfy the distance limitation.

If the distance is set for all words in the request – this condition will limit the total count of the words between all words in the request. This will help to find a more compact localization of the searched words together maximizes the words grouping.

Distance syntax example:

Brown <5> fox jumps over <15> “lazy dog”

This request means that words “Brown” and “fox” must have no more than five words between them and words “over” and “lazy” must have no more than 15 words between them. All another words can be located in any places of the context.

Brown fox jumps over +the lazy dog <25>

In the example above all words must be located not far than around of 25 words.

Content source


It is a filter criterion allows choosing only those resources that has entrance of searched words in the selected main text source. While search system crawl the resource content has been split on to the several types by the HTML tag source text:

  • title – from HTML TITLE tags
  • keyword – from HTML METHA “keywords” tag attribute
  • description – text from HTML METHA “description” tag attribute
  • H – text from HTML H tags
  • alt – text from “alt” and “text” attributes of the HTML IMG and A tags
  • reference URL – text from the href and src attributes of the HTML A, IMG, FRAME and another including own resource’s URL
  • body – text from another sources of HTML document

and each type of the source text is accumulated in one separated content source. So, for example, all texts from tag H (H1-H9) are concatenated in to one long text sequence and can be searched separately. User can choose any combination of the content source types searched. For each site user defines the list of the supported content sources. Data from unlisted content sources will be indexed as body.

Search in dedicated content fields


ASM supports eight main textual and eight extended numeric or logical content fields. Each field can be searched by own keywords set or search string in combination with logical NOT operation. This gives possibility to unite the complex logical conditions and to include and to exclude words, for example include in title and exclude from document body…

Another kind of usage of the search in some content fields – it is usage extended numeric fields in combination with mapped indexation (MI). Several qualitative attributes that was detected during indexation process can be used in search. There is several operations like equal, less, greater, and masked bit AND and bit OR are available. Masked bit operations acts with set of bits defined by mask. Typically each bit represents the one qualitative attribute and can to be a criterion to include or exclude resource from search results.

Document’s types – this filter allows choosing only resources with the specified HTTP MIME types. Foe each site user defines the list of the supported HTTP MIME types. Data from unsupported MIME types is not indexed.

Now only PDF document type has own dedicated algorithm for parsing. Resources with another document types will be parsed as HTML TEXT or plain text.

Languages – it is a filter that allows choosing only resources satisfied with the set of languages named language mask. While crawling of resource the languages mask (set of languages) detected and set for each word from context. While search user can set a set of languages. If user has been chosen the English language – this means for the search system that user want only resources in English language. But for another supported languages if resources has even one language from the list of chosen for search – this lead to include this resource. So, English language criterion works with AND logic, but all another with OR. This differences between the English and another languages because sites almost contains English text.

Date – it is a filter that allows choosing only resources added for the proper time period. The time period can be defined by the one or pair time borders to filter the resources by resource add date in range.

Site and user – filters allow to choosing only resources belongs to the correspondent site or user by specifying the site or resource Id in the request and will lead to search only in the resources of the sites that has belongs to user specified. By default if any Id specified search will be done in all resources. User Id can be replaced with the prefix of domain name of the server where search engine resides.

Similarity – it is a filter that allows eliminating the similar resources from the list of resources found. Similar resource – it is a resource which has fully or partially identical title or body content with the resource that has been already added in to the results list. All resources with the same titles are collected in the separated list (can be displayed with the indent) and number of the resources with the different bodies in this list has been limited.

Results number, pagination and results cache


The number of resources that can be returned from search can be limited and ranged by page number and pagination. Search process is significantly differs from selection of records in regular SQL db. Because this differences results candidates number in most cases bigger than actual number of returned resources. This is because filtration and selection using different criterions applied after selection by words entrances as a main part of full text search process. The responses cache used to store results of search query to prevent repeat of regular full text search in case of the same query or very similar. Search cache can be filled step by stem while client requests next pages of results or can be pre-filled with maximum possible results after first kind of search query. This is optionally.

Because search processing acts as incremental search – the number of potential results can grow step by step when client requests next pages in pagination. This approach provide the search system with the additional possibility to decrease the load and free some resources of computational unit, but in case of client side requirements of exact number of search results – the second method of cache fill can be used. Also, in cases of high popular words search – the search systems usually returned not all possible results, but predefined maximum that can be less than 5% of candidates. The cache pre-fill method also acts this way, but pre-fills the cache on maximum possible items per query.