HCE Project Python language Distributed Tasks Manager Application, Distributed Crawler Application and client API bindings.  2.0.0-chaika
Hierarchical Cluster Engine Python language binding
dc_crawler.OwnRobots.RobotFileParserLookalike Class Reference
Inheritance diagram for dc_crawler.OwnRobots.RobotFileParserLookalike:
Collaboration diagram for dc_crawler.OwnRobots.RobotFileParserLookalike:

Public Member Functions

def __init__ (self, url="")
 
def set_url (self, url)
 
def read (self)
 
def parse (self, lines)
 
def can_fetch (self, user_agent, url, syntax=GYM2008)
 
def mtime (self)
 
def modified (self)
 
- Public Member Functions inherited from dc_crawler.OwnRobots.RobotExclusionRulesParser
def __init__ (self)
 
def source_url (self)
 
def response_code (self)
 
def sitemap (self)
 
def sitemaps (self)
 
def is_expired (self)
 
def is_allowed (self, user_agent, url, syntax=GYM2008)
 
def get_crawl_delay (self, user_agent)
 
def fetch (self, url, timeout=None)
 
def parse (self, s)
 
def __str__ (self)
 
def __unicode__ (self)
 

Public Attributes

 last_checked
 
- Public Attributes inherited from dc_crawler.OwnRobots.RobotExclusionRulesParser
 user_agent
 
 use_local_time
 
 expiration_date
 

Private Attributes

 _user_provided_url
 

Detailed Description

A drop-in replacement for the Python standard library's RobotFileParser
that retains all of the features of RobotExclusionRulesParser.

Definition at line 671 of file OwnRobots.py.

Constructor & Destructor Documentation

◆ __init__()

def dc_crawler.OwnRobots.RobotFileParserLookalike.__init__ (   self,
  url = "" 
)

Definition at line 675 of file OwnRobots.py.

675  def __init__(self, url = ""):
676  RobotExclusionRulesParser.__init__(self)
677 
678  self._user_provided_url = ""
679  self.last_checked = None
680 
681  self.set_url(url)
682 
683 
def __init__(self)
constructor
Definition: UIDGenerator.py:19

Member Function Documentation

◆ can_fetch()

def dc_crawler.OwnRobots.RobotFileParserLookalike.can_fetch (   self,
  user_agent,
  url,
  syntax = GYM2008 
)

Definition at line 698 of file OwnRobots.py.

698  def can_fetch(self, user_agent, url, syntax=GYM2008):
699  return RobotExclusionRulesParser.is_allowed(self, user_agent, url, syntax)
700 
701 

◆ modified()

def dc_crawler.OwnRobots.RobotFileParserLookalike.modified (   self)

Definition at line 706 of file OwnRobots.py.

706  def modified(self):
707  self.last_checked = time.time()
708 

◆ mtime()

def dc_crawler.OwnRobots.RobotFileParserLookalike.mtime (   self)

Definition at line 702 of file OwnRobots.py.

702  def mtime(self):
703  return self.last_checked
704 
705 

◆ parse()

def dc_crawler.OwnRobots.RobotFileParserLookalike.parse (   self,
  lines 
)

Definition at line 694 of file OwnRobots.py.

694  def parse(self, lines):
695  RobotExclusionRulesParser.parse(self, ''.join(lines))
696 
697 
Definition: join.py:1

◆ read()

def dc_crawler.OwnRobots.RobotFileParserLookalike.read (   self)

Definition at line 690 of file OwnRobots.py.

690  def read(self):
691  RobotExclusionRulesParser.fetch(self, self._user_provided_url)
692 
693 

◆ set_url()

def dc_crawler.OwnRobots.RobotFileParserLookalike.set_url (   self,
  url 
)

Definition at line 684 of file OwnRobots.py.

684  def set_url(self, url):
685  # I don't want to stuff this into self._source_url because
686  # _source_url is set only as a side effect of calling fetch().
687  self._user_provided_url = url
688 
689 

Member Data Documentation

◆ _user_provided_url

dc_crawler.OwnRobots.RobotFileParserLookalike._user_provided_url
private

Definition at line 678 of file OwnRobots.py.

◆ last_checked

dc_crawler.OwnRobots.RobotFileParserLookalike.last_checked

Definition at line 679 of file OwnRobots.py.


The documentation for this class was generated from the following file: