October 19, 2017

Crawler

General information about the hyScore.io crawler for website owners and publishers.

WHAT IS IT?

The hyScore.io crawler is an automated robot that visits pages to examine, determine and analyze the content, in this sense, it is somewhat similar to the robots used by the major search engine companies (Google, Bing, etc.).

The hyScore.io crawler is identified by having one of the following user-agents:

  • Mozilla/5.0 (iPhone; CPU iPhone OS 8_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12F70 Safari/600.1. 4 (compatible; HyScore/1.0; +https://hyscore.io/crawler/)

Deprecated User -Agent:

  • “User-Agent”: “Keyword Extractor; info@hyscore.io” (deprecated)
  • “User-Agent”: “Keyword Extractor 5000; lucas@hyscore.io” (deprecated)

The hyScore.io crawler can’t be identified by requests coming from hyScore.io owned IP address ranges because we’re working with dynamic cloud IP addresses such as e.g AWS EC2 instances. If you are suspicious about requests being spoofed you should first check the IP address of the request against the appropriate RIPE database, using a suitable whois tool or lookup service. In general, the only valid addresses you should be seeing are in the address ranges from e.g. AWS IP address ranges (https://docs.aws.amazon.com/general/latest/gr/aws-ip-ranges.html). Depending on your region (eu, us, etc.) you should white-list these IP address ranges. In addition please add and allow the following address range 148.64.56.0 to 148.64.56.255 (148.64.56.0/24) and 148.64.56.64 to 148.64.56.80 in your robots.txt.

 

WHY IS HYSCORE.IO CRAWLING MY SITE?

hyScore.io assists publishers, advertisers and technology companies to contextually analyze pages or raw text e.g. to categorize, do an environmental analysis (e.g. brand safety and fraud detection use cases), tagging, place ads, personalize, for content recommendation, contextual video placements, etc. To do so it is necessary to examine, or crawl, the page to determine what is the content on it about, to express it in weighted keywords, category, or categories, the sentiment and much more for an automated processing.

Pages are only ever visited on demand, so if the hyScore.io Crawler has visited your site then this means someone requested the context for that page where the hyScore.io information was either not yet available or needed to be refreshed. For this reason, you will often see a request from the hyScore.io crawler shortly after a user has visited a page. The Crawler systems are engineered to be as friendly as possible, such as limiting request rates to any specific site, automatically backing away if a site is down or slow or is repeatedly returning non-200 (OK) responses.

It is important to be aware that there may be a significant chain of systems involved that cause hyScore.io to be analyzing your site. hyScore.io has partnered with and provides real-time contextual information to a number of real-time systems, such as AppNexus and many others. These systems are often used by other third-party systems (Adserver, DMP, Brand Safety,…) as part of their strategy.

 

BLOCKING WITH ROBOTS.TXT

Firstly note that hyScore.io is not providing a search engine system to anyone, we never make the crawled contents of your site available to any search or public systems. As discussed in the previous section we are only analyzing your site because you or a 3rd party (you work together with e.g. in terms of advertising, media, etc.) has caused us to be queried about the context of the page.

With a robots.txt file, you may block the hyScore.io Crawler from parts or all of your site, as shown in the following examples:

Block specific parts of your site:

User-agent: hyscore
Disallow: /private/
Disallow: /messages/

Block entire site:

User-agent: hyscore
Disallow: /

Allow hyscore to crawl site:

User-agent: hyscore
Disallow:

See also the Wikipedia article for more details and examples of robots.txt rules.

All that said, we, of course, take any request to desist crawling any site, or parts of a site, or any other feedback on the Crawler operations seriously and will act on it in a prompt and appropriate manner, if this is the case for you please don’t hesitate to contact us at crawler@hyscore.io and we will be happy to exclude your site, or otherwise investigate immediately.

 

MORE INFORMATION

If you think your site is being visited in error, or the crawler is causing your site problems then please email hyScore.io at crawler@hscore.io and we will investigate.

Thanks.