General information about the hyScore.io crawler for website owners and publishers.
WHAT IS IT?
The hyScore.io crawler is an automated robot that visits pages to examine, determine and analyze the content, in this sense, it is somewhat similar to the robots used by the major search engine companies (Google, Bing, etc.).
The hyScore.io crawler is identified by having one of the following user-agents:
- Mozilla/5.0 (iPhone; CPU iPhone OS 8_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12F70 Safari/600.1. 4 (compatible; HyScore/1.0; +https://hyscore.io/crawler/)
- “User-Agent”: “Keyword Extractor; email@example.com” (deprecated)
- “User-Agent”: “Keyword Extractor 5000; firstname.lastname@example.org” (deprecated)
The hyscore.io crawler can be additionally identified by requests coming from the following IP address ranges, please make sure they are whitelisted in your robots.txt:
- 220.127.116.11, 18.104.22.168, 22.214.171.124, 126.96.36.199 (random AWS, globally)
- 188.8.131.52 to 184.108.40.206 (220.127.116.11/24)
If you are suspicious about requests being spoofed you should first check the IP address of the request against the appropriate RIPE database, using a suitable whois tool or lookup service.
We recommend to white-list our UserAgent!
WHY IS HYSCORE.IO CRAWLING MY SITE?
hyScore.io assists publishers, advertisers and technology companies to contextually analyze pages or raw text e.g. to categorize, do an environmental analysis (e.g. brand safety and fraud detection use cases), automated tagging, place and targeting ads, personalize, do content recommendation, contextual video placements, etc. To do so it is necessary to examine, or crawl, the page to determine what is the content on it about, to express it in weighted keywords, category, or IAB categories, the sentiment and much more for an automated processing.
Pages are only ever visited on demand, so if the hyScore.io Crawler has visited your site then this means someone (in your company or external) requested the content analysis and insights for that page where the hyScore.io information was either not yet available or needed to be refreshed. For this reason, you will often see a request from the hyScore.io crawler shortly after a user has visited a page. The Crawler systems are engineered to be as friendly as possible, such as limiting request rates to any specific site, automatically backing away if a site is down or slow or is repeatedly returning non-200 (OK) responses.
It is important to be aware that there may be a significant chain of systems involved that cause hyScore.io to be analyzing your site. hyScore.io has partnered with and provides real-time contextual information to a number of real-time systems, such as Data Management Platforms (DMP) or Demand Side Platforms (DSP) and many others. These systems are often used by other third-party systems (Adserver, DMP, Brand Safety, Ad Fraud…) as part of the customers’ strategy (Agencies, Brands, Publishers, etc.).
BLOCKING WITH ROBOTS.TXT
Firstly note that hyScore.io is not providing a public search engine system to anyone, we never make the crawled contents of your site available to any public systems. As discussed in the previous section we are only analyzing your site because you or a 3rd party (you work together with e.g. in terms of advertising, media, content recommendation, brand safety, etc.) has caused us to be queried about the context of the single page URL.
With a robots.txt file, you may block the hyScore.io Crawler from parts or all of your site, as shown in the following examples:
Block specific parts of your site:
Block entire site:
Allow hyscore to crawl site:
See also the Wikipedia article for more details and examples of robots.txt rules.
All that said, we, of course, take any request to desist crawling any site, or parts of a site, or any other feedback on the Crawler operations seriously and will act on it in a prompt and appropriate manner, if this is the case for you please don’t hesitate to contact us at email@example.com and we will be happy to exclude your site, or otherwise investigate immediately.
Note: If you block our crawler the result will be shown as “Error – blocked by robots.txt“. That means, that our clients get aware that you don’t want to be crawled for further analysis. In some cases that might be ending in being excluded from advertising campaigns and can result in a monetary loss or can cause a malfunction of a 1st or 3rd party application.