December 4, 2016

RESTful-API documentation

Documentation of the hyScore.io RESTful-API

Please take a look at our Frequently asked question (FAQ) section first, if you’ve a question before contacting our support. Thanks.

Endpoint

Endpoint v1 : https://api.hyscore.io (Deprecated!)

Endpoint v2 : https://api.hyscore.io/v2/

Method

POST

Authorization

Header fieldDescription
x-api-keyAPI-Key (mandatory)
AuthorizationBasic Og==
Content-Typeapplication/x-www-form-urlencoded OR application/JSON (raw)

Usage

Input POST body should either be application/x-www-form-urlencoded  or raw JSON. Available fields in POST body:

ParameterDescription
urlFull URL of the article to be analyzed. (e.g. http://domain.com/folder/article001.html)
textRaw text to be analyzed.

Note: You can use the parameter "url" OR "text". A combination of both lead to an error.
numberOfKeywords Number of keywords returned in JSON response. (default number = 5). We recommend a setting of 2 up to 3 keywords, if you want to use the API for use cases like “contextual video” or “contextual advertising”.
uuidType = String. Can be used for tracking purposes. If you want to personalize your offer for specific user or enrich a users data set you can inject here an identifier.
customDataType = String (array). Can contain any custom information, identifier, etc.

getTextType = Boolean (Default=False). Switch specifying (getText=True) if the system should return the extracted text.
getMetaType = Boolean (Default=False). Switch specifying (getMeta=True) if the system should return the extracted meta keywords.
imagedivid
Advice: API v2 only!
Type = String. Instead of letting the system automatically choose the appropriate image, you can enter the id of the image div that should be used.

With the parameter "imagedivid" you can force our API to extract the image URL of a specific image
container.

articledivid
Advice: API v2 only!

Type = String. The parameter arcticledivid allows you to specify which part of the website should be used for the text analysis/keyword extraction.

{\"parameter\":\"value\"}

Advice: The string parameter and/or value has to be escaped exactly once!

String example:
„articledivid“: "{\"parameter\":\"value\“}“,

Parameter value example:

  • {\"class\":\"part part-story\"}

  • {\"id\":\"main-news\"}

  • {\"id\":\"article-news\"}

  • ...

Note: Takes either url or text as input, not both at the same time.

Response

The response is in JSON format and contains the following fields:

KeyDescription
category
Advice: API v1 only!
The category of the URL/Website.

Response:
“gx_retry" until category is determined.

Note: you can find the full hyScore category list containing the IAB category
mapping and the response status list as *.csv-file below. See "IAB category mapping".


Depricated:
In some cases we maybe respond with "no channels returned” OR “null”. These are "corpses" which will be removed out of our index over time (mainly old URLs first requested in the early phase of hyscore). Its a self-cleaning mechanism.
categories
Advice: API v2 only!
The weighted categories of the URL/Website (max weight: 100). The category with the highest weight is more likely as the others.

Response:
“gx_retry" until category is determined.

Note: you can find the full hyScore category list containing the IAB category
mapping and the response status list as *.csv-file below. See "IAB category mapping".
customDataAn additional The value if the customData input field if used. Any format allowed (numeric, srings), e.g. “23D-XZ-2300” or “user@domain.com”
iabThe category of the URL/Website as IAB category.

In API v2 you get a list of matching IAB categories with a weighted score. The category with the highest weight is more likely as the others.

Official IAB Content taxonomy : category (Tier 1) and IAB code (Tier 1 & 2), example:

API v1:
“iab”:
{
“category”: “Automotive”,
“code”: “IAB2-4”,

API v2:
“iab”:
{
“category”: “Automotive”,
“code”: “IAB2-4”,
"weight": 28.053 }

Response:
- “no channels returned” until category is determined
- "category not supported by IAB" - we were not able to determine a IAB category for the website and its content.
image
Advice: API v2 only!
URL of the article image if available. Either chosen automatically of via the imagedivid - parameter.
languageThe language of the given content/text is determined automatically by the system:

  • “cn” : Chinese (CN)

  • “cz” : Czech (CZ)

  • “de” : German (DE)

  • “dk” : Dansk (DK)

  • “en” : English (EN)

  • “es” : Spanish (ES)

  • “fr” : French (FR)

  • “hu” : Hungarian (HU)

  • “in” : Hindi (IN)

  • “it” : Italien (IT)

  • “nl” : Dutch (NL)

  • "pl" : Polish (PL)

  • “pt” : Portuguese (PT)

  • “ru” : Russian (RU)

  • “se” : Swedish (SE)

  • “tr” : Turkish (TR)

  • ... more to follow soon.

metaKeywordsThe given meta keywords – if exists – of the URL set by its publisher (not weighted). Has to be activated with “True”.
sentiment
Advice: API v2 only!
By default, we analyze the sentiment of the provided URL/text content.

Example: "sentiment": 0.30833333333333335,

The sentiment is currently determined in German (DE) and EN (English). We will add this step by step for other languages too.

If any other language is involved you'll get the response:

"sentiment": "Only EN and DE supported at the moment."

Sentiment has a value between +1 (perfect), 0 (neutral) and -1 (poor, negative). A tiny demo of the functionality you'll find in our demo section.

textThe article text that was extracted and analyzed (if “getText=True” is set). Matches the text input field if it was used.

If the default parameter/value "getText=False" is set you'll get the response "text": "Deactivated. See docs."
text length Example: A Image/picture gallery has often less text and lower keyword scores. An article with more text and much more chars provide a higher ranked keyword score. You can use the text length as an on factor / indicator for your e.g. decision engine (e.g. use cases: recommendation, brand safety, etc.)
tldThe Top Level Domain (tld) of the analyzed URL
urlOrigin / Full URL of the site analyzed. (e.g. http://domain.com/folder/article001.html)
uuidThe value if the uuid input field if used. Any format allowed (numeric, srings), e.g. “23D-XZ-2300” or “user@domain.com”.

Note: this information is just looped through the system. We don't store or cache this information.
weightedKeywordsContains the keywords/entities extracted from the input. Keywords are displayed in their normalized form. Each entry consists of:

  • type: The type of the keyword if possible, e.g. “Keyword, Country, VideoGame, Person, …”

  • name: The actual keyword (normalized).

  • surface_form: The actual keyword how it is written on the website and a user would see/read it. The surface_form is a list element in the JSON response.

  • weight: The weight of the keyword. The higher this value is, the more relevant the entry is to the given content (max. weight: 10).

  • frequency: The number of times the entity appears in the text. List ist sorted by weight.

status
Advice: API v2 only!
Contains the status of the API request and the result.

  • status = {"type":"Error", "message": "Target URL not reachable."}
    There is an issue with the requested target URL. It is not reachable (timeout), there is a DNS error, a SSL error, it is not a valid URL or any other issue related to the URL. The result counts as a valid result. If you want to lower these kind of error you should check if the website/URL is still available.

  • status = {"type":"Error", "message": "Issues crawling URL"}
    The website is protected (login, not authorized), we're not allowed to crawl the website (norobots), etc...

  • status = "status": {"type": "Error", "message": "Problematic URL"
    Something is wrong with the URL. We were not able to process it. Invalid format, etc. We queued this URL in the dead letter queue.

  • status = {"type":"Incomplete", "message": "Categorization issues"}
    Something went wrong in determing a correct category, e.g. gx_tagged, gx_retry, gx_notfound, gx_nomatches, see "categorization" status.

  • status = {"type":"Ok", "message": "All seems well."}
    The request was successful. A full JSON response is provided.

  • status = {"type":"No data yet", "message": "New URL, analysis in progress"}
    The requested URL is new to hyScore.io and needs to be analyzed.
  • API and Categorization status response(s)

    StatusDescription Last modification

    2xx

    status See above. Advice: API v2 only!
    Contains the status of the API request and the result.

    status = {"type":"Error", "message": "Target URL not reachable."}
    There is an issue with the requested target URL. It is not reachable (timeout), there is a DNS error, a SSL error, it is not a valid URL or any other issue related to the URL. The result counts as a valid result. If you want to lower these kind of error you should check if the website/URL is still available.

    status = {"type":"Error", "message": "Issues crawling URL"}
    The website is protected (login, not authorized), we're not allowed to crawl the website (norobots), etc...

    status = "status": {"type": "Error", "message": "Problematic URL"
    Something is wrong with the URL. We were not able to process it. Invalid format, etc. We queued this URL in the dead letter queue.

    status = {"type":"Incomplete", "message": "Categorization issues"}
    Something went wrong in determing a correct category, e.g. gx_tagged, gx_retry, gx_notfound, gx_nomatches, see "categorization" status.

    status = {"type":"Ok", "message": "All seems well."}
    The request was successful. A full JSON response is provided.

    status = {"type":"No data yet", "message": "New URL, analysis in progress"}
    The requested URL is new to hyScore.io and needs to be analyzed.
    never
    "category": - Response
    gx_adserverThe page is for an adserver iframe or similar, categorisation would be of no value.28.06.17
    gx_contentaggregatorUrl is for a site with no content of ist own, typically just a page of links28.06.17
    gx_blockedhyScore does not crawl this site, no editorial classification has been assigned (rare). May also mean the specific url contains components that hyScore has blacklisted from crawling.28.06.17
    gx_taggedhyScore does not crawl this site, but an editorial classification has been assigned (review)28.06.17
    gx_uncrawlableCan not be analysed, probably can not even be accessed, not possible to determine what the context of the site is at all (unusual)28.06.17
    gx_badsite_norobotsCan not be analysed due to robots restrictions(rare)28.06.17
    gx_baddata(bad Data) hyScore was unable to analyse this page, the url maybe invalid, or may be blacklisted by hyScore or the site could be unresponsive.28.06.17
    gx_invalid(invalid) hyScore was unable to analyse this page, the url maybe invalid, or may be blacklisted by hyScore or the site could be unresponsive.28.06.17
    gx_retry(retry) hyScore has queued this page for processing as it has not yet been analysed by our categorization. You may be able to retry shortly. Any results returned for this page will be from domain level only.28.06.17
    gx_redirected(redirected) hyScore was unable to analyse this page because the site redirected our crawler elsewhere when we tried to visit it, possibly this site requires a login, or otherwise has restricted access.28.06.17
    gx_noactions(noactions) hyScore was unable to analyse this page because the site has returned no usable content at all. Possibly the site is having issues or is refusing to serve real requests to our crawler.28.06.17
    gx_norobots(norobots) hyScore was unable to analyse the page directly because the site does not allow the crawler access via a robots.txt directive.28.06.17
    gx_offline(offline) hyScore was unable to analyse this page as the crawler is currently unable contact the site, the site may be down, blocking the crawler, or have generated too many errors and been temporarily disabled from further crawling. You may be able to retry later.28.06.17
    gx_badlanguage(badlanguage) hyScore has analysed the page, however, the language is not or partly supported by your platform28.06.17
    gx_notwhitelisted(notwhitelisted) - function / status - deprecated - not in use28.06.17
    gx_partial(partial) hyScore was unable to analyse this page because the site has returned little or no usable content at all. Possibly the site is having issues or is refusing to fully serve requests from our crawler.28.06.17
    gx_gaveup(gaveup) hyScore was unable to analyse this page because the site is not responding to requests. Possibly the site is having issues or is refusing to serve requests from our crawler.28.06.17
    gx_notauthorised(notauthorised) hyScore was unable to analysethispage because the site required a login28.06.17
    gx_notfound(notfound) hyScore was unable to analyse this page because the site returned a Not Found (404) error when our crawler tried to visit it.28.06.17
    gx_nohost(nohost) hyScore is unable to analyse this page as the site does not seem to currently exist in the global DNS records, possibly this site is private to your network, or there has been a temporary issue with DNS lookups.28.06.17
    gx_nourlEither no url was specified at all, or it could not be interpreted as a url at all28.06.17
    gx_nomatcheshyScore has analysed this page, but it does not match any channels at all. Possibly the page contains very little content and it is not possible to determine a valid channel at all.28.06.17
    gx_unmappedhyScore has analysed this page and determined information for it, however, the mapping does not include any category for this response (unknown category result)28.06.17
    gx_notdownloadedhyScore was unable to download and process this page for analysis. This may be an intermittent issue or may indicate there is something in this page that hyScore currently cannot handle.28.06.17
    "iab": - Response
    category not supported by IABExactly what it means...28.06.17

    4xx or other general error

    "message": "Forbidden"No OR wrong API-Keynever
    "message": "Limit exceeded"The daily or monthly quota / limit of API requests configured for your API-Key exceeds the configured volume.28.06.17
    "error": "Target not reachable. Check URL""Target not reachable. Check URL"

    The URL you've requested is not available (DNS Error, Target page temporarily not reachable or any other problem which prevents hyScore from analyzing the URL/website - mostly DNS issue and website/URL meanwhile removed/offline).
    28.09.17 (API v2/b only)
    Deprecated: 09.10.2017 - see "status" in JSON response.

    IAB category mapping

    • The Latest IAB hyScore category mapping file (*.zip-file) contains the latest CSV-File with hyScore’s category mapping (IAB category mapping).
    • Last Update: 29th August 2017

     


    Other JSON response example

    Example(s) for Endpoint "https://api.hyscore.io/v2/":


    Response example [1]:Status "message": "New URL, analysis in progress", "type": "No data yet". "No data yet" shows usually the initial and first request of an unknown URL. If this URL was never seen before by hyScore and we're not able to deliver an answer within 100ms, we’ll always send this response. Usually with the 2nd or 3rd request to the same URL we’ve provide a proper result - see Response example [3]/b>

    Response Header:
    HTTP/1.1 200 OK

    Response Body:

    {
    "status": {
    "message": "New URL, analysis in progress",
    "type": "No data yet"
    },
    "text": "N/A",
    "image": "N/A",
    "weightedKeywords": "N/A",
    "categories": "N/A",
    "language": "N/A",
    "sentiment": "N/A",
    "url": "https://www.theguardian.com/football/2017/oct/14/manchester-city-stoke-city-premier-league-match-report",
    "iab": "N/A",
    "tld": "N/A",
    "metaKeywords": "N/A"
    }




    Response example [2]:message: "Categorization issue"; type: "incomplete". If it takes a bit longer to determine the website category/channel this message can appear. It will be updated as soon we’ve determined the category/website of the channel. Depending on the load, the amount of websites and amount of text lines being parallel analyzed this could take a little while. In these cases, if you are not using the API in a real-time case, we recommend to request this URL after waiting 5-10m again.

    Response Header:
    HTTP/1.1 200 OK

    Response Body:

    {
    "status": {
    "message": "Categorization issues",
    "type": "Incomplete"
    },
    "weightedKeywords": [
    {
    "surface_form": "Companion",
    "frequency": 3,
    "type": "Keyword",
    "name": "Offizierskreuz",
    "weight": 3.0232558139534884
    },
    {
    "surface_form": "Clan",
    "frequency": 3,
    "type": "Keyword",
    "name": "Computerspieler-Jargon",
    "weight": 3.0232558139534884
    },
    {
    "surface_form": "AW",
    "frequency": 2,
    "type": "Keyword",
    "name": "Arctic Warfare",
    "weight": 2.0155038759689923
    },
    {
    "surface_form": "app",
    "frequency": 1,
    "type": "Work",
    "name": "Mobile App",
    "weight": 1.5077519379844961
    },
    {
    "surface_form": "Permalink",
    "frequency": 1,
    "type": "Keyword",
    "name": "Permalink",
    "weight": 1.0077519379844961
    },
    {
    "surface_form": "Bookmark",
    "frequency": 1,
    "type": "Keyword",
    "name": "Lesezeichen",
    "weight": 1.0077519379844961
    },
    {
    "surface_form": "iOS",
    "frequency": 1,
    "type": "Keyword",
    "name": "Apple iOS",
    "weight": 1.0077519379844961
    },
    {
    "surface_form": "Katastrophe",
    "frequency": 1,
    "type": "Keyword",
    "name": "Katastrophe",
    "weight": 1.0077519379844961
    },
    {
    "surface_form": "call of duty",
    "frequency": 1,
    "type": "Keyword",
    "name": "Call of Duty",
    "weight": 1.0077519379844961
    },
    {
    "surface_form": "Android",
    "frequency": 1,
    "type": "Keyword",
    "name": "Androide",
    "weight": 1.0077519379844961
    }
    ],
    "text": "Deactivated. See docs.",
    "image": "http://www.gamezfightclub.de/wp-content/uploads/2014/10/cod_aw_cov.jpg",
    "customData": "",
    "sentiment": 0.0875,
    "categories": [
    {
    "name": "gx_retry",
    "weight": 1
    }
    ],
    "uuid": "",
    "language": "de",
    "url": "http://www.gamezfightclub.de/2014/10/cod-aw-companion-app-ab-3-11-verfuegbar/",
    "iab": [
    {
    "category": "category not supported by IAB"
    }
    ],
    "tld": "gamezfightclub.de",
    "text length": 725
    }










    Response example [3]: Full response – A full response looks like this:

    Response Header:
    HTTP/1.1 200 OK

    Response Body:
    {
    "status": {
    "message": "All seems well.",
    "type": "Ok"
    },
    "weightedKeywords": [
    {
    "surface_form": "Stoke",
    "frequency": 12,
    "type": "SoccerClub",
    "name": "Stoke City F.C.",
    "weight": 5.312257405515832
    },
    {
    "surface_form": "Manchester City",
    "frequency": 6,
    "type": "SoccerClub",
    "name": "Manchester City F.C.",
    "weight": 5.206128702757916
    },
    {
    "surface_form": "Premier League",
    "frequency": 3,
    "type": "SoccerLeague",
    "name": "Premier League",
    "weight": 4.203064351378958
    },
    {
    "surface_form": "Sané",
    "frequency": 5,
    "type": "Keyword",
    "name": "Jacques-Noël Sané",
    "weight": 4.005107252298264
    },
    {
    "surface_form": "Guardiola",
    "frequency": 3,
    "type": "Plant",
    "name": "Guardiola",
    "weight": 3.004085801838611
    },
    {
    "surface_form": "Jesus",
    "frequency": 3,
    "type": "Person",
    "name": "Jesus",
    "weight": 3.004085801838611
    },
    {
    "surface_form": "Kevin De Bruyne",
    "frequency": 3,
    "type": "SoccerPlayer",
    "name": "Kevin De Bruyne",
    "weight": 2.7030643513789583
    },
    {
    "surface_form": "Watford",
    "frequency": 2,
    "type": "SoccerClub",
    "name": "Watford F.C.",
    "weight": 2.0020429009193053
    },
    {
    "surface_form": "Tammy",
    "frequency": 2,
    "type": "Person",
    "name": "Tammy Faye Messner",
    "weight": 2.0020429009193053
    },
    {
    "surface_form": "Belgian",
    "frequency": 2,
    "type": "SoccerClub",
    "name": "Belgium national football team",
    "weight": 2.0020429009193053
    }
    ],
    "text": "Deactivated. See docs.",
    "image": "https://i.guim.co.uk/img/media/6e1132cf166ea64e9c11d4dae50f9ec4c1a4a1ab/0_161_4570_2742/master/4570.jpg?w=300&q=55&auto=format&usm=12&fit=max&s=d1791724ac408508290413c48da06ab8",
    "customData": "",
    "sentiment": 0.12891282959464775,
    "categories": [
    {
    "name": "sport_soccer",
    "weight": 68.231
    },
    {
    "name": "sport",
    "weight": 58.15
    },
    {
    "name": "interest_frequent_travelers",
    "weight": 52.175
    },
    {
    "name": "interest_male",
    "weight": 49.56
    }
    ],
    "uuid": "",
    "language": "en",
    "url": "https://www.theguardian.com/football/2017/oct/14/manchester-city-stoke-city-premier-league-match-report",
    "iab": [
    {
    "category": "Sports",
    "code": "IAB17-44",
    "weight": 68.231
    },
    {
    "category": "Sports",
    "code": "IAB17",
    "weight": 58.15
    },
    {
    "category": "category not supported by IAB"
    },
    {
    "category": "category not supported by IAB"
    }
    ],
    "tld": "theguardian.com",
    "text length": 5164
    }




    Response example [4]: Error response – a error response, e.g. Target URL not reachable" (DNS error, etc.) looks like this:

    Response Header:
    HTTP/1.1 200 OK

    {
    "status": {
    "message": "Target URL not reachable.",
    "type": "Error"
    },
    "weightedKeywords": [],
    "text": "Deactivated. See docs.",
    "image": "N/A",
    "customData": "Enrich your customData here...",
    "sentiment": "N/A",
    "categories": "N/A",
    "uuid": "236723627362736%2323djksd",
    "language": "N/A",
    "url": "https://www.theguardian.error/",
    "iab": "category not supported by IAB",
    "tld": "N/A",
    "metaKeywords": "N/A",
    "text length": 3
    }

    Last updated: November 6th, 2017