by Michael Boecher | Insights
As digitalization progresses, the mountains of data in companies are growing rapidly. However, much of the valuable information is still unused in the form of texts, websites, media files, documents, and e-mails. Numerous innovations in the field of Natural Language Processing (NLP) now make it possible to evaluate this information to a new extent. This leads to immediate information and competitive advantage in many industries. A central building block in NLP is the recognition of semantic concepts in texts – the so-called Named Entity Recognition (NER).
Companies continuously produce text data such as e-mails, work protocols, manuals, patents and much more. Clients produce text data via e-mail, social media channels, questionnaires, reviews, comments, and other sources. Text data comes from different sources, is written by different authors in different languages and often contains spelling mistakes. Companies are making significant efforts to secure this data in so-called “data lakes”. Organizing of this data is often difficult and time-consuming, but automatic text analysis makes this possible.
More efficient than a manual analysis done by humans
Finding relevant content in complex text collections requires new document analysis and search concepts. Common methods, such as searching for certain terms, i.e. the exact matching of letter sequences, prove to be inefficient in times of big data. The manual checking and classification of millions of texts by humans are, in turn, hardly economical and, of course, time-consuming as well. This means that far too much time is wasted on processes that a machine can do faster, better and more precisely.
Get valuable insights out of all your data
Nevertheless, it is extremely important for companies to be able to include all available data for their decisions. In the course of due diligence, for example, a data room comprising several gigabytes would ideally be checked entirely instead of merely selecting just a sample of documents. The same applies, for example, to a very big archive with digitalized texts and documents or to research in an entire online content network in which all articles and URIs (even for x – million entries) can be analyzed in a database.
Thanks to modern techniques such as Named Entity Recognition, large amounts of data can easily be analyzed in a blink of an eye – in real-time or by batch-packets in defined time slots. These processes are working automated 24 hours a day and 365 days a year by using NLP solutions like hyScore|analyze.
In science, the automatic recognition of a real-world object is known as Named Entity Recognition (NER). General objects such as persons, places, and organizations can be recognized, but also specific objects such as aircraft, company, phone, or e.g. cryptocurrency.
Image: Difference between rule-based character search (left) and intelligent detection of entities (right). In the example on the left, the system does not find the character string “UC Berkeley” because it does not occur in the text. In the example on the right, the system recognizes the text section “University of California, Berkeley” as an organization. Similarity measures can be used to link this organization to UC Berkeley University. Furthermore, a rule-based system cannot distinguish between the company or the fruit “Apple”. An intelligent system – like hyScore|analyze can!
The history of the development of NER systems goes back to the early 90’s, but has recently been boosted by the application of deep neural networks. The accuracy of the systems was achieved by two fundamental improvements: firstly, neural networks can include entire sentences or even entire documents in the analysis – older systems, however, were always limited to a few words. On the other hand, the mathematical representation of individual words is much more advanced than before.
by Michael Boecher | Insights
Contextual Data – what does it mean?
Contextual data is data that gives context to a person, entity or event. It is commonly used by business organizations for market research and prediction. Contextual data is taken from various sources and may include business information, family and socioeconomic background, educational history, health background, general environment and many other factors. More definitions of contextual data you’ll find at the end of this article .
At hyScore.io we define contextual data as follows: contextual data in our “context” is simply used to know more about the meaning of a website or any provided (plain) text and its content. We structure unstructured data and express in a scored and weighted manner the meaning and most important content of the website/text in keywords and their entity plus a sentiment score which involves evaluating online opinions based on specific words. The sentiment is then judged to be positive, negative or neutral.
Furthermore, we classify and weight the website in an own categories taxonomy and map these directly to the IAB standard taxonomy (Tier 1 / Tier 2). This kind of contextual data is useful for several use cases in many industries.
Contextual Data is about the content and environment of a website/text
Contextual data is that which is delivered to the right person, at the right time, within an actionable context. For example, the user reads an article about renting a Finca in Cala Millor on the island Mallorca in Spain. Wouldn’t it be great to show him a contextual matching video about the island Mallorca, the region Cala Millor or a best practice video of “how to rent a Finca”? Wouldn’t it make sense to show him a contextual advertising of a “Finca rental service” or links to previous articles and user reviews about the topic? If the sentiment of the article is bad, you might show him a video of “hidden traps to rent a Finca in Spain”.
The other way around is to not show something in the context, e.g. for brand safety. As an airline, you might not want to advertise your great deals on trips to New York right next to news about a horrible plane crash.
With hyScore’s contextual data API, you know right at this moment what a user is reading in which environment and you’re able to directly use this information to deliver additional information based on this actionable context or not. If you just want to enrich a users profile (interests, famous topics, etc.) you can do this by simply sending the user identifier with the initial request to our API. We just loop it through and provide you the information what user has read.
hyScore’s definition of contextual data is simple and valuable for many use cases. We don’t build products based on our data by our own. We leave it up to you how you use this kind of data in your business context. You can ennoble the data we provide by using it in your own product, application, and any intended use case. We don’t mind if you use it for content recommendation, site search improvements, tagging, for a contextual video player, contextual advertising, environmental analysis for brand safety, fraud detection or website classification, user profile enrichment, audience and user segmentation purposes, digitalization, research, whatever.
Our mission is to remove the major pain point to get access to this kind of contextual data for you. You need no additional infrastructure, you don’t need computational linguists and natural language processing experts. All you need is just an API-Key to get access to it. Sign up for a free account.