Since its inception 10 years ago Glenbrook is focused on providing its customers with the advanced data services based on Glenbrook’s proprietary platform. Glenbrook platform constitutes the foundation of its capabilities to navigate the Deep Web, analyze web pages, extract contexts from them, and convert unstructured information presented in these contexts into structured facts that can be entered into Glenbrook customers’ Big Data strategic initiatives.

Using this approach Glenbrook successfully served its clients in financial services, information technology, human resources, mobile, risk management, publishing, and social media.

    Glenbrook's proprietary technology has been awarded six patents, with several more pending. The platform has the capability to deal with multiple languages and multi-byte characters.

    Glenbrook's platform to navigate the Internet and collect information consists of the following layers:

  • Deep Web Trawling
    • Efficient means to penetrate into Deep Web
    • Automatic collection of information residing behind HTML forms in Deep Web
    • Adept to deal with ever-changing structure of web sites and web pages
  • Page Analysis – Contexts Recognition and Extraction
    • Identification of target content vs “non-content” (undesired ads, links, spurious text, etc.)
    • Automatic extraction of target content (news articles, titles, authors, etc.)
    • Adept to deal with ever-changing structure of web pages
  • Conversion of unstructured Web Information into Structured Data
    • Extraction of target data
    • Extraction of entities, their attributes and relationships
    • Conversion of target data into structured database format
  • Deep Web vs. Surface Web
    • The Surface Web consists of static pages that are searchable by a search engine (e.g., Google). It is 50+ billion pages in size.
    • The Deep Web is much larger (in excess of 1 trillion pages) and is the portion of the web that is dynamic. These pages are created in response to a query by a user (e.g., “What are prices for airfare from Chicago to Newark on July 20?"). The trend is for increasing volumes of information to be placed in the Deep Web.
    • Glenbrook's platform knows when to go to the Deep Web (and when not to), how to construct the query for a given page, and, once there, how to find and extract desired information efficiently. This is essential due to the enormous size and complexity of the Deep Web versus the Surface Web.
  • Collecting information about millions of companies in North America directly from their web sites
  • Conversion of customer records from English to Chinese, Japanese, and Korean (and vice versa)
  • Collecting “Points of Interest” for mobile applications globally
  • Collecting business information from regulatory agencies worldwide
  • Determining millions of company URLs when only company name and state is known
  • Given a company URL, determining company name and country of origin
  • Collecting business information about franchisees
  • Collecting job postings directly from Deep Web sites of employers
  • Template-free automatic extraction of news articles worldwide
  • Automatic extraction of contexts from web pages
  • Collecting business information from press releases
Glenbrook prides itself in providing its customers – Fortune 100 companies and startups alike - with the client-tailored data services. Major types of these services are described below.
Multi-Language Conversion of Business Records

Global companies deal on a daily basis with the problem of maintaining current and potential customer records. These records are either acquired from third parties (such as D&B) or generated by the companies own sales and marketing organizations. The challenge in leveraging these records is that they are often in different languages and their reconciliation across the company is not an easy task. Not only the language is different but also the conventions on how to describe customer data in different languages are different. For example, the order of address elements in Chinese is reversed from that used in English-speaking countries.

Glenbrook has developed a special approach that helps to alleviate the problem. This approach is a combination of record analysis, multiple translation techniques, and fact verification based on postal conventions and company information on the Internet in the target country.

Collecting Business Data Worldwide

Businesses are always interested in the most accurate and up-to-date information about their current and potential customers. Internet constitutes one of the most important sources of this information. Companies’ web sites are an excellent source of this information. But only 2% of the information about businesses is on their own web site – 98% of available information is found elsewhere on the Internet (in press releases, articles, blogs, etc.). Furthermore, only a small fraction of the web pages of interest is on Surface Web (the portion of the Web that is accessible through search engines); the rest is located in the Deep Web. And, to make it even more difficult, the sought information is rarely presented on the web pages in a structured format (in most cases, it appears as free-form text). Glenbrook's Platform is best suited to collect factual information about businesses and has the ability to navigate the Deep Web, analyze web pages, extract independent contexts from them, and convert semi-structured and unstructured information into structured facts that can be stored in a database.


With the high popularity of mobile geo-based applications, more and more companies are looking to provide their users with detailed and comprehensive information about local businesses, attractions, etc. Internet is a rich source of information on these Points of Interest (POI). Glenbrook collects and refreshes POI information using its ability to aggregate the information collected from across the Internet.

If you have any questions about Glenbrook Data Services please email us at