Distributed scalable monitoring of foreign language Web sites
- Automatic multi-lingual data collection and mirroring of user-identified Web sites
- Automatic extraction and translation of text
- Search across multi-lingual sites
- Visualization tools and automatic topic detection for enhanced analysis
- Collected sites archived for later use
- Browser-based user interface with personalized user dashboards
- Watchlists for continuous monitoring of specific content
- Designed for inter-agency data sharing
The BBN Web Monitoring System Version 2 (WMS2) is an end-to-end capability for collecting, organizing, and translating open source content from the World Wide Web. This groundbreaking system integrates and manages the media analysis process from beginning to end— from data collection and processing, to automated triage and retrieval, to machine-assisted translation and support for human translation, to export and dissemination.
The system’s automatic analysis of Web site content supports effective content-based retrieval and triage for human analysts who must
deal with overwhelming volumes of continuously accumulating media.
Multi-lingual data collection and extraction
WMS2 continuously captures content from user-selected Web sites into an archive that can be shared by multiple, distributed user groups. The captured site is archived and versioned for later use. Internal links are preserved in the harvested Web pages so users can navigate within the archived sites.
Text analysis and automatic translation
WMS2 identifies and extracts the text from the pages, leaving behind potentially harmful active content, and automatically tags named entities (people, places, and organizations). Captured pages are then automatically translated into English using machine translation software from SDL Language Weaver. English speakers can use the machine translation to get the gist of an article; linguists and analysts can use the on-board Clip Editor to correct the machine translation and add analytical commentary.
WMS2 can support any language currently available from SDL Language Weaver, including Arabic, Farsi/Persian, Mandarin Chinese, Russian, Korean, Urdu, Hindi, and many others.
Support for analysis
The BBN Web Monitoring System Version 2 includes tools and technologies that enable analysts to quickly discover relevant information and drill down into the data that’s most important to you.
- BBN’s unsupervised topic analysis technology labels the incoming flow of Web pages with automatically discovered topics.
- A variety of visualization presentations, including charts, maps, and pivot view, allows for convenient browsing, filtering, sorting, and grouping of search results.
- Users can bookmark items of interest and to allow for rapid retrieval of any item in the archive.
- Watchlists enable you to keep your searches up-to-date with automatic retrieval of the most recent relevant results.
- The analyst’s personal dashboard is customizable to allow tracking of current projects using different visualizations.
- The Clip Editor allows online correction of the machine translation and addition of analytical notes or commentary.
- Data in the WMS archive is stored as XML and can be exported to other applications for additional analysis.
Support for geographic search and personalized labeling
With WMS2, custom labels for sources or regions help you group content to support personalized searches of the most relevant content.
Development of the BBN Web Monitoring System Version 2 has been supported in part by the CTTSO Technical Support Working Group (TSWG), the Defense Advanced Research Projects Agency (DARPA), and other U.S. government agencies.