Not All Data Is Created Equal
Transforming Data to Work for You
The explosion of open source data in today's dynamic environment poses critical challenges to U.S. intelligence, defense and homeland security agencies. These challenges are compounded by the growing diversification of data collected and stored in a broad range of complex formats, languages and platforms. We solve these challenges for our clients. Applying over two decades of experience in multilingual data processing, MGPS implements innovative, customized, and mission-focused data collection and data exploitation techniques that enable artificial intelligence (AI) solutions. We optimize data collection and structure so AI technologies can be effective.
Collaboration for Mission Success
Through many client engagements we have evolved our tradecraft for exploitation, source content format/structure conversion, locale transcoding and recoding, and entity extraction/mining of monolingual and multilingual content. We take the time to understand customer data and information needs, ensuring our methods and output are optimized, yielding mission-relevant raw-data discovery and collection as well as the generation of structured data optimized for data exploitation.
MGPS’ partnerships involve continual discussion and collaboration with the client, and ongoing assessment of off-the-shelf, customizable tools, to construct both incremental and revolutionary improvements. We seek the best potential solution, develop a custom version of the solution, and deploy it rapidly and boldly so that apparent failures provide immediate hypotheses for improvement. Our approach to client collaboration is agile, swift and iterative to maximize client success.
Structuring Data for Machine Learning
Using AI alone without advanced data collection and processing can be problematic, as it’s hampered by the variety of data sources, locations, formats, and languages. Our approach maximizes access to data so that it can be leveraged for machine modeling and other purposes. We create optimized gold-standard AI-model training corpora by integrating human talent and customized tools to extract data from sources and then consolidate into unified formats.
Our talent teams include language groups with deep linguistic and cultural expertise across more than 200 languages and dialects who work closely with our technical groups positioned to maintain top-level skills in the relevant technologies and to design and implement ideal solutions for processing specific languages and locales. We employ such Data-for-AI technologies as Neural Machine Translation, Speech-to-Text, Natural Language Understanding and Computer Vision to produce high-quality and accessible information at mission speed.
Neural Machine Translation (NMT)
MGPS has employed, customized and applied successive generations of Machine Translation (MT) over the last two decades.
We employ Neutral Machine Translation (NMT) in a full production life cycle that uses multiple industry-recognized best-of-breed Machine Translation models as a baseline. We customize the output by developing, deploying and maintaining our models based on domain-specific, client-approved corpora to dramatically improve language translation quality, consistency, reliability and speed.
We execute corpus optimization to organically enhance accuracy, fidelity and consistency. Specifically, we employ robust workflows in which we populate the final products with key metadata. Our team employs technologies and procedures that enrich the data included not only in the deliverables, but also in the recyclable translated-content databases (Translation Memories). This feeds our continuous improvement cycle, which drives consistently more accurate results and provides cleaner quality data.
MGPS provides end-to-end StT solutions, from proof-of-concept workflow validation to full implementation. Employing custom workflows with integrated quality control, our comprehensive process yields validated source files for audio-text corpus model training or other uses. We also employ our client datasets for model generation and retraining for continuous improvement in speech recognition, including such features as timing, accents, sentiment, and other metadata.
MGPS is advancing the art in StT across our multilingual environment. We continuously analyze and test multilingual StT technologies for accuracy. These tools are integrated into other data for AI or exploitation workflows to serve key customer data needs.
The objective of NLU is to expand beyond simple understanding of the source text, and includes additional critical information such as geolocation data, original sourcing, sentiment analysis, political affiliation, and validation data. We work in close collaboration with our clients to devise the requirements for data and metadata functionality and processing.
MGPS’ analysts create purpose-built datasets by identifying, capturing, collecting, and digitizing/scanning images, multimedia, and other content. Datasets are enriched with image tagging and other metadata to address specific client needs.