Search
  • Rajesh Kommineni

AI/ML Adoption Series: Foundation Models and Prompt based Model Development


Thinkdeeply provides AI as a Service for Business Users. Our mission is to accelerate the adoption of AI/ML. We make AI Easy by simplifying AI development with No Code AI Platform. We accelerate it through our Industry Solution packs and AI Hub


We intend to discuss best practices for AI/ML adoption, latest advancements in the field. Our goal of intended audience of these posts include

  • Executives who are interested in learning about how AI/ML can help and accelerate its adoption

  • AI/ML Practitioner such as Data scientists, ML Engineers, ML Ops teams


Summary


Foundation Models and Prompt based Model Development have revolutionized the field of Natural Language Processing (NLP). Majority of ML Tasks can achieve reasonable performance with little or no additional tuning of the models. These will help to infuse AI/ML into delivering better customer experience or increased automation or generating insights. Some of the ML tasks and their (common) use cases include:

  • Classification

  • Classifying a given content (e.g. an email, document, or product) into a predefined list of classes. Alternately, generate class definitions using translation methods (i.e., translating text to a target language like taxonomy)

  • Entity Extraction

  • Extracting relevant entities (e.g., People/Places, Product Attributes, Key Facts, Tables) from various data sources such as text blogs or OCR outputs

  • Entity Matching

  • Finding and deleting duplicate records

  • Linking two partial records of the same subject to create a more complete record


  • Conversational AI

  • Chatbots etc.

  • Generative Tasks such as Translation, Generation

  • Translating one language into another

  • Changing writing style of text from one author to another

  • Generating an image from a text description

  • Summarization, Error Detection and Data Imputation and some more.


In most of the above use cases, traditionally, the field has progressed from standard tokenization/TF-IDF representation towards embeddings (either character or token embeddings or both) and lately, transformers. This progression has significantly improved accuracies while using less data. However, these are still task-specific architectures, and still require data engineering on task-specific labeled data. While the algorithm is better, the infrastructure is still a significant engineering effort and leads to siloed and hard-to-maintain systems.


Foundation Models and Prompt based Model development helps to accelerate adoption by using Task-agnostic architecture, Limited to no labeled Data. Before, we delve further, we need to understand the following concepts:


Foundation Models


Extremely large language models such as GPT-3, BERT, GPT-2 etc are not trained to answer specific instructions. These models are typically trained to predict the next word (or token) to follow a given set of text. Yet they have been shown to respond well to all sorts of instructions. The assumption is that these models have seen so much data, that not only do they understand language, but they also have seen many varied examples of responses to instructions. You can imagine GPT-3 has probablyseen conversations that include “Tell me a story: Well, once upon a time…” and “Can you tell me where the Burlington Mall is? It’s on Fifth Street.”


Zero-Shot


Zero-Shot learning allows a model, at test time, to infer responses to inputs without any training. They typically take some context and prompt and provide an inference. Here are a couple of examples:

  • Example 1 (Error Detection)

  • Context

  • Country: US, City: Bangkok

  • Prompt

  • Is there an error in Country?

  • Inference

  • Yes

  • Example 2 (Typical OCR output)

  • Context

  • Type_of_insurance

  • COMMERCIAL GENERAL LIABILITY: true

  • CLAIMS-MADE: false

  • OCCUR: true

  • PRO- JECT: false

  • CLAIMS-MADE: false

  • 2500$ deductible

  • Prompt

  • What is the type of insurance?

  • COMMERCIAL GENERAL LIABILITY

  • Is it per Claim?

  • False

  • What is the deductible?

  • $2500

Few-Shot

If zero-shot doesn’t provide the required accuracy, we should be able to fine-tune Foundation Models with only a few examples. Few-shot learning differs from transfer learning. The goal of transfer learning is to “transfer” features to learn various downstream discriminative tasks. We may still need large amounts of labeled data for achieving desired outcomes in transfer learning. The goal of Few-Shot is to generate models that can generalize with few-samples.


Prompt based Model


Prompt based model development is a strategy where models can accept prompts as inputs and generate desired responses with little (i.e., Few-Shot) or no examples (i.e., Zero-Shot). There are some challenges in prompt based Model development (mostly, around Prompt Engineering). However, Foundation models in combination with Prompt based Models provide a great alternative that are relatively cheap in terms of time (i.e., time-to-market), cost ( requiring few labeled examples and less engineering effort).


Conclusion and Next Steps


In practice, Foundation and Prompt based Models achieve reasonable accuracies. For the Entity Extraction task, we have seen 50+% accuracy with a zero shot model (depending on the domain) and 90+% accuracy for few shot models. In the next article, we will discuss in detail on how to build and fine-tune these models.



If you are interested to know how the above technologies might help your business, please feel free to reach out info@thinkdeeply.ai for a free consultation call. Besides ideation, we have set up end-to-end pipelines for rapid prototyping to quickly validate if these approaches may help you.

13 views0 comments

Recent Posts

See All
Technology.jpg

Technology

Discover reusable AI assets

Visually label various data types

Build, deploy and monitor AI models

Services

Offerings.jpg

Realize the potential of AI and ML

Rapid delivery of prototypes

ML Solutions Deployment at scale

Fully managed service

Analytics, Cloud Native Services