ARTS Check-in Day 6

A: 278. First Bad Version #

You are a product manager and currently leading a team to develop a new product. Unfortunately, the latest version of your product did not pass the quality check. Since each version is developed based on the previous version, all the versions after the bad version are also bad.
Suppose you have n versions [1, 2, ..., n] and you want to find out the first bad one, which causes all the following versions to be bad.
You can call a function bool isBadVersion(version) to determine whether a version is bad or not. Implement a function to find the first bad version. You should minimize the number of calls to the API.
Example 1:
Input: n = 5, bad = 4
Output: 4
Explanation:
call isBadVersion(3) -> false
call isBadVersion(5) -> true
call isBadVersion(4) -> true
So, 4 is the first bad version.
Example 2:
Input: n = 1, bad = 1
Output: 1

var solution = function (isBadVersion: any) {
  return function (n: number): number {
    for (let i = 1; i <= n; i += 1) {
      if (isBadVersion(i)) {
        return i;
      }
    }
    return n;
  };
};

The submission result is:

Time Limit Exceeded
22/24 cases passed (N/A)

It should pass logically, I will try to change it to binary search later after finishing the check-in.

The improved solution using binary search from the reference is:

var solution = function (isBadVersion: any) {
  return function (n: number): number {
    let left = 0
    let right = n
    while (left <= right) {
      let middle = Math.floor((left + right) / 2)
      if (isBadVersion(middle) && !isBadVersion(middle - 1)) {
        return middle
      } else {
        isBadVersion(middle) ? (right = middle - 1) : (left = middle + 1)
      }
    }
    return -1
  }
}

Binary search is often used to quickly narrow down the search range in an ordered array.

R: Open challenges in LLM research #

The author identifies 10 challenges that LLM currently needs to address, with illusion and context learning being the most discussed topics. The author's own focus is on multimodality, new architectures, and reducing GPU requirements while increasing selectivity.

Reducing and measuring model illusions#

While illusions may be a feature for some creative applications, they are often considered bugs for most scenarios. Therefore, reducing and measuring model illusions is a popular research direction. There are temporary solutions to reduce illusions, such as adding more context in prompts, CoT, self-consistency, etc., which are further referenced and explained in the article.

Improving context length and context building#

Most problems require context to provide good answers, as models need to learn relevant information from the context in prompts, a process called "context learning."

Context length is particularly important for Retrieval Augmented Generation (RAG). RAG requires two stages to work: 1. Chunking: collecting all the necessary documents and storing them in a vector database; 2. Querying: when a query is input, it is also embedded and compared with the data in the vector database for similarity retrieval.

The longer the supported context length in LLM, the more relevant chunked texts can be included in the context, resulting in better generation performance.

However, it is not necessarily better to include more content in the context. The model's capacity and processing efficiency should also be considered. Therefore, another parallel path is to optimize the prompt itself, making it easier for LLM to process and improve efficiency, known as "Prompt Engineering" or prompt construction.

Collaboration with other modalities#

The consideration of multimodality is driven by the fact that many scenarios involve multimodal data. Furthermore, the leading LLM models have already made extensive use of text-related data, and further improvements require leveraging the value of multimodal data.

The author is particularly excited about the potential of multimodal models to improve access to the internet and the real world for visually impaired individuals.

Making LLM faster and cheaper#

When GPT-3.5 was first released, there were concerns about its latency and cost. However, in just six months, the community has been able to achieve the same performance with only 2% of the memory used by GPT-3.5. The author mentioned several important techniques for model optimization and compression, such as model quantization, knowledge distillation, low-rank factorization, and model pruning, which were already discussed in the author's book years ago and are still relevant and popular today.

Designing new model architectures#

The Transformer architecture has been around since 2017, and it is uncertain how long it will continue to lead. Surpassing the continuously optimized Transformer architecture for six years is not an easy task and requires considering current concerns such as scalability and hardware resources. Transformers were initially designed to run quickly on TPUs at Google and were later optimized for GPUs.

Developing GPU alternatives#

Since the introduction of the AlexNet deep learning neural network in 2012, GPUs have become the dominant hardware in this field.

The scarcity of GPU resources is widely felt, and therefore, in the past decade, several companies have attempted to create new hardware for AI, such as Google's TPU, Graphcore's IPU, as well as the anticipation of quantum computing and exploration of photonic chips.

Making agents truly usable#

Agent, which refers to an LLM that can take actions such as browsing the web or sending emails, is a relatively new direction compared to others.

Due to its novelty, there is great enthusiasm for this direction, as seen in popular GitHub repositories such as Auto-GPT and GPT-Engineer.

However, despite the enthusiasm, there is still skepticism about whether LLM is reliable and trustworthy enough to handle actions.

A recent case study involved using LLM for sociological research. Stanford University conducted an experiment where an agent was defined to organize a Valentine's Day party, and the agent autonomously performed actions such as sending party invitations and making new friends over the next two days.

A notable company in this direction is ADept, which demonstrated how an agent can browse the web and add a new account to Salesforce.

Learning from human preferences#

RLHF (Reinforcement Learning from Human Feedback) is a good technique for aligning models with human preferences, but it has some limitations. The author believes that better methods can be found to align models with human preferences.

Some of the challenges with RLHF include how to quantify human preferences, what human preferences really are, and whether cultural, regional, and political factors are considered. It is difficult to obtain training data that represents the preferences of all potential users, and community-driven data can still be biased.

Improving the efficiency of LLM dialogue interfaces#

Since the introduction of ChatGPT, there have been discussions about what a suitable dialogue interface for a wide range of tasks should look like.

However, this is not a new discussion, as chat interfaces have been used as the entry point for super apps for over a decade, especially in many Asian countries.

In 2016, there were discussions suggesting that applications were dead, and chatbots were the future.

The author likes chatbot interfaces for three reasons: they are easy to learn even for people who have never used computers, they are interactive and can be operated through voice input if hands are not convenient, and they are robust enough to handle any request.

However, the author also believes that there are areas for improvement in chatbot interfaces, such as the limitation of inputting only one message per turn, the need for multimodal input, integrating generative AI into workflows, and the ability to edit and delete information in chat conversations to improve the overall dialogue.

Building LLM for non-English languages#

LLM models that prioritize English do not work well in terms of performance, latency, and speed for other languages.

Efforts have been made in other languages, such as Symato's efforts in Vietnamese. However, some argue that this direction is not meaningful for the following reasons: it is more of a resource investment issue rather than a research problem, and we already know how to do it, but there is a lack of resources invested in other languages, even if data is available. Some pessimistic views even suggest that only English and Mandarin will remain on the internet, and other languages will disappear.

The impact of LLM on language learning is still unclear. It is uncertain whether LLM will help people learn new languages faster or eliminate the need for people to learn new English.

A: 278. First Bad Version#

R: Open challenges in LLM research#