ARTS 簽到第 7 天

A：290. Word Pattern #

Given a pattern and a string s, determine if s follows the same pattern.
Here, "follow" means a full match, such that there is a bijection between a letter in pattern and a non-empty word in s.
Example 1:
Input: pattern = "abba", s = "dog cat cat dog"
Output: true
Example 2:
Input: pattern = "abba", s = "dog cat cat fish"
Output: false
Example 3:
Input: pattern = "aaaa", s = "dog cat cat dog"
Output: false

function wordPattern(pattern: string, s: string): boolean {
  const arr = s.split(' ')
  if (pattern.length !== arr.length) {
    return false
  }
  const map = new Map()
  const patternAppearArr = Array.from(new Set(pattern.split('')))
  let resultPattern = ''
  let index = -1
  for (let i = 0; i < arr.length; i += 1) {
    const char = arr[i]
    if (map.has(char)) {
      resultPattern += map.get(char)
    } else {
      index += 1
      map.set(char, patternAppearArr[index])
      resultPattern += map.get(char)
    }
  }
  return resultPattern === pattern
}

Submission Result:

41/41 cases passed (68 ms)
Your runtime beats 44.12 % of typescript submissions
Your memory usage beats 73.53 % of typescript submissions (42.1 MB)

Record the letters that appear in the pattern and deduplicate them. Then match each word in the string to the corresponding letter in the pattern, and store the matching result in a map. If a word has appeared before, use the recorded value; if not, push the representation one step back and record it in the map. Finally, obtain the pattern that represents the current string, and check if it matches the original pattern.

R：How to Match LLM Patterns to Problems #

The author previously wrote an article discussing the patterns for building LLM systems and applications, and then received some questions about how to match specific problems with patterns. This article further explores the problems that people may encounter when applying these patterns.

External vs Internal Models, Strong vs Weak Data Dependencies#

External models are models that we cannot fully control. We cannot fine-tune them, and they are limited by their calling speed and context length. We may also be concerned about sending confidential or proprietary data to them. Nevertheless, external models currently perform at a leading level.

Internal models are models that we develop and deploy ourselves. They do not have the limitations of external models and are generally trained using open-source models. However, the performance of these models often lags behind the commercial models of third-party companies by several months or even years.

To determine the patterns to apply LLM, we need to understand the role of data in the application scenario: Is data a primary component or a byproduct? Or is data an irrelevant factor?

For example, model evaluation and fine-tuning are strongly dependent on data. Caching, "defensive measures to ensure user experience," and "guardrail patterns" to ensure output quality are more related to infrastructure.

RAG (Retrieval Augmented Generation) and user feedback on mobile phones are in the middle. RAG requires filling in prompts for in-context learning but also requires retrieval index services. Fine-tuning with user feedback data requires designing user interfaces and performing data analysis and dependency data pipelines.

Matching Patterns to Problems#

Let's take a look at which patterns to apply to specific problems:

Lack of performance measurement for specific tasks: Whether it is an external or internal model, when we change prompts, fine-tune models, or improve the RAG process, we need a way to measure how much improvement has been achieved and to perform regression testing. In addition, we need to measure whether users like or dislike new model features and the impact of our adjustments on users. For these problems, we can use the "evaluation" and "collect user feedback" tasks.
Poor performance of external models: This may be due to outdated model training data, lack of proprietary data for the model, or insufficient context during generation. For these problems, RAG and evaluation can be used. Evaluation is used to measure the performance improvement achieved after retrieval.
Poor performance of internal models: The model may generate non-factual responses, off-topic responses, or responses that are not fluent enough in tasks such as extraction and summarization. In this case, fine-tuning and fine-tuning with user feedback can be considered.
Limited by external models: This may be due to technical limitations such as the number of API calls or token length, or it may be due to the inability to send machine data or the cost of API calls. In this case, you need to contact the LLM provider to try local deployment, or fine-tune and fine-tune with user feedback and evaluation on your own.
Delay exceeds user experience requirements: Some use cases may require the model to return within a few hundred milliseconds, including the time to control data quality. Although streaming output can optimize user experience, it may not be suitable for all scenarios, such as non-chatbot scenarios. In this case, caching, guardrail patterns, etc. can be used.
Ensuring customer experience: LLM may not necessarily produce accurate outputs that people want. In this case, it is necessary to implement user experience safeguards to handle errors, such as setting correct expectations and effective ignoring and correction. In addition, we need to recognize when errors occur and mitigate their impact, entering the fault-tolerant process. This requires defensive user experience and fine-tuning with user feedback to understand and fine-tune the problems.
Lack of visibility into the impact on users: Sometimes we deploy LLM applications, but the actual effect may deteriorate. We need to know whether it has improved or worsened, so we need to monitor and collect user feedback.

T：pyannote-audio Speech Annotation#

A speech annotation model that can distinguish the dialogues of different speakers in an audio and annotate the time intervals of each speaker's speech. Based on this, audio segmentation can be performed, and then input to the whisper model for speech-to-text conversion, achieving the conversion of a multi-speaker dialogue audio into text.

S：A Technique for Speed Reading#

When reading, try to avoid backtracking, which forces our thinking to keep up with the article and switch along with it. This may be uncomfortable at first, but for content that does not require a high level of understanding, it is possible to understand the content that has not been fully grasped by relying on the context.

Reference:

ARTS Challenge

A：290. Word Pattern#

R：How to Match LLM Patterns to Problems#