ARTS Check-in Day 5

A：268. Missing Number #

Given an array nums containing n numbers in the range [0, n], find the number that is missing from the range [0, n].
Example 1:
Input: nums = [3,0,1]
Output: 2
Explanation: n = 3, since there are 3 numbers, all numbers are in the range [0,3]. 2 is the missing number because it does not appear in nums.
Example 2:
Input: nums = [0,1]
Output: 2
Explanation: n = 2, since there are 2 numbers, all numbers are in the range [0,2]. 2 is the missing number because it does not appear in nums.
Example 3:
Input: nums = [9,6,4,2,3,5,7,0,1]
Output: 8
Explanation: n = 9, since there are 9 numbers, all numbers are in the range [0,9]. 8 is the missing number because it does not appear in nums.
Example 4:
Input: nums = [0]
Output: 1
Explanation: n = 1, since there is 1 number, all numbers are in the range [0,1]. 1 is the missing number because it does not appear in nums.

There is not much time left for the check-in, and the writing is a bit rushed.

function missingNumber(nums: number[]): number {
  const maxLength = nums.length
  const allNums: number[] = new Array(maxLength).fill(0).map((_, index) => index)
  allNums.push(maxLength)
  const missNums = allNums.filter(num => !nums.includes(num))
  if (missNums.length === 1) {
    return missNums[0]
  }
  throw new Error('no result')
};

The submission result is:

122/122 cases passed (700 ms)
Your runtime beats 5.49 % of typescript submissions
Your memory usage beats 21.98 % of typescript submissions (45.8 MB)

The official solution is more efficient and uses a hash set, as both adding and looking up elements in a hash set have a time complexity of O(1), making it more efficient:

function missingNumber(nums: number[]): number {
  const set = new Set()
  const n: number = nums.length;
  for (let i = 0; i < n; i++) {
    set.add(nums[i])
  }
  let missing: number = -1
  for (let i = 0; i <= n; i++) {
    if (!set.has(i)) {
      missing = i
      break
    }
  }
  return missing
}

First, each element is recorded in the Set, and then compared with the existing values in the Set in numerical order to directly find the missing one. The benefits of using a hash set are evident.

R：Exploring Generative AI #

Due to the recent popularity of large language models (LLMs), the author became curious about them and wanted to know what impact they would have on work and how to use LLMs in software delivery practices. This is a record of her explorations and findings.

Toolchain#

For a technology that is still evolving, it is necessary to establish a mental model to understand how it works, which helps in processing the overwhelming amount of information. What types of problems does it solve? What parts need to be pieced together to solve the problem? How do they combine?

How tools are categorized:
In the mental model, tools are categorized as follows to support programming work:

Task type: Quickly find information in context; generate code; reasoning about code (explaining or identifying issues); code transformation (e.g., obtaining documentation or diagrams)
Interaction mode: Chat window; inline assistant (like GitHub Copilot); command line
Prompt composition: User input from scratch; combining user input and context
Model attributes: Is the model used for code generation tasks? What languages can it generate code for? When was it trained, and how recent is the information? The model's parameter count; the model's context length limit; how the model's content filtering works, and who is serving it
Hosting method: Is it a product hosted by a commercial company? Which open-source tools can connect to large language model services? Self-built tools connected to large language model services; self-built tools for fine-tuning, self-built large language model APIs

The author analyzed GitHub Copilot, GitHub Copilot Chat, ChatGPT, etc., along these dimensions:

Median Function - The Story of Three Functions#

This is a story about generating a median function, which illustrates the usefulness and limitations of LLM assistance.

Typically, the author searches for "median JS function" to implement it, but this time she tried using GitHub Copilot for assistance. Copilot first generated the correct function signature and then provided three different implementations.

The first implementation is: sort the array first, then take the middle number; if it's an even sequence, take the average of the two middle numbers. The goal is achieved, but the issue is that the sort method is not immutable and will change the order of the original array, which may introduce hard-to-trace bugs.

The second implementation is: slice the original array and then use the first implementation, which does not change the original array, so there are no issues.

The third implementation is: slice the original array and directly take Math.floor(sorted.length / 2), which will have issues when the array length is even, as it does not take the average of the two middle numbers.

From the performance of these three functions, it is important to understand what our functions are doing and to write sufficient test cases to test the generated code.

Is using Copilot to generate code different from searching and then copying and pasting? When we search and copy-paste, we know the source of the code, which helps us judge the reliability of the pasted code through votes and comments on platforms like Stack Overflow, but with Copilot-generated code, we lack a basis for judgment.

Should we generate test cases, code, or both? The author used Copilot to generate test cases for this median function, and the results were indeed good. For a task of this complexity, she is willing to use Copilot to generate both cases and code. However, for more complex functions, she prefers to write test cases herself to ensure quality and better organize the structure of the test cases, avoiding omissions even when providing part of the content to Copilot for generation.

Can Copilot help me fix errors in the generated code? After asking Copilot for refactoring, it indeed provided some reasonable suggestions, including pointing out errors directly with ChatGPT, but all of this is predicated on my awareness that the code still needs improvement and correction.

Conclusion:

You must clearly know what you are doing to judge how to handle the generated code. Just like in the above example, you must understand what the median is and what boundary cases to consider to obtain reasonable test cases and code.
Copilot itself can improve problematic code again, which raises the question of whether we need to engage in a dialogue with AI tools while using them.
Even if there are doubts about the quality of the generated test cases and code, we can choose not to adopt its code, using only the generated code to help cover omitted scenarios in test cases or to help infer our own written code.

When is an inline code assistant more useful?#

For inline code assistants, opinions vary on their usefulness, depending on the specific context and expectations.

What does "useful" specifically mean? Here, useful means that after using it, I can solve problems faster with comparable quality, which applies not only to writing code but also to subsequent manual reviews and rework, as well as quality issues.

Factors Affecting the Usefulness of Generated Content#

The following factors are relatively objective, but the author elaborated on each point with subjective insights in the original text; please read the original for more details.

More popular technology stacks

The more popular the technology stack used, the richer the dataset about that technology stack in the model, which means that data for Java and JS is more abundant than for Lua. However, some colleagues have also achieved good testing results with languages like Rust, where data is less abundant.

Simple and common problems

Simple and common problems generally include the following examples: the problem itself is very simple; a common solution pattern in context; templated code; repetitive patterns.

This is helpful for scenarios where repetitive code is often handwritten, but for those familiar with advanced IDE features, shortcuts, and multi-cursor operations, the reduction of repetitive work by Copilot may not be as significant and may even reduce the motivation for refactoring.

Smaller problems

Smaller problems are easier to review the generated code. As the scale increases, both the problems and the code become harder to understand, often requiring multiple steps, which increases the risk of insufficient test coverage and may introduce unnecessary content.

More experienced developers

Experienced developers can better judge the quality of generated code and use it efficiently.

Greater error tolerance

Judging the quality and correctness of generated code is very important, but there is also the issue of our tolerance for quality and correctness in our scenarios. In high error tolerance scenarios, we can adopt its suggestions with a higher acceptance rate, but in low tolerance scenarios for issues like security policies, such as Content-Security-Policy HTTP headers, we still find it difficult to easily adopt Copilot's suggestions.

Conclusion:
Using inline code assistants has suitable applications, but many factors affect their usefulness, and the skills for using them cannot be fully explained through a training course or blog post. Only through extensive use, even exploring beyond useful boundaries, can we better utilize this tool.

When can an inline code assistant become a hindrance?#

Having discussed when Copilot is useful, it is natural to mention situations where it is not. For example:

Amplifying bad or outdated practices

Since Copilot references any content in the associated context, such as open files in the same language, it may also bring over bad code examples.

For instance, if we want to refactor a codebase, but Copilot continues to adopt old patterns because the old pattern code is still widely present in the codebase, the author refers to this situation as "poisoned context," which currently lacks a good solution.

Conclusion:
AI's hope to improve the prompt context through the code in the codebase has both benefits and drawbacks, which is one of the reasons developers may stop trusting Copilot.

Review fatigue from generated code

Using Copilot means repeatedly reviewing small chunks of generated code. Typically, the flow of programming involves continuously writing implementations of solutions in our minds. With Copilot, we need to continuously read and review the generated code, which is a different cognitive approach and lacks the enjoyment of continuous code production, leading to review fatigue and a feeling of disrupted flow. If we do not address this review fatigue, we may start to overlook the quality of the generated code.

Additionally, there may be some other impacts:

Automation bias: Once we have a good experience with generative AI, we may overtrust it.
Sunk cost: Once we spend time using Copilot to generate some locally problematic code, we may be more inclined to spend 20 minutes getting that code to work with Copilot rather than rewriting it ourselves in 5 minutes.
Anchoring effect: The suggestions given by Copilot are likely to anchor our thinking, influencing our subsequent thoughts. Therefore, it is also important to break free from this cognitive influence and not be anchored by it.

Conclusion:
It is crucial not to let Copilot limit our thinking; we need to step outside its constraints; otherwise, we may end up like someone navigating their car into a lake.

Code assistants cannot replace pair programming#

Although inline code assistants and chatbots can interact with developers to a large extent like humans, the author does not believe that this practice can replace pair programming.

The belief that programming assistants and robots can replace pair programming may stem from some conceptual misunderstandings. Here are the advantages of pair programming:

Programming assistants can have a significant impact in the first area of the image: "1+1>2". They can help us overcome difficulties, start faster, and achieve workable results, allowing us to focus more on designing overall solutions while also sharing more knowledge.

However, pair programming is not just about sharing explicit knowledge in code; it also involves implicit knowledge such as the evolution history of the codebase, which cannot be obtained from large language models. Additionally, pair programming can improve team workflows, avoid wasted time, and make continuous integration easier. It also helps us develop communication, empathy, and feedback skills. It provides valuable opportunities for teams to connect in remote work.

Conclusion:
Programming assistants can only cover a small portion of the goals and benefits of pair programming because pair programming is not just about helping individuals; it is also about helping teams improve holistically. Pair programming can enhance the communication and collaboration levels of the entire team, improve workflows, and strengthen code ownership awareness. Furthermore, it does not encounter the aforementioned drawbacks experienced when not using programming assistants.

Using GitHub Copilot with TDD#

After using AI programming assistants, do we no longer need tests? Is TDD outdated? To answer this question, we will test the two benefits that TDD brings to software development: 1. Providing good feedback; 2. Using a divide-and-conquer approach to solve problems.

Providing good feedback

Good feedback needs to be fast and accurate. Neither manual testing, documentation, nor code reviews can provide feedback as quickly as unit tests. Therefore, whether it is manually written code or AI-generated code, quick feedback is needed to verify correctness and quality.

Divide and conquer

Divide and conquer is a quick approach to solving large problems. This also enables the implementation of continuous integration, trunk-based development, and continuous delivery.

Even in the case of AI-generated code, an iterative development model is still adopted. More importantly, there is a notion that LLMs can enhance the quality of model outputs with similar CoT prompts, which aligns well with the principles advocated in TDD.

Tips for Using GitHub Copilot with TDD#

Start

Starting from an empty test file does not mean starting from an empty context; there are usually related user story notes and discussions with pair programming partners.

These are all things that Copilot "cannot see"; it can only handle errors like spelling and syntax. Therefore, we need to provide it with this context:

Provide mocks
Write down acceptance criteria
Hypothetical guidance: for example, no need for a GUI, use object-oriented programming or functional programming

Additionally, Copilot uses open files as context, so we need to keep both the test file and the implementation file open.

Start by writing a descriptive test case name; the more descriptive the name, the better Copilot's generated test code will perform.

The Given-When-Then structure helps us in three ways: first, it reminds us to provide business context; second, it gives Copilot the opportunity to generate expressive case names; finally, it allows us to see Copilot's understanding of the problem.

For example, when naming the test case, if Copilot suggests "Assuming the user... clicks the purchase button," it indicates that Copilot does not fully understand our intent, which can be clarified in the context description at the top of the file, such as "Assuming no GUI is needed," "This is an API test suite for a Python Flask application."

Green

Now we can start implementing the code. An existing, expressive, and readable test case can maximize Copilot's potential. At this point, Copilot has more input to work with and does not need to "learn to walk" like a baby.

Filling in test cases: At this time, Copilot is likely to generate larger chunks of code rather than "taking small steps," and this code may not have complete test cases. In this case, we can go back and supplement the test cases, although this is not the standard TDD process, it currently does not seem to present significant issues.

Delete and regenerate: For code that needs to be re-implemented, the best way to get Copilot to work effectively is to delete the implementation and let it rewrite it, as there are test cases in place, making it relatively safe even if it is rewritten. If this fails, deleting the content and writing comments step by step may help. If it still does not work, we may need to turn off Copilot and write it ourselves.

Refactor

In TDD, refactoring means making incremental changes to improve the maintainability and scalability of the code while keeping all behaviors consistent.

For this, Copilot may have some limitations. Consider the following two scenarios:

"I know what kind of refactoring I want to do": Using IDE shortcuts and features, such as multi-cursor, function extraction, renaming, etc., may be faster than refactoring with Copilot.
"I don't know where to start refactoring": For small, localized refactoring, Copilot can provide suggestions, but it still struggles with large-scale refactoring suggestions.

In some scenarios where we know what we want to do but just can't recall the syntax and API, Copilot can help us well, completing tasks that would otherwise require searching.

Conclusion:
As the saying goes, "garbage in, garbage out," this applies equally to data engineers and generative AI LLMs. In the author's practice, TDD ensures high quality in the codebase, and this high-quality input allows Copilot to perform better, so it is recommended to use TDD when using Copilot.

T：Umami Website Visit Statistics Analysis#

A framework for website visit statistics analysis that can be self-deployed. The xLog blog currently supports configuring Umami, and this site uses a self-deployed service.

S：Reading “Li Xiaolai: My Reading Experience” Notes#

Reading has two fundamental values:
• Need to recognize reality and think about the future;
• Prefer knowledge that is reproductive.

Reference:

ARTS Check-in Activity

A：268. Missing Number#

R：Exploring Generative AI#