ARTS Check-in Day 4

I clocked in over twenty minutes late today. I was working overtime and chatting with my younger brother for a while, then I selected an article about using LLM for TDD development, but it was too long. To clock in on the same day, I switched to another article, but I still ended up being late, and the article I chose wasn't of high quality, which was a loss. Anyway, I'll use this article to complete my clock-in, and the article on using LLM for TDD will be the content for the next clock-in.

A：263. Ugly Number #

An ugly number is a positive integer whose prime factors only include 2, 3, and 5.
Given an integer n, please determine if n is an ugly number. If so, return true; otherwise, return false.
Example 1:
Input: n = 6
Output: true
Explanation: 6 = 2 × 3
Example 2:
Input: n = 1
Output: true
Explanation: 1 has no prime factors, so its set of prime factors is the empty set {2, 3, 5}. It is conventionally considered the first ugly number.
Example 3:
Input: n = 14
Output: false
Explanation: 14 is not an ugly number because it includes another prime factor 7.

This is also a relatively simple problem. I didn't think of any theoretical methods at first and initially overlooked that 0 is not an ugly number. After making a change, I submitted it:

function isUgly(n: number): boolean {
  if (n === 0) {
    return false
  }
  if (n === 1) {
    return true
  }
  if (n % 2 === 0) {
    return isUgly(n / 2)
  }
  if (n % 3 === 0) {
    return isUgly(n / 3)
  }
  if (n % 5 === 0) {
    return isUgly(n / 5)
  }
  return false
}

The submission result was:

1013/1013 cases passed (56 ms)
Your runtime beats 100 % of typescript submissions
Your memory usage beats 38.09 % of typescript submissions (44 MB)

It's worth noting that I implemented this using recursion, but it could also be done with a while (true) loop.

R：LangChain + Streamlit + Llama: Bringing Conversational AI to Your Local Machine #

Since everyone should be quite familiar with large language models and LangChain, I'll just briefly describe it here.

Large language models have garnered significant attention, and many developers are using them to create chatbots, personal assistants, or content generation. The possibilities of large language models have sparked great enthusiasm in the developer, AI, and NLP communities.

Domain-specific data can be injected into large language models to efficiently solve query problems, particularly useful for internal company document knowledge bases. The architecture used to achieve this is called "retrieval-augmented generation" or "generative question answering."

What is LangChain? LangChain is a development framework that conveniently links the components of large language AI applications together, allowing developers to quickly implement applications like chatbots.

This article mainly discusses how to create a document assistant using LangChain and the LLaMA 7B model (I personally feel it's a bit outdated, as LLaMA2 is already available).

Article structure:

Create a virtual environment and file structure
Pull the large language model locally
Integrate the large language model into LangChain and customize the Prompt template
Document retrieval and answer generation
Create an application using Streamlit

1. Create a virtual environment and file structure#

Create a basic file structure and Python virtual environment, mainly for model files, Notebook files, and the app.py entry file. You can clone the author's repository: DocQA.

2. Pull the large language model locally#

LLaMA is a large language model released by Meta, and LLaMA2 can be used commercially for free. This article uses LLaMA1. Go to HuggingFace to find the LLaMA model and download the bin file into the models directory.

GGML is an open-source machine learning tensor library written in C++, which can run LLMs on consumer-grade hardware through quantization.

So what is quantization? The weights of LLMs are floating-point numbers, which take up more space and computational power compared to integer values. Quantization reduces the precision of the weights to decrease resource usage. GGML supports 4-bit, 5-bit, and 8-bit quantization.

You need to weigh memory, disk space, and model performance when choosing model parameter sizes and quantization methods. The larger the size and the higher the quantization precision, the better the performance, but it also consumes more resources.

If GGML is a C++ library, how can it be used in Python? This is where the llma-cpp-python project comes in. It's a Python binding for llama.cpp, allowing us to run the LLaMA model using Python.

After all this introduction, running it is actually very simple, just a few lines of Python code:

3. Integrate the large language model into LangChain and customize the Prompt template#

For LLMs, simplifying their operation can be understood as inputting text and outputting text. Therefore, most of the work in LangChain is also text-centered.

Subtle differences in prompts can lead to significant variations in LLM performance, which is why the concept of Prompt Engineering has emerged to consider how to generate higher-quality prompts. To facilitate seamless interaction with LLMs, LangChain provides functionality for developing prompt templates, which typically consist of two parts: text templates and dynamic parameters.

For simple applications, just passing the prompt and input parameters to the LLM to generate results is sufficient, but complex applications generally require connecting the LLM with other components. LangChain provides a development method for connecting components in series.

4. Document retrieval and answer generation#

In many LLM applications, the data users need is not in the model's training dataset and needs to be included in the prompt. LangChain provides the necessary components to load, transform, store, and query this data:

These five processes are: document loading - document transformation - embedding - vector storage - vector retrieval. Below is the complete process for document retrieval:

This process is quite lengthy, so I won't elaborate here. It's worth noting that since this is a locally deployed solution, the embedding model is not using a remote service but instead uses the LlamaCppEmbeddings component from LangChain to perform embedding with the LLaMA model.

5. Create an application using Streamlit#

The author did not elaborate on Streamlit because it is a relatively optional step for the main workflow. However, when implementing file uploads using Streamlit, the author emphasized that to prevent running out of memory, he saved the uploaded file in a temporary directory as raw.txt. Currently, only txt file types are supported, but it can be modified to support PDF and CSV files. Finally, by calling the Streamlit library, this LangChain-based LLM application was turned into a web application:

# Bring in deps
import streamlit as st 
from langchain.llms import LlamaCpp
from langchain.embeddings import LlamaCppEmbeddings
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from langchain.document_loaders import TextLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma


# Customize the layout
st.set_page_config(page_title="DOCAI", page_icon="🤖", layout="wide", )     
st.markdown(f"""
            <style>
            .stApp {{background-image: url("https://images.unsplash.com/photo-1509537257950-20f875b03669?ixlib=rb-4.0.3&ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D&auto=format&fit=crop&w=1469&q=80"); 
                     background-attachment: fixed;
                     background-size: cover}}
         </style>
         """, unsafe_allow_html=True)

# function for writing uploaded file in temp
def write_text_file(content, file_path):
    try:
        with open(file_path, 'w') as file:
            file.write(content)
        return True
    except Exception as e:
        print(f"Error occurred while writing the file: {e}")
        return False

# set prompt template
prompt_template = """Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.
{context}
Question: {question}
Answer:"""
prompt = PromptTemplate(template=prompt_template, input_variables=["context", "question"])

# initialize hte LLM & Embeddings
llm = LlamaCpp(model_path="./models/llama-7b.ggmlv3.q4_0.bin")
embeddings = LlamaCppEmbeddings(model_path="models/llama-7b.ggmlv3.q4_0.bin")
llm_chain = LLMChain(llm=llm, prompt=prompt)

st.title("📄 Document Conversation 🤖")
uploaded_file = st.file_uploader("Upload an article", type="txt")

if uploaded_file is not None:
    content = uploaded_file.read().decode('utf-8')
    # st.write(content)
    file_path = "temp/file.txt"
    write_text_file(content, file_path)   
    
    loader = TextLoader(file_path)
    docs = loader.load()    
    text_splitter = CharacterTextSplitter(chunk_size=100, chunk_overlap=0)
    texts = text_splitter.split_documents(docs)
    db = Chroma.from_documents(texts, embeddings)    
    st.success("File Loaded Successfully!!")
    
    # Query through LLM    
    question = st.text_input("Ask something from the file", placeholder="Find something similar to: ....this.... in the text?", disabled=not uploaded_file,)    
    if question:
        similar_doc = db.similarity_search(question, k=1)
        context = similar_doc[0].page_content
        query_llm = LLMChain(llm=llm, prompt=prompt)
        response = query_llm.run({"context": context, "question": question})        
        st.write(response)

I personally pay more attention to the application of Streamlit. I feel that after Gradio for prototyping, there is a good chance for Streamlit in more complex application development. Unfortunately, this article did not delve into it, and the content I most wanted to see was briefly glossed over.

T：CoDeF #

A video-to-video LLM that outputs stably and has good quality.

S：SQ3R Reading Method#

SQ3R stands for five words: Survey, Question, Read, Recite, Review. Before learning, first skim through the content, then based on this overview, pose your own questions about what it is about and what problems it solves. Then, delve deeper into reading with these questions in mind. Through reading, find the answers.
Finally, close the book and recite what the book is about, what questions you had, and how the book addressed them. Lastly, review to consolidate learning outcomes. After these five steps, the content of the book can be truly absorbed.

Reference:

ARTS Clock-in Activity

A：263. Ugly Number#

R：LangChain + Streamlit + Llama: Bringing Conversational AI to Your Local Machine#