Knowledge Base

About 1087 wordsAbout 4 min

Through code repositories, you can quickly create enterprise or personal knowledge bases. By uploading documents to code repositories and configuring knowledge base-related pipelines, documents can be automatically processed by large language models and uploaded to the knowledge base for use in page Q&A and Open API scenarios. This can be used to quickly build RAG (Retrieval-Augmented Generation) applications.

Knowledge Preparation

Understanding the RAG Application Building Process

The diagram below shows how to build a RAG application in 2 steps using CNB's knowledge base plugin.

Build RAG Application in 2 Steps with CNB Knowledge Base Plugin

1. Use the Knowledge Base Plugin to Import Repository Documents to the Knowledge Base

Use the CNB knowledge base plugin to import repository documents into CNB's knowledge base. The plugin runs in cloud-native builds and automatically handles document chunking, tokenization, vectorization, and other operations. Once the knowledge base is built, it can be used by downstream LLM applications.

2. Call CNB Open API for Retrieval and Develop LLM Applications

After the knowledge base is built, use CNB's Open API for retrieval and combine it with LLM models to generate answers.

The typical RAG application workflow is as follows:

1.User asks a question

2.After understanding the user's question, use a query to call knowledge base retrieval. As mentioned above, use CNB's Open API for retrieval to get relevant document segments

3.After getting results from CNB knowledge base retrieval, construct and concatenate the prompt with question + knowledge context. For example, the concatenated prompt would typically look like this:

User Question: {user question}

Knowledge Base:
{content retrieved from knowledge base}

Please answer the user's question based on the above knowledge base.

4.Send the concatenated prompt to the LLM model to generate an answer and return it to the user

Specific Usage Methods

Step 1: Configure Pipeline to Use Knowledge Base Plugin

Plugin image name: cnbcool/knowledge-base

Configure the pipeline in the repository's .cnb.yml to use the knowledge base plugin. As configured below, when code is committed to the repository's main branch, it will trigger the pipeline and automatically use the knowledge base plugin to perform chunking, tokenization, vectorization, and other processing on Markdown files, then upload the processed content to CNB's knowledge base.

main:
  push:
    - stages:
        - name: build knowledge base
          image: cnbcool/knowledge-base
          settings:
            include: "**/**.md"

Some plugin parameter descriptions are as follows. For more parameters, please refer to the cnbcool/knowledge-base plugin documentation.

Parameter	Description	Default Value	Required	Notes
`include`	Specify files to include	Empty	Yes	Uses glob pattern matching, includes all files by default. Supports multiple patterns separated by commas, like `.md,.mdx,.docx,.txt,*.pdf`
`exclude`	Specify files to exclude	Empty	No	Uses glob pattern matching, excludes no files by default. Supports multiple patterns separated by commas
`embedding_model`	Embedding model	hunyuan	No	Currently only supports `hunyuan`
`chunk_size`	Specify text chunk size	1500	No
`chunk_overlap`	Specify the number of overlapping tokens between adjacent chunks	0	No

Step 2: Using the Knowledge Base

After the knowledge base is built, you can query and retrieve from the repository's knowledge base through Open API. The retrieved content can be combined with LLM models to generate answers.

Tips

Before starting, please read: CNB Open API Usage Tutorial Access token requires permission: repo-code:r (read repository code)

Interface Information

URL: https://api.cnb.cool/{slug}/-/knowledge/base/query
Method: POST
Content Type: application/json

Request Parameters

The request body should be in JSON format, containing the following fields:

Parameter	Type	Required	Description
query	string	Yes	Keywords or questions to query
top_k	number	No	Maximum number of results to return, default is 5

Request Example

{
  "query": "云原生开发配置自定义按钮"
}

Response Content

The response is in JSON format, containing an array of results. Each result includes the following fields:

Field Name	Type	Description
score	number	Relevance score ranging from 0-1, higher values indicate better matches
chunk	string	Matched knowledge base content text
metadata	object	Content metadata

metadata Field Details

Field Name	Type	Description
hash	string	Unique hash value of the content
name	string	Document name
path	string	Document path
position	number	Position of content in the original document
score	number	Relevance score, higher values indicate better matches

Response Example

[
  {
    "score": 0.8671732,
    "chunk": "该云原生远程开发解决方案基于Docker...",
    "metadata": {
      "hash": "15f7a1fc4420cbe9d81a946c9fc88814",
      "name": "quick-start",
      "path": "vscode/quick-start.md",
      "position": 0,
      "score": 0.8671732
    }
  }
]

Usage Examples

cURL Request Example

Note: {slug} should be replaced with the repository slug. For example, if the CNB official documentation knowledge base repository address is https://cnb.cool/cnb/docs, then {slug} would be cnb/docs. After replacement, the complete request URL would be: https://api.cnb.cool/cnb/docs/-/knowledge/base/query

curl -X "POST" "https://api.cnb.cool/{slug}/-/knowledge/base/query" \
  -H "accept: application/json" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${token}" \
  -d '{
    "query": "云原生开发配置自定义按钮"
}'

The retrieved response content can be combined with LLM models to generate answers.

RAG Mini Application Example

For example, here's a simple RAG application example code implemented in JavaScript:

import OpenAI from 'openai';

// Configuration
const CNB_TOKEN = 'your-cnb-token'; // Replace with your CNB access token, requires permission: `repo-code:r` (read repository code)
const OPENAI_API_KEY = 'your-openai-api-key'; // Replace with your OpenAI API key
const OPENAI_BASE_URL = 'https://api.openai.com/v1'; // Or your proxy address
const REPO_SLUG = 'cnb/docs'; // Replace with your repository slug

// Initialize OpenAI client
const openai = new OpenAI({ 
  apiKey: OPENAI_API_KEY,
  baseURL: OPENAI_BASE_URL
});

async function simpleRAG(question) {
  // 1. Call CNB knowledge base retrieval
  const response = await fetch(`https://api.cnb.cool/${REPO_SLUG}/-/knowledge/base/query`, {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'Authorization': `Bearer ${CNB_TOKEN}`
    },
    body: JSON.stringify({ query: question })
  });
  
  const knowledgeResults = await response.json();
  
  // 2. Extract knowledge content (assuming we take all results here)
  const knowledge = knowledgeResults
    .map(item => item.chunk)
    .join('\n\n');
  
  // 3. Call OpenAI to generate answer
  const completion = await openai.chat.completions.create({
    model: "gpt-4.1-2025-04-14",
    messages: [
      {
        role: "user",
        content: `Question: ${question}\n\nKnowledge Base: ${knowledge}\n\nPlease answer the question based on the knowledge base.`,
      },
    ],
  });

  return completion.choices[0].message.content;
}

// Usage example
const answer = await simpleRAG("How to develop a plugin?");
// Output answer combined with knowledge base
console.log(answer);