1. Introduction

Purpose of the document

user’s monthly usage of
- GPT-3.5
- GPT-4
- Whisper-1
- Claude+ or 2 (coming soon)
fixed infra outcome / (user*month) of digitalOcean
fixed GPT conversation outcome per conversation
- GPT-3.5
- GPT-4
fixed whisper-1 outcome/(user*month) of digitalOcean
paycut of lemonsqueezy

Overview of the ChattyAI system

AI powerful feature
- GPT chat
  - GPT-3.5
  - GPT-4
- Prompt
- Whisper-1 voice to text

2. Token consumption cost in a GPT system

Definition of token in a GPT model

In the context of the GPT (Generative Pretrained Transformer) model, a "token" refers to the smallest unit of text that the system can process.

When processing language, GPT breaks down text into chunks called tokens. These tokens can be as small as one character or as large as one word. For example, in English, the sentence "ChatGPT is great!" might be broken down into ["ChatGPT", "is", "great", "!"]. Each of these is a token.

The system then uses these tokens to understand the context and generate responses. Tokens are transformed and processed through the model's layers, with each layer learning different levels of language abstraction (e.g., syntax, semantics).

It's essential to note, for languages like Chinese, Korean, and Japanese, tokenization can be quite complex, as words are not easily separated by spaces like in English. For this reason, more sophisticated tokenization techniques, such as SentencePiece or Byte-Pair Encoding (BPE), are used.

How token consumption impacts cost

model type	input price	output price
gpt-3.5-turbo-16k-0613	$0.003 / k tokens	$0.004 k tokens
gpt-4	$0.03 / k tokens	$0.06 / k tokens
Whisper-1	$0.006 / minute	k tokens

每月 1000 分钟录音，用完

每条至多录 40 分钟，最少 1 分钟

每分钟最多说英文 200 单词

每条跑一次 whisper

每条跑一次 gpt-3.5-16k

每次录音时长在 1-40mins 之间的话，这预估成本项范围大约是什么？

英文一般一个单词一个 token，所以一分钟就是 200 tokens，每条 40 分钟一共 8000 tokens

1000 分钟录音共计 200000 tokens

每条最多录 40 分钟，最少 1 分钟，那么最少需要 summary 1000/40 = 25次，最多 summary 1000 次

GPT-3.5-16k 输入成本

内容 token 成本：200k tokens * $0.003/k tokens = 0.6$

prompt token 成本：

最少需要 25 次 prompt 的 token 花费，按照每个 prompt 100 tokens 计算 25 * 100 tokens * $0.003/k tokens = 0.0075 $

最多需要 1000 * 100 tokens * $0.003/k tokens = 0.3 $

共计成本在 0.6075$ ~ 0.9$ 之间

GPT-3.5-16k 输出成本

根据目前的样例，我们假设 GPT 的信息压缩率是 1/5，这意味着一共输入了 200 k tokens，压缩并生成了 40k tokens

token 成本： 40k tokens * $0.004 / 1k tokens = 0.16$

whisper 成本： 1000 minute * $0.006 / minute = 6$

AIGC 共计成本 : wisper + GPT Input + GPT output = 6 + 0.16 + [0.6075, 0.9] = [6.7675$, 7.06$]

这个成本是我们要付给 openai 的钱

Strategy for efficient token usage
- prompt optimization to generate less output word
- adopt cheaper LLM model
- adopt GPT-3.5-16k rather than GPT-4

Average token monthly usage per user

150$ in June
unique visitors 2.3k
Total requests 50.33k
- real chat request received 5408

Total messages

total 10875
user 5406
assistant 5434

model type	total count	assistant count	user count
gpt-3.5-turbo-0301	3245	1609	1603
gpt-3.5-turbo-0613	530	265	265
gpt-3.5-turbo-16k-0613	104	52	52
gpt-4-0314	6090	3052	3038
gpt-4-0613	902	455	447

// message per visitors
10875/2.3k = 4.7283

// cost per visitors
150$/2.3k = 0.06522$

// real chat request received 5408
gpt-3.5 assitant messages = 1926, 
gpt-3.5 user messages = 1920, 

gpt-4 assistant messages = 3507,
gpt-4 user messages = 3485,

// cost per messages
150$/5408 = 0.02774$

Screenshot 2023-07-05 at 22.51.24.png