String | Notion

string is the set of all strings of 8-bit bytes, conventionally but not necessarily representing UTF-8-encoded text. A string may be empty, but not nil. Values of string type are immutable

Strings and UTF-8 encoding

In Go, a string is in effect a read-only slice of bytes. A string holds arbitrary bytes. It's not required to hold Unicode or UTF-8 text. Indexing a string yields its bytes, not its characters: a string is just a bunch of bytes.

A string is a sequence of bytes, not runes. However, strings often contain Unicode text encoded in UTF-8, which encodes all Unicode code points using one to four bytes. (ASCII characters are encoded with one byte, while other code points use more.)

Since Go source code itself is encoded as UTF-8, string literals will automatically get this encoding. For example, in the string "café" the character é (code point 233) is encoded using two bytes, while the ASCII characters c, a and f (code points 99, 97 and 102) only use one:

fmt.Println([]byte("café")) // [99 97 102 195 169]
fmt.Println([]rune("café")) // [99 97 102 233]

The main difference between a string and a slice of bytes is that a string is immutable, meaning that you cannot modify its individual characters. On the other hand, a slice of bytes is mutable, so you can modify its individual elements.

字节长度 len(str) 与编码无关; 字符长度与编码有关

len(“你好“) // 6
utf8.RuneCountInString("你好") // 2

for range

Go 只有一种以 UTF-8 编码方式处理字符串，那就是在字符串上使用 for range 循环——在每次迭代时解码一个 UTF-8 编码 rune。每次循环时，循环的索引都是当前 rune 的起始位置，以字节为单位，code point 是其值

s := "我是中国人"
for index, runeValue := range s {
		fmt.Printf("%#U 起始于字位置%d\\n", runeValue, index)
}

// 输出结果如下：
我 起始于字位置0
是 起始于字位置3
中 起始于字位置6
国 起始于字位置9
人 起始于字位置12

作为对比，使用下标循环来遍历字符串。使用下标方式索引字符串 s，得到的是 s[i] 的字节值。字符串 s 对应的二进制序列为e6 88 91 e6 98 af e4 b8 ad e5 9b bd e4 ba ba 所以 s[0] 对应的字节为 e6 对应的 UTF-8 字符就是拉丁字符 æ 如果想读取第一个字符为“我”，可以先把字符串转为 rune 数组 (runeArray :=[]rune(s) 再通过索引 runeArray 来读取 runeArray[0]

fmt.Printf("% x\\n", s)
for i := 0; i < len(s); i++ {
		fmt.Printf("%c 起始于字位置%d\\n", s[i], i)
}

// 输出结果如下：
e6 88 91 e6 98 af e4 b8 ad e5 9b bd e4 ba ba
æ 起始于字位置0
� 起始于字位置1
� 起始于字位置2
æ 起始于字位置3
� 起始于字位置4
¯ 起始于字位置5
ä 起始于字位置6
¸ 起始于字位置7
 起始于字位置8
å 起始于字位置9
� 起始于字位置10
½ 起始于字位置11
ä 起始于字位置12
º 起始于字位置13
º 起始于字位置14

efficient concatenation

Clean and simple string building 会生成一个新的 string

s := fmt.Sprintf("Size: %d MB.", 85) // s == "Size: 85 MB."

bytes.Buffer

buf := []byte("Size: ")
buf = strconv.AppendInt(buf, 85, 10)
buf = append(buf, " MB."...)
s := string(buf)