【实测】Python|【实测】Python 和 C++ 下字符串查找的速度对比【实测】Python和C++下字符串查

完整格式链接：https://blog.imakiseki.cf/2022/03/07/techdev/python-cpp-string-find-perf-test/
背景最近在备战一场算法竞赛，语言误选了 Python ，无奈只能着手对常见场景进行语言迁移。而字符串查找的场景在算法竞赛中时有出现。本文即对此场景在 Python 和竞赛常用语言 C++ 下的速度进行对比，并提供相关参数和运行结果供他人参考。
参数硬件和操作系统

-`root@ .o+`------------ `ooo/OS: Arch Linux ARM aarch64 `+oooo:Host: Raspberry Pi 4 Model B `+oooooo:Kernel: 5.16.12-1-aarch64-ARCH -+oooooo+:Uptime: 3 hours, 32 mins `/:-:++oooo+:Packages: 378 (pacman) `/++++/+++++++:Shell: zsh 5.8.1 `/++++++++++++++:Terminal: /dev/pts/0 `/+++ooooooooooooo/`CPU: (4) @ 1.500GHz ./ooosssso++osssssso+`Memory: 102MiB / 7797MiB .oossssso-````/ossssss+` -osssssso.:ssssssso. :osssssss/osssso+++. /ossssssss/+ssssooo/- `/ossssso+/:--:/+osssso+- `+sso+:-``.-/+oso: `++:.`-/+/ .``/

编译环境和解释环境

Python
- 解释器：Python 3.10.2 (main, Jan 23 2022, 21:20:14) [GCC 10.2.0] on linux
- 交互环境：IPython 8.0.1
C++
- 编译器：g++ (GCC) 11.2.0
- 编译命令：g++ test.cpp -Wall -O2 -g -std=c++11 -o test

场景本次实测设置两个场景：场景 1 的源串字符分布使用伪随机数生成器生成，表示字符串查找的平均情况；场景 2 的源串可连续分割成 20,000 个长度为 50 的字符片段，其中第 15,001 个即为模式串，形如“ab…b”（1 个“a”，49 个 “b”），其余的字符片段形如“ab…c”（1 个“a”，48 个“b”，1 个“c”）。

项目	场景 1：平均情况	场景 2：较坏情况
字符集	小写字母	`abc`
字符分布	`random.choice`	有较强规律性
源串长度	1,000,000	1,000,000
模式串长度	1,000	50
模式串出现位置	250,000、500,000、750,000	750,000
模式串出现次数	1	1

测试方法 【【实测】Python|【实测】Python 和 C++ 下字符串查找的速度对比】本次实测中，Python 语言使用内置类型 str 的 .find() 成员函数，C++ 语言分别使用 string 类的 .find() 成员函数、strstr 标准库函数和用户实现的 KMP 算法。

测试对象	核心代码
Python	`src.find(pat)`
C++ - `test.cpp`	`src.find(pat)`
C++ - `test_strstr.cpp`	`strstr(src, pat)`
C++ - `test_kmp.cpp`	`KMP(src, pat)`

源代码生成源串和模式串

import random# 场景 1： # 源串 s = "".join(chr(random.choice(range(ord("a"), ord("z") + 1))) for _ in range(1000000)) # 模式串列表，三个元素各对应一个模式串 p = [s[250000:251000], s[500000:501000], s[750000:751000]]# 场景 2： # 模式串 p = 'a' + 'b' * 49 # 其他字符片段 _s = "a" + "b" * 48 + "c" # 源串 s = _s * 15000 + p + _s * 4999# 存储到文件，便于 C++ 程序获取 with open('source.in', 'w') as f: f.write(s) with open('pattern.in', 'w') as f: f.write(p[0])

测试代码 Python

In []: %timeit s.find(p[0])

C++ - test.cpp

#include #include #include #include #define LOOP_COUNT (1000) using namespace std; using std::chrono::high_resolution_clock; using std::chrono::duration_cast; using std::chrono::duration; using std::chrono::milliseconds; double test(string s, string p, size_t* pos_ptr) { auto t1 = high_resolution_clock::now(); *pos_ptr = s.find(p); auto t2 = high_resolution_clock::now(); duration ms_double = t2 - t1; return ms_double.count(); }int main() { string s, p; size_t pos; ifstream srcfile("source.in"); ifstream patfile("pattern.in"); srcfile >> s; patfile >> p; double tot_time = 0; for (int i = 0; i < LOOP_COUNT; ++i) { tot_time += test(s, p, &pos); }cout << "Loop count:" << LOOP_COUNT << endl; cout << "Source string length:" << s.length() << endl; cout << "Pattern string length: " << p.length() << endl; cout << "Search result:" << pos << endl; cout << "Time:" << tot_time / LOOP_COUNT << " ms" << endl; return 0; }

C++ - test_strstr.cpp

#include #include #include #include #define LOOP_COUNT (1000) using namespace std; using std::chrono::high_resolution_clock; using std::chrono::duration_cast; using std::chrono::duration; using std::chrono::milliseconds; char s[1000005], p[1005], *pos=NULL; double test(char* s, char* p, char** pos_ptr) { auto t1 = high_resolution_clock::now(); *pos_ptr = strstr(s, p); auto t2 = high_resolution_clock::now(); duration ms_double = t2 - t1; return ms_double.count(); }int main() { ifstream srcfile("source.in"); ifstream patfile("pattern.in"); srcfile >> s; patfile >> p; double tot_time = 0; for (int i = 0; i < LOOP_COUNT; ++i) { tot_time += test(s, p, &pos); }cout << "Loop count:" << LOOP_COUNT << endl; cout << "Source string length:" << strlen(s) << endl; cout << "Pattern string length: " << strlen(p) << endl; cout << "Search result:" << pos - s << endl; cout << "Time:" << tot_time / LOOP_COUNT << " ms" << endl; return 0; }

C++ - test_kmp.cpp

#include #include #include #include #include #define LOOP_COUNT (1000) using namespace std; using std::chrono::high_resolution_clock; using std::chrono::duration_cast; using std::chrono::duration; using std::chrono::milliseconds; int dp[1005]; int KMP(string s, string p) { int m = s.length(), n = p.length(); if (n == 0) return 0; if (m < n) return -1; memset(dp, 0, sizeof(int) * (n+1)); for (int i = 1; i < n; ++i) { int j = dp[i+1]; while (j > 0 && p[j] != p[i]) j = dp[j]; if (j > 0 || p[j] == p[i]) dp[i+1] = j + 1; } for (int i = 0, j = 0; i < m; ++i) if (s[i] == p[j]) { if (++j == n) return i - j + 1; } else if (j > 0) { j = dp[j]; --i; } return -1; }double test(string s, string p, int* pos_ptr) { auto t1 = high_resolution_clock::now(); *pos_ptr = KMP(s, p); auto t2 = high_resolution_clock::now(); duration ms_double = t2 - t1; return ms_double.count(); }int main() { string s, p; int pos; ifstream srcfile("source.in"); ifstream patfile("pattern.in"); srcfile >> s; patfile >> p; double tot_time = 0; for (int i = 0; i < LOOP_COUNT; ++i) { tot_time += test(s, p, &pos); }cout << "Loop count:" << LOOP_COUNT << endl; cout << "Source string length:" << s.length() << endl; cout << "Pattern string length: " << p.length() << endl; cout << "Search result:" << pos << endl; cout << "Time:" << tot_time / LOOP_COUNT << " ms" << endl; return 0; }

结果 IPython 的 %timeit 魔法命令可以输出代码多次执行的平均时间和标准差，在此取平均时间。C++ 的代码对每个模式串固定运行 1,000 次后取平均时间。
以下时间若无特别说明，均以微秒为单位，保留到整数位。

场景	模式串出现位置	Python	C++ - `test.cpp`	C++ - `test_strstr.cpp`	C++ - `test_kmp.cpp`
场景 1	250,000	105	523	155	2564
场景 1	500,000	183	1053	274	3711
场景 1	750,000	291	1589	447	4900
场景 2	750,000	2630*	618	353	3565

* 原输出为“2.63 ms”。IPython 的 %timeit 输出的均值保留 3 位有效数字，由于此时间已超过 1 毫秒，微秒位被舍弃。此处仍以微秒作单位，数值记为“2630”。
局限性本次实测时使用的设备硬件上劣于算法竞赛中的标准配置机器，实测结果中的“绝对数值”参考性较低。
总结根据上表中的结果，在给定环境和相关参数条件下，场景 1 中 Python 的运行时间大约为 C++ 中 string::find 的五分之一，与 std:strstr 接近；而在场景 2 中 Python 的运行时间明显增长，但 C++ 的前两种测试方法的运行时间与先前接近甚至更短。四次测试中，C++ 的用户实现的 KMP 算法运行时间均较长，长于同条件下 Python 的情况。
Python 中的内置类型 str 的快速查找（.find()）和计数（.count()）算法基于 Boyer-Moore 算法和 Horspool 算法的混合，其中后者是前者的简化，而前者与 Knuth-Morris-Pratt 算法有关。
有关 C++ 的 string::find 比 std::strstr 运行时间长的相关情况，参见 Bug 66414 - string::find ten times slower than strstr。
值得关注的是：C++ 中自行实现的 KMP 算法的运行时间竟然远长于 C++ 标准库甚至 Python 中的算法。这也类似于常说的“自己设计汇编代码运行效率低于编译器”的情况。Stack Overflow 的一个问题 strstr faster than algorithms? 下有人回答如下：

Why do you think strstr should be slower than all the others? Do you know what algorithm strstr uses? I think it's quite likely that strstr uses a fine-tuned, processor-specific, assembly-coded algorithm of the KMP type or better. In which case you don't stand a chance of out-performing it in C for such small benchmarks.

KMP 算法并非是所有线性复杂度算法中最快的。在不同的环境（软硬件、测试数据等）下，KMP 与其变种乃至其他线性复杂度算法，孰优孰劣都无法判断。编译器在设计时考虑到诸多可能的因素，尽可能使不同环境下都能有相对较优的策略来得到结果。因而，在保证结果正确的情况下，与其根据算法原理自行编写，不如直接使用标准库中提供的函数。
同时本次实测也在运行时间角度再次印证 Python 并不适合在算法竞赛中取得高成绩的说法。
参考

https://stackoverflow.com/questions/22387586/measuring-execution-time-of-a-function-in-c
https://www.cplusplus.com/reference/string/string/find/
https://stackoverflow.com/questions/681649/how-is-string-find-implemented-in-cpython
https://github.com/python/cpython/blob/main/Objects/stringlib/fastsearch.h#L5
https://stackoverflow.com/questions/8869605/c-stringfind-complexity
https://stackoverflow.com/questions/19506571/can-it-be-faster-to-find-the-minimum-periodic-string-inside-another-string-in-te
https://gcc.gnu.org/onlinedocs/gcc-9.4.0/libstdc++/api/a17342_source.html
https://opensource.apple.com/source/tcl/tcl-10/tcl/compat/strstr.c.auto.html
https://gist.github.com/hsinewu/44a1ce38a1baf47893922e3f54807713
https://stackoverflow.com/questions/11799956/performance-comparison-strstr-vs-stdstringfind
https://stackoverflow.com/questions/7586990/strstr-faster-than-algorithms
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66414
http://0x80.pl/notesen/2016-10-08-slow-std-string-find.html