主页 > 开源代码 >

[Lc_Notes]hash去重|find|string逐字符处理|栈溢出

开源代码
2025-09-13 17:24:02

1.hash 去重

一、使用STL的unordered_set快速去重（推荐）

二、自定义哈希表实现去重（链地址法）

三、典型应用场景

四、性能对比

2. 对比 find | count

find()

count（）

3. 字符串逐个字符插入vector

4. 栈溢出 (heap-buffer-overflow)

错误根源

🛠️ 分场景修复方案

场景1：数组/vector索引越界（高频原因）

场景2：动态内存分配不足

场景3：指针野指针问题

调试技巧

总结

1.hash 去重

在C++中，哈希表常用于高效去重场景。下面将介绍两种典型的实现方式及代码示例：

一、使用STL的unordered_set快速去重（推荐） #include <iostream> #include <unordered_set> #include <vector> void deduplicate_with_unordered_set() { std::vector<int> input = {2, 5, 2, 8, 5, 6, 8, 8}; std::unordered_set<int> hash_set; // 去重逻辑 for (int num : input) { if (hash_set.find(num) == hash_set.end()) { // O(1)时间复杂度查询 hash_set.insert(num);//没发现就插入 } } // 输出结果 std::cout << "去重后的元素："; for (int unique_num : hash_set) { std::cout << unique_num << " "; } }

特点 [2] [6]：

基于STL实现，无需手动处理哈希冲突平均时间复杂度为O(n)适用于基础类型和可哈希对象

二、自定义哈希表实现去重（链地址法） #include <iostream> #include <list> #define TABLE_SIZE 10 class HashTable { private: std::list<int> table[TABLE_SIZE]; int hashFunction(int key) { return key % TABLE_SIZE; // 简单取模哈希函数 } public: void insert(int key) { int index = hashFunction(key); for (auto it = table[index].begin(); it != table[index].end(); ++it) { if (*it == key) return; // 发现重复则直接返回 } table[index].push_back(key); // 无重复时插入 } void printUnique() { std::cout << "去重后的元素："; for (int i=0; i<TABLE_SIZE; ++i) { for (int num : table[i]) { std::cout << num << " "; } } } }; int main() { HashTable ht; int arr[] = {2, 5, 2, 8, 5, 6, 8, 8}; for (int num : arr) { ht.insert(num); } ht.printUnique(); return 0; }

实现原理 [1] [3]：

通过hashFunction计算桶位置在链表桶中遍历查找重复元素无重复时插入新元素冲突处理采用链地址法

三、典型应用场景日志去重处理海量日志时，用哈希表记录已处理日志的指纹（如MD5值），避免重复分析 [4]。数据库索引优化在ETL过程中对主键进行预去重，例如： std::unordered_set<std::string> unique_keys; while (auto record = db.fetch()) { if (!unique_keys.count(record.id)) { process(record); unique_keys.insert(record.id); } }

网络爬虫URL管理存储已爬取URL的哈希值，防止重复抓取： std::unordered_set<size_t> visited_urls; size_t url_hash = std::hash<std::string>{}(url); if (!visited_urls.count(url_hash)) { crawl(url); visited_urls.insert(url_hash); }

四、性能对比

实现方式

平均时间复杂度

空间复杂度

适用场景

STL unordered_set

O(n)

通用场景

自定义链式哈希表

O(n)

O(n+m)

需要控制底层实现时

线性探查法

O(n)~O(n²)

O(n)

内存受限环境 [6]

2. 对比 find | count

find() 功能返回指向目标元素的迭代器，若未找到则返回end()迭代器。 auto it = hash.find(42); if (it != hash.end()) { cout << "找到元素：" << *it; } count（）

返回 hash[num] 的 value

3. 字符串逐个字符插入vector

（适用于滑动窗口算法场景）：

class Solution { public: int lengthOfLongestSubstring(string s) { vector<char> str; // 直接遍历字符串的每个字符（无需拆分） for (char c : s) { // 范围for循环 [9] str.push_back(c); } // 或使用迭代器 [6] // for (auto it = s.begin(); it != s.end(); ++it) { // str.push_back(*it); // } // 后续滑动窗口算法逻辑... } };

关键要点：

字符串本身可视为字符数组，直接遍历即可使用范围for循环（C++11特性）最简洁若需要处理子串分割（如按空格分割），可参考istringstream方法

4. 栈溢出 (heap-buffer-overflow)

错误根源

报错表明在 solution.cpp:11:18 处尝试读取非法堆内存地址 0x503000000150的越界情况，具体原因可能为：

数组/vector越界访问：例如 vector<int> nums 的索引超出 [0, nums.size()-1] 范围动态内存分配不足：malloc 或 new 分配的空间不足以容纳后续操作（如 int* arr = malloc(10) 但访问 arr[10]）指针偏移错误：指针算术运算后指向非法地址（如 ptr += offset 越界）（[2] 。对象生命周期问题：已释放内存被重复访问（如 free(x) 后仍使用 x[0]）。

🛠️ 分场景修复方案场景1：数组/vector索引越界（高频原因）

假设代码涉及滑动窗口算法（minSubArrayLen 函数常见于此类问题）：

// 错误示例：right 可能达到 nums.size() int left = 0, right = 0; while (right <= nums.size()) { // ❌ 当 right=nums.size() 时 nums[right] 越界 sum += nums[right]; right++; } // 修复：调整循环条件 while (right < nums.size()) { // ✅ 确保 right ∈ [0, nums.size()-1] sum += nums[right]; right++; }

验证方法：在循环内添加 cout << "right=" << right << endl; 观察索引值

场景2：动态内存分配不足

若涉及手动内存分配：

// 错误示例：分配空间不足（网页4案例） int* arr = (int*)malloc(sizeof(int) * 5); for (int i=0; i<=5; i++) { // ❌ i=5 越界访问 arr[i] = i; } // 修复：调整分配大小或循环范围 int* arr = (int*)malloc(sizeof(int) * 6); // ✅ 分配6个元素空间 for (int i=0; i<6; i++) { arr[i] = i; }

验证方法：检查 malloc/new 的参数是否与后续操作匹配

场景3：指针野指针问题

若存在指针传递或赋值逻辑（类似网页9案例）：

char* parray = (char*)malloc(100); char* array = parray; free(parray); // 错误：后续使用 array 指针会访问已释放内存 // 修复：释放后置空指针 free(parray); parray = nullptr; // ✅ 避免野指针 array = nullptr;

调试技巧 AddressSanitizer日志分析关注错误日志中的 Shadow bytes 部分，通过内存映射关系定位越界位置： Shadow bytes around the buggy address: 0x0c187fff80c0: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd fa：可寻址内存区域fd：已分配内存的 RedZone 非法访问会显示在 RedZone 边界外。最小化复现代码将 minSubArrayLen 函数剥离为独立测试用例，逐步注释代码块定位具体越界操作。

总结

针对 solution.cpp:11:18 的 READ 操作，优先检查以下位置：

vector访问：确认 vector<int>& nums 的所有 [] 或 .at() 操作索引合法性。迭代器失效：若使用 auto it = nums.begin()，检查循环中是否触发迭代器失效（如插入/删除操作）。内存复用：检查是否复用已释放的临时变量内存空间（如函数返回局部变量指针）。

标签：

上一篇
vue2以及vue3中v-if和v-for是否可以同时使用

下一篇
python学习第三天