当前位置：首页 > news >正文

网站seo和sem是什么意思个人网站备注模板

news 2026/4/8 22:57:20

网站seo和sem是什么意思,个人网站备注模板,黄山建设网站公司,页面设计的宗旨是什么案例需求#xff1a; 1.爬取腾讯社招的数据#xff08;搜索 | 腾讯招聘#xff09;包括岗位名称链接时间公司名称 2.爬取所有页#xff08;翻页#xff09; 3.利用jsonpath进行数据解析 4.保存数据#xff1a;txt文本形式和excel文件两种形式解析#xff1a; 1.分…案例需求 1.爬取腾讯社招的数据搜索 | 腾讯招聘包括岗位名称链接时间公司名称 2.爬取所有页翻页 3.利用jsonpath进行数据解析 4.保存数据txt文本形式和excel文件两种形式解析 1.分析该网站同步还是异步——异步查看xhr 2.找到正确的数据包——看响应内容 3.复制请求地址 https://careers.tencent.com/tencentcareer/api/post/Query?timestamp1727929418908countryIdcityIdbgIdsproductIdcategoryIdparentCategoryIdattrIdkeywordpageIndex3pageSize10languagezh-cnareacn 4.删除不必要的找到正确的可删可不删 https://careers.tencent.com/tencentcareer/api/post/Query? 5.该网站反爬手段比较强给其进行伪装 headers {User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36 } data {timestamp: 1648355434381,countryId: ,cityId: ,bgIds: ,productId: ,categoryId: ,parentCategoryId: 40001,attrId: ,keyword: ,pageIndex: i,pageSize: 10,language: zh-cn,area: cn } 6.保存在excel文件中创建对象 wb workbook.Workbook() # 创建Excel对象 ws wb.active # 激活当前表 ws.append([职称, 链接, 时间, 公司名称]) 进行excel保存 def save_excel(z,l,s,g):my_list [z,l,s,g] # 以列表形式写入ws.append(my_list)wb.save(腾讯社招.xlsx) 进行本地文本保存 def save_text(n,u,t,p):with open(腾讯社招.txt,a,encodingutf-8)as f:f.write(n\n)f.write(u\n)f.write(t\n)f.write(p\n) 7.使用jsonpath解析数据 names jsonpath(r, $..RecruitPostName) urls jsonpath(r, $..PostURL) times jsonpath(r, $..LastUpdateTime) pronames jsonpath(r, $..ProductName) 8.处理解析的数据 for name, url, time, protime in zip(names, urls, times, pronames):# print(name,url,time,protime)save_text(name, url, time, protime)save_excel(name, url, time, protime) 9.翻页分析 for i in range(1,6):print(第{}页已经保存完毕.format(i))# url https://careers.tencent.com/search.htmldata {timestamp: 1648355434381,countryId: ,cityId: ,bgIds: ,productId: ,categoryId: ,parentCategoryId: 40001,attrId: ,keyword: ,pageIndex: i,pageSize: 10,language: zh-cn,area: cn} 示例代码 import requests from jsonpath import jsonpath from openpyxl import workbook import time #http://careers.tencent.com/jobdesc.html?postId1685827130673340416 def get_data():response requests.get(url, headersheaders, paramsdata)r response.json()return rdef parse_data(r):names jsonpath(r, $..RecruitPostName)urls jsonpath(r, $..PostURL)times jsonpath(r, $..LastUpdateTime)pronames jsonpath(r, $..ProductName)for name, url, time, protime in zip(names, urls, times, pronames):# print(name,url,time,protime)save_text(name, url, time, protime)save_excel(name, url, time, protime) # 保存数据 def save_text(n,u,t,p):with open(腾讯社招.txt,a,encodingutf-8)as f:f.write(n\n)f.write(u\n)f.write(t\n)f.write(p\n)def save_excel(z,l,s,g):my_list [z,l,s,g] # 以列表形式写入ws.append(my_list)wb.save(腾讯社招.xlsx) if __name__ __main__:wb workbook.Workbook() # 创建Excel对象ws wb.active # 激活当前表ws.append([职称, 链接, 时间, 公司名称])url https://careers.tencent.com/tencentcareer/api/post/Query?headers {User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36}for i in range(1,6):print(第{}页已经保存完毕.format(i))# url https://careers.tencent.com/search.htmldata {timestamp: 1648355434381,countryId: ,cityId: ,bgIds: ,productId: ,categoryId: ,parentCategoryId: 40001,attrId: ,keyword: ,pageIndex: i,pageSize: 10,language: zh-cn,area: cn}time.sleep(2)hget_data()parse_data(h) 运行结果同样也可以添加代理来进行添加代理 zhima_api http://http.tiqu.letecs.com/getip3?num1type1procity0yys0port1pack225683ts0ys0cs0lb1sb0pb4mr1regionsgm4 proxie_ip requests.get(zhima_api).json()[data][0] print(proxie_ip) # 将提取后的IP处理成字典形式构造完整HTTP代理 proxies {http: http:// str(proxie_ip[ip]) : str(proxie_ip[port]),#https: https:// str(proxie_ip[ip]) : str(proxie_ip[port]) }

查看全文

http://www.w-s-a.com/news/736559/