Python爬取新浪微博数据快速版

简单都有人啊 · 发表于 2023-4-28 09:33:28

新浪微博的数据可是非常有价值的，你可以拿来数据分析、拿来做网站、甚至是*。不过很多人由于技术限制，想要使用的时候只能使用复制粘贴这样的笨方法。没关系，现在就教大家如何批量爬取微博的数据，大大加快数据迁移速度！ 1、需要先获取cookie, 2、运行爬虫运行爬虫之前先简单的进行分析，微博这样的网站反爬机制都比较严的，最近的风控更严，特别是对IP的需求更高，所以在爬取数据之前需要加上代理池。爬虫代理的使用之前分享过很多，这里就简单的说下，根据自己的程序设计选择使用api提取模式自己管理IP或者使用隧道转发直接进行数据爬取都可以。这里我们选择使用后者，隧道转发的更适合业务启动和上手也快。实现过程如下：
#! -*- encoding:utf-8 -*-

import requests
import random

# 要访问的目标页面
targetUrl = &#34;weibo.com/?sudaref=www.baidu.com&#34;

# 要访问的目标HTTPS页面
# targetUrl = &#34;weibo.com/?sudaref=www.baidu.comp&#34;

# 代理服务器(产品官网 www.16yun.cn)
proxyHost = &#34;t.16yun.cn&#34;
proxyPort = &#34;31111&#34;

# 代理验证信息
proxyUser = &#34;16LDJLCD&#34;
proxyPass = &#34;254565&#34;

proxyMeta = &#34;http://%(user)s:%(pass)s@%(host)s:%(port)s&#34; % {
      &#34;host&#34; : proxyHost,
      &#34;port&#34; : proxyPort,
      &#34;user&#34; : proxyUser,
      &#34;pass&#34; : proxyPass,
}

# 设置 http和https访问都是用HTTP代理
proxies = {
      &#34;http&#34;  : proxyMeta,
      &#34;https&#34; : proxyMeta,
}

#  设置IP切换头
tunnel = random.randint(1,10000)
headers = {&#34;Proxy-Tunnel&#34;: str(tunnel)}

resp = requests.get(targetUrl, proxies=proxies, headers=headers)

print resp.status_code
print resp.text

吃什么话梅 · 发表于 2025-2-26 03:14:45

啊啊啊啊啊啊啊啊啊啊啊

贼小鱼 · 发表于 2025-12-17 03:17:06

呵呵，低调，低调！

知情 · 发表于 2025-12-18 00:29:33

众里寻他千百度，蓦然回首在这里！

毅然 · 发表于 2025-12-18 05:11:51

有空一起交流一下

		自动登录	找回密码
密码			立即注册