抓取老齐的it加油站

最近分析了下阿里的Aliplayer请求阿里云点播的流程，随后发现老齐的IT加油站用的也是Aliplayer，而且前端还暴露出了AccessKeyId和AccessKeySecret，所以想，既然id和key全部暴露了，不如直接用php的SDK 来请求一下拿到真实的视频地址吧。

结果提示临时key无法操作
还有这种操作B

既然浏览器能获取到，不如拿Python来截取浏览器日志吧

用到的扩展有selenium及chromedriver 下载地址 https://npm.taobao.org/mirrors/chromedriver/

beautifulsoup4是为了分析页面元素

pip install beautifulsoup4

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import re
import time
import json
from bs4 import BeautifulSoup
import requests

def geturls(name='', url=''):
    res = requests.get(url)
    # 如果不指定parser，会有警告，使用默认的html.parser，不同的系统解析可能会有差异
    soup = BeautifulSoup(res.text, 'html.parser')

    urls_object = soup.find('div', attrs={'class': 'col-8'}).find('div', attrs={'class': 'list-group'})

    # 获取二级目录
    # second_DirName = urls_object.find_all('span', attrs={'class': 'font-weight-bold'})
    #
    i = 0
    for Name in urls_object.find_all('a'):
        i += 1

        re_url = get_mp4_url('https://www.itlaoqi.com' + Name.get('href'))

        j = 1
        while (re_url is None and j<4):
            re_url = get_mp4_url('https://www.itlaoqi.com' + Name.get('href'))
            j +=1
            time.sleep(1)

        print(str(i) + '、' + Name.find('span', attrs={'class': 'mr-2'}).next_sibling.strip())
        print(re_url)
        print('https://www.itlaoqi.com' + Name.get('href'))
        continue

def get_mp4_url(url):
    caps = {
        'browserName': 'chrome',
        'loggingPrefs': {
            # 'browser': 'ALL',
            # 'driver': 'ALL',
            'performance': 'ALL',
        },
        'goog:chromeOptions': {
            'perfLoggingPrefs': {
                'enableNetwork': True,
            },
            'w3c': False,
        },
    }

    driver = webdriver.Chrome(desired_capabilities=caps)
    driver.implicitly_wait(5)

    driver.get(url)
    #
    driver.find_element_by_tag_name('body').send_keys(Keys.SPACE)

    log = driver.get_log('performance')
    #
    for entry in log:
        try:
            m = str(json.loads(entry['message'])['message']["params"])
            url = re.search('https://video.itlaoqi.com/sv/.*?mp4', m).group()
            driver.quit()
            return url

        except Exception as e:
            continue

#geturls(url='https://www.itlaoqi.com/chapter/1507.html')
print(get_mp4_url('https://www.itlaoqi.com/chapter/1453.html'))

经过测试可以正常拿到真实的视频地址

小乐的博客

共 0 条评论

纱衣

纱衣

纱衣

纱衣

纱衣

扁鹊

张小三

张小三

张小三

张小三