pip install selenium
脚本

#!/usr/bin/python
from selenium import webdriver
browser = webdriver.PhantomJS()
browser.get("http://www.site-digger.com/html/articles/20110516/proxieslist.html")
trs = browser.find_elements_by_tag_name('tr')
for tr in trs[1:]:
        print(tr.text.split(' ')[0])

报 selenium.common.exceptions.WebDriverException: Message: 'phantomjs' executable needs to be in PATH

wget https://bitbucket.org/ariya/phantomjs/downloads/phantomjs-2.1.1-linux-x86_64.tar.bz2
tar -xvjf phantomjs-2.1.1-linux-x86_64.tar.bz2 
cp -R phantomjs-2.1.1-linux-x86_64 /usr/local/share/ 
ln -sf /usr/local/share/phantomjs-2.1.1-linux-x86_64/bin/phantomjs /usr/local/bin/

完成

进阶:直接获取js加载后的html源代码,然后使用其它方式提取,比如正则,美味道的鸡汤

browser = webdriver.PhantomJS()
browser.get("http://www.site-digger.com/html/articles/20110516/proxieslist.html")
#对于一些延迟加载,此处必须得sleep一下
time.sleep(5)
content = browser.page_source
print content

转载请注明出处:https://www.isres.com/python/185.html

添加新评论