Python爬取DWR框架的网站

  • A+
所属分类:BLOG Python

最近有个爬取竟台数据的需求,在写python代码的时候,遇到了对方是DWR框架网站。用Chrome开发者工具一看,Post提交参数一堆字典,头都大了。

网站Hearders and Request body 内容:

Python爬取DWR框架的网站Python爬取DWR框架的网站Python爬取DWR框架的网站

看到Request Payload,明显是个词典,但是第一次处理这种类型,根本不知道如何拼装Post params。网上的资料真的太少,最终让我Google到了,内容如下:

params = {
    'callCount': '1',
    'page': '%2Fmall%2Findex.htm',
    'httpSessionId': '',
    'scriptSessionId': 'E0C452A0E81994294E5E8241DAA72AB4655',
    'c0-scriptName': 'TvDwrController',
    'c0-methodName': 'selectTvSchedule',
    'c0-id': '0',
    'c0-e1': 'null:null',
    'c0-e2': 'null:null',
    'c0-e3': 'null:null',
    'c0-e4': 'null:null',
    'c0-e5': 'null:null',
    'c0-e6': 'null:null',
    'c0-e7': 'null:null',
    'c0-e8': 'null:null',
    'c0-e9': 'string:'+dateSte,
    'c0-e10': 'null:null',
    'c0-e11': 'null:null',
    'c0-e12': 'null:null',
    'c0-e13': 'string:'+dateSte,
    'c0-e14': 'null:null',
    'c0-e15': 'null:null',
    'c0-e16': 'null:null',
    'c0-e17': 'null:null',
    'c0-e18': 'null:null',
    'c0-e19': 'null:null',
    'c0-e20': 'null:null',
    'c0-e21': 'null:null',
    'c0-e22': 'null:null',
    'c0-e23': 'null:null',
    'c0-e24': 'null:null',
    'c0-e25': 'null:null',
    'c0-e26': 'null:null',
    'c0-e27': 'null:null',
    'c0-e28': 'null:null',
    'c0-e29': 'null:null',
    'c0-e30': 'null:null',
    'c0-e31': 'null:null',
    'c0-e32': 'null:null',
    'c0-e33': 'null:null',
    'c0-e34': 'null:null',
    'c0-e35': 'null:null',
    'c0-e36': 'null:null',
    'c0-e37': 'null:null',
    'c0-e38': 'null:null',
    'c0-e39': 'null:null',
    'c0-e40': 'null:null',
    'c0-e41': 'null:null',
    'c0-e42': 'null:null',
    'c0-e43': 'null:null',
    'c0-e44': 'null:null',
    'c0-e45': 'null:null',
    'c0-e46': 'null:null',
    'c0-e47': 'null:null',
    'c0-e48': 'number:0',
    'c0-e49': 'null:null',
    'c0-e50': 'null:null',
    'c0-e51': 'null:null',
    'c0-e52': 'null:null',
    'c0-e53': 'null:null',
    'c0-e54': 'null:null',
    'c0-e55': 'null:null',
    'c0-e56': 'null:null',
    'c0-e57': 'null:null',
    'c0-e58': 'null:null',
    'c0-e59': 'null:null',
    'c0-e60': 'null:null',
    'c0-e61': 'null:null',
    'c0-e62': 'null:null',
    'c0-e63': 'null:null',
    'c0-e64': 'null:null',
    'c0-e65': 'null:null',
    'c0-e66': 'null:null',
    'c0-e67': 'null:null',
    'c0-e68': 'null:null',
    'c0-e69': 'null:null',
    'c0-e70': 'null:null',
    'c0-e71': 'null:null',
    'c0-e72': 'null:null',
    'c0-e73': 'null:null',
    'c0-e74': 'null:null',
    'c0-e75': 'null:null',
    'c0-e76': 'number:0',
    'c0-e77': 'null:null',
    'c0-e78': 'null:null',
    'c0-e79': 'null:null',
    'c0-e80': 'null:null',
    'c0-e81': 'null:null',
    'c0-e82': 'null:null',
    'c0-param0': 'Object_Object:{advNotiYn:reference:c0-e1, afterRn:reference:c0-e2, articleGb:reference:c0-e3, articleNo:reference:c0-e4, bbsDiv:reference:c0-e5, bbsGb:reference:c0-e6, bbsGbName:reference:c0-e7, bbsNo:reference:c0-e8, bdBDate:reference:c0-e9, bdBTimeE:reference:c0-e10, bdBTimeS:reference:c0-e11, bdDate:reference:c0-e12, bdEDate:reference:c0-e13, bdTime:reference:c0-e14, beforeRn:reference:c0-e15, bestItemYn:reference:c0-e16, brandCode:reference:c0-e17, businessCategory:reference:c0-e18, businessType:reference:c0-e19, buyQty:reference:c0-e20, ceoName:reference:c0-e21, commentNo:reference:c0-e22, communeGb:reference:c0-e23, compAddr:reference:c0-e24, compName:reference:c0-e25, compPostNo:reference:c0-e26, content:reference:c0-e27, contentNo:reference:c0-e28, credential:reference:c0-e29, delyMethod:reference:c0-e30, dir:reference:c0-e31, emailAddr:reference:c0-e32, fromDate:reference:c0-e33, giftHasYn:reference:c0-e34, homepageIntro:reference:c0-e35, homepageName:reference:c0-e36, homepageUrl:reference:c0-e37, inboundGb:reference:c0-e38, inboundNo:reference:c0-e39, insertDate:reference:c0-e40, insertId:reference:c0-e41, internetId:reference:c0-e42, itemCode:reference:c0-e43, itemName:reference:c0-e44, itemType:reference:c0-e45, itemType2:reference:c0-e46, keyField:reference:c0-e47, limit:reference:c0-e48, managerHp:reference:c0-e49, managerName:reference:c0-e50, managerTel:reference:c0-e51, modifyDate:reference:c0-e52, modifyId:reference:c0-e53, msale_code:reference:c0-e54, newItemYn:reference:c0-e55, openYn:reference:c0-e56, orderBy:reference:c0-e57, procDate:reference:c0-e58, procId:reference:c0-e59, procStatus:reference:c0-e60, progCode:reference:c0-e61, rank:reference:c0-e62, readCount:reference:c0-e63, recommCount:reference:c0-e64, regNo:reference:c0-e65, reply:reference:c0-e66, request:reference:c0-e67, rownum:reference:c0-e68, rsalePrice:reference:c0-e69, saveamt:reference:c0-e70, searchCon:reference:c0-e71, sitemCode:reference:c0-e72, sno:reference:c0-e73, snoList:reference:c0-e74, sort:reference:c0-e75, start:reference:c0-e76, title:reference:c0-e77, toDate:reference:c0-e78, totalCnt:reference:c0-e79, url:reference:c0-e80, useYn:reference:c0-e81, vodItemYn:reference:c0-e82}',
    'batchId': '1'
}

然后就取到了相关的内容,返回的是字符串,re.findall正则搞定。

参考连接:

发表评论

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: