## Problem
You want to post spam 500 times with a single click.
## Solution
' open "http://weibo.com/kevpp"
' play the macros for 500 times
EVENTS TYPE=KEYPRESS SELECTOR=".input_detail" CHARS="Kev++到此一游({{!NOW:yyyy-mm-ddThh:nn:ss}})\n"
TAG POS=1 TYPE=A ATTR=TXT:发布
WAIT SECONDS=5
## Result
abc-spider
Kev++到此一游(2013-04-28T16:42:11)
| 转发| 收藏| 评论
10分钟前 来自新浪微博
abc-spider
Kev++到此一游(2013-04-28T16:42:05)
| 转发| 收藏| 评论
10分钟前 来自新浪微博
abc-spider
Kev++到此一游(2013-04-28T16:41:58)
删除| | 转发| 收藏| 评论
10分钟前 来自新浪微博
abc-spider
Kev++到此一游(2013-04-28T16:41:47)
| 转发| 收藏| 评论
10分钟前 来自新浪微博
abc-spider
Kev++到此一游(2013-04-28T16:41:41)
## Links
- http://weibo.com/kevpp
- https://addons.mozilla.org/en-US/firefox/addon/imacros-for-firefox/?src=search
2013-04-28
How to post spam 500 times?
2013-04-24
How to scrape an obfuscated site? (一)
## Problem
How to scrape an obfuscated site (such as `www.hidemyass.com`)
As you can see, random html tags are injected. I break it up into multiple lines with indentation.
You need to clean them up to see the real ip-address (displayed as `88.200.222.238`).
> Some people (including me), when confronted with a problem, think
> “I know, I'll use regular expressions.” Now they have two problems.
s = '''<span>
<style>
.n8jQ{display:none}
.p1Qr{display:inline}
.E3lv{display:none}
.I0ja{display:inline}
.oRy_{display:none}
.FYOA{display:inline}
.oldO{display:none}
.NQ2o{display:inline}
</style>
<span class="n8jQ">54</span>
<span></span>
<div style="display:none">60</div>
<span class="p1Qr">88</span>
<span style="display:none">143</span>
<span class="oRy_">143</span>
<span></span>
<span class="n8jQ">160</span>
<div style="display:none">160</div>
.
<span style="display:none">41</span>
<span class="oldO">41</span>
<div style="display:none">41</div>
<span class="NQ2o">200</span>
<span class="I0ja">.</span>
<span style="display:none">27</span>
<span style="display:none">63</span>
<div style="display:none">63</div>
<span style="display:none">178</span>
<span style="display:none">191</span>
<span class="47">222</span>
.
<div style="display:none">34</div>
<span style="display:none">45</span>
<span class="n8jQ">45</span>
<span class="oldO">229</span>
<span></span>
<span style="display: inline">238</span>
</span>'''
## Solution
def parse_ipaddr(s):
# normalize tags
txt = re.sub(r'\bdiv\b', 'span', s)
txt = re.sub(r'(?<=>)\s*([.0-9]+)\s*((?=<)(?!</)|(?=</span>$))', r'<span style="display:inline">\g<1></span>', txt)
# extract style sheet
css = {}
l, r = s.find('<style>'), s.rfind('</style>')
for i in s[l+7:r].strip().splitlines():
m = re.search(r'\.(?P<key>[^{]+)\{display:(?P<val>none|inline)\}', i)
if m:
d = m.groupdict()
css[d['key']] = d['val'] == 'inline'
# collect ip parts
ip_parts = []
for j in re.findall(r'<span (class|style)="([^"]+)">([^<>]+)</span>', txt):
if j[0]=='class' and css.get(j[1], True):
ip_parts.append(j[2])
elif j[0]=='style' and 'inline' in j[1]:
ip_parts.append(j[2])
else:
pass
return ''.join(ip_parts)
## Result
>>> parse_ipaddr(s)
'88.200.222.238'
## Links
- http://www.hidemyass.com/proxy-list/
- http://regex.info/blog/2006-09-15/247
2013-04-11
计算编程高手的平均年龄
Tags:
json
## Problem
What's the average age of StackOverflow top users?
## Solution 1
URL='https://api.stackexchange.com/2.1/users?order=desc&sort=reputation&site=stackoverflow'
curl -s $URL |
gunzip |
jsawk 'return $$.items' |
jsawk 'return $$.age' -a 'return $$.reduce(function(x,y){return x+y})/$$.length'
## Solution 2
URL='https://api.stackexchange.com/2.1/users?order=desc&sort=reputation&site=stackoverflow'
curl -s $URL |
gunzip |
jq '[.items[].age | select(.)] | add/length'
## Result
36.72
## Links:
- https://github.com/micha/jsawk
- http://stedolan.github.io/jq/manual/
Subscribe to:
Posts (Atom)