HJKL: April 2013

2013-04-28

How to post spam 500 times?



## Problem

You want to post spam 500 times with a single click.


## Solution

    ' open "http://weibo.com/kevpp"
    ' play the macros for 500 times
    EVENTS TYPE=KEYPRESS SELECTOR=".input_detail" CHARS="Kev++到此一游({{!NOW:yyyy-mm-ddThh:nn:ss}})\n"
    TAG POS=1 TYPE=A ATTR=TXT:发布
    WAIT SECONDS=5

## Result

    abc-spider
    Kev++到此一游(2013-04-28T16:42:11)
    | 转发| 收藏| 评论
    10分钟前 来自新浪微博
    abc-spider
    Kev++到此一游(2013-04-28T16:42:05)
    | 转发| 收藏| 评论
    10分钟前 来自新浪微博
    abc-spider
    Kev++到此一游(2013-04-28T16:41:58)
    删除| | 转发| 收藏| 评论
    10分钟前 来自新浪微博
    abc-spider
    Kev++到此一游(2013-04-28T16:41:47)
    | 转发| 收藏| 评论
    10分钟前 来自新浪微博
    abc-spider
    Kev++到此一游(2013-04-28T16:41:41)

## Links

- http://weibo.com/kevpp
- https://addons.mozilla.org/en-US/firefox/addon/imacros-for-firefox/?src=search

2013-04-24

How to scrape an obfuscated site? (一)


## Problem

How to scrape an obfuscated site (such as `www.hidemyass.com`)

As you can see, random html tags are injected. I break it up into multiple lines with indentation.
You need to clean them up to see the real ip-address (displayed as `88.200.222.238`).

> Some people (including me), when confronted with a problem, think
> “I know, I'll use regular expressions.”   Now they have two problems. 

    s = '''<span>
        <style>
            .n8jQ{display:none}
            .p1Qr{display:inline}
            .E3lv{display:none}
            .I0ja{display:inline}
            .oRy_{display:none}
            .FYOA{display:inline}
            .oldO{display:none}
            .NQ2o{display:inline}
        </style>
        <span class="n8jQ">54</span>
        <span></span>
        <div style="display:none">60</div>
        <span class="p1Qr">88</span>
        <span style="display:none">143</span>
        <span class="oRy_">143</span>
        <span></span>
        <span class="n8jQ">160</span>
        <div style="display:none">160</div>
        .
        <span style="display:none">41</span>
        <span class="oldO">41</span>
        <div style="display:none">41</div>
        <span class="NQ2o">200</span>
        <span class="I0ja">.</span>
        <span style="display:none">27</span>
        <span style="display:none">63</span>
        <div style="display:none">63</div>
        <span style="display:none">178</span>
        <span style="display:none">191</span>
        <span class="47">222</span>
        .
        <div style="display:none">34</div>
        <span style="display:none">45</span>
        <span class="n8jQ">45</span>
        <span class="oldO">229</span>
        <span></span>
        <span style="display: inline">238</span>
    </span>'''

## Solution

    def parse_ipaddr(s):
        # normalize tags
        txt = re.sub(r'\bdiv\b', 'span', s)
        txt = re.sub(r'(?<=>)\s*([.0-9]+)\s*((?=<)(?!</)|(?=</span>$))', r'<span style="display:inline">\g<1></span>', txt)
    
        # extract style sheet
        css = {}
        l, r = s.find('<style>'), s.rfind('</style>')
        for i in s[l+7:r].strip().splitlines():
            m = re.search(r'\.(?P<key>[^{]+)\{display:(?P<val>none|inline)\}', i)
            if m:
                d = m.groupdict()
                css[d['key']] = d['val'] == 'inline'
    
        # collect ip parts
        ip_parts = []
        for j in re.findall(r'<span (class|style)="([^"]+)">([^<>]+)</span>', txt):
            if j[0]=='class' and css.get(j[1], True):
                ip_parts.append(j[2])
            elif j[0]=='style' and 'inline' in j[1]:
                ip_parts.append(j[2])
            else:
                pass
    
        return ''.join(ip_parts)

## Result

    >>> parse_ipaddr(s)
    '88.200.222.238'

## Links

- http://www.hidemyass.com/proxy-list/
- http://regex.info/blog/2006-09-15/247

2013-04-11

计算编程高手的平均年龄

Tags: json


## Problem

What's the average age of StackOverflow top users?

## Solution 1

    URL='https://api.stackexchange.com/2.1/users?order=desc&sort=reputation&site=stackoverflow'

    curl -s $URL |
      gunzip |
        jsawk 'return $$.items' |
          jsawk 'return $$.age' -a 'return $$.reduce(function(x,y){return x+y})/$$.length'

## Solution 2

    URL='https://api.stackexchange.com/2.1/users?order=desc&sort=reputation&site=stackoverflow'

    curl -s $URL |
      gunzip |
        jq '[.items[].age | select(.)] | add/length'

## Result

    36.72

## Links:

- https://github.com/micha/jsawk
- http://stedolan.github.io/jq/manual/

Pages

2013-04-28

How to post spam 500 times?

2013-04-24

How to scrape an obfuscated site? (一)

2013-04-11

计算编程高手的平均年龄