演讲与笔记

左耳朵耗子的绩效观

制定目标和绩效,目的不是用来考核人的,而用来改善提高组织和人员业绩和效率的。

人是复杂的,人是有状态波动的,任何时候都不应该轻易否定人,绩效考核应该考核的是事情,而不是人。

考核价值观最大的问题就是非常容易的上纲上线,也非常容易的被制造政治斗争,也非常容易的扼杀各种不同思想,老实说,这从很大程度上是一种洗脑的手段——通过对人制造一种紧张或恐惧而达到控制思想的目的。

KPI适合把人当机器用的行业,而OKR适合人人都是公司一员的创新行业。

YF:
考核的的尺子一定要长,是为了

在一个公司里,每天都会产生很多新的想法,并不是每一个想法都会落到实处,因为有的人没有时间做,有的想法纯属脑洞大开,有的想法有其他位置问题不可能实现,有的可能提出的人只有个创意,实施的人做出了花。

如果有重大的事,或者一定要做的事,要落到纸面上,有排期,这样才能真正做起来。当然如果做到了管理岗位,切忌把所有事都写下来,因为好大一部分是前面提到的做不做无所谓的事。要给每个人自由,才能让他们发挥到自己的长处。

所以,要用okr,而不是kpi。

阅读 instagram 的 python 升级文章

在 Instagram 的用户数迅速增长的过程中,性能问题还是出现了:服务器数量的增长率已经慢慢的超过了用户增长率。

为此,他们决定跳过 Python 2 中哪些蹩脚的异步 IO 实现 (可怜的 gevent、tornado、twisted 众),直接升级到 Python 3,去探索标准库中的 asyncio 模块所能带来的可能性。

在 Instagram,进行 Python 3 的迁移需要必须满足两个前提条件:

– 不停机,不能有任何的服务因此不可用
– 不能影响产品新特性的开发

Dropbox CEO’s speech

Dropbox CEO’s speech

Bill Gates’s first company made software for traffic lights. Steve Jobs’s first company made plastic whistles that let you make free phone calls. Both failed, but it’s hard to imagine they were too upset about it. That’s my favourite thing that changes today. You no longer carry around a number indicating the sum of all your mistakes. From now on, failure doesn’t matter: you only have to be right once.

So that’s how 30,000 ended up on the cheat sheet. That night, I realised there are no warmups, no practice rounds, no reset buttons. Every day we’re writing a few more words of a story. And when you die, it’s not like “here lies Drew, he came in 174th place.” So from then on, I stopped trying to make my life perfect, and instead tried to make it interesting. I wanted my story to be an adventure — and that’s made all the difference.

And today on your commencement, your first day of life in the real world, that’s what I wish for you. Instead of trying to make your life perfect, give yourself the freedom to make it an adventure, and go ever upward. Thank you.

It took me a while to get it, but the hardest-working people don’t work hard because they’re disciplined. They work hard because working on an exciting problem is fun. So after today, it’s not about pushing yourself; it’s about finding your tennis ball, the thing that pulls you. It might take a while, but until you find it, keep listening for that little voice.

Fortunately, it doesn’t matter. No one has a 5.0 in real life. In fact, when you finish school, the whole notion of a GPA just goes away. When you’re in school, every little mistake is a permanent crack in your windshield. But in the real world, if you’re not swerving around and hitting the guard rails every now and then, you’re not going fast enough. Your biggest risk isn’t failing, it’s getting too comfortable.

Honestly, I don’t think I’ve ever been “ready.” I remember the day our first investors said yes and asked us where to send the money. For a 24 year old, this is Christmas — and opening your present is hitting refresh over and over on bankofamerica.com and watching your company’s checking account go from 60 dollars to 1.2 million dollars. At first I was ecstatic — that number has two commas in it! I took a screenshot — but then I was sick to my stomach. Someday these guys are going to want this back. What the hell have I gotten myself into?

They say that you’re the average of the 5 people you spend the most time with. Think about that for a minute: who would be in your circle of 5? I have some good news: MIT is one of the best places in the world to start building that circle. If I hadn’t come here, I wouldn’t have met Adam, I wouldn’t have met my amazing cofounder, Arash, and there would be no Dropbox.

And now your circle will grow to include your coworkers and everyone around you. Where you live matters: there’s only one MIT. And there’s only one Hollywood and only one Silicon Valley. This isn’t a coincidence: for whatever you’re doing, there’s usually only one place where the top people go. You should go there. Don’t settle for anywhere else. Meeting my heroes and learning from them gave me a huge advantage. Your heroes are part of your circle too — follow them. If the real action is happening somewhere else, move.

One thing I’ve learned is surrounding yourself with inspiring people is now just as important as being talented or working hard. Can you imagine if Michael Jordan hadn’t been in the NBA, if his circle of 5 had been a bunch of guys in Italy? Your circle pushes you to be better, just as Adam pushed me.

I was thrilled for him, but it was a shock for me. Here was my faithful beer pong partner and my little brother in the fraternity, two years younger than me. I was out of excuses. He was off to the Super Bowl and I wasn’t even getting drafted. He had no idea at the time, but Adam had given me just the kick I needed. It was time for a change.

图灵奖获得者John Hopcropt讲座

# revolutions

agri revolutions 10000BC

industrial revolution 1700 AD

information revolution 2015 AD

# jobs

there used to be elevator operators, but this job disappears, so will drivers

what if 25% of the work force will be needed to produce all the goods and services

we are living in a changing world, job in the future will require a sophisticated education well beyond that available today

# China’s education

# deep learning
many layered network, the first layers learn it’s a image, the older layers learn the style and content but lose the iamge pixels

SVM is a big advance, deep networks is a big advance, but we don’t under deep networks

如果遇到刚入职场的我 —— 小道消息

0.能力的提升比涨工资更重要,虽然你特别缺钱,刚开始工作谁不缺钱啊,又不是富二代。别跟同事比薪水,「那个蠢货比我技术差,凭什么比我薪水高?」蠢货,比了也不会让你薪水立刻涨上来。

1.老板不是资本家,他/她也没有剥削你。「工作受剥削」大概是我这一带人乃至几代人受的教育里最影响你职业发展的一个垃圾观念了,赶紧从头脑中删除。永远不要再想起来。

2.注意仪容仪表,尽管你是个工程师,但也不能穿得别太邋遢,衣服整洁,每天洗头,刮胡子。

3.你的同事不是笨蛋,你的客户也不是白痴,你自己才是蠢货。所以,对他们出现的错误要能够理解,你的同事开发的软件会有缺陷,客户不会用则是因为软件做得不够好。

4.在客户那里,你代表的就是公司。别跟客户抱怨公司的产品不够好不给力,这实在太蠢了。

5.做好小事,处理好每件小事的细节,做不好的地方自己要有总结,找到原因以便改进。相信我,这时候你真不是做大事的料。

6.养成一个好习惯,比如每天做工作记录,每天回顾自己。跳槽后你会发现比很多人都牛。一个好习惯能秒掉一大部分人。

7.从现在起,你认识的每个人以后跟你可能都有合作关系,他们以后可能是你的同事,你的下属,你的上司,你的合作伙伴。争取让他们第一印象对你好一点。

ad talks

广告检索相当于先验知识,如果先验知识有效后可以放开

定向给广告主可见,哈哈

应定向和软定向。。

如果slave长期处于 standby 状态,真切过去的时候会不会 slave 根本不可用呢?要不要长期监控 salve 是不是好的

广告预控,取决于预估CPM与CPM的匹配程度,在快要达到预扣费的时候开始减慢速度。

PID 老一套,但是调参就可以根据历史数据由机器来调了,很好

读 Web Scraping with Python

#Chapter I Introduction

## 为什么要写爬虫?

1. 每个网站都应该提供 API,然而这是不可能的
2. 即使提供了 API,往往也会限速,不如自己找接口

注意已知条件(robots.txt 和 sitemap.xml)

1. robots.txt 中可能会有陷阱
2. sitemap 中可能提供了重要的链接

## 估算网站的大小

一个简便方法是使用 site:example.com 查询,然而这种方法对于大站不适用

## 识别网站所使用的技术

1. builtwith 模块

“`
pip install builtwith
builtwith.parse(url) # returns a dict
“`

2. python-whois 模块
“`
pip install python-whois
import whois
whois.whois(url)
“`

## 下载器

下载器需要提供的几个功能:

1. 错误重试,仅当返回的错误为500的时候重试,一般400错误可认为不可恢复的网页
2. 伪装 UA
3. 策略
a. 爬取站点地图 sitemap
b. 通过 ID 遍历爬取
i. ID 可能不是连续的,比如某条记录被删除了
ii. ID 访问失效 n 次以后可以认为遍历完全了
4. 相对连接转化,这点可以利用 lxml 的 make_link_absolute 函数
5. 处理 robots.txt 可以利用标准库的 robotsparser 模块

“`
import robotsparser
rp = robotparser.RobotFileParser
rp.set_url(‘path_to_robots.txt’)
rp.read()
rp.can_fetch(“UA”, “url”)
True or False
“`

6. 支持代理
7. 下载限速,粒度应该精确到每一个站点比较好
8. 避免爬虫陷阱,尤其是最后一页自身引用自身的例子
a. 记录链接深度

例子:https://bitbucket.org/wswp/code/src/chpter01/link_crawler3.py

#Chapter II Scraping

##抽取资源的方式

1. 正则
不适用于匹配网页结构,因为网页结构中空白等都是无关紧要的,而可能破坏正则 Structural-based
适用于数据本身符合某种模式,比如 IP 地址,比如日期等 Content-based
2. xpath 与 CSS
适用于匹配网页的结构信息 Strctual-based,lxml 的 CSS 选择器在内部是转换为 xpath 实现的,css 远不如 xpath 灵活
3. BeautifulSoup,慢,从来没有在生产代码中见到过

下载的第二步,就是把获得的网页传递给 Extractor 来提取内容,可以通过传递给下载函数回调来处理,但是这种耦合性太强了

#Chapter III Downloader Cache

* 书中的缓存把所有相应都做了缓存,包括500的错误响应,实际上这样的直接不缓存好了。。
* 书中的磁盘缓存把 url normalize 逻辑也加到了这里,感觉比较混乱
* 注意使用磁盘文件缓存的话会受限于磁盘单目录文件的数量,即使是 ext4 文件系统也不大

#Chapter IV

执行下载时间估算也是很重要的,每个链接下载需要多长时间,整个过程需要多长时间
多线程的下载例子,手工模拟线程池

“`
def process_queue(q):
pass

threads = []
while thread or crawl_queue:
for thread in threads:
if not threads.is_alive():
threads.remove(thread)
while len(threads) < max_threads and crawl_queue: thread = threading.Thread(target=process_queue, daemon=True) thread.start() threads.append(thread) time.sleep(some_time) ``` 性能的增长与线程和进程的数量并不是成线性比例的,而是对数比例,因为切换要花费一定的时间,再者最终是受限于带宽的 #Chapter V Dynamic Content ## 逆向接口 依赖于 Ajax 的网站看起来更复杂,但是实际上因为数据和表现层的分离会更简单,但是如果逆向工程也不好得到通用的方法,如何构建一个辅助工具呢?表示出网页上哪些地方是动态加载的,列出 js 全局变量,列出可能的 jsonp 请求 利用 Ajax 接口时,可以利用各种边界情况,比如把搜索条件置为空,置为 *,置为 . ## 渲染动态网页 使用Qt,使用 Selenium 或者 PhantomJS,这时附加 Cookie 等都是很严重的问题 #Chapter VI Form Interaction * 登录表单中往往会有隐藏的参数,比如 form_key 用于避免表单重复提交,还可能需要 cookie 验证 * Wow,竟然可以直接从浏览器加载 Cookie,使用 browsercookie 模块 #Chapter VII 使用机器识别验证码 使用 Pillow 和 pytesseract 识别验证码,但是 tesseract 本不是用来识别验证码的 ##一种锐化方法 ``` img.convert('L') img.point(lambda x: 0 if x < 1 else 255, 'l') tessact.image_to_string(img) ``` 还可以通过限定字符集提高识别率 还可以使用人工打码平台