创业

Netflix 公司文化学习笔记

在头条的时候经常听一鸣说到“context, not control”,开始以为是他原创的,后来才知道原来是从 Netflix “偷”来的。不过当时其实自己并不能特别理解这其中的含义,直到后来在创业公司,偶然想到才感觉真是醍醐灌顶,于是特意找来完整的 Netflix 文化 PPT 学习了一下,并做了一写笔记。

file

Value are what we value

file
file

公司价值观体现在我们珍视什么。几乎每个公司都有一些听起来很高大上的价值观,然而这些金玉在外的话语是没有用的。

  1. 公司真正的价值观体现在公司奖励哪些行为,提升哪些人,以及解雇哪些人。
  2. 公司的价值观也体现在员工们珍视的行为和技能中。

High Performance

file
file
file
file
file
file

Freedom & Responsibility

file
file
file

Context, not Control (My favorite part :p )

file
file
file
file

Highly Aligned, Loosely Coupled

file

Pay Top of Market

file
file

Promotions and Development

file
file
file

What are some profitable one-person online businesses?

Here are a list of examples I collected from Hacker News that claim to run a successful one-person online business. You could find the original threads at the end of this post.

Quiz Website

I run a popular Quiz website. I make around $6,000 per month from Google adsense. I work between 2-3 hours a week usually posting quiz links on my Pinterest page. My only expense is hosting which is around $20 per month (Digital Ocean). I have never advertised my website and it gets all the traffic from Pinterest Organically. Compare to my salary, I’m an IT Administrator in my day job and make $400 per month. I live in Ethiopia 🙂 I thought this inspires my fellow HN. Good day.

  • How do you manage to kickstart something like this? You mentioned you get your traffic organically via Pinterest, but there had to be something you did initially that set off that growth.

My website started five years ago, It didn’t get any traffic the first three years before one of my quiz went viral. Now I have around 70k followers on Pinterest.

  • This is important. I have seen this a lot. Persistence. Many people keep pushing,keep pushing even if there is no positive feedback loop for a long time. After a while, they beat time. Kudos.

Cursive-IDE

I develop and sell Cursive (https://cursive-ide.com), which has paid my bills nicely for a couple of years now. Currently I make more than I made in my last job at Google. I never thought I’d be able to make a living selling developer tools, much less into a niche market, but I’m constantly amazed by how well Cursive does.

The work is a mix of fun and boring slog, like most jobs I guess. A lot of my time is spent on support, both technical and sales, so when I work less I actually end up getting more frustrated because a higher percentage of the work is not as fun as writing new features. I’ve also had a bad year of having to work around IntelliJ bugs, but normally I like the actual development work a lot. I have friendly enthusiastic users who constantly make my day. It’s a pretty sweet gig, and being able to decide how I spend my time, and which bits of my time I spend working, is priceless.

I got started during a sabbatical from my last job, just building something that I wanted myself. It turns out that lots of other people wanted it too.

Product Pix

I own Product Pix (https://www.proproductpix.org). It removes the background from product photos, with the intended audience being mostly people who sell stuff online and need to set their background against a white background.

It makes $1300/month right now, up from $0 6 months ago. Living in the Bay Area, that would put me well below the poverty line if it were my sole source of income, so I’m not gonna call it "successful" just yet.

How I got started: I do machine learning, and I methodically searched for places where people buy a service transactionally on platforms like Fiverr and that I think can be automated away (or greatly automated with human reviewers in the loop) with state of the art machine learning models. There are hundreds or thousands of such opportunities that individuals can solve on their own.

I’ll be more comfortable giving sage advice once I’ve crossed the $10K/month threshold, but still I’d say a willingness to try a lot of shit out and get digging on stuff you have 0 familiarity with is mandatory. In this project I’ve had to learn javascript, frontend, photography, google ads campaign management etc.

Another tip I wish someone had told me is, build a pricing page from day one. The temptation to get some signal you’re useful to people will drive you to offer stuff for free, but that will end up getting you a lot of unwanted attention from people who will never ever pay.

Instagram Posting Service

My business is in my bio, don’t want to link it here. Pays about the same as my previous job at Microsoft did, but with a lot less involvement — I haven’t touched the main code in about a year now. I probably spend about two or three hours a week on customer support, that’s it, really. No marketing spend, all word-of-mouth and Google.

The idea came about when I wanted to post to Instagram, but the API didn’t allow it. So I spent about a week trying to automate the process using a phone, with screenshot OCR and a state machine. After a lot of messing around with it, I had a working prototype. Made a website, added a $5/month Stripe plan to see if people were willing to pay for it, sent it to a few friends, posted it on Twitter, and eventually, people signed up and tried it out. It worked, then it didn’t work, then I fixed it, then it worked again, this went on and on for a few weeks until it became quite useable.

About two months in, local offices of Toyota and Samsung signed up, and they loved it, money wasn’t an issue. That was the moment I realized it may be worth doing it properly.

It grew organically, and I bought lots and lots of Android phones, which are simple workers getting jobs off a queue, and host them in two locations roughly. Phones last for about two years, then I buy new ones (<$100 a phone). Each phone pays for itself in less than a month, server costs are less than $200 a month.

Facebook tried to sue me after I filed for a trademark, we figured it out (I rebranded). Been going steady ever since, but I consider it to be shut down by yet another Instagram move sooner or later. But I said that after 3 weeks of running it, and it’s been almost five years I think.

I made it a point to not use any private Instagram APIs, like all my competitors did — instead, I don’t emulate the Instagram app, I emulate the person tapping the phone, and use only the official app for it. I think that let me survive this long.

Updown.io

I run https://updown.io since 2012, a website monitoring service I created. I’m working about 5-10 hours per week on it. It makes about $6,000 per month and is still growing linearly. I also keep a full-time job alongside for now as an engineering manager. The key for me is to take time, make something useful, delight your clients, and don’t try to become uber or airbnb.

Sports App Business

I am mainly in sports apps. I think it is still possible to have succes. It requires a lot of patience. Don’t focus on the revenue part. And don’t try to build a new hype. Very slim chance you build the next angry birds. Instead try to build a product that is based on an already successful specific category/ product. Very important is that you understand your customer and genuinely try to make a product that is better than the competition. You should love your own product. The good thing is that bigger companies tend to destroy their own product with too many ads, notifications, non relevant features etc. Furthermore I believe it’s important that your product contents can be automated without too much manual work. After all you are the only person with only so much time. I know a guy who created a fitness diet app. He cooked and photographed more than a thousand meals. He wrote many articles. In the end he gave up. It took him 80 hours per week to maintain and update all the content. His app was making maybe 100 a month. I know another guy who created a successful formula 1 live app. He is using paid data feeds and scrapes a lot of additional data. Everything automated. Spends like 10 hours a week maintaining things. Makes about 100k a month. Similar story for a guy who created a popular weather app. In essence the only thing what they do is aggregation of data and present it in a relatively simple app. Also don’t spend too much time on analytics, seo and other optimizations. It may take 2 years before you get traction anyway. First the product then after (if it’s worth) the optimization. One concrete product where I think you can still have success is a baby monitor with 2 phones. Couple of good apps only. All premium priced. Not too difficult technically. I don’t have time for it, so go for it 🙂

PageFlow

I run https://pageflows.com and have been living off it full time for a little over a year.

The business makes a bit more than what I was earning a few years ago as a junior developer in London, so it’s not a huge amount of money, but it’s enough.
It’s a fairly boring business to run and not as predictable or sexy as some sort of micro saas, but it’s I’m happy with how things have been so far. Happy to answer any questions you have.

Most customers are indeed businesses. Great shout on some sort of team/business plan – it’s on my to-do list!

I’ve commended to a response below with the business model, but yeah I’ve just started trialling a freemium model yesterday so need to update the rest of the site with clearer pricing plans etc.

Until yesterday there was no freemium access, it was just paid up-front to access all the content. $39 per quarter or $99 per year.

Your use case is kinda where the idea came from, most product people do something similar. The hard part is adding enough relevant content on Page Flows for enough people!

Wakatime

Seven years ago I solo-started an automatic time tracker for programmers called WakaTime 1 and launched here on HN 2. Partly from listening to developers too much, I waited way too long (almost a year) before adding a paid plan, but now it generates more MRR than an SF developer salary not including stock options. Technically I make more from RSUs and stock from past startups as a regular employee, but if I wasn’t lucky with those then it would be my highest income stream.

1: https://wakatime.com/about
2: https://news.ycombinator.com/item?id=6046227

For anyone thinking it’s egregiously difficult to start a solo-project: You’re right, but if you stick with it your persistence will pay off. For solo-products, I think grit is the deciding factor between success and failure.

There were several stages of MVP. First usable version took a month and half to build and public launch with 2 IDEs supported was 2 and half months after starting to build.

May 3 2013 – Started development Flask website & Vim plugin (https://wakatime.com/blog/1-why-i-built-wakatime)
June 25 2013 – Finished Vim plugin and Website (https://github.com/wakatime/vim-wakatime/commit/4346a055e301…)
July 1 2013 – Started Sublime plugin (https://github.com/wakatime/sublime-wakatime/commit/b7fe36f8…)
July 15 2013 – Finished Sublime plugin and public launch (https://news.ycombinator.com/item?id=6046227)

Unfortunately I don’t have WakaTime data until after finishing the Vim plugin, but everything after that I can see how long the actual coding took by dogfooding.

ERP plugins

I have a one-person lifestyle business. I like it primarily because it gives me the flexibility to live anywhere in the world. I hated my old desk job and the idea of 2 weeks vacation every year.

I run a SaaS product that integrates with ERPs. I pretend to my customers that I have a team (so much so that I have multiple email addresses to people that don’t exist that actually just forward to me). One of our customers thinks they’re paying for a team of 6, but it’s actually just me.

My monthly billings last month was 73k USD. I am a tax resident of a tax haven although I do live 3-6 months at a time in a different country.

The only advice I’d give anyone looking to build a lifestyle business is to keep your ambitions and by extension- product feature set in check. I know several other people who operate like me, and the common thread is we have businesses that can easily take VC funds, hire, and expand. But for lifestyle priorities, we chose not to.

A lot of people I’ve met (particularly in Chiang Mai, Thailand) copy popular, common, and easy online businesses such as drop shipping, social media XYZ, or coding. Unless you live in a really low cost area, it’s not a good life. The key is have a very specific niche that can be scaled upwards if you want, but you always have the option not to. Those the ideas and businesses that seems to provide the ideal balance in lifestyle.

EDIT: The product came about at my last job where I built it to make my own job easier. Essentially it did 95% of what job which at the time enabled me to be the "best performer" while not actually working that hard.

Flowx.io

I develop Flowx [https://www.flowx.io], an Android weather app. It makes around $2,500 USD/month with about $500/month in costs excluding my time. It covers about 60% of my total costs including my time which is 40+ hours a week. I cover my remaining costs through contract work. This might not seem like a success but the business allowed us to move to the Rarotonga, Cook Islands from Auckland, New Zealand. Lifestyle-wise and building-a-business-wise, I think it’s a success.

Just an added note. I started Flowx as a side-project in 2012. In 2016, we moved to Rarotonga and we decided then to try to grow it into a business. It was making ~$100/month at that stage. Since then, it has grown it to $2500/month through added pro features and a better subscription prices.

Stay focused

I sell freemium software that blocks distractions on your computer so that you can focus on doing work. Unlike my competitors, it’s a one-time payment business model.
The idea for my product first came to me when a friend in university had trouble staying focused on writing papers. He was constantly playing World of Warcraft and needed a way to temporarily block himself from playing the game. So I quickly made a little VB.NET app and service that would watch for the game executable and kill the process if it starts. It did the job well enough and he ended up graduating 🙂

At that point, some other students approached me and asked for my little app to help them study. That’s when, half-way through university (2010), I made a website for my app and had it available for free. I continued to maintain it and over 4 years, added more features including: blocking websites, adding breaks, scheduling, and passwords.

In 2014, I split the product into a free and paid tier. It wasn’t an easy decision, but I was spending a lot of time on it by this point and customer support was also starting to take a serious hit on my personal time. In about two years (2016), I was making more money from the paid product than my well paying government day job. So, I decided to quit my job and work on my business full time.

Although I felt it was risky, the alternative was passing up an opportunity many people dreamed of having. I never planned to start a business in the first place and I kind of felt/still feel imposter syndrome. For now, I’m just enjoying my new found freedom and continue to be thankful for my new job. I’m going to keep it a lifestyle business for now, but I wouldn’t be opposed to selling it as my exit plan.

I’ve spent (effectively) $0 in advertising since developing it and I’d say my customers come from organic search, external links, and word-of-mouth.

browserless.io

I run a headless browser service called browserless.io. Got started due to lack of a comparable service, and all others seemed more geared for testing.

It’s been around two years now, and makes more than any prior engineering job I’ve ever had. You do have a lot of other stresses you might not otherwise have, but you’ll also work a lot less than at a traditional job!

I’m working on a few interviews for some sites, which go more into the details, and will post here when they’re done.

EDIT: feel free to comment here on anything or email me at joel at browserless dot io

Website builder

I created a SaaS website builder for a small niche market. I’ve been running it for about 8 years. I gross a bit over $14,000 per month with about $500 in expenses for servers and third-party APIs. I work 1-2 hours per week answering customer support emails. Basically, I automated away my old job as a web designer 🙂

The smartest decision I made was targeting a small niche market that larger businesses wouldn’t bother with. I often get kind emails from customers thanking me for helping their industry. I kept things simple, didn’t add features unless I really believed customers needed them, and didn’t try to generalize the solution. I think those are the main reasons why the product worked.

By far the hardest part was/is marketing. I’m still bad at it. I’ve tried may things. Most failed or were too hard to sustain. Some succeeded, like Facebook ads, but those successes were often hard to recreate. At this point it’s mostly word of mouth.

Working alone can be psychologically challenging. When I have a problem, there’s no one to help because no one else knows how the platform works. With no one to bounce ideas off of, it’s easy to get stuck in a rut going round and round the same set of possible solutions. And I really have to monitor myself to ensure that I don’t get too isolated. This was an issue in the early years, but now I have a routine that gets me up and out and into the world every day. I would strongly advise anyone considering the solo route to carefully consider the social and mental health aspects of working alone.

I feel very grateful to my former self for doing the hard work that pays my bills today. And I’m tremendously grateful for open source tools and resources like Stack Overflow without which I would never have made it this far alone.

meme tshirt

I ran a Shopify site selling meme shirts for 3 years.

You might recognize classics such as "Legalize 4Loko 2020" and "BREAD" as featured in Elle magazine.

All on-demand printing. Order goes through Shopify’s API to the supply center, order gets fulfilled, shipped. No inventory. Kinda pricey, but zero maintenance. Set and forget.

Find the most extremely dank and niche memes possible so you hit the little nugget inside of someone’s brain that makes them want to spend $15-30 on a t-shirt.

A good print would net me somewhere like 300 orders a month. A sweatshirt could go for $50-60. You have options.

ML golf predict

I made https://www.golfforecast.co.uk – an ML algorithm to predict golf.

After 5 years it’s making enough from subscriptions for me to live off (3K gbp/mo). The algorithm is always a work in progress but it’s seeing consistent returns now so I’m making money from that too 🙂 plus it makes golf a lot more entertaining.

IPINFO.IO

I started https://ipinfo.io as a side project, and then ran it fulltime as a one-person SaaS app for over a year. We’re now a team of 8, profitable, and growing quickly. We’re still 100% bootstrapped, and I have zero plans to raise any outside funding.

We started with a simple IP geolocation API, which now handles over 20 billion API requests per month. We’ve added new data to that service, such as IP type classification (hosting, isp, or business, and soon education too), IP to company, and carrier detection. And we’ve also launched some other products, like hosted domain API (all domains hosted on an IP, sometimes called reverse IP), IP ranges belonging to an organization, and an ASN API. We’ve got a lot in the pipeline too, including some domain related offerings (see https://host.io for an early preview).

So it’s definitely possible 🙂 What sort of SaaS product are you thinking of launching? Would be happy to chat! Shoot me an email at ben@ipinfo.io

NANAGRAM

I’m working on NanaGram (https://nanagram.co) solo and bootstrapped. Although I’m not making a full-time income yet, it’s generating a profit. It’s mostly automated.

NanaGram is the 3rd greatest generator of happiness and fulfillment in my life (after my wife and my dog). I get a constant stream of good vibes from customers, most recently voicemails from grandmothers! (https://nanagram.co/blog/feedback-by-vm)

Good luck 🙂

PhantomJSCloud

I run https://PhantomJsCloud.com
I started it as a free MVP about 2 years ago while in Thailand, and given that I was attracting a slow but steady stream of users I decided to build out a commercial v1 from it.

The freemium SaaS went live in March and it’s growing monthly. If I still lived in Thailand I would consider it very successful, but I am in the Seattle area now so it’s ramen profitable.

The biggest surprise I got was how slow organic growth takes. Every month I gain more users + MRR but discovery seems to be the biggest problem. I tried Google Adwords in June but Google decided to cost me upwards of $5/click for basic keyword targeting so gave that up. I tried Adwords again in November and now google thinks I’m more relevant, so I pay starting at $0.20/click for the same keywords that cost $5/click 6 months previous. I am currently doing experiments to see if the acquisition cost justifies that spend.

From a effort perspective, the SaaS api+backend itself was about 50% of the effort. The subscription service + user dashboard was another 50%.

From a skills perspective, I think doing a SaaS as a solo founder is only practical if you have extremely broad skillsets: Business management, UX, full-stack webdev, devops, sales, marketing, support. Thankfully I have some experience in all those (except sales) so I was able to either do or fake everything required. If you don’t have all those skills, you are going to be increasingly reliant on luck, which isn’t a winning strategy.

I solicit users to email whenever they have a question/comment/issue and reply to everything. Overall I think I have provided email support to aprox 50% of my paying customers, and maybe half of the support was provided before they decided to pay, so it is very important 🙂

it’s actually a really great way to understand your customer’s needs, and your products actual (in the eyes of the user) deficiencies. I also use uservoice to help the highly-desired features/requests to "bubble up" but if a customer asks for something and it’s an easy enhancement I go ahead and implement it. Likewise if the same problem is annoying a bunch of people, I need to either document a workaround or make it easier.

Yes, "inbound marketing" (a blog) is probably the biggest accelerator to growth I can (and should) do. I’m holding off for now though, as I need to make the product more friendly to business users first. Right now PhantomJsCloud is focused on developers, so I need to make some non-dev friendly tooling first. That’s my excuse at least.
Regarding StackOverflow, yes, that’s actually how I validated the free MVP (answering SO questions and if my product might be beneficial, providing a link to my product) but generally those traffic sources don’t seem to scale very well past MVP validation. I haven’t tried Quora though, I will add that to my todo list 🙂

SingalBox

I’m running https://SignalBox.ai alone, I wrote all of the software and am working on partnering and sales right now.
Previously I have 2 other startups, one was media monitoring and one was forex.

The media monitoring is B2B only. The forex trading is automated and run from my home research cluster.

Both are generating enough revenue to live off (media monitoring 120k forex, 60-80k)

I guess they fit the definition of solo founder and online, but they have no public facing websites (except SignalBox)

EDIT: I also run a slack group for Solo Founders, If you would like an invite, please email me

  • Sales are a big struggle to me. Where did you find this partnerships?

Network. Go to the meetups.
Don’t rely on serendipity, we can do better than that. Use your programming skills.

Pull the meetup list, get all of their twitter profiles, search everyones last 1000 tweets for topics you are interested in. Pull all of their code on github. Push it through the profiler and find the talent.

Mirror github if you have to. Pull the whole darn thing, it’s only a couple of hundred gigs (if you dont pull the code) Profile everyone based on their stars, contributions, watchers and pull requests.

How many other meetups do they go to? What’s their history like on other forums?

Put the pics of these people on your phone, and then go and find them at the meetup. Pull their customer lists / testimonials and any other publicly available data.

Look at their company DNS records. Pull their company filings if they’re available. Know their revenue, know their customers. Who’s making the decisions at this company? Who is signing the cheques?

Scientia potentia est

dropshipping guitars on shopify

I run an ecommerce store from Shopify which fulfills the orders by drop-shipping through AliExpress.
This is definitely doable for one person, and it isn’t technically challenging for a software developer–but the hardest part (at least for me) is marketing, creating content, advertising, and so on.

Actually running a Shopify store and fulfilling by drop-shipping is simple. I would definitely recommend that as a good place to start, one person can do it.

There are a few good ways, but it really helps if you know the products well. For me, my site sells guitar parts and DIY kits. And I’ve been playing guitar since I was about 10 years old, so that helps a ton.
I had a few other stores before this that didn’t sell well at all, and I have to say that’s because I just didn’t know the products, or what the end users really wanted/needed/cared about.

Great ways to pick products: – Terapeak (http://www.terapeak.com/), but this is paid – eBay completed listings – Or most simple (and what I use) — once you know your products, search AliExpress and sort by "best-selling". That’s my go-to.

Feel free to check my store for ideas (or if you want to buy something!). URL is: http://modshop.guitars/

pinpoard

I run Pinboard, $257K in gross revenue for 2016. A ton of money for one person, not quite enough for two people.

Do you have a writeup on your marketing?

My marketing is, I spend all my time talking smack on Twitter.

bugmuncher

I run BugMuncher (https://www.bugmuncher.com), it started as a side-project 5 years ago, then in September 2015 I packed in freelancing to focus on BugMuncher full time.

As of November 2016 BugMuncher reached profitability – ie: it’s my sole source of income, and covers all of my living expenses.

price comparison

I run https://www.fortsu.es (also https://www.fortsu.co.uk, https://www.fortsu.de and https://www.fortsu.com) a price comparison website for running shoes. Original one is focused on spanish market while expanding into interesting ones.

It started as side project some years ago when I wanted to buy running shoes online and it has been improved over the time. To-Do list never ends 😉

At the beginning it was basically word of mouth and niche related forums on the internet.
Then I started reading about SEO and advertising. Organic search more or less work but I got almost no traffic from a couple of banners on related pages during few months. I didn’t try advertising networks like AdWords.

I don’t see big brands as my competition. I have partnered with some but I don’t think they sell much on the internet (typically higher price tags) compared to full equipped city centre stores.

iOS app

I started selling macOS (and now iOS) software on my own website back in 2007. https://clickontyler.com My original goal was to earn enough money to refinish the hardwood floors in my house. Since then, however, it’s taken on a life of its own and become a suite of three main products. It enables me to live comfortably in the Nashville suburbs.

webhosting reviews

1 man startup – http://reviewsignal.com/webhosting/compare I do web hosting reviews. Not the scummy pay-for-placement stuff you see, but an actual review site. It tracks what people are saying about hosting companies on Twitter and publishes the results.
The story is told a bit here http://techcrunch.com/2012/09/25/web-hosting-reviews-are-a-c… I was just tired after 10 years of still relying exclusively on my experience and the experiences of people I knew. Figured there must be a better way and I had been working with Twitter data for thesis and saw this opportunity.

selling online

I started selling online, total sales so far over $300k. Multiple sources, some retail, some wholesale.
What I’ve learned:

  1. Not all rules matter. A large part of my business is stretching certain rules, either from the marketplace, or from the source (e.g. a store that doesn’t allow resale). That said, you can’t get away with breaking rules unless you have a very good understanding of why the rule exists, who’s motivated to uphold it, and generally what the risks are. Don’t screw over customers.

  2. There’s a lot more to be made by taking risks than there is to be lost. I’ve easily lost over $1k multiple times in various ways, but when I "win" it’s to the tune of 10 or 30 times that. Take smart risks, only where the realistic upside justifies it.

  3. Be willing to pay for information. There are courses out there in almost any topic. Personally I’ve largely carved my own path and paid very little , but I’d still recommend courses for others. Also read a lot of whatever free information is out there, and network with people who have more experience.

  4. Don’t do too many things at once. It will kill you. I’ve been full time in college and it’s extremely tough to balance everything. Delegate as soon as you can afford to, anything others can do that doesn’t take a lot of brains pay people to do.

  5. Don’t be afraid to scale, but do it slowly. My first purchase of over 10k was 6 months after I started, iirc.

(Several of these are probably specific to this kind of business, may not be generally applicable. Startups have a much different road where profitability isn’t the most important at first.)

selling open source software to government

I was working for Automattic after an acqui-hire thing. After a year there, I found that I missed working in security. I found a full-scope penetration testing gig three blocks from my apartment.

In my spare time, I started to tinker with a few ideas and released them as an open source project. Said project saw a lot of interest within the hacker community very quickly. I didn’t expect this. Folks formed an opinion on it pretty quickly. Some people hate it. Others love it. Of those who know it, very few are in-between.

I left my pen testing job with a decent amount of money saved up. I didn’t know exactly what I would go and do afterwards. I spent some time tinkering with Android, just for giggles.

I was very reluctant to start a business that used my "successful?" open source project. Partially because it leverages another open source project owned by another company.

I was at a conference in 2011 and someone from a US government agency asked if I was selling anything. I said no. He said that was too bad, because he had end of year money, and he liked my open source stuff. It was then that I decided to look at expanding my open source kit into a commercial product.

April will mark the two year anniversary of my first customer. My customers are well known organizations and they trust my software to assess how well they protect their networks. I’m constantly in awe of this.

website counter

https://www.improvely.com and https://www.w3counter.com

Five figures a month, just me, I’ve written about my solo business a couple times in other Ask HN threads. Ten years ago (almost to the day), in my college dorm, I was looking at the Webalizer web stats report my web host provided for my blog, and thought "I could do something much cooler than this". So I did. I had built a few educational sites and threw some ads on them for a couple years before that, but W3Counter was the first service I actually charged a subscription for, and now I make a living building and selling this stuff.

vintage computer hardware

I don’t know exactly how you define "successful online business," but I am currently a university student making $500 – $2000 a month at about 5 to 10 hours a week.
Basically, there is a market for vintage computer hardware, so I post some adds offering to take away old office items they can’t just throw away. Such as old keyboards, terminals, etc. and they pay me a nominal fee ($1 – $5 per item depending) to rid them of their "trash". I then resell those items after cleaning them up a bit for extremely high profit margins $35 – $120 for 20 minutes of work (since I was payed to take away the trash).

One of the things I did was sold Model M keyboards which I made USB compatible: http://austingwalters.com/keyboards/

Another way I make money is by tutoring or helping out with programming, I use to help out local people, but I have since switched over to Google Helpouts. Usually, it’s just explaining some algorithms and writing some C code. Pretty easy, no real upkeep, and I can set what ever hours I want.

pingrow

Just launched Pinegrow Web Designer (http://pinegrow.com) two months ago. The company is actually run by my wife and me, but I do all the work with Pinegrow while she is taking care of our other projects.
Pinegrow has been paying most of our bills since launch and I have a lot of expansions in the pipeline: full support for Foundation alongside Bootstrap, developer edition that’ll work with templates, a similar app for designing emails…

cramfighter

I run a small business called Cram Fighter (http://cramfighter.com) that is targeted at students (mostly medical) that are preparing for standardized exams. I got the idea after watching my wife preparing for her board exams and it seemed like a perfect little project to learn iOS programming. Initially my goal was to do earn maybe $5k annually, but now I’m on track to surpass my salary as senior developer by next year.
You’ll find a lot of one-person businesses targeting tiny, but profitable, niches like mine. What’s great about it is that often when you find a tiny opportunity, it opens up a lot of other problems that need solving that you would never find otherwise. It’s also a great way to learn the skills of running a business in a relatively stress-free way (at least compared to running a startup).

The only downside is if you’re anything like me, you’ll get antsy working on small projects and yearn to tackle bigger, more ambitious problems. Sometimes 1-person companies have the potential for turning into a company with startup-like growth, sometimes not. I’m still trying to figure out how far I can take my company.

office snapshots

I run http://officesnapshots.com which publishes photos of office design projects from around the world.
I started it in 2007 as a gin side project to teaching history. I’m no longer teaching and it is the majority of my income.

laptop battery meter

I sell a laptop battery meter (http://batterybarpro.com). It’s not income replacing; it makes about $1,000 per month, but it’s been crucial in saving enough for down payments one two houses.
I’ve tried to get the revenue numbers up, but I’ve never been able to break a $2,000 month.

robots everywhere

http://www.robots-everywhere.com I used to employ two people, but I automated them away. I am successful in the sense that I have clear title to my home at age 33, if that counts.

excel version control

I run https://www.spreadgit.com, a hosted version control system for Excel. Doing this solo and full time. It’s been a hell of a ride so far but I love it.

write a book

I wrote a book1 that generates about $2k of revenue per month. Not quite your definition of success, but it’s given me a taste. I’m now in the beta testing process for my next thing2.

smart shooter

I develop and sell Smart Shooter.
http://kuvacode.com

Its a traditional desktop app (windows, mac), but only sold online via our own website or the mac app store. I created it about 4 years ago, and work on it solely in my spare time. In fact I’m employed full time at a major tech company but this I keep separate.

To claim its profitable is a bit misleading, because of cause the major cost in developing such software is my own time. I’ve incorporated as a limited company here in Finland but do not pay myself a salary, so the only costs to the business are web hosting and occasional hardware purchases (computers, cameras).

I started this as a project for personal interest; at the time I was working as a software engineer developing financial trading software. Smart Shooter was a good way to develop something that covered both my interests in graphics programming and digital photography, to alleviate the borebom from my day job.

So for me its been successful, its still an pleasureable hobby, allows me an excuse to play around with the latest cameras, and brings in some pocket money. It doesn’t generate enough revenue that I could quit my main job, but the possibilities could be there if situations change.

payment gateway

I am running a complete payment gateway that supports VISA and MasterCard and mobile payments by SMS.
The name of the service is: https://www.bizify.me

For an introduction to the service: https://www.bizify.me/hacker-news/

asterisk consulting

I sell Asterisk reporting sw for win @samreports.com. It makes about $1000 a month in revenues. I also work as iOS developer for the man. I have a free iOS app on the AppStore (HRTecaj), soon to be commercial, when I add ATMs. I was Asterisk integrator, and learned a lot about the system, made software to present call reports in customisable and pleasant way. SAMReports has been selling, consistently, for 4 years. I made a few updates, but now I’m working on a major update.

sell books

I run http://lsathacks.com, and have a related book series
I sell e-books on my site and through affiliates, and sell print books on amazon. All told I make around $3,000 a month in passive revenues. I also make $4000-$5000 more in tutoring revenues.

However, the site is fairly new (I just sold the books through affiliates/print previously). As I grow the site I expect I may be able to get over $10,000 per month passive.

The LSAT is an admission exam for American and Canadian law schools. My materials/lessons teach people how to do better on it.

radio community

I own and operate http://www.radioreference.com and http://www.broadcastify.com. I do all the development, business management, and support.
I have a team of community volunteers that do a lot of day to day moderation and member management.

I got started simply building a set of community resources for the radio communications and hobbyist market.

We’re very profitable and these businesses provide the majority of my family’s income.

electricain calculator pro

I developed the Electrician Calculator Pro, a National Electrical Code compliant calculator for engineers, electricians, lighting designers, etc:
http://www.electriciancalculator.com

I first created the Android version about 3 years ago, then the iOS version about 1 year ago. It currently makes just enough to cover some bills, although I believe it has a greater potential. I’m currently looking for ways to make this a recurring revenue stream instead of a one time payment gig.

vlad studio – wallpapers

I’ve been running http://www.vladstudio.com (where I publish my wallpapers and other stuff) for several years, and for quite some time, it was my primary source of income. Unusual, because my premium accounts are not really a "product", but just a way to "like" or "donate".

Ref

  1. https://news.ycombinator.com/item?id=21332072
  2. https://news.ycombinator.com/item?id=19701783

Baelish: An Introspection II

完全自己一个人做了一个系统,在这个过程中有不少的收获和教训,趁还没有忘记赶快记下来。

过早优化

开始设想的服务太大了,想做一个超牛逼的大而全的东西。所以在一开始的时候就拆成了好多的repo,
每个模块都拆成了不同的微服务,中间使用 RPC 调用,并且每次打成好多不同的镜像,部署的时候也很麻烦。其实这里的问题在于不明白其中的逻辑,而是生搬硬套架构,犹如东施效颦。

分库是一个很大的问题, 最开始的时候总是想着把库拆出来做一个基础组建库,然后拆出来了好多库,甚至把代理和抽取都单独出库来,实际上没有必要保持代码的纯洁性,这是我常犯的一个错误。这方面造成了镜像打包都很麻烦,而且要在各种库之间切来切去,依赖也要重复安装好多次。当某一个组件需要被其他人复用的时候再拆出来也不迟,像是 npm 那样拆得太散也不好。

即使在拆成不同仓库比较好的时候,也没必要打成好多的镜像,如果一个人维护多个镜像的话,很容易就会忘记每个镜像的每个版本到底更新了什么。

另一个问题就是典型的“过早优化”。早期我把很多只是保存状态,做增删改查的部分都抽象成了单独的服务,实际上封装到一个接口中,读取 redis 就很好,在做好监控的前提下等到 redis 扛不住的再优化也不迟。实际上在项目的早期,做一个单体应用就很好,需要抽出来的地方抽出来,能不抽出来尽量不抽出来。这里的问题其实还是在于没有理解逻辑,生搬硬套架构。看过了一千篇文章,却还是做不好一个架构。

强行使用刚学会的技术

这点主要体现在 Frontier 和后来的 Scheduler 上面。在定向爬虫上,Frontier 本身就不是必须的,根本没必要多此一举。Scheduler 也没有必要使用 token bucket 算法,使用堆是最好的。token bucket 或者 leaky bucket 还是必须的。这里也考虑过多,单点部署其实就够了。单 master 多 slave 虽然看起来会有单点故障,但是确实是最简单高效的模式。

选型

选型上出的问题主要在于消息队列,日志服务和容器平台。监控上的选型倒是正确的。监控的选型完全是错误的,Prometheus 才是唯一正解。这里的问题还是在于东拼西凑概念,没有完整的理论体系。

消息队列

~~最开始混淆了缓存和队列的区别,对于爬虫的不同任务来说,需要分别放在不同的缓存,而不是直接
放到同一个队列,这样是无法调度的。~~这里在于对于消息队列的理解不够深入。

在队列的选型上,首先尝试了 rabbitmq,然而 rabbitmq 并没有一个很好的 Python 客户端,官方钦定的客户端叫做 pika,抽象层级不够,仅仅提供了非常原始的包装,而且 rabbitmq 本身的稳定性非常差,经常莫名其妙挂掉,而且没有任何一场日志,在 rabbitmq 上至少坑了半个月。

然后尝试了 redis stream,因为我本身对 kafka 的概念比较熟悉,而且redis 本身也是比较稳定的。但是还是感觉被坑的不浅。当时 redis stream 刚刚出来,Python 的客户端还没有支持这个特性,导致一些代码还需要自己解析响应,在这上面画的时间不少不说,做出来的还不太稳定。redis stream 虽然是借鉴了 kafka 的概念,但是还是有很多地方不同的,而且有一些东西也没有明确,这就导致实现起来各种小 bug 满天飞。最重要的一点是,redis 想实现 kafka 这个 API 本质上就是南辕北辙了,kafka 之所以可以做到 consumer group 能够重放这个功能,就是因为在硬盘上有比较好的消息堆积能力,而 redis 作为一个内存数据库,注定做不到好的消息堆积能力。实际上单纯模仿 kafka的 API 是没有意义的。

现在使用的是 celery,然而还是有问题。celery 提供的并发模型太少,只有 prefork 和 gevent勉强可以用,然而 gevent 又回导致严重的内存泄漏问题,而爬虫是需要大量的并发请求的,在这种情况下,celery 就成了一个瓶颈。另外一个问题是对于失败任务的 retry 机制在 celery 中也很不明确,celery 本身封装了不少层,导致捕获出异常来成了一个很大的问题,而我们又不能设置永久重试,最终结果就是有一些任务在重拾到最大次数之后被永久丢弃了。这里也是和爬虫这个业务紧密相关的,毕竟下载的失败率是很高的。

最终决定还是使用 kafka。最开始的时候,实际上还是觉得 kafka 太难搭建了,用起来的话太浪费时间了。但是实际上最开始可能用 redis 就可以,等到性能出问题了再去换到 kafka 上。使用kafka 的话,上面两个问题都可以得到解决,自己编写客户端可以任意选择并发模型,而且对于抓取失败的链接可以自定义重试策略。

日志服务

当部署多个实例的时候,实际上日志的收集是非常关键的一步,可以说必须在横向扩展之前完成,而之前忽略了这一点。在 debug 的过程中,日志非常重要,日志的缺失也就拖累了开发进度。

实际上这里的根本问题还是在于东拼西凑架构,而不是有一个统一的设计理念。

另外,阿里云的日志服务也是一个大坑,连基本的全文搜索都做不到,搞一些花里胡哨的东西也不知道有啥卵用。plain old grep 才是排查问题的利器啊。现在看来可能还是需要 loki + kafka 来做一下。

关于业务性日志和程序性日志的区别,会单独再写文章讨论。

容器平台

最开始的时候没有多想,直接使用 ansible 上线部署,这样的不好是在同一台机器上只能部署一个实例。但是爬虫需要扩展的时候,需要在一台机器上部署多个实例,这时候就需要容器的编排平台了,另外就是日志也需要收集。首先考虑了 kubernetes,但是还是觉得太复杂了,概念有点多,感觉用不然,然后就选择了 nomad,结果证明又是一个大坑。nomad 的编排经常无法看到运行中的容器,迷之找不到 container。nomad 的日志收集也有问题,没有好的解决方案。最重要的问题还是,nomad 的生态太小众了,遇到问题无法查找到社区提供的解决方案。最终还是上了 kubernetes,其实过了入门的坎,再看 k8s 还是很简单的。另外一点就是 k8s 通过 cluster IP 这个功能很好地解决了服务发现的问题,完全不用再去手工注册服务,代码量节省了不少,也省去了维护 consul 的工作。

RPC

在 RPC 框架的选择上,主要纠结在 thrfit 和 gRPC 之间,虽然花了一些时间学习和比较两个框架,但是最终感觉还是值得的。不过也还是使用地太早了,在最开始的时候完全没有使用 RPC 的必要性。

监控

监控使用了 influxdb 现在看来是一个比较正确的选择,但是没能及早发现 statsd 还是走了一些弯路,不过学习了下时序数据库的相关东西也算没有浪费时间吧。

influxdb 和 statsd 实际上是两个大坑。influxdb 好多关于时序性数据的特点和要求没有在文档中提及,需要自己试错才知道。而 statsd 基本完全没考虑标签,导致聚合结果完全是错的。

数据库

数据库的选择和使用上其实暴露了我对于 mysql 性能的无知了。最开始没有考虑到连接数问题,导致 MySQL 被锁死。之后又没有如何批量插入的问题,导致数据插入的丢失问题也很严重。当然这个问题也不完全是我的个人问题,把半结构化的数据存入 MySQL 本来就是一个比较奇葩的选择。

总体来看,主要原因就在于两个:

  1. 知识不足,确实需要学习
  2. 选型过于小众,坑太多

其实核心还是没有自己的逻辑,东拼西凑。这一点在读完 Facebook 员工的一篇文章后有了极大改善。

业务逻辑

从业务逻辑上来说,也有不少可以优化的地方。

规则变动

从我自身而言,对于整个业务逻辑的梳理不是很明确,排期预计也不准确。最终导致的结果就是,爬虫要执行的规则变来变去,导致做了好多次返工。比如抽取的规则,最开始定义了页面的字段,最后才统一到必须是行的字段上。最开始觉得直接写 yaml 就可以了,最终还是回到做了一个 GUI 上。

混乱管理

CXO 们除了FBJ有做通用爬虫的想法之外,其他人还停留在线性增长的思路上。只是关心短线结果,不考虑长远的规划,对于爬虫的开发也产生了一些不良影响。实际上,作为科技公司,不论是否直接参与代码的编写,对于其中的好奇心和敬畏感是都要有的,如果只是关心结果,很难做到高效。

CEO 最大的问题在于在公司呆的时间太短,对于公司发生的事情掌控力太差,频繁见客户不一定有用,耐心打磨产品才是正途。

  1. 心不齐,没有得到足够的授权来做爬虫平台这个事情。好多方案不一定哪个更好,但是必须定下来一个,好多无意义的争论是没有意思的。

  2. 其他人能力不行,这个真带不动,kafka 不知道,grpc 也不知道,metrics 也不知道。根源还是上一个问题,人心不齐,这种问题竟然还需要说服他们,谁不会就赶紧学就好了。

对于 Baelish 的搭建,犯得一个错误就是问题考虑太复杂了。看了不少创业的书,心里很明白要拿出一个 MVP 来,但是实际上却做不到,总是想着要做一个大而全的东西,过早优化太多了。 实际上就应该单机部署就行了,直接 gevent 开一千个线程,然后就可以跑起来,这样的话,即使 20s 一个的请求,并发也可以在 50 了。单机部署可能还是不行,但是没必要用 Kafka,主要有两个原因:

  1. 最开始时候的量用不到 Kafka,等到规模大了再用也无所谓
  2. 团队不熟悉 Kafka,那么就需要时间来教他们用,这时候就浪费时间和感情了。

同事水平过低

很简单的东西,没有人能明白我的思路,反复说了,他们还是按照低效的方法来做,实际上最终还是要返工。

比如说对于监控问题,很明显很清晰的一个问题,利用现有工具也可以做得很好,非要自己写一通,最后的结果也是很差的。

对于缓存的问题,有很成熟的思路可以直接使用,但是由于大家水平问题,竟然理解不了,也抽象不到这个层级,最终竟然重抽问题还是没有解决。

过于倚重阿里云和其他第三方服务,缺乏自研和探索精神。实际上诸如灵犀和 jumpserver 之类的服务是非常难用的,而开源的工具可以做到很好,把时间花在这些 trivial 的东西上最终产出也不是很好。阿里云的日志服务,k8s 服务,es 服务等等都不是非常地好用,甚至可以说非常难用了。而整体研发的思路,尤其是F很信任的CD方面则是能用阿里云尽量用阿里云,没有一点探索精神。

战略的迷失

盲目追求数据的大而全,但是又不能保证数据质量,没有做精做细某一块。举个例子来说:

  1. 电商数据。最基础的抓取问题没有解决,或者说这个数据根本就是不可能获得的,阿里的风控团队是吃素的吗?更何况其中还有法律风险问题。

  2. 招投标数据。这里面可以做的点非常非常多。而且作为一手的数据来源,政府网站永远不可能屏蔽爬虫。而去爬二手数据来源,需要繁杂的反爬措施。

后端数据清洗方面,整个公司对于数据的治理还停留在线性叠加的水平上,而不是打造平台,从而能够横向拓展。比如说对于研报、新闻、招投标公告需要一套底层的文章库,而现在每一套的处理流程都是单独的,而且效率很低,没有人有整合的想法。相比之下,头条很早就有打造推荐引擎的想法。

抓取上,更是“脚本小子”的思路,每个项目都单独编写爬虫,主要精力竟然是放在了不同站点的反爬策略上,这一点是非常匪夷所思的。出了重点抓取的电商数据外,不应该有任何网站存在很复杂的反爬逻辑才对。另外就是单独编写的爬虫可维护性太差,其实就相当于内包给某个员工,业务的风险性太大。甚至经常出现某个人的脚本由于写得太差,把整个集群打挂的情形。

总结

要有自己的逻辑。科技公司还是要技术驱动的,那些“非技术驱动论”的鼓吹者可以休矣!

为什么说小公司的沟通效率反而是低下的?

人们普遍的观点是:大公司环节冗长导致沟通效率低下,小公司人少好传达效率更高。其实有时候,小公司的沟通效率反而是低下的。

最核心的逻辑:大公司遇到的好多问题不会因为公司变小就不存在,这些问题也不只是因为公司大了就必然产生的,还有部分是因为公司大了招的人质量下降了才产生的。如果小公司招的人质量不行,那么一开始就会有各种问题。

小公司的人水平不足是一个很重要的原因。在大公司虽然环节众多,但是因为大家对问题都有比较深入的研究,所以简单的问题也可以很快定下来,马上执行。但是在小公司由于好多人经验不足,反而需要反复讨论,谁也不能拿定一个主意,导致迟迟不能执行。

小公司也并不一定是更快的成长之路,在小公司你会被各种烦事儿纠缠,以至于无法深入思考。人的成长最好是十字形人才,根深才能叶茂,有在一方面的内功很重要。

大公司也不只是螺丝钉,而是站在巨人的肩膀上,在更高的平台,思考更抽象的问题,做更有挑战的事儿。

小公司的另一个陷阱是创始人成长太慢。最开始的时候可能创始人还能够独当一面,但是当业务开始开展以后,创始人不一定能够跟得上这个节奏,反而成了拖后腿的。我们可能已经习惯了比尔盖茨和扎克伯格的故事,但是这样能够随着公司成长的 CEO 是可遇而不可求的。

小公司同样可以犯大公司的病。本来可以顺畅流动的空气也可能被人为阻断,不管是管理层好心学习大公司的制度还是恶意过一把当领导的瘾,他们可能因此在公司内部制造各种障碍。创业公司最好还是能做到 Context, not Control。如果没有给足 Context,即使好心问大家意见,大家也不知道该说啥。但是又总是不够乾纲独断,还非要考虑大家的意见,做决定总是犹犹豫豫,最终效率低下。

总之,同等条件下,小公司肯定是效率更高的。但是现实情况可能是小公司因为能力问题,反而效率更低。

Baelish: An Introspection

baelish 是一个基于配置的爬虫系统,目标是让标注员也能够通过可视化界面的来抓取数据。最近一年一直都在写这个项目。在这个过程中可以是说踩了无数的坑,杀死了不少脑细胞终于搞了一个勉强能用的 demo 版本。

总体思想上出的问题,老是想把知道的工具都用上去,试试好不好玩儿,而不是从项目需要的角度来选择。这种思想其实是自己早就知道是错的,可是真的能够自己负责一个项目的选型和架构的时候还是忍不住手痒痒啊。不过好在自己老早就知道这样是错的,至少以后再做项目不会犯这种错误啦。

这篇文章主要是总结下在其中犯得各种错误,以备查阅。

项目组织

最开始把项目分成了若干个代码仓库。baelish 负责调度和下载,jaqen 负责代理管理,bolton 负责解析和存储,inf 是基础库的代码,futile 是和爬虫业务无关的 utility,app_common 是数据库的 orm 和 Django 的后台,conf 是配置文件、idl 是 protobuf 代码。对于一个小型项目来说,分这么多库显然太复杂了。最终干掉了大多数库,只保留了 baelish、 app_common、idl、conf 和 futile 库,现在准备再干掉其他的库,只留下 baelish 和 futile。并且在打包 docker 镜像的时候全部都打包成一个镜像,这样部署也方便些。

基础组件选型

容器编排平台选型

最开始想通过 ansible 直接部署到多台机器上,然后使用 consul 服务发现的机制。但是这个过程中发现在同一个机器上如果部署同一个服务的多个副本的话不是很方便。脑袋一热,开始寻找一个真正的编排平台。

去年的十一假期研究了几天 k8s,概念是在太多了,看得我实在是头昏脑涨,所以放弃了 k8s。这时候因为已经选用了 consul,就注意到了同一家公司出的 nomad。nomad 号称是一个轻量级的调度平台,只有一个 binary,而且还能够和 consul 无缝集成。nomad 简直是一场灾难。首先他的调度是有问题的,尤其是其中一个比较有特色的功能叫做 parameterized job,顾名思义就是可以以不同的参数启动一个任务。这个任务就总是启动失败,而且还有看不到日志的情况。由于 nomad 的社区较小,在 GitHub 上只有不到一万的 star,所以除了问题以后只能看到几个悬而未决的 issue,然后就是干瞪眼。

最终选择了使用阿里云托管版的 k8s,虽然贵了点,但是对于公司来说,这点钱确实不算什么了。这时候距离我学习 k8s 的概念也有了几个月了,经过几个月的沉淀,一些难点也逐渐想明白了。使用了 k8s 之后,确实没有什么大的问题了。

这里要特别说明一下 k8s 上的服务发现实现的优点。在传统的集群中,比如说我们使用 zk 或者 consul 作为服务发现的话,一种模式是服务方主动把自己的 IP 和端口注册到注册中心,在退出的时候解注册。这样的不好是侵入性比较强,在客户端中需要自己去解析服务地址。k8s 上的服务注册在 etcd 中,然后内部服务访问的时候通过 DNS 解析的方式获取到 IP。那么这里就有个问题了,一般语言或者系统的实现中,DNS 可能有也可能没有缓存,那么当服务在集群中漂移的时候怎么能保证总能访问到正确的地址呢?k8s 的实现比较神奇,他的 clusterIP 是虚拟的,并且在服务的整个生命周期都是不变的,也就是说,DNS 和 IP 一定是固定的,服务层有没有 DNS 缓存就无所谓了。

消息队列选型

最开始的时候觉得 kafka 实在太重了,虽然很熟悉 kafka 的时候,但是考虑到自己运维的压力,所以就想找个轻量级的工具。首先尝试使用了Redis,但是因为消息都堆在内存里面,一旦消费端发生了阻塞,很快就oom了。

后来尝试了使用更加“工业级”一点的 rabbitmq,毕竟还自带了管理界面。但是折腾了一周,rabbitmq 总是会神奇的自动退出,查了下可能是 Erlang VM 的问题,并且没有更多任何日志消息,最终放弃了。而且 rabbitmq 没有一个很好的 python 客户端,有一个叫做 pika 的 python 客户端,但是基本跟玩具一样,什么也没有,完全需要自己写。

在之后,正好 redis 发布了 5.0 版本,提供了 redis stream 的功能,号称是和 kafka 一样的设计理念,所以就尝试了一下。遇到了两个问题,首先当时 redis-py 还没有跟进,所以只好使用比较低端的 python 客户端来和 redis 通信,这样导致工作量大了很多;还有一个就是 ack 的语义不明,倒是消费总是重复,最终放弃了。

因为 ack 的问题总解决不好,又想使用一些比较全家桶的方案,这时候 celery 进入了我的视野。celery 作为一个异步框架,只需要编写 worker 函数就行了,至于 broker 可以使用 rabbitmq 或者是 redis。因为 rabbitmq 之前一直跑不起来,所以选择了 redis。用了大概一个月的时间还是比较满意的。celery 虽然可以支持 redis,但是他是使用了 kombu 这个库,把 redis 封装成了 AMQP 协议,也就是 rabbitmq 来使用的,这就导致了想要改一些东西的话还是很复杂的。同时 redis 毕竟还是在内存里的数据库,一开始提到的 OOM 的问题还是没有彻底解决,这时候就想着在换一下了。

终于又想起了 kafka,仔细把 kafka 的文档通读了一遍,然后又看了下官方的例子,发现运行一个简单的 kafka 集群其实并没有想象的那么难。kafka 背后的公司现在叫做 confluent,他们官方提供了 kafka-docker 的镜像,最终使用 docker-compose 把 kafka 和 zk 都做了一个单节点的部署,虽然听起来可用性不高,但是到目前为止确实没有发生过任何问题,当然以后流量大了肯定要搞集群的,不过这也不过就是需要把 compose 文件改几个参数罢了。至于 kafka 的客户端,则是使用 confluent-kakfa 加 threadpoolexecutor 自己封装了一个。

RPC 选型和微服务

在前东家的时候一直用 thrift,但是 thrift 不支持 uint64,这点让我一直不是很爽。而且听说 thrift 的序列化性能和 protobuf 相比差了不少。于是乎,在研究了一段时间 thrift 和 gRPC 的优缺点之后,毅然选择了 gRPC。

但是问题来了,gRPC 虽好,暂时用不上啊。虽然设想着代理、解析、下载等等可能都需要微服务,但是最终都没有用,因为运维几个微服务的代价太高了,人手不够的时候还是单体应用好,不能切分太细了。而且其实在最开始并没有多大的流量,不如先使用快糙猛的 http 服务搞起来。另外 gRPC 的 Python 版本到目前为止还不支持多进程模式,所以更要慎重使用。

除了 gRPC 以外,还使用 protobuf 定义了几个全局透传的对象,现在也马上要被移除了。开始想着是这几个对象可能最终要被持久化存储,那么使用 protobuf 做序列化再适合不过了。对于应用的内部通信,实际上用语言本身的对象就是最好的了,protobuf 完全没必要,画蛇添足。

存储系统的设计

对于 mysql 竟然了解地不是很充分。高性能 MySQL 这本书到现在为止也才只看了 50%。当时我竟然以为事务可以让一批数据批量入库,想想真是 naive 啊。

监控系统

不懂的地方很多,但是最终弄对了,收获也很大。大概花了一个月的时间首先学习了什么是时序数据,然后系统调研了 opentsdb、influxdb、prometheus 等等时序数据库或者监控方案的优缺点,最终选择了 influxdb + grafana 的方法。这里有个坑就是对于带有各种 tag 的数据的聚合方式,各家都支持地不太好,哪怕是 influxdb 的亲儿子 telegraf 也会把数据理解错,这里只能是自己根据业务来实现了一个打点的库,自己在客户端做好聚合工作。

因为其中被 telegraf 坑了一把,所以监控这块还有一些短板,不过补上也很简单,只是工作量的问题。

业务逻辑

调度

由于在开始项目之前,刚刚看了MIT 的信息检索导论这本书,其中提到了爬虫的 frontier 组件,然后就模仿着写了一个调度的组件,可是根本就是想多了。书中提到的调度算法是面向的全网爬取,也就是说搜索引擎级别的爬取,实际上和我要解决的半定向爬取的问题不是一个问题。虽然浪费了大概一个月时间实现了这么一个东西,但是实际上并没有什么卵用,最后抛弃了。

调度中一个很重要的问题就是频控。我是知道一个叫做 token_bucket 的算法的,在这里就特别想把这个算法用上,但是事实有一次证明我错了。对于这种主动发起请求,自己能控制频率的情形,最好的方法还是 sleep 就好了。

可是毕竟sleep总让人感觉可能会很低效啊,这时候我又想起了操作系统中进程调度的各种优先级算法。如你所知,又掉进了坑里。这里的调度问题实际上和进程调度完全不是一个问题,非要用那个优先级算法实际上除了会造成好多任务没有在运行以外,并没有什么卵用。

最终采用的方式就是每个线程负责 N 个爬虫的调度,简单轮询,稳定又高效。

下载解析

这里可以说是整个项目从一开始设计基本正确的地方了。使用 pipeline 的模式,把每个步骤都抽象成一个 stage,其实和 django 的 middleware 有点像,最终完成一个网页的抓取。

这里唯一的坑就是开始想把规则加载、代理和解析都设计成一个 RPC 服务去调用,后来发现完全没有精力搞这些事情,就算了。

缓存

设计地太复杂了。考虑了缓存加载和缓存过期两种时间,搞得大家都比较迷惑。最终发现绝大多数的项目也都不需要缓存,这块直接去掉了。

代理

本来想自己使用阿里云或者 adsl 机器自己搭个集群,但是自己搭建的 IP 对于当前的场景来说不够用啊,而且自己搭建太复杂了,还是直接买得好。

管理

小公司的管理果然是有非常大的问题。

没有长远规划

作为一家依赖爬虫数据的公司,在爬虫系统的规划和建设上毫无调研和思路。而当我提出建设爬虫平台的时候,除了 CEO 竟然没人能理解其中的意义。

在公司的开始阶段,当然要小步快跑,迅速满足业务需求为主。但是当进展到一定程度之后,可维护程度应该是一个更重要的指标。

没有统一架构

公司一共四个负责爬虫的,竟然有两套框架。没有人说了算,没有统一的框架使得代码不能复用,也不能被其他人维护。这让我想起了头条强推 TCE 的场景,所有业务不管适不适合一律上云,这样大家每个人想到的功能点才能改进之后惠及每一个人,毕竟“刀越磨越快”。

总结

  1. 不要使用过于小众的基础组件,比如 celery、nomad。最好使用足够简单、且经过验证的系统,不如 kubernetes,Kafka

彭博到底是做什么生意的?

一句话:卖彭博终端机

为什么大家要买彭博终端机呢?界面那么老土

I think most of your questions can be answered by realizing that Bloomberg was founded in 1981, and they basically got a monopoly in financial data provision because there were no other options in 1981. That is why they have a custom monitor & keyboard: in the days before the IBM PC, everyone had a custom monitor & keyboard, because these things were not standardized. Bloomberg was a technologist & businessman before he was a politician; his business success gave him the money to run for office, his office doesn’t force people to pay for Bloomberg.

The reason they’re still a monopoly is because knowing how to navigate a Bloomberg is a critical skill for most finance professionals, and now that they have that skillset, they can be very productive moving around in it. A different (better?) UI would require they re-learn everything, which is not going to happen. And when financial professionals are making half a million a year, paying $24k/year for a terminal so that they can be productive isn’t a bad investment.
(Source: have a couple friends at Bloomberg. One is in their UI department, and keeps having his proposals for better UIs shot down for business reasons. Also married a financial professional who had to use a Bloomberg in her days as a bond trader.)

The best way to think about the Bloomberg terminal is a web browser that connects you to a private network. (Bloomberg actually is the largest known private network.) Once connected, you have access to thousands of “web apps” – which Bloomberg users call “functions”. Instead of a URL, you use a short 2 to 5 letter mnemonic code for each function, such as “MSG” for email, or “TOP” for top news. These different functions provide all sorts of various functionality – most of them are of course related to financial information. Functions like “CDSW” are for analyzing credit default swaps, “SDLC” gives you supply chain data for different companies, other functions analyze or curate Twitter, others correlate news events with historical stock data, etc. There are also many non-financial functions as well that reflect the “social network” aspect of the Bloomberg terminal, such as “POSH” which is basically a high-end Craigs list, or “DINE” which is a high-end Yelp.
All-in-all, the Bloomberg Terminal is like a private Internet for financial professionals.

核心的依赖可能是 bloomberg chat,相当于金融圈的社交网络。

“I think Facebook is the best comparison,” Ayzerov says. “If Facebook had only one fourth of your friends, you wouldn’t use it. The advantage of Bloomberg is that every financial person has it.””

Wow, great comments about Bloomberg. It makes me wonder, is there similar system for Cryptocoins traders?

Bloomberg would be very hard to dethrone. It’s key selling point is all of the data they have access to, which takes forever to setup integration with all of those providers. They continually expand data sources as well so it’s not like they are asleep at the wheel.
Finally, Bloomberg chat has strong network effects so even if you had all of the same data, many traders still wouldn’t switch to you because they can’t communicate with others still on Bloomberg.

1. It is a all-in-one news source. There are a lot of features that allow you to monitor the news from many different sources in real time.
2. It is a social network. The built-in chat and email service is _really_ basic. But, just about every one working in Finance is on it, with their contact details and resumes. As a trader, you can legally close financial transactions on the Bloomberg chat, as one would over the phone.
3. It is a data sharing platform. Banks and other market participants contribute to Bloomberg data by sending information that is normally not visible in the market. For instance FX volatilities are quoted by banks on bloomberg in real time. This information is only available in few places.
4. It is an API that allows its users to use its data for custom analytics.
5. It is an execution platform, where you can book trades, follow their values and risk when the market moves, etc.
6. It is open to 3rd parties: some banks and other data vendors have their own pages on bloomberg (which I never had access to).
7. It has many many other stuffs. There is a restaurant review system. There is a classified section. There are things to monitor the weather. It has videos, maps, it’s just huge.

知乎推广的一些小技巧

转载自:[ 姑婆那些事儿](https://mp.weixin.qq.com/s?__biz=MzAwMDA3ODc2NQ==&mid=2650452037&idx=2&sn=626e4f2d597e2664a9163ea99300449d&chksm=82e07687b597ff91f6d6d7e13f9621f2baafcfd58b076e4aa591e54679b5f2c5b302f131255a&mpshare=1&scene=1&srcid=120442SC5YQtAnPiRwsPpCpR#rd)

文章打开率越来越低,从0起步的公众号往往是运营最难跨越的鸿沟。刚起步的微信公众号,没人关注怎么办?怎样获得初始的1000个粉丝?”

由于微信的闭环特性,所以大量粉丝需要从外部渠道导入。粉丝渠道来源并不是单一的,如果我在其他自媒体平台同时采用一定策略吸引粉丝。这样算下来假定半年目标是吸引1万粉丝,每天吸引55个就可以了,这55个粉丝的主要来源是知乎、今日头条、朋友圈,每个渠道每天能吸引18就足够了。

而知乎作为高质量的流量聚集地是非常好的引流选择,目前很多人对知乎的流量挖掘并不是很好,下面有一些基本的方法分享给大家。

# 日常运营

4、一个高赞的答案回答字数控制在800-1500字左右,回答形式多采用图文式回复、数据分析式回复、引用佐证式回复,知乎用户重逻辑轻感情。

知乎用户尤其排斥软文广告,内容上干货价值+亲身经历+幽默风趣调侃语气+最后一段神来之笔+逻辑推理小清新+分段分点论述最能得到他们的青睐。

这样的回复才能获取点赞、收藏、评论,也容易把你推到话题回复第一名的位置。这时你就获得了他们的认可,粉丝关注,自然通过你的联系方式联系你。

5、回答问题时一定要挑选知乎热门话题,这些话题流量大关注人数多,如果只回答一些冷门话题那么很长一段时间内都不会有人看到。

回答内容一定要有争议有价值,即使抖机灵也要抖出新高度。回答下方的评论区也要充分利用,和粉丝保持高互动。

6、同样的回答内容尽量不要用多次,如果被系统检测到很容易受到处罚,首次违规封禁一天,或七天,甚至清除屏蔽用户的数据所有回答。可以在答案中留下链接引导用户查看。

7、回答中不要留太显眼的联系方式或者公众号二维码,知乎对大V留联系方式是默许的,但对普通用户或者小V被发现或者被举报就会处以禁言。

8、当你在知乎已经有了一定量的回答量,粉丝也有了一些积累。可以做初步的导流准备了。这时候修改自己的帐号资料,在上面增加自己的联系方式,数量不要多避免营销的嫌疑。

9、知乎系统有个特点,一天当中如果回答数量超过十个或者长期回答数量过多,系统会提示操作频繁。为了避免不必要的麻烦一天最佳回答数量应该控制在5-7之间。

10、知乎用户的活跃时间段在中午12:00-14:00及晚上19:00-23:00,一般在此时间段内回答问题违规被处罚的情况会很少发生。

微信开发笔记

可以使用微信的测试号学习如何开发
http://mp.weixin.qq.com/debug/cgi-bin/sandbox?t=sandbox/login

公众号对于消息的处理相当于使用了微信的服务器做转发代理, 发送到公众号的后端服务器, 而一旦进入网页就相当于直接同服务器通信了. 微信会使用 POST 发送消息到服务器

对于消息的处理有一个签名的过程, 这样后端服务器可以判断消息是否来自微信, 从而防止 API 被恶意滥用盗用.

所以这些繁杂的事情不如交个框架去处理

APPID/APPSECRET 相当于公众号的账号和密码, 通过这两个组合获取一个 access_token 用于平时访问, access_token 是有有效期的, 即使明文传送被泄露了也问题不大

问题是, 服务器需要记得去刷新这个 token, 所以这些东西应该交给框架最好了

微信开放了 JS SDK 可以使用图片语音地图等一系列的应用, 不错

常用的一些 meta 标签

1.  
2.  
3.  
4.  

iOS中浏览器直接访问站点时,navigator.standalone为false,从 主屏启动webapp 时,navigator.standalone为true
移动版本webkit 为 input元素提供了autocapitalize属性,通过指定autocapitalize=”off”来关闭键盘默认首字母大写
开发者指定 的 target属性就失效了,但是可以通过指定当前元素的-webkit-touch-callout样式属性为none来禁止iOS弹出这些按钮

同样为一个img标签指定-webkit-touch-callout为none也会禁止设备弹出列表按钮,这样用户就无法保存\复制你的图片了
指定文字标签的-webkit-user-select属性为none便可以禁止iOS用户选中文字

The Problem with Pocket

I started using Pocket about six years ago. Back then it’s a Firefox plugin named Read It Later. I saved about 1000+ articles to read-it-later. But until now, when I quit my job, I finally got some time to read those articles. Then I got this:

![](https://ws2.sinaimg.cn/large/006tKfTcly1fq9zaefzlnj31kw0zmdqu.jpg)

I thought that pocket’s server would retrive the article for me, and store the copy so that I can read it when ever I got time. It turns out that I was wrong, Pocket only fetches the article on the computer or phone, it only stores the url in the cloud. That’s too bad, a lot of my saved pages have gone 404ed.

关于 CTO 的一些想法

Hacker News 上看到一篇文章不错,摘录一些笔记,原文链接在最后面

# 技术上

学习新东西的速度远远赶不上问题产生的速度。
虽然 startup 可以很灵活的转型,但是你第一次选择的技术架构不是那么好改的
现在技术的生命期都很短,一些很流行的技术很可能也会很快过期,所以选择任何技术都要做好留下技术债的准备
你写的每一行代码也都可能存在很长一段时间,所以尽量写好吧,哪怕慢一点
不要老想着停下来重构,尽量多谢测试

# 关于招人

只有当你迫切需要这个人的时候才找
招人是为了跟的上增长,而不是为了产生增长
知道需要干什么事情了再招人

总的来说,如果你不确定需不需要招人,那么一般是不需要的。

管理人一直都很简单,保持坦诚沟通,公开地打分,沟通哪些是好的,哪些是坏的。而且这样也可以让做的不好的人有所准备,如果不合适的人,他们也会对自己的表现有所预期。最重要的是,给每个人一个很好的个人发展计划。

最后,在公司发展的过程中,看到新人成长甚至变为 leader 也是一件很让人激动的事情

https://medium.com/sketchdeck-developer-blog/what-i-wish-i-knew-when-i-became-cto-fdc934b790e3?token=e-Jk1uh8fiXG6w_Z