Month: 六月 2017

docker 基础概念

Docker 是一个进程的容器,不是虚拟机。他为一个进程隔离了文件系统、网络和环境变量。最好在其中运行一个且仅运行一个线程,而不是运行多个任务。

Docker 中最好运行的是无状态的服务,这样方便于横向扩展,对于有状态的服务,建议把状态 mount 出来。

使用场景

  1. 为有不同需求的应用创建不同的隔离环境, 比如部署两个脚本,一个需要用 Python 2.7,另一个需要用 Python 3.6
  2. 微服务
  3. 管理进程, 也就是类似 systemd 或者 supervisord 的角色
  4. 保护宿主系统免受侵害

Image vs Container(镜像与容器)

Container is a running instance of image, each time you run an image, a new container is created. You can commit a container back as an image, however, it’s a little controversial

Image name format: user/image:tag

basic usage

  • docker run OPTIONS IMAGE COMMAND to generate a container based on given image and start it.
    • most used command is -d
    • and -it
    • –restart=always to always restart the container
    • –name=NAME to name the container
  • docker start CONTAINER_ID to restart stopped container, note that this will reuse the options and command when docker run is issued
  • then use docker attach CONTAINER_ID to reattach to the given container
  • docker exec OPTIONS CONTAINER COMMAND to run an extra command in container

Note, docker is all about stdio, and if you would like to read something, read it from stdin, if you would like to output something, write to stdout

building docker images

two ways:
* commit each change
* using dockerfiles

Commands

Container related

run

每次执行 docker run, 都会根据镜像来创建一个全新的 container, 可以使用 docker start 或者 docker attach 来连接上已经创建的 container。image 和 container 的关系大概类似于程序和进程之间的关系。

Syntax:

docker run [options] [image name] [command]
docker exec -it [container] bash can be used as a ssh equivalent
-d detach the container and runs in background
-p set ports [host:container]
--name set the name
--rm clean the container after running
--net sets the network connected to
-w sets working dir
-e sets env variable
-u sets user
-v sets volume hostfile:containerfile:options

status

docker ps -a shows which container is running

Image ralated

docker pull
docker images
docker search
docker build docker build -t user/image [dir]

网络相关

基础命令

docker network ls   ls the network interfaces
docker network inspect  inspect the network for details
docker network create/rm    create network interface
docker network connect/disconnect [net] [container] connect a container to a network

by setting network, docker automatically create /etc/hosts file inside the image, and you can use the name of the container to access the others.

docker 有两个网络模式

桥接模式

使用 `docker run –net=”bridge”,这种模式会使用虚拟网卡 docker0 做了一层 NAT 转发,所以效率比较低,优点是不用改变应用分配固定端口的代码,docker 会在宿主机上随机分配一个端口,避免冲突。

Host 模式

使用 `docker run –net=”host”,宿主机和 docker 内部使用的都是同一个网络,比如说 eth0

Docker 容器一般来说是无状态的,除了保存到数据库之外,还可以使用卷来把容器中的状态保存出来。

docker volume create –name hello
docker run -d -v hello:/container/path/for/volume containerimage mycommand

日志

You could use docker logs [contianer] to view stdout logs. But the logs sent to /var/logs/*.log are by default inside the container.

Remove stopped images
docker rm $(docker ps -aq)

使用 docker 的时候不使用 sudo

sudo gpasswd -a ${USER} docker

然后登出再登录当前用户即可

参考

  1. https://blog.talpor.com/2015/01/docker-beginners-tutorial/

Dockerfile 基础教程

Dockerfile 列出了构建一个docker image 的可复现步骤。比起一步一步通过 docker commit 来制作一个镜像,dockerfile更适用于 CI 自动测试等系统。

Dockerfile 命令

  • FROM,指定基础镜像
  • MAINTAINER,作者,建议格式(Jon Snow <jonsnow@westros.com>
  • EXPOSE,需要暴露的端口,但是一般也会使用 -p 来制定端口映射
  • USER,运行的用户
  • WORKDIR,进程的工作目录
  • COPY,复制文件到
  • RUN,运行shell命令
  • CMD,启动进程使用的命令
  • ENTRYPOINT,镜像启动的入口,默认是 bash -c
  • ENV,设定环境变量
  • VOLUME,卷

ENV key=value foo=bar

几个比较容易混淆的

COPY vs ADD

ADD 会自动解压压缩包,在不需要特殊操作的时候,最好使用COPY。

ENTRYPOINT vs CMD

entrypoint 指定了 Docker 镜像要运行的二进制文件(当然也包括参数),而 cmd 则指定了运行这个二进制文件的参数。不过因为默认 entrypoint 是 bash -c,所以实际上 CMD 指定的也是要运行的命令。

另外,docker run 时候包含命令行参数,会执行命令行参数,而不是 CMD 的内容。如果使用 /bin/bash 作为命令行的指令,这样便替换掉 CMD 的内容,从而进入镜像中查看编译出的镜像究竟是什么样的。

个人倾向于只使用 CMD,而不使用 ENTRYPOINT

如何理解 VOLUME 指令

Dockerfile 中的 volume 指定了一个匿名的 docker volume,也就是说在 docker run 的时候,docker 会把对应的目录mount 到一个匿名的卷。当然如果使用 -v 参数指定了 mount 到哪个目录,或者是指定了卷名,那就不会采用匿名的卷了。

使用Dockerfile 还是 commit 来构建镜像

如果可能的话,尽量使用 dockerfile,因为是可复现的。

I’ve been wondering the same thing, and my impression (which could be totally wrong) it that it’s really the same case as with VMs –> you don’t want to not know how to recreate the vm image. In my case I have regular .sh scripts to install, and am wondering why I can’t just maintain these, run docker and effectively call these, and create the golden version image that way. My scripts work to get it installed on a local PC, and the reason I want to use docker is to deal with conflicts of multiple instances of programs/have clean file system/etc
https://stackoverflow.com/questions/26110828/should-i-use-dockerfiles-or-image-commits

参考

  1. https://stackoverflow.com/a/34245657/1061155
  2. https://stackoverflow.com/questions/41935435/understanding-volume-instruction-in-dockerfile

使用 systemd 部署守护进程

大多数的 Linux 系统已经选择了 systemd 来作为进程管理器。之前打算使用 supervisord 来部署服务,思考之后发现还不如直接使用 systemd 呢。这篇文章简单介绍下 systemd。 # 例子

我们从一个例子开始,比如说我们有如下的 go 程序:

package main

import (
    fmt
    net/http
)

func handler(w http.ResponseWriter, r *http.Request) {
    fmt.Fprintf(w, Hi there!)
}

func main() {
    http.HandleFunc(/, handler)
    http.ListenAndServe(:8181, nil)
}

编译到 /opt/listen/listen 这里。首先我们添加一个用户,用来运行我们的服务:

adduser -r -M -s /bin/false www-data

记下这条命令,如果需要添加用户来运行服务,可以使用这条。

Unit 文件

Unit 文件定义了一个 systemd 服务。/usr/lib/systemd/system/ 存放了系统安装的软件的 unit 文件,/etc/systemd/system/ 存放了系统自带的服务的 unit 文件。 我们编辑 /etc/systemd/system/listen.service 文件:

[Unit]
Description=Listen

[Service]
User=www-data
Group=www-data
Restart=on-failure
ExecStart=/opt/listen/listen
WorkingDirectory=/opt/listen

Environment=VAR1=whatever VAR2=something else
EnvironmentFile=/path/to/file/with/variables

[Install]
WantedBy=multi-user.target

然后

systemctl enable listen
systemctl status listen
systemctl start listen

其他一些常用的操作还包括:

systemctl start/stop/restart    
systemctl reload/reload-or-restart  
systemctl enable/disable    
systemctl status    
systemctl is-active 
systemctl is-enabled
systemctl is-failed
systemctl list-units [--all] [--state=]    
systemctl list-unit-files
systemctl daemon-reload 
systemctl cat [unit-name]   
systemctl edit [uni-name]
systemctl list-dependencies [unit]

依赖管理

In that case add Requires=B and After=B to the [Unit] section of A. If the dependency is optional, add Wants=B and After=B instead. Note that Wants= and Requires= do not imply After=, meaning that if After= is not specified, the two units will be started in parallel. if you service depends on another service, use requires= + after= or wants= + after=

尚未研究的问题:如何使安装的服务开机启动?是更改 wantedby 吗?如果是,wantedby 的值应该是什么? 对于 nginx 这样的 daemon 服务如何管理?

类型

Type: simple / forking 关于每个字段的含义,可以参考 这篇文章

使用 journalctl 查看日志

首先吐槽一下,为什么要使用 journal 这么一个拗口的单词,叫做 logctl 不好么

journalctl -u service-name.service

还可以添加 -b 仅查看本次重启之后的日志。

启动多个实例

  1. https://unix.stackexchange.com/questions/288236/have-systemd-spawn-n-processes
  2. http://0pointer.de/blog/projects/instances.html

systemd

YN:如何使安装的服务开机启动?是更改 wantedby 吗?如果是,wantedby 的值应该是什么? 对于 nginx 这样的 daemon 服务如何管理?

大多数的 Linux 系统已经选择了 systemd 来作为进程管理器。之前打算使用 supervisord 来部署服务,思考之后发现还不如直接使用 systemd 呢。这篇文章简单介绍下 systemd。 # 例子

我们从一个例子开始,比如说我们有如下的 go 程序:

<pre class="code">package main

import (
    fmt
    net/http
)

func handler(w http.ResponseWriter, r *http.Request) {
    fmt.Fprintf(w, Hi there!)
}

func main() {
    http.HandleFunc(/, handler)
    http.ListenAndServe(:8181, nil)
}

编译到 /opt/listen/listen 这里。 首先我们添加一个用户,用来运行我们的服务:

<pre class="code">adduser -r -M -s /bin/false www-data

记下这条命令,如果需要添加用户来运行服务,可以使用这条。

Unit 文件

Unit 文件定义了一个 systemd 服务。/usr/lib/systemd/system/ 存放了系统安装的软件的 unit 文件,/etc/systemd/system/ 存放了系统自带的服务的 unit 文件。 我们编辑 /etc/systemd/system/listen.service 文件:

<pre class="code">[Unit]
Description=Listen

[Service]
User=www-data
Group=www-data
Restart=on-failure
ExecStart=/opt/listen/listen
WorkingDirectory=/opt/listen

Environment=VAR1=whatever VAR2=something else
EnvironmentFile=/path/to/file/with/variables

[Install]
WantedBy=multi-user.target

然后

<pre class="code">sudo systemctl enable listen
sudo systemctl status listen
sudo systemctl start listen

其他一些常用的操作还包括:

<pre class="code">systemctl start/stop/restart    
systemctl reload/reload-or-restart  
systemctl enable/disable    
systemctl status    
systemctl is-active 
systemctl is-enabled
systemctl is-failed
systemctl list-units [--all] [--state=…]    
systemctl list-unit-files
systemctl daemon-reload 
systemctl cat [unit-name]   
systemctl edit [uni-name]
systemctl list-dependencies [unit]

依赖管理

In that case add Requires=B and After=B to the [Unit] section of A. If the dependency is optional, add Wants=B and After=B instead. Note that Wants= and Requires= do not imply After=, meaning that if After= is not specified, the two units will be started in parallel. if you service depends on another service, use requires= + after= or wants= + after=

类型

Type: simple / forking 关于每个字段的含义,可以参考这篇文章

使用 journalctl 查看日志

首先吐槽一下, 为什么要使用 journal 这么一个拗口的单词, 叫做 logctl 不好么…

<pre class="code">journalctl -u service-name.service

还可以添加 -b 仅查看本次重启之后的日志.

启动多个实例

https://unix.stackexchange.com/questions/288236/have-systemd-spawn-n-processeshttp://0pointer.de/blog/projects/instances.html

django 小技巧

运行开发服务器

python manage.py runserver [host:]port

可以指定绑定的IP

创建用户和更改密码

python manage.py createsuperuser  # 创建超级用户
python manage.py changepassword username

进入当前项目的shell

在这个 python shell 中,可以直接使用 django 的model

python manage.py shell

timezone aware time

在向数据库中保存datetime字段的时候,经常会遇到 django 报警缺少时区信息,可以使用 django 自带的 timezone.now()

from django.utils import timezone
now_aware = timezone.now()

Thrift RPC 框架

Thrift 是一个全栈的 RPC 框架,它包含了接口定义语言(IDL)和 RPC 服务两部分,大概相当于 protobuf + gRPC 的功能。

安装

可以使用 https://github.com/yifeikong/install 中的脚本来安装

Thrift 中的类型与 IDL

包括 bool, byte/i8, i16, i32, i64, double, string, binary

  • 比较蛋疼的是 thrift 不支持 uint,原因是好多语言里面没有原生无符号类型(无语。)
  • binary 类型相当于某些语言中的 bytes
  • string 使用 utf-8 编码
  • byte 和 i8 是同一个类型,也是有符号的

复合类型(struct)

struct 就像编程语言中的结构体或者类一样,用来自定义类型。注意在 Thrift 中定义类型的时候需要使用数字标记顺序,这样是为了更高效地序列化。

注意其中的 required 和 optional 字段,required 表示必选的字段,optional 的字段可以忽略。为了兼容性考虑,建议尽可能把字段声明为 optional。

struct Cat {
    1: required i32 number=10;  // 可以有默认值
    2: optional i64 big_number;
    3: double decimal;
    4: string name="thrifty";  // 字符串也可以有默认值
}

exceptions

Thrift 中还可以定义异常,关键字是 exception,其他语法和 struct 一样。

typedef

Thrift 支持 C/C++ 类型的 typedef

typedef i32 MyInteger   // 1
typedef Tweet ReTweet   // 2

枚举

enum Operation {
    ADD = 1;
    SUB = 2;
    MUL = 3;
    DIV = 4;
}

容器类型

Thrift 中包含了常见的容器类型 list set map 等。

  • list<t1>: 一个 t1 类型的有序数组
  • set<t1>: 一个 t1 类型的无需集合
  • map<t1,t2>: 一个字典,key 是 t1 类型,value 是 t2 类型

常量

使用 const 定义常量

const i32 INT_CONST = 1234;    // 1
const map<string,string> MAP_CONST = {"hello": "world", "goodnight": "moon"}

注释

Thrift 支持 Python 和 C++ 类型的注释。

# This is a valid comment.

/*
 * This is a multi-line comment.
 * Just like in C.
 */

// C++/Java style single-line comments work just as well.

命名空间

for each thrift file, you have to add a namespace for it.

namespace py tutorial
namespace java tutorial

include

include “other.thrift”

服务

服务类似于一个接口,在 Thrift 中定义,然后根据 Thrift 生成的文件,再使用具体的代码实现。

注意其中的 oneway, 意思是客户端不会等待响应。

service StringCache {
    void set(1:i32 key, 2:string value),
    string get(1:i32 key) throws (1:KeyNotFound knf),
    oneway void delete(1:i32 key)
}

生成的代码

Thrift 的整个网络架构如图:

生成的代码位于蓝色的一层,Transport 实现了二进制数据的传输,我们可以选择 TCP 或者 HTTP 等协议传输我们的数据。也就是 Processor。Protocol 层定义了如何把 Thrift 内部结构的数据序列化到二进制数据,或者反过来解析,可以使用 JSON、compact 等转换方法。Processor 负责从 Protocol 中读取请求,调用用户的代码,并写入响应。Server 的实现可以有很多中,比如多线程、多进程的等等。

Processor 的定义:

interface TProcessor {
    bool process(TProtocol in, TProtocol out) throws TException
}

Server 的具体工作:

  • 创建一个 Transport 用于传输数据
  • 为这个 Transport 创建输入输出的 Protocol
  • 基于上面的 Protocol 创建 Processor
  • 等待客户端请求,并且把收到的请求交给 Processor 处理,一直循环下去。

编译

thrift -r --gen py file.thrift

编译好的文件在 gen-py 目录下

  • -r 表示递归编译
  • --gen 指定要生成的语言

一个例子

handler 对应实现 service
Server 中使用 Handler

Python 的 server 和 client

常见问题

YN: 线程安全性

  1. thrift 默认提供了 thread/process 等不同的 server 类型,需要考虑 handler 的线程安全问题
  2. thrift client 不是线程安全的,在多线程程序中使用需要注意 (http://grokbase.com/t/thrift/user/127yhv7wmx/is-the-thrift-client-thread-safe)
  3. 看一下 pyutil 中是如何使用的。..

何时需要一个 thrift 服务呢?而不是封装一个类或者 dal 来操作?

  1. 跨语言,跨代码库的调用
  2. 需要维持一个很重的资源的服务

如果只是同一个语言内,需要读写一些数据库之类的,封装成一个类就可以了

Const 应该定义在哪儿?

如果是一个需要在调用过程中使用的常量,使用 thrift,如果是在数据库中存储,使用在代码中定义的常量

Thrift vs http api

A few reasons other than speed:

  1. Thrift generates the client and server code completely, including the data structures you are passing, so you don’t have to deal with anything other than writing the handlers and invoking the client. and everything, including parameters and returns are automatically validated and parsed. so you are getting sanity checks on your data for free.
  2. Thrift is more compact than HTTP, and can easily be extended to support things like encryption, compression, non blocking IO, etc.
  3. Thrift can be set up to use HTTP and JSON pretty easily if you want it (say if your client is somewhere on the internet and needs to pass firewalls)
  4. Thrift supports persistent connections and avoids the continuous TCP and HTTP handshakes that HTTP incurs.

Personally, I use thrift for internal LAN RPC and HTTP when I need connections from outside.

参考

  1. https://stackoverflow.com/questions/9732381/why-thrift-why-not-http-rpcjsongzip
  2. https://thrift-tutorial.readthedocs.io/en/latest/usage-example.html#a-simple-example-to-warm-up
  3. http://thrift-tutorial.readthedocs.io/en/latest/index.html
  4. https://diwakergupta.github.io/thrift-missing-guide/
  5. http://thrift.apache.org/tutorial/py

SSL Pinning 与破解

什么是 SSL Pinning

To view https traffic, you could sign your own root CA, and perform mitm attack to view the traffic. HPKP (http public key pinning) stops this sniffing by only trust given CA, thus, your self-signed certs will be invalid. To let given app to trust your certs, you will have to modify the apk file.

How to break it?

Introducing Xposed

decompile, modify and then recompile the apk file can be very diffcult. so you’d better hook to some api to let the app you trying to intercept trust your certs. xposed offers this kind of ability. moreover, a xposed module called JustTrustMe have done the tedious work for you. just install xposed and JustTrustMe and you are off to go. Here are the detaild steps:

  1. Install Xposed Installer

for android 5.0 above, use the xposed installer.

NOTE: 对于 MIUI,需要搜索 Xposed 安装器 MIUI 专版。

  1. Install Xposed from xposed installer, note, you have to give root privilege to xposed installer

  2. Install JustTrustMe

uwsgi 和 wsgi 协议

uWSGI is a web server than runs python web frameworks. uwsgi(lower case) is the protocol it communicates with front end web servers(nginx)

wsgi 协议

YN:

值得注意的是, wsgi实际上定义了一个同步的模型, 也就是每一个客户请求会调用一个同步的函数, 这样也就无法发挥异步的特性.

两个最简单的例子

其中实现 simple_app 函数也就是实现了wsgi协议.需要注意的有一下三点:

  1. environ字典中包含的变量
  2. start_response的参数
  3. simple_app的调用次序和返回值
HELLO_WORLD = b"Hello world!\n"

def simple_app(environ, start_response):
    """Simplest possible application object"""
    status = "200 OK"
    response_headers = [("Content-type", "text/plain")]
    start_response(status, response_headers)
    return [HELLO_WORLD]

class AppClass:
    """Produce the same output, but using a class
    (Note: "AppClass" is the "application" here, so calling it
    returns an instance of "AppClass", which is then the iterable
    return value of the "application callable" as required by
    the spec.
    If we wanted to use *instances* of "AppClass" as application
    objects instead, we would have to implement a "__call__"
    method, which would be invoked to execute the application,
    and we would need to create an instance for use by the
    server or gateway.
    """
    def __init__(self, environ, start_response):
        self.environ = environ
        self.start = start_response
    def __iter__(self):
        status = "200 OK"
        response_headers = [("Content-type", "text/plain")]
        self.start(status, response_headers)
        yield HELLO_WORLD

而对于server/gateway来说, 每接收到一个http客户端, 都会调用一次这个 application callable

import os, sys
enc, esc = sys.getfilesystemencoding(), "surrogateescape"
def unicode_to_wsgi(u):
    # Convert an environment variable to a WSGI "bytes-as-unicode" string
    return u.encode(enc, esc).decode("iso-8859-1")
def wsgi_to_bytes(s):
    return s.encode("iso-8859-1")
def run_with_cgi(application):
    environ = {k: unicode_to_wsgi(v) for k,v in os.environ.items()}
    environ["wsgi.input"]        = sys.stdin.buffer
    environ["wsgi.errors"]       = sys.stderr
    environ["wsgi.version"]      = (1, 0)
    environ["wsgi.multithread"]  = False
    environ["wsgi.multiprocess"] = True
    environ["wsgi.run_once"]     = True
if environ.get("HTTPS", "off") in ("on", "1"):
        environ["wsgi.url_scheme"] = "https"
    else:
        environ["wsgi.url_scheme"] = "http"
headers_set = []
    headers_sent = []
def write(data):
        out = sys.stdout.buffer
if not headers_set:
             raise AssertionError("write() before start_response()")
elif not headers_sent:
             # Before the first output, send the stored headers
             status, response_headers = headers_sent[:] = headers_set
             out.write(wsgi_to_bytes("Status: %s\r\n" % status))
             for header in response_headers:
                 out.write(wsgi_to_bytes("%s: %s\r\n" % header))
             out.write(wsgi_to_bytes("\r\n"))
out.write(data)
        out.flush()
def start_response(status, response_headers, exc_info=None):
        if exc_info:
            try:
                if headers_sent:
                    # Re-raise original exception if headers sent
                    raise exc_info[1].with_traceback(exc_info[2])
            finally:
                exc_info = None     # avoid dangling circular ref
        elif headers_set:
            raise AssertionError("Headers already set!")
headers_set[:] = [status, response_headers]
# Note: error checking on the headers should happen here,
        # *after* the headers are set.  That way, if an error
        # occurs, start_response can only be re-called with
        # exc_info set.
return write
result = application(environ, start_response)
    try:
        for data in result:
            if data:    # don"t send headers until body appears
                write(data)
        if not headers_sent:
            write("")   # send headers now if body was empty
    finally:
        if hasattr(result, "close"):
            result.close()

参考资料

  1. https://bottlepy.org/docs/dev/async.html
  2. http://uwsgi-docs-cn.readthedocs.io/zh_CN/latest/WSGIquickstart.html
  3. https://www.digitalocean.com/community/tutorials/how-to-deploy-python-wsgi-applications-using-uwsgi-web-server-with-nginx

squid proxy

Install squid

plain old apt-get update && apt-get install squid3 apache2-utils -y

Basic squid conf

/etc/squid3/squid.conf instead of the super bloated default config file

# note that on ubuntu 16.04, use squid instead of squid3
auth_param basic program /usr/lib/squid3/basic_ncsa_auth /etc/squid3/passwords
auth_param basic realm proxy
acl authenticated proxy_auth REQUIRED
http_access allow authenticated
forwarded_for delete
http_port 0.0.0.0:3128

Please note the basic_ncsa_auth program instead of the old ncsa_auth

Setting up a user

sudo htpasswd -c /etc/squid3/passwords username_you_like, on 16.04, it’s squid, not squid3
and enter a password twice for the chosen username then
sudo service squid3 restart

see: https://stackoverflow.com/questions/3297196/how-to-set-up-a-squid-proxy-with-basic-username-and-password-authentication

centos

I have to use centos, since adsl providers are not capable of providing ubuntu

check out this wonderful article: https://hostpresto.com/community/tutorials/how-to-install-and-configure-squid-proxy-on-centos-7/

yum install -y epel-release
yum install -y squid
yum install -y httpd-tools
systemctl start squid
systemctl enable squid
touch /etc/squid/passwd && chown squid /etc/squid/passwd
htpasswd -c /etc/squid/passwd root

edit /etc/squid/squid.conf

auth_param basic program /usr/lib64/squid/basic_ncsa_auth /etc/squid/passwd
auth_param basic children 5
auth_param basic realm Squid Basic Authentication
auth_param basic credentialsttl 2 hours
acl auth_users proxy_auth REQUIRED
http_access allow auth_users
http_port 3128

一个小问题

squid 默认只允许代理 443 端口的https流量,而会拒绝对其他端口的connect请求。需要更改配置文件

To fix this, add your port to the line in the config file:
acl SSLports port 443
so it becomes
acl SSL
ports port 443 4444
squid 默认还禁止了除了443之外的所有connect
deny CONNECT !SSL_Ports # 删掉这一句