Month: 七月 2017

python-readability 源码阅读

readability 是一个可以从杂乱无章的网页中抽取出无特殊格式,适合再次排版阅读的文章的库,比如我们常见的手机浏览器的阅读模式很大程度上就是采用的这个库,还有 evernote 的 webclipper 之类的应用也都是利用了类似的库。readability 的各个版本都源自readability.js这个库,之前尝试阅读过js版本,无关的辅助函数太多了,而且 js 的 dom api 实在称不上优雅,读起来晦涩难通,星期天终于有时间拜读了一下python-readability的代码。

readability核心是一个Document类,这个类代表了一个 HTML 文件,同时可以输出一个格式化的文件

# 几个核心方法和概念

## summary

summary 方法是核心方法,可以抽取出一篇文章。可能需要对文章抽取多次才能获得符合条件的文章,这个方法的核心思想是:

1. 第一次尝试抽取设定 ruthless,也就是强力模式,可能会误伤到一些标签
2. 把给定的 input 解析一次并记录到 self.html,并去除所有的 script,sytle 标签,因为这些标签并不贡献文章内容
3. 如果在强力模式,使用remove_unlikely_candidates去掉不太可能的候选
4. transform_misused_divs_into_ps把错误使用的 div 转换成 p 标签,这样就不用考虑 div 标签了,其实这步挺关键的。其实还有一些其他的处理需要使用。
5. 使用score_paragraphs给每段(paragraph)打分
6. 使用select_best_candidates获得最佳候选(candidates)
7. 选出最佳候选,如果选出的话,调用 get_article 抽取文章
8. 如果没有选出,恢复到非强力模式再试一次,还不行的话就直接把 html 返回
9. 清理文章,使用 sanitize 方法
10. 如果得到的文章太短了,尝试恢复到非强力模式重试一次

强力模式和非强力模式的区别就在于是否调用了 remove_unlikely_candidates

对于以上的核心步骤, 已经足够应付大多数比较规范的网页. 但是还是会有不少识别错误. 公司内部的改进做法在于:

1. transform那一步修正了更多的错误。
2. 在得到best node之后记录了xpath,方便抽取下一页内容。
3. 抽取后检查文章长度。
4. 如果抽取失败,不是返回文章,而是返回空
5. 分页文章的合并
6. 更多地文章抽取相关正则

下面按照在 summary 出场顺序依次介绍~

## remove_unlikely_candidates

匹配标签的 class 和 id,根据unlikelyCandidatesRe和okMaybeItsACandidate这个两个表达式删除一部分节点。

unlikelyCandidatesRe:combx|comment|community|disqus|extra|… 可以看出是一些边缘性的词汇
okMaybeItsACandidateRe: and|article|body|column|main|shadow… 可以看出主要是制定正文的词汇

## transform_misused_divs_into_paragraphs

1. 对所有的 div 节点,如果没有 divToPElementsRe 这个表达式里的标签,就把他转化为 p
2. 再对剩下的 div 标签中,如果有文字的话,就把文字转换成一个 p 标签,插入到当前节点,如果子标签有 tail节点的话,也把他作为 p 标签插入到当前节点中
3. 把 br 标签删掉

## socore_node

1. 按照tag、 class 和 id 如果符合负面词汇的正则,就剪掉25分,如果符合正面词汇的正则,就加上25分
2. div +5 分, pre、td、backquote +3 分
3. address、ol、ul、dl、dd、dt、li、form -3分
4. h1-h6 th -5 分

## score_paragraphs

1. 首先定义常量,MIN_LEN 最小成段文本长度
2. 对于所有的 p,pre,td 标签,找到他们的父标签和祖父标签,文本长度小于 MIN_LEN 的直接忽略
3. 对父标签打分(score_node),并放入排序队列
4. 祖父标签也打分,并放入排序队列
5. 开始计算当前节点的内容分(content_socre) 基础分1分,按照逗号断句,每句一分,每100字母+1分,至少三分
6. 父元素加上当前元素的分,祖先元素加上1/2
7. 链接密度 链接 / (文本 + 链接)
8. 最终得分 之前的分 * (1 – 链接密度)

注意,当期标签并没有加入 candidates,父标签和祖父标签才加入
累计加分,如果一个元素有多个 p,那么会把所有子元素的content score都加上

## select_best_candidate

就是 ordered 中找出最大的

## get_article

对于最佳候选周围的标签,给予复活的机会,以避免被广告分开的部分被去掉,阈值是10分或者最佳候选分数的五分之一。如果是 p 的话,node_length > 80 and link_density < 0.25 或者 长度小于80,但是没有连接,而且最后是句号 # 思考 readability之所以能够work的原因,很大程度上是基于html本身是一篇文档,数据都已将在html里了,然后通过操作DOM获得文章。而在前端框架飞速发展的今天,随着react和vue等的崛起,越来越多的网站采用了动态加载,真正的文章存在了页面的js中甚至需要ajax加载,这时在浏览器中使用readability.js虽然依然可以(因为浏览器已经加载出了DOM),但是如果用于抓取目的的话,需要执行一遍js,得到渲染过的DOM才能提取文章,如果能够有一个算法,直接识别出大段的文字,而不是依赖DOM提取文章就好了~

忙等待

“`
while True:
pass
“`

this is busy waiting

“`
while True:
time.sleep(10)
“`

this is not busy waiting, bucause cpu are free to do other things, and only need trivial cpu cycles

http://stackoverflow.com/questions/529034/python-pass-or-sleep-for-long-running-processes

busy waiting is considered as anti-pattern, but using it in spinning-lock is ok

should use select if we need to wait for something

https://en.wikipedia.org/wiki/Busy_waiting

html Node vs Element

* Node vs Element

HTML Document consists of different node, element is one type of node.

* NodeList vs HTMLCollection

NodeList is a collection of node, HTMLCollection is a collection of element.

* HTMLCollection vs NodeList vs Array

HTMLCollection 和 NodeList 都是动态的,会随着 DOM 的变化而变化

Array 是静态的数据结构

make 和 premake

Rule

target: dependencies(sepreated by spaces)
    command(s) to run to build the target from dependencies

Basic usage

in most projects, we have a .h file for functiont interfaces, and the whole program depends on it.
typically, all .o files are compiled from corresponding .c source files

cc = gcc              # marco
prom = calc
deps = calc.h         # the one .h to rule them all
obj = main.o getch.o getop.o stack.o
$(prom): $(obj)
    $(CC) -o $(prom) $(obj)
%.o: %.c $(deps)      #pattern rule, which means all .o depends on all .c and $(deps)
    $(CC) -c $< -o $@ # $< means the depender and $@ means the dependee

Functions

we can make the file even smarter by using makefile funcions
Makefile function syntax $(func params)

cc = gcc
prom = calc
deps = $(shell find ./ -name "*.h") # find all header files using the builtin shell function
src = $(shell find ./ -name "*.c")
obj = $(src:%.c=%.o) 
$(prom): $(obj)
    $(CC) -o $(prom) $(obj)
%.o: %.c $(deps)
    $(CC) -c $< -o $@
clean:                              # empty target to run a commnad
    rm -rf $(obj) $(prom)

Reference

http://www.epubit.com.cn/article/546

Autotools

Autotools is a collection of three tools:

  1. autoconf — This is used to generate the “configure” shell script. As I mentioned earlier, this is the script that analyzes your system at compile-time. For example, does your system use “cc” or “gcc” as the C compiler?
  2. automake — This is used to generate Makefiles. It uses information provided by Autoconf. For example, if your system has “gcc”, it will use “gcc” in the Makefile. Or, if it finds “cc” instead, will use “cc” in the Makefile.
  3. libtool — This is used to create shared libraries, platform-independently.
$ autoscan                    #--> creates `autoscan.scan` file
$ mv autoscan.scan to `autoscan.ac` file
$ autoconf                    # --> use autoconf.ac to create `configure` file
# we need a `makefile.in` as the template for configure file to use
$ automake                    # --> use `makefile.in` to create `makefile`
$ autoheader                  # --> generate `config.h.in`
$ ./configure                 # --> generate the makefile and config.h
$ make && make install        # horry!

To be continued at:
http://markuskimius.wikidot.com/programming:tut:autotools:5

premake 基本用法

premake 可以生成makefile
premake gmake
生成的 makefile 支持
make 默认构建
make help 查看帮助文件
make config=release 按照 release 构建
make clean 清除构建
make config=release clean 清除 release 构建

premake5脚本的名字是 premake5.lua, 本质上就是一个 lua 脚本, 每一行都是一个函数调用, 因为参数恰好是字符串或者 table, 所以可以省略括号

-- premake5.lua
workspace "HelloWorld"
   configurations { "Debug", "Release" }
project "HelloWorld"
   kind "ConsoleApp"
   language "C"
   targetdir "bin/%{cfg.buildcfg}"
files { "**.h", "**.c" }
filter "configurations:Debug"
      defines { "DEBUG" }
      flags { "Symbols" }
filter "configurations:Release"
      defines { "NDEBUG" }
      optimize "On"

常用函数

workspace   相当于 vs 的 solution   
project project 
kind     指定编译目标类型   ConsoleApp WindowedApp    SharedLib StaticLib     
location     指定编译目标目录   
define   定义常量   
files   添加文件    文件名 *.ext **.ext
removefiles  屏蔽文件   
links   链接库 
libdirs 添加库目录   
configurations  指定不同的编译选项   需要通过 filter {"configurations:<name>"} 指定具体选项
platforms    指定不同的平台    和 vs 的 platform 类似, 但是也需要使用 filter 定义
includedirs 添加 include 目录   
optimize     设置优化选项  Off On
buildoptions    编译选项    比如-std=c99

作用范围

作用范围会发生继承, 使用 workspce '*' 或者 project '*'代表选中了所有workspace 或者 project

premake install

newaction {
   trigger     = "install",
   description = "Install the software",
   execute = function ()
      -- copy files, etc. here
   end
}

C的编译、调试与静态检查

# 使用Clang/gcc 常用的选项

“`
-std=c11 设定标准为 c11
-Wall 显示所有警告
-O2 二级优化, 通常够用了
-march=native 释放本地CPU所有指令
-g 如果需要使用 gdb 调试的话

-Dmarco=value 定义宏
-Umarco undef 宏
-Ipath 添加到 include
-llibrary 链接到 liblibbrary.a 文件
-Lpath 添加到链接

-c 只编译而不链接
-S 生成汇编代码, 但是不生成机器代码
-E 只预处理

-fopenmp 打开 OpenMP 支持
-pthread 添加 pthread 支持
-Werror 把所有 warning 显示为 error
“`

# 如何生成静态库, 动态库

see:

1. http://www.adp-gmbh.ch/cpp/gcc/create_lib.html
2. http://stackoverflow.com/questions/2734719/how-to-compile-a-static-library-in-linux

## 静态库

静态库的创建原理是把不同的目标文件打包在一起, 所以分两步

1. gcc -c -o mean.o mean.c
2. ar rcs libmean.a mean.o
生成的库的名字多了 lib 和. a

使用 `gcc -static main.c -L. -lmean -o a.out`

## 动态库

动态库需要生成 PIC(地址无关代码),

-Wl 后面的命令会传递给链接器

“`
gcc -c -fPIC calc_mean.c -o calc_mean.o # 大写的-fPIC 比- fpic 更通用, 虽然在 x86平台上没有区别
gcc -shared -Wl,-soname,libmean.so.1 -o libmean.so.1.0.1 calc_mean.o
“`

使用 `gcc main.c -o a.out -L. -lmean`

“`
LD_LIBRARY_PATH=.
./dynamically_linked
“`

## 最常用的指令

cc -Wall -std=c11 source.c -o executable
g++ -Wall -std=c++11 source.cc -o executable

# tips

处理二进制数据时尽量使用uint8_t,而不要使用char

函数的参数类型(接口)尽量使用 `void*`
不要这么做:

“`
void processAddBytesOverflow(uint8_t *bytes, uint32_t len) {
for (uint32_t i = 0; i < len; i++) { bytes[0] += bytes[i]; } } ``` 这么做: ``` void processAddBytesOverflow(void *input, uint32_t len) { uint8_t *bytes = input; for (uint32_t i = 0; i < len; i++) { bytes[0] += bytes[i]; } } ``` 来自

提交仓库前,统一格式,而不应该在编写过程中注意

不要使用malloc,总是使用calloc,因为清零的性能损失太小了,但是却经常忘记. 这一点存疑

尽量保证在编写内存获取代码的时候就写好释放代码

## 内存泄漏的排查

核心思想,malloc/free不配对

“`
windows,使用
#define _CRTDBG_MAP_MALLOC
#include

_CrtDumpMemoryLeaks();
“`
linux,使用mtrace实现动态检查
使用valgrind实现静态检查
valgrind –leak-check=full ./a.out
注意查看definitely lost和possible lost

## 调试的工具

– valgrind 排查内存问题
– strace/ltrace 查看系统调用和库调用
– pmap 查看内存使用情况

## 测试

## rr

mozilla’s rr is a promising tool to replace gdb. it can replay the recored execution of a program, so you can replay it until you find out the bug.

http://rr-project.org/

PEP8 中需要注意的地方

# spaces

If operators with different priorities are used, consider adding whitespace around the operators with the lowest priority(ies). Use your own judgment; however, never use more than one space, and always have the same amount of whitespace on both sides of a binary operator.

“`
Yes:
i = i + 1
submitted += 1
x = x*2 – 1
hypot2 = x*x + y*y
c = (a+b) * (a-b)
No:
i=i+1
submitted +=1
x = x * 2 – 1
hypot2 = x * x + y * y
c = (a + b) * (a – b)
“`

# comments

Compound statements (multiple statements on the same line) are generally discouraged.

Comments that contradict the code are worse than no comments. Always make a priority of keeping the comments up-to-date when the code changes!

Comments should be complete sentences. If a comment is a phrase or sentence, its first word should be capitalized, unless it is an identifier that begins with a lower case letter (never alter the case of identifiers!).

If a comment is short, the period at the end can be omitted. Block comments generally consist of one or more paragraphs built out of complete sentences, and each sentence should end in a period.
You should use two spaces after a sentence-ending period.
When writing English, follow Strunk and White.
Python coders from non-English speaking countries: please write your comments in English, unless you are 120% sure that the code will never be read by people who don’t speak your language.

Conventions for writing good documentation strings (a.k.a. “docstrings”) are immortalized in PEP 257 .
* Write docstrings for all public modules, functions, classes, and methods. Docstrings are not necessary for non-public methods, but you should have a comment that describes what the method does. This comment should appear after thedef line.
* PEP 257 describes good docstring conventions. Note that most importantly, the “”” that ends a multiline docstring should be on a line by itself, e.g.:
“`
“””Return a foobang

Optional plotz says to frobnicate the bizbaz first.
“””
“`
For one liner docstrings, please keep the closing “”” on the same line.

# Nameing Conventions

Overriding Principle
Names that are visible to the user as public parts of the API should follow conventions that reflect usage rather than implementation.

Names to Avoid
Never use the characters ‘l’ (lowercase letter el), ‘O’ (uppercase letter oh), or ‘I’ (uppercase letter eye) as single character variable names.
In some fonts, these characters are indistinguishable from the numerals one and zero. When tempted to use ‘l’, use ‘L’ instead.

Exception Names
Because exceptions should be classes, the class naming convention applies here. However, you should use the suffix “Error” on your exception names (if the exception actually is an error).

When implementing ordering operations with rich comparisons, it is best to implement all six operations ( __eq__ ,__ne__ , __lt__ , __le__ , __gt__ , __ge__ ) rather than relying on other code to only exercise a particular comparison.
To minimize the effort involved, the functools.total_ordering() decorator provides a tool to generate missing comparison methods.
PEP 207 indicates that reflexivity rules are assumed by Python. Thus, the interpreter may swap y > x with x < y , y >= xwith x <= y , and may swap the arguments of x == y and x != y . The sort() and min() operations are guaranteed to use the < operator and the max() function uses the > operator. However, it is best to implement all six operations so that confusion doesn’t arise in other contexts.

# lambda

Always use a def statement instead of an assignment statement that binds a lambda expression directly to an identifier.
Yes:
def f(x): return 2*x
No:
f = lambda x: 2*x
The first form means that the name of the resulting function object is specifically ‘f’ instead of the generic `’‘`. This is more useful for tracebacks and string representations in general. The use of the assignment statement eliminates the sole benefit a lambda expression can offer over an explicit def statement (i.e. that it can be embedded inside a larger expression)

# exceptions

Design exception hierarchies based on the distinctions that code catching the exceptions is likely to need, rather than the locations where the exceptions are raised. Aim to answer the question “What went wrong?” programmatically, rather than only stating that “A problem occurred” (see PEP 3151 for an example of this lesson being learned for the builtin exception hierarchy)

Class naming conventions apply here, although you should add the suffix “Error” to your exception classes if the exception is an error. Non-error exceptions that are used for non-local flow control or other forms of signaling need no special suffix.

Use exception chaining appropriately. In Python 3, “raise X from Y” should be used to indicate explicit replacement without losing the original traceback.

When deliberately replacing an inner exception (using “raise X” in Python 2 or “raise X from None” in Python 3.3+), ensure that relevant details are transferred to the new exception (such as preserving the attribute name when converting KeyError to AttributeError, or embedding the text of the original exception in the new exception message).

python signal

Any thread can perform an alarm(), getsignal(), pause(), setitimer() or getitimer(); only the main thread can set a new signal handler, and the main thread will be the only one to receive signals

# default handlers:

signal.SIG_DFL

This is one of two standard signal handling options; it will simply perform the default function for the signal. For example, on most systems the default action for SIGQUIT is to dump core and exit, while the default action for SIGCHLD is to simply ignore it.

signal.SIG_IGN

This is another standard signal handler, which will simply ignore the given signal.

# assign handerls:
signal.signal(signalnum, handler)

Set the handler for signal signalnum to the function handler. handler can be a callable Python object taking two arguments

handler: handler(signum, frame)

# example

“`
import signal, os

def handler(signum, frame):
print ‘Signal handler called with signal’, signum
raise IOError(“Couldn’t open device!”)

# Set the signal handler and a 5-second alarm
signal.signal(signal.SIGALRM, handler)
signal.alarm(5)

# This open() may hang indefinitely
fd = os.open(‘/dev/ttyS0’, os.O_RDWR)

signal.alarm(0) # Disable the alarm
“`

Reference:

1. https://docs.python.org/2/library/signal.html

shell 编程教程

# 变量和值

variables are referenced by $var or ${var}. Global variables are visible to all sub bash sessions, and are often called env variables, local variables are only visible to local session, not subsessions.

Global variables can be viewed as `env`, and can be created by `export`.

TEST=testing; export $TEST # or
export TEST=testing # NOTE: no $

## Useful variables

“`
HOME Same to ~
IFS
PATH Search path
EUID User id
GROUPS Groups for current user
HOSTNAME Hostname
LANG
LC_ALL
OLDPWD
PWD
“`

## 定义和使用变量

定义变量,注意因为 shell 中的语法使用空格作为命令分割的限制,等于号前后不能加空格。

“`
FOO=bar
“`

使用变量,需要添加上 `$` 符号。

“`
echo $FOO
“`

字符串在双引号中可以直接插入,这时候要加上大括号来指示变量名的起始位置。

“`
echo “${FOO}xxx”
“`

变量默认实在当前的回话中可见的,而不会作为环境变量传递给调用的命令。可以使用 export 导出变量,或者在命令前加上指定的环境变量。

“`
-> % cat env.py
import os
print(‘FOO env variable is: ‘, os.environ.get(‘FOO’))

-> % python3 env.py
FOO env variable is: None

-> % FOO=bar python3 env.py
FOO env variable is: bar
“`

使用 export

“`
-> % export FOO=bar
-> % python3 env.py
FOO env variable is: bar
“`

## 一些有用的内置变量

“`
$HOME 家目录,比如 /home/kongyifei
$IFS 默认的分隔符,和 for 循环紧密相关
$PATH 搜索路径,当你执行 ls 的时候,shell 会在这个变量中查找 ls 命令
$EUID 当前有效用户 ID
$LANG
$LC_ALL
$OLDPWD 上一个工作目录
$PWD 当前工作目录

“`

## 数组

使用小括号来定义一个数组,关于 for 循环随后会讲

“`
A=(1 2 3)

for el in ${A[@]}; do
echo $el
done
“`

## 字符串操作

大括号里面的字符串会被展开成独立的字符串

“`
% echo {1,2,3,4}
1 2 3 4
% mkdir -p test/{a,b,c,d}{1,2,3,4}
% ls test/
a1  a2  a3  a4  b1  b2  b3  b4  c1  c2  c3  c4  d1  d2  d3  d4
% mv test/{a,c}.conf # 这个命令的意思是:mv test/a.conf test/c.conf
“`

切片: `${string:start:length}`

默认值 `${var:-default}`

设定值 `${var:=default}`

长度 `${#var}`

### 字符串 Expansion and slice

[zorro@zorrozou-pc0 bash]$ mkdir -p test/zorro/{a,b,c,d}{1,2,3,4}
[zorro@zorrozou-pc0 bash]$ ls test/zorro/
a1  a2  a3  a4  b1  b2  b3  b4  c1  c2  c3  c4  d1  d2  d3  d4

[zorro@zorrozou-pc0 bash]$ mv test/{a,c}.conf
这个命令的意思是:mv test/a.conf test/c.conf

${string:start :length} string slice

default value: ${var:-default}
set value: ${var:=default}

${#var} get variable length

# Redirection

input: <, output >, append >>

cat > file << EOF this line will be redirected to file EOF # pipe pipe commands will be run simultaneously, but the second command will wait for the input # Sub shell use $(expression) # 控制语句 ## 条件语句 if 语句成立的条件是 `expr` 返回值为 0。 ``` if expr; then statement; elif expr; then statement; else statement; fi ``` ## test command 虽然可以使用任意的语句作为判断条件,不过我们一般情况下都是用 `[` 这个命令来作为判断条件的,需要注意的是 `[` 并不是一个语法,而是一个命令。不过由于 `[` 这个上古命令实在功能太少,现在一般采用 `[[` 来作为判断条件。 ``` if [[ "a" == "b" ]]; then echo "wtf" else echo "meh" fi ``` `[[`支持的条件有 1 数值比较, 仅限整数,注意不能使用 `>` `<` 等符号。 ``` n1 -eq n2 equal n1 -ge n2 greater or equal n1 -gt n2 greater n1 -le n2 less or equal n1 -lt n2 less n1 -ne n2 not equal ``` 2 字符串比较 Note: Variables may contain space, so the best way to comparison is to add quotes: `"$var1" = "$var2"` ``` str1 == str2 equal str1 != str2 not equal str1 < str2 less str1 > str2 greater
-z str zero
-n str not zero length
“`

3 file comparison

-d is directory?
-e exist?
-f is regular file?
-r exist and readable?
-s exist and has content
-w exist and writable
-x exist and executealbe
-O exist and owned
-G exist and in same group
file -nt file2 newer than
file1 -ot file2 older than

## case

case var in
parttern | pattern2) commands;;
pattern3) commands2;
*) default commnads;;
esac

## Loops

### foreach 语句

for var in list; do
echo $var
done

其中 list 可以是一个数组,也可以是一个被 $IFS 分割的字符串。默认情况下,$IFS 是 ” \n\t”。其中包含了空格。

如果要覆盖 IFS,一般这样使用:

OLDIFS=$IFS
IFS=”\n” # new seperator
# do things
IFS=$OLDIFS

### while-loop

until/while expr; do
# commands
done

### pipe

the result of a for loop is pipe-able to other command

“`
for city in beijing shanghai; do
echo $city is big
done > cities.txt
# will save the result in cities.txt
“`

# 输入输出

## 命令行参数

parameters to a script can be obtained as $1, $2, $3…。 $0 is the script name, remember to check whether the parameter is empty. $# is the number of parameters(without script name).

“`
$0 script name / function name
$1…$x command line arguments / parameters
$# number of arguments(without $0)
$* all parameters as a string
$@ all parameters as a string list
“`

### shift

processing parameters using shift,

while [ -n “$1” ]; do
case “$1” in
-a) echo “option -a” ;;
–) shift
break;;
*) echo “$1” is not a option ;;
esac
shift
done

## read

read OPTIONS VARNAME read input to variables

– read -p Prompt
– read -t timeout
– read -s hide input

we can use read to read file or stdin

## redirection

2> redirect STDERR
m>&n redirect fd m to fd n’s associated file

Note: you have to use command >> command.log 2>&1 (put 2>&1 at the end), since this means redirect 2 to 1’s
in a scirpt
exec 2> filename # reopen stdout to filename

# Signal

trap commnad signal is used to handle signals in shell

# Functions

有两种定义函数的方式

“`
function name {
# function body
}

foo() {
# function body
}
“`

要调用上面这个函数,直接就输入

“`
foo
“`

就好了

## return

shell functions behave like a small script, and it does NOT return a computed value…It retures a exit code, which is between 0 and 255. if no return is specified, the exit code of last command will be returned

You can read the return value by $? like any normal commands

the right way to to return a value from function, you will have to echo out the value, and put the function is subshell

“`
function foo {
# do some compute
echo $result;
}

retval=$(foo)
“`

Note: any thing that echos in the function body will be captured, so please keep that from happen

## parameters

like a shell script, $0 holds the function name, $1 … $9 holds the parameters, $# is the num of parameters

## local variables

use `local` to declare local variables

# alias

“`
alias new_name=’command string’
$ \command # bypass alias
“`

# debugging

DEBUG macro

# multiprocess

PID_ARRAY=()
for file in filelist; do
md5sum file &;
PID_ARRAY+=(“$!”)
done
wait ${PID_ARRAY[@]}

docker 小技巧

# 删除所有停止的容器

docker rm `docker ps -aq`

# 删除所有没有 tag 的镜像

docker rmi `docker images | grep “^” | awk ‘{print $3}’`

# 进入运行中的镜像

docker exec -it CONTAINER_NAME bash

# 杀死所有运行中的镜像

docker kill $(docker ps -q)

# 删除所有镜像

docker rmi $(docker images -q)