浏览器插件

xpath generator 是如何实现的?

写爬虫的话做到最后基本上最终没法自动化的就是指定要抽取的元素的xpath了, 要定向爬一个网站的内容基本上都会归结到去找下一页和数据元素的xpath. 如果能把xpath的生成交给不会写程序的运营同学来做的话, 能够极大地解放程序员的生产力.

毕竟xpath也算是一个DSL, 对于不会编程的同学还是有一定难度的. SQL写得熟练的PM多得是, 想找一个会写xpath的运营同学则是很困难, 毕竟术业有专攻, 运营需要面对的问题和我们程序猿还是有很大不同. 多年的经验, 感觉能教会他们yaml已经是极限了…

那么能不能有一个图形化的工具来生成xpath呢? 答案显然是有的, chrome浏览器就内置了生成xpath的工具, 如下图所示:

![chrome xpath](https://ws4.sinaimg.cn/large/006tNc79ly1flhwpu64uvj31f20keaj4.jpg)

这幅图生成的xpath是: `//*[@id=”fc_B_pic”]/ul[1]/li[1]/a[1]`

然而chrome的xpath生成却有几个缺点:

1. chrome的xpath只会想上去找带有id的元素, 而根据实际的情况, 往往找到带有class的元素就可以保证找的xpath是对的了.
2. chrome生成的元素是尽量保证元素唯一的, 也就是当你想要搞一个能能够选中多个元素的xpath时, chrome 无能为力, 还是需要自己去改写.
3. 另外就是生成之后不能方便的用图形工具去验证.

未完待续

steal focus from chrome omnibox on new tab

chrome set focus to the omni box when you create a new tab, although there is an api to replace the new tab page. you can’t steal the focus from the omni box in the new tab page simply. there are two work-arounds.

# if you are creating a new tab programmatically
https://stackoverflow.com/questions/42178723/chrome-extension-creating-new-tab-and-taking-focus-to-page

# if you are creating a new tab by click new tab button
https://stackoverflow.com/questions/17598778/how-to-steal-focus-from-the-omnibox-in-a-chrome-extension-on-the-new-tab-page

running chrome extension from command line

https://stackoverflow.com/questions/22193369/run-chrome-extensions-using-command-prompt

chrome/firefox 插件源码的位置:

1. Chrome `~/Library/Application Support/Google/Chrome/Default/Extensions//`
2. Firefox  `~/Library/Application Support/Firefox/Profiles/PROFILE_ID/extensions/EXTENSION_ID/`

Chrome Extension storage

# Basic Concepts

there are 3 storage area for chrome, `sync`, `local`, `managed` areas. the `sync` area will be synced with the cloud. managed area is read-only.

all your extension scripts share the same storage, including content scripts, they don’t belong to their domain’s localStorage.

# Usage

“`
chrome.storage.local.get(‘key’, function(data) {});
chrome.storage.local.get([“KEY1”, “KEY2”], function(data) {});

chrome.storage.local.set(data, function() {}); // data is key-value pair to store

chrome.storage.local.remove(‘key’, function() {});
chroem.storage.local.remove([“KEY1”, “KEY2”], function() {});
chrome.storage.local.clear(function() {});
“`

# Events

Chrome extension cookies

# permissions

set the cookies permission and the domain you would like to access cookies.

“`
“permissions”: {
“cookies”,
“*://*.example.com/”
}
“`

# type
## cookie
just a simple object with `{name, value, domain…}`

## CookieStore
normal mode and incognito mode use different cookie stores.
# read

get: `chrome.cookies.get({url: URL, name: COOKIE_NAME, storeId: COOKIE_STORE_ID}, function(cookie) {})`

get all: `chrome.cookies.get({domain: DOMAIN}, function(cookies) {})` NOTE: there are other filters not listed here.

set: `chrome.cookies.set({url, name, value}, function(cookie) {})` if failed, the callback gets null

Chrome 扩展插件开发

A chrome extension can inject script into the page, this is called content script.

https://developer.chrome.com/extensions/getstarted
https://developer.chrome.com/extensions/content_scripts
https://developer.chrome.com/extensions/messaging

图标变灰的问题

Add browser_action.default_icon in your manifest.json file

“`
{

“browser_action”: {
“default_icon”: “icons/icon-32.png”
},


}
“`

学习 greasemonkey 教程

GreaseMonkey/TamperMonkey 学习

# 头部命令

* @name | 脚本名字|
* @namespace|命名空间|
* @version| 版本|
* @author|作者|
* @description
* @homepage
* @icon
* @updateURL
* @downloadURL
* @include
* @exclude
* @resource key url
* @require include scripts
* @connect reach cross origin domains self, current domain, localhost, or *
* @run-at when to run the script document-start/document-body/document-end/document-idle/context-menu
* @grant whitelist GM_* functions If no @grant tag is given TM guesses the scripts needs.

# 函数

“`
GM_addStyle(css)
GM_get/set/deleteValue
GM_listValues()
GM_getResourceText(name)
GM_getResourceURL(name) get base64 encoded urI
GM_openInTab(url)
GM_getTab(cb) Get a object that is persistent as long as this tab is open.
GM_getTabs(cb) Get all tab objects as a hash to communicate with other script instances.
GM_setClipboard(data, info) set the clipboard

GM_xmlhttprequest can do cross domain request

“`

using it in $.ajax https://gist.github.com/yifeikong/9e93cc38297cce989ffbef5587ad2f39