剑桥领思介绍

Posted on 2020-06-13 Edited on 2023-05-19 In English Word count in article: 2.7k Reading time ≈ 10 mins.

[toc]

免责声明：以下内容仅代表我的个人认知、个人经历和个人观点，不能确保信息完全准确，请读者自己辨识。如果有任何错误，请不要将它们归咎于任何其他个人或组织。

剑桥领思 (Linguaskill)

什么是剑桥领思

由剑桥大学英语考评部研发的剑桥领思（Linguaskill）是一项快速便捷的在线测试产品，可帮助组织机构测试应试者的英语听、说、读、写水平，该产品以人工智能技术为支撑，集便于组织、快速出成绩、准确可靠、成绩报告详尽等优点于一身。

报考通用版 or 职场版？

剑桥领思（Linguaskill）适用于 16 岁及以上的测评对象，根据不同的目标和计划测评的英语种类，分为剑桥领思通用版和剑桥领思职场版。剑桥领思通用版和职场版的考试形式与题型相同，均由 “阅读和听力”、“写作” 以及 “口语” 三部分组成。

剑桥领思通用版适用于日常英语测评，其应用场景包括学校招生、阶段性测试和毕业测评等，以及企业机构的日常岗位招聘等。

剑桥领思职场版则偏重商务和企业英语应用语境，适用于要求应聘者熟悉专业商务用语，以及涉及国际业务的组织机构。

考试形式

在考试组织上，剑桥领思考试不限时间、场地，只需一台电脑、一个网络、一个麦克风和一套耳机设备，即可进行考试。

剑桥领思阅读和听力成绩即时生成，完整评估在 48 小时内完成。在当前远程监考的环境下，成绩报告会在完成监考复核后发布，即考生会在考试完成后 72 小时内获得成绩报告。
剑桥领思以人工智能为支撑，采用最先进的自适应性理论，试题的难易程度会依据应试者的前一个回答情况而调整。
评估结果以清晰的个人或小组报告呈现，全面反映被测试者在欧洲语言通用参考框架 CEFR 以及在剑桥英语量表中的相应级别。

考试内容

剑桥领思考试模块分为三个模块：阅读和听力、写作及口语，应试者可根据需求进行模块分别或统一进行测试。

阅读和听力

考试时长：约 60 - 85 分钟

阅读和听力模块具有自适应性，能根据应试者情况自动调整，因此题目数量不是固定的。应试者每回答一个问题，电脑就能更加了解其英语水平。待已答题目达到一定数量，电脑能准确判断出应试者英语水平后，测试结束。

当前，阅读和听力题目是分开的，做题没有明确的时间限制。

考试题型 -- 阅读

阅读并选择：应试者阅读包含简短文字的通知、示意图、标识、备忘录或者信件，然后从三个选项中选择最符合短文意思的句子或短语。
填空：应试者从四个选项中选择正确的单词给句子填空。
完形填空：应试者从四个选项中选择正确的单词或短语给短文中的句子填空。
开放式完形填空：应试者将单词填写在短文中空缺的地方。
- 这是我认为阅读中比较困难的一类，需要自己想出来一个单词填空。

拓展阅读：应试者读一篇较长的文本，并回答一系列选择题。题目顺序与文中信息顺序一致。
- 基本一个段落对应一道题目，所以可以缩小阅读范围。

考试题型 -- 听力

听并选择：应试者听一段短音频，然后从三个选项中选出正确答案。
- 不限定阅读题目的时间，所以有足够的时间去理解题目和选项。
- 一段对话可以听两遍，中间的间隔时间由考生决定。
- 备选项基本都会听到，所以不能根据听到的信息来蒙题，需要在完全理解对话的基础上，根据逻辑推理才能做出正确的选择。

拓展听力：应试者听一段较长音频，然后回答一系列相关选择题。题目顺序和音频中信息顺序一致。
- 长音频，问题由多个组成，同样地，一段对话可以听两遍。

写作

考试时长：45 分钟（两部分）

写作测试采用电脑阅卷。应试者用键盘输入答案。

考试题型：

测试分为两个部分。 1. 应试者阅读一篇短消息，通常为邮件。然后根据消息中的信息和给出的三个要点写一篇不少于 50 个单词的邮件。答题时长约为 15 分钟。 2. 应试者阅读一篇概述某情景的短文，然后根据情景中的信息和给出的三个要点写出不少于 180 个单词的信或者报告。答题时长约为 30 分钟。 * 类似于雅思的大小作文，两篇作文的计时是一起的，所以需要合理分配写作的时间。 * 考官范文中的字数远远超过题目要求，所以我猜测每篇文章可以尽量多写一点，可以加分。

口语

考试时长： 15 分钟（五个部分）

口语模块需要准备电脑、麦克风和头戴式耳机。试题会出现在屏幕上和耳机里，应试者的回答会被录下来，供考官打分。

考试题型

口语涵盖面试、大声朗读、陈述 1、陈述 2 和交流活动五大题型，每个题型均占总分的 20%。

面试：应试者回答关于自己的八个问题（前两个问题不计分）。

需要集中精力，根据播放的录音来回到对应的问题。
前两个问题是你的名字和家庭名字拼读。

大声朗读：应试者大声朗读八个句子。

句子中有类似于 AM, Mrs 的缩写词。
阅读句子时最好先分析句子结构，按语义群开始朗读，而不是一个个单词向外输出。
阅读的句子会越来越长，最后两个句子一般是 20+ 个单词。

陈述 1: 应试者就给定的某话题进行一分钟的陈述。准备时间为 40 秒。

目前，考试要求不能携带纸和笔，所以不能在纸上写思路和关键词。

陈述 2: 应试者就给定的一个或多个图形（例如图表、示意图或信息表单）进行一分钟的陈述。准备时间为一分钟。

一般是两个物品的对比，比如两件衣服的材料、价格和功能比较，两件房子大小、租金和内部设计的比较等等。

交流活动：应试者简短地就某话题的五个问题发表自己的观点。准备时间为一分钟。

五个题目的关键词可以提前看到，所以可以提早准备。

报名费用

全项报名

698 元人民币。 ### 单项报考 - 阅读和听力：398.00 元人民币 - 写作：398.00 元人民币 - 口语：398.00 元人民币

报名须知

领思考试需要提早一周时间，请合理安排考试时间。比如周一报名，只能选择下周一及后面的考试场次。

认可度

Global Recognition

认可院校

莫纳什大学

Monash English language proficiency

莫纳什大学在其他语言测试中心关闭的情况下，认可剑桥领思通用考试（Cambridge Linguaskill General）成绩。

针对雅思要求 6.5 (单项不低于 6) 的专业，剑桥领思成绩要求为 176 (听力、写作、口语、阅读均不低于 169)，对应 CEFR 的 B2 水平。

报名

剑桥领思考试 -- 全科

该报名平台仅支持微信支付接受微信支付

考试日期选择的范围是：成功支付报名费后的第 7 日及随后的 30 个自然日。

根据官网提供的信息报考剑桥领思，通过二维码完成考生用户个人信息填写及考试科目的选择，完成考试缴费后您将收到考试报名完成的通知邮件，按照邮件内容指引进行考试即可。考试形式为线上考试，同时采用专业的远程在线监考平台提供考试全程的远程在线监考。

我是在上午 9 点报名的，当天下午 5 点收到报名考试成功的通知，如下图所示。

欧洲语言共同参考框架

欧洲语言共同参考框架（即 Common European Framework of Reference for Languages，简称 CEFR）是描述学习者语言能力的国际标准。这一标准全球通用。

CEFR 图表

References

解读 Playing Atari with Deep Reinforcement Learning

Posted on 2020-06-04 Edited on 2023-05-19 In paper Word count in article: 3.7k Reading time ≈ 14 mins.

[toc]

Playing Atari with Deep Reinforcement Learning

Paper download link

多平台维护不易，内容实时更新于个人网站，请移步阅读最新内容。

Abstract

文章提出第一个深度学习模型，能够使用强化学习算法从高维的感知输入 (high-dimensional sensory input) 中学习到控制策略 (control policies)。这个模型是一个卷积神经网络 (convolutional neural network)，使用 Q-learning 的变种进行训练，可以直接输入像素，然后输出评估未来奖励的值函数 (a value function estimating future rewards)。

The deep learning model is a convolutional neural network, trained with a variant of Q-learning, whose input is raw pixels and whose output is a value function estimating future rewards.

Introduction

研究现状：很多 RL 应用依赖于人工制定的特征，其性能表现严重依赖特征表示的质量，不具备自学习能力。 > Most successful RL applica- tions that operate on these domains have relied on hand-crafted features combined with linear value functions or policy representations. Clearly, the performance of such systems heavily relies on the quality of the feature representation.
从深度学习 DL 在计算机视觉和语音识别上取得进步的现状，思考类似的方法能否应用到通过强化学习 RL 来控制智能体。 > Recent advances in deep learning have made it possible to extract high-level features from raw sensory data, leading to breakthroughs in computer vision and speech recognition.
- DL --> extract high-level features
It seems natural to ask whether similar techniques could also be beneficial for RL with sensory data.
从 DL 的视角，讨论 RL 面临的挑战。

比较 DL 和 RL
- RL 有一些问题，DL 依然无法解决。

	Deep Learning	Reinforcement Learning
Data source	required large amounts of hand-labelled training data	learn from a scalar reward signal that is frequently sparse, noisy and delayed (actions and rewards)
数据来源	从大量人工标注的数据中学习	只能从稀少、噪声和延迟的奖励中学习
Data features	assume the data samples to be independent	typically sequences of highly correlated states
数据特点	采样数据独立	一系列高度关联的状态
Data distribution	a fixed underlying distribution	data distribution changes as the algorithm learns new behaviours
数据分布	固定的数据分布	学习新行为后，数据分布会调整

本文贡献

卷积神经网络可以在复杂的 RL 环境中从视频输入数据中学习控制策略。

This paper demonstrates that a convolutional neural network can overcome these challenges to learn successful control policies from raw video data in complex RL environments.
- 使用 Q-learning 的变种训练
  - The network is trained with a variant of the Q-learning algorithm, with stochastic gradient descent to update the weights.
- 为了缓和关联数据和非平稳分布，使用经验回放机制 (an experience replay mechanism)，随机采样之前的转换，所以可以平滑过去行为的训练分布。
  - To alleviate the problems of correlated data and non-stationary distributions, we use an experience replay mechanism which randomly samples previous transitions, and thereby smooths the training distribution over many past behaviors.

Background

任务描述

环境 (environment) > We consider tasks in which an agent interacts with an environment E, in this case the Atari emulator, in a sequence of actions, observations and rewards.
行动 (actions): 通过合法的行为来进一步改变模拟器的内部状态和游戏得分。 > At each time-step the agent selects an action at from the set of legal game actions, A = {1, . . . ,K}.

The action is passed to the emulator and modifies its internal state and the game score.
观察 (observations)：屏幕像素 > The emulator’s internal state is not observed by the agent; instead it observes an image rom the emulator, which is a vector of raw pixel values representing the current screen.
奖励 (rewards)：游戏得分 > In addition it receives a reward representing the change in game score.
交互特征
- 行动和奖励的延时性 (feedback delay) > Note that in general the game score may depend on the whole prior sequence of actions and observations; feedback about an action may only be received after many thousands of time-steps have elapsed.
- 智能体最大化得分 > The goal of the agent is to interact with the emulator by selecting actions in a way that maximises future rewards.

问题抽象

graph TB
A[Question & Model]-->B(the optimal action-value function)
A-->I(learn game strategies from sequences of actions and observations)
B-->C(Intuition)
B-->J(the maximum expected reward achievable by following any strategy)
C-->D(Bellman equation)
C-->K(current reward plus the optimal value of the sequence at the next time-step)
D-->E(generalisation)
D-->L(estimate the action-value function by using the Bellman equation as an iterative update)
E-->M(the above action-value function is without any generalisation)
E-->N(function approximator)
N-->F(linear function approximator)
N-->G(non-linear function approximator)
G-->H(neural network)
H-->O(Q-network)

从一系列的行动和观察中学习 > We therefore consider sequences of actions and observations, st = x1, a1, x2, ..., at−1, xt, and learn game strategies that depend upon these sequences.
- 有效时间内终结 > All sequences in the emulator are assumed to terminate in a finite number of time-steps.
- 转化为 MDP 问题 > This formalism gives rise to a large but finite Markov decision process (MDP) in which each sequence is a distinct state.
- 使用强化学习来处理 MDPs 问题 > We can apply standard reinforcement learning methods for MDPs, simply by using the complete sequence st as the state representation at time t.
Optimal action-value function：在确定初始状态后 (st, at, pi) 后，在规定的迭代时间内，不同的 policy 值返回的奖励是不同的；选择序列中最大的奖励，那么对于的 policy 就是最优的。
- We define the optimal action-value function Q∗(s, a) as the maximum expected return achievable by following any strategy, after seeing some sequence s and then taking some action a, Q∗(s, a) = maxπ E [Rt|st = s, at = a, π], where π is a policy mapping sequences to actions (or distributions over actions).
Bellman equation
- The optimal action-value function obeys an important identity known as the Bellman equation.
- Intuition : 转化问题为寻找当前最优
- 因为 the optimal value Q∗(s?, a?) of the sequence s? at the next time-step 不可知，所以进一步简化问题。 > The basic idea behind many reinforcement learning algorithms is to estimate the action- value function, by using the Bellman equation as an iterative update, Qi+1 (s, a) = E [r + γ maxa?Qi (s?, a?)|s, a]. Such value
  - 证明收敛性： > Such value iteration algorithms converge to the optimal action-value function.
泛化能力
- 实践中，原模型严重依赖当前的 s，缺乏泛化能力。 > In practice, this basic approach is totally impractical, because the action-value function is estimated separately for each sequence, without any generalisation.
- 我们希望使用一个近似函数来估计行为 - 值函数 > It is common to use a function approximator to estimate the action-value function.
  - 近似函数可以使用线性函数，或者非线性函数，比如神经网络。 > In the reinforcement learning community this is typically a linear function approximator, but sometimes a non-linear function approximator is used instead, such as a neural network.
- 问题转换：把一个复杂的数学模型，通过 Bellman equation 转换为 Q-learning 的问题。
Training -- Q-network
- A Q-network can be trained by minimising a sequence of loss functions that changes at each iteration.
- Note that the targets depend on the network weights; this is in contrast with the targets used for supervised learning, which are fixed before learning begins.
- Differentiating the loss function with respect to the weights at the gradient.
- [又一次简化问题] Rather than computing the full expectations in the above gradient, it is often computationally expe- dient to optimise the loss function by stochastic gradient descent.

Summary
- model-free: it solves the reinforcement learning task directly using samples from the emulator E, without explicitly constructing an estimate of E.
- off-policy: it learns about the greedy strategy a = maxa Q(s, a; θ), while following a behaviour distribution that ensures adequate exploration of the state space.

Timeline

1995: TD-gammon used a model-free reinforcement learning algorithm similar to Q-learning, and approximated the value function using a multi-layer perceptron with one hidden layer.
- Early attempts to follow up on TD-gammon were less successful.
- 1997: It was shown that combining model-free reinforcement learning algorithms such as Q-learning with non-linear function approximators, or indeed with off-policy learning could cause the Q-network to diverge.
Subsequently, the majority of work in reinforcement learning focused on linear function approximators with better convergence guarantees.
More recently, there has been a revival of interest in combining deep learning with reinforcement learning.
- Deep neural networks have been used to estimate the environment; restricted Boltzmann machines have been used to estimate the value function [2004]; or the policy [2012].
- The divergence issues with Q-learning have been partially addressed by gradient temporal-difference methods.
- Pros
  - These methods are proven to converge when evaluating a fixed policy with a nonlinear function approximator [2009]; or when learning a control policy with linear function approximation using a restricted variant of Q-learning [2010].
- Cons
  - However, these methods have not yet been extended to nonlinear control.
2005: NFQ optimises the sequence of loss functions, using the RPROP algorithm to update the parameters of the Q-network.
- NFQ has also been successfully applied to simple real-world control tasks using purely visual input, by first using deep autoencoders to learn a low dimensional representation of the task, and then applying NFQ to this representation.
  - Our approach applies reinforcement learning end-to-end, directly from the visual inputs; as a result it may learn features that are directly relevant to discriminating.
- Cons
  - It uses a batch update that has a computational cost per iteration that is proportional to the size of the data set, whereas we consider stochastic gradient updates that have a low constant cost per iteration and scale to large data-sets.
1993: Q-learning has also previously been combined with experience replay and a simple neural network.
- But again starting with a low-dimensional state rather than raw visual inputs.
2013: The use of the Atari 2600 emulator as a reinforcement learning platform, who applied standard reinforcement learning algorithms with linear function approximation and generic visual features.
- The HyperNEAT evolutionary architecture has also been applied to the Atari platform, where it was used to evolve (separately, for each distinct game) a neural network representing a strategy for that game.

Deep Reinforcement Learning

现有的基于深度学习在视觉和声音上的研究，激发作者的本文研究。 > Recent breakthroughs in computer vision and speech recognition have relied on efficiently training deep neural networks on very large training sets. The most successful approaches are trained directly from the raw inputs, using lightweight updates based on stochastic gradient descent. By feeding sufficient data into deep neural networks, it is often possible to learn better representations than handcrafted features.
本文的研究目标，连接 RL 和 DNN。 > Our goal is to connect a reinforcement learning algorithm to a deep neural network which operates directly on RGB images and efficiently process training data by using stochastic gradient updates.

Deep Q-learning with Experience Replay
1. We store the agent’s experiences at each time-step.
2. We apply Q-learning updates, or minibatch updates, to samples of experience, drawn at random from the pool of stored samples.
3. After performing experience replay, the agent selects and executes an action according to an ?-greedy policy.

DQN (off-policy) vs online Q-learning
- Each step of experience is potentially used in many weight updates, which allows for greater data efficiency.
- Learning directly from consecutive samples is inefficient, due to the strong correlations between the samples; randomizing the samples breaks these correlations and therefore reduces the variance of the updates.
- When learning on-policy the current parameters determine the next data sample that the parameters are trained on.
  - It is easy to see how unwanted feedback loops may arise and the parameters could get stuck in a poor local minimum, or even diverge catastrophically.
  - Note that when learning by experience replay, it is necessary to learn off-policy (because our current parameters are different to those used to generate the sample), which motivates the choice of Q-learning.
- By using experience replay the behavior distribution is averaged over many of its previous states, smoothing out learning and avoiding oscillations or divergence in the parameters.
- In practice, our algorithm only stores the last N experience tuples in the replay memory, and samples uniformly at random from D when performing updates.
  - A more sophisticated sampling strategy might emphasize transitions from which we can learn the most, similar to prioritized sweeping.

Preprocessing and Model Architecture

降低输入数据的维度
- Reduce the input dimensionality
  - [降采样和灰度化] The raw frames are preprocessed by first converting their RGB representation to gray-scale and down-sampling it to a 110×84 image.
How to parameterize Q using a neural network?
- We instead use an architecture in which there is a separate output unit for each possible action, and only the state representation is an input to the neural network.
  - The outputs correspond to the predicted Q-values of the individual action for the input state.
- The main advantage of this type of architecture is the ability to compute Q-values for all possible actions in a given state with only a single forward pass through the network.
Deep Q-Networks
- Input: state representation > The input to the neural network consists is an 84 × 84 × 4 image.
- Three hidden layers, followed by a rectifier nonlinearity.
- Output: Q-values for actions > The output layer is a fully- connected linear layer with a single output for each valid action.
- We refer to convolutional networks trained with our approach as Deep Q-Networks (DQN).

Experiments

方法的通用性和鲁棒性：不需要根据游戏定制信息。 > We use the same network architecture, learning algorithm and hyperparameters settings across all seven games, showing that our approach is robust enough to work on a variety of games without incorporating game-specific information.
适应游戏的做法
- [修改分数比值] Since the scale of scores varies greatly from game to game, we fixed all positive rewards to be 1 and all negative rewards to be −1, leaving 0 rewards unchanged.
- [frame-skipping technique] The agent sees and selects actions on every kth frame instead of every frame, and its last action is repeated on skipped frames.

Training and Stability

如何评估训练 > Since our evaluation metric is the total reward the agent collects in an episode or game averaged over a number of games, we periodically compute it during training.
- The average total reward evolves during training are indeed quite noisy, giving one the impression that the learning algorithm is not making steady progress.
- More stable, metric is the policy’s estimated action-value function Q, which provides an estimate of how much discounted reward the agent can obtain by following its policy from any given state.
- 收敛性缺乏理论依据保证 > This suggests that, despite lacking any theoretical convergence guarantees, our method is able to train large neural networks using a reinforcement learning signal and stochastic gradient descent in a stable manner.

Visualizing the Value Function

尽管理论依据少，但通过 predicted value function 和实际游戏界面的比较和关联，来证明算法的有效性。 > Figure 3 demonstrates that our method is able to learn how the value function evolves for a reasonably complex sequence of events.

Main Evaluation

和现有的算法、做法比较（理论和实际） > We compare our results with the best performing methods from the RL literature.
- 找出其他算法的缺陷，或者验证算法提高点。 > Note that both of these methods incorporate significant prior knowledge about the visual problem by using background sub- traction and treating each of the 128 colors as a separate channel. In contrast, our agents only receive the raw RGB screenshots as input and must learn to detect objects on their own. > Our approach (labeled DQN) outperforms the other learning methods by a substantial margin on all seven games despite incorporating almost no prior knowledge about the inputs.
人机比较 > In addition to the learned agents, we also report scores for an expert human game player and a policy that selects actions uniformly at random.
和 evolutionary policy search 比较 > We also include a comparison to the evolutionary policy search approach from [8] in the last three rows of table.

Conclusion

贡献 > This paper introduced a new deep learning model for reinforcement learning, and demonstrated its ability to master difficult control policies for Atari 2600 computer games, using only raw pixels as input.
> We also presented a variant of online Q-learning that combines stochastic minibatch up- dates with experience replay memory to ease the training of deep networks for RL.

References

其他人的解读

如何在文字上方添加拼音或英文标注

Posted on 2020-06-02 Edited on 2023-05-19 In html Word count in article: 708 Reading time ≈ 3 mins.

[toc]

在公众号 Linux 中国的文章中，自己看到文章中一般会在中文技术名词的上方标注对应的英文，用户体验非常好，如下所示。

自己私下询问后得知，可以使用 HTML5 的 RUBY 标签实现，那么再也不用采用下面的方式来给技术名词加专业英文解释。

1	强化学习(reinforcement learning)

背景知识

什么是 HTML？

HTML 是用来描述网页的一种语言。 - HTML 指的是超文本标记语言 (Hyper Text Markup Language) - HTML 不是一种编程语言，而是一种标记语言 (markup language) - 标记语言是一套标记标签 (markup tag) - HTML 使用标记标签来描述网页

HTML 标签

HTML 标记标签通常被称为 HTML 标签 (HTML tag)。 - HTML 标签是由尖括号包围的关键词，比如 <html> - HTML 标签通常是成对出现的，比如 <b> 和 </b> - 标签对中的第一个标签是开始标签，第二个标签是结束标签 - 开始和结束标签也被称为开放标签和闭合标签

HTML 文档 = 网页

HTML 文档描述网页
HTML 文档包含 HTML 标签和纯文本
HTML 文档也被称为网页

Web 浏览器的作用是读取 HTML 文档，并以网页的形式显示出它们。浏览器不会显示 HTML 标签，而是使用标签来解释页面的内容。

什么是 HTML5？

HTML5 是最新的 HTML 标准。
HTML5 是专门为承载丰富的 web 内容而设计的，并且无需额外插件。
HTML5 拥有新的语义、图形以及多媒体元素。
HTML5 提供的新元素和新的 API 简化了 web 应用程序的搭建。
HTML5 是跨平台的，被设计为在不同类型的硬件（PC、平板、手机、电视机等等）之上运行。

Ruby

Ruby 是一种排版注释系统，是位于横排基础文本上方的简短文字，主要针对东亚语言作出简单的读音注释。

ruby 涉及的元素包括 ruby、rt 以及 rp。首先使用 ruby 指定一个具体的表达式，然后使用 rt 提供说明。rt 部分将显示在表达式上方。

例子

Example 1

<ruby>
强化学习<rt>reinforcement learning</rt>
</ruby>
有近几十年的研究历史，是 <ruby>
人工智能<rt>artificial intelligence</rt>
</ruby>
的一个研究方向。

强化学习 reinforcement learning 有近几十年的研究历史，是人工智能 artificial intelligence 的一个研究方向。

Example 2

<ruby>汉<rt>Hàn</rt></ruby>
<ruby>字<rt>zì</rt></ruby>
<ruby>拼<rt> pīn</rt></ruby>
<ruby>音<rt>yīn</rt></ruby>

汉 Hàn 字 zì 拼 pīn 音 yīn

References

网站被谷歌和百度收录

Posted on 2020-05-31 Edited on 2023-05-19 In website Word count in article: 1.2k Reading time ≈ 4 mins.

[toc]

在谷歌或者百度的搜索链接中，使用以下格式可以直接搜索自己的域名。如果能搜索到就说明已经被收录，反之则没有。

1 2	// site: your_wabsite site: http://www.waylon.one

如果自己的域名没有被收录，那么可以通过下面的方式，让自己的网站内容被 baidu 和 google 收录，增加网站 SEO。

Google

注册 webmasters

webmasters 入口

添加网站资源

根据添加网站资源，现在支持网址前缀资源和网域资源。 * 网址前缀资源：仅包含具有指定前缀（包括协议 http/https）的网址。 * 网域资源：包括所有子网域（m、www 等）和多种协议（http、https、ftp）的网域级资源。 * 如果您希望资源匹配任何协议或子网域（http/https/www./m. 等），请考虑改为添加网域资源，但 网域资源仅支持 DNS 记录验证。

选择网域资源后，弹框要求对 DNS 支持验证。

登录到域名提供商网站（我是在 Doynadot 解析域名）。
将验证的 TXT 记录复制到验证网站的 DNS 配置中。
注意：DNS 更改可能要过一段时间才会生效。如果 Search Console 未能立即发现相应记录，请等待 1 天，然后重新尝试验证。

添加站点地图

站点地图（sitemap），是一个页面，上面放置了网站上需要搜索引擎抓取的所有页面的链接。站点地图可以告诉搜索引擎网站上有哪些可供抓取的网页，以便搜索引擎可以更加智能地抓取网站。

配置站点地图

安装百度和 Google 的站点地图生成插件。

1	npm install hexo-generator-sitemap --save

修改配置文件

修改 Hexo 站点配置文件 _config.yml，添加以下内容：

1
2
3

# 自动生成 sitemap
sitemap:
  path: sitemap.xml

执行网站生成和部署命令，进入，检查里面是否有 sitemap.xml 文件，这就是生成的站点地图，里面包含了网站上所有页面的链接，google 搜索引擎通过这个文件来抓取网站页面。

1	hexo g && hexo d

提交站点地图

如下图所示，在添加新的站点地图中添加自己的站点地图并提交，那么自己的网站内容就会被 google 周期性收录。

Baidu

网站提交

提交网址给百度的入口点这里。
网站提交过后，在第三步有一个验证网站步骤，证明您是该域名的拥有者。
- 我选择的是文件验证，需要将验证文件放置于您所配置域名的根目录下，对应于 Hexo 的 hexo_file_path。
- 网站发布成功后，返回站点验证。
- 为保持验证通过的状态，成功验证后请不要删除 HTML 文件。

资源提交

普通收录工具可以向百度搜索主动推送资源，缩短爬虫发现网站链接的时间。入口点这里。
在资源提交中，我选择了主动推送 (实时) 的方式，有以下优势。
- 及时发现：可以缩短百度爬虫发现您站点新链接的时间，使新发布的页面可以在第一时间被百度收录。
- 保护原创：对于网站的最新原创内容，使用主动推送功能可以快速通知到百度，使内容可以在转发之前被百度发现。

推送配置

参考 hexo 百度主动推送实现，实现基于插件的自动推送。

安装 hexo-baidu-url-submit 插件

在 Hexo 站点根目录下，输入下面命令安装插件。

1	npm install hexo-baidu-url-submit --save

在站点配置文件 (_config.yml) 中添加 baidu-url-submit 的配置项

其中，进入主动推送工具后，会看到接口调用地址的 token，token 是由 16 个英文数字组合的字符串。

# 设置百度主动推送
baidu_url_submit:
  count: 200  # 比如200，代表提交最新的200个链接
  host: your_website # 在百度站长平台中注册的域名
  token: your_token # 请注意这是您的秘钥， 所以请不要把博客源代码发布在公众仓库里!
  path: baidu_urls.txt # 文本文档的地址，文件默认

加入新的 deploy 配置项

在站点配置文件中找到 deploy 项，添加新的推送项，如下所示：

deploy:
- type: git
  repo: XXX
  branch: master
- type: baidu_url_submitter

验证配置是否成功

重新部署网站，查看 git-bash 中的信息，类似于下面的信息。

INFO  Deploying: baidu_url_submitter
INFO  Submitting urls
http://www.waylon.one/blog/uncategorized/dynadot-buy-domain/
http://www.waylon.one/blog/uncategorized/hello-world/
{"remain":2998,"success":2}
INFO  Deploy done: baidu_url_submitter

对比接口的返回信息，推送成功。

1	{"remain":2998,"success":2}

References

从域名注册商 Dynadot 购买域名及配置

Posted on 2020-05-30 Edited on 2023-05-19 In website Word count in article: 903 Reading time ≈ 3 mins.

[toc]

域名是进入一个网站的唯一途径，也是网站的唯一 ID。一旦我们决定建立一个网站，那么域名是不可或缺的。

域名注册商是什么

域名注册商，就是提供域名注册服务的商家，主要包括两大类，一类是自己取得了 ICANN 注册资质的商家，这类商家一般都是很大牌、靠谱的商家，因为取得 ICANN 认证的门槛比较高；还有一类是经销商，也就是 Reseller，他们自己没有 ICANN 资质，但是通过代理销售别的有资质的商家，也提供域名注册服务。

本文介绍的域名注册商 -- Dynadot，是成立于 2002 年的老牌的域名注册商，总部位于加利福尼亚州圣马特奥，ICANN 授权的域名注册商和网站托管公司。

Dynadot 目前已经有中文网站，支持支付宝付款，并且有在线客服可以提供中文支持。

为什么选择 Dynadot ?

Dynadot 是获得 ICANN 授权的老牌域名注册商，是一家无负债的私营公司，确保了公司的稳定运营。
相比 GoDaddy 付费的隐私保护，Dynadot 提供免费的隐私保护.
- 当注册一个域名时，ICANN 会要求把注册人的个人信息 (邮件地址、电话等) 公开到 Whois 数据库中，那么任何人都可以通过 Whois 来查询你的个人联系信息。
  
  When you register a domain, ICANN, the non-profit regulatory body that is responsible for managing the domain name system (DNS), requires that your personal contact information including your email address, phone number, and mailing address be placed into a Whois database that is publicly available online. This gives anyone - including spammers and scammers - the ability to easily find your personal contact information simply by doing a Whois lookup on your domain.
Dynadot 有大量的特殊后缀域名可以注册，比如 .io, .one 等，域名价格通常比 GoDaddy 便宜。
- 域名注册费用顶级域价格
- 以 .one 为例，Dynadot 的注册、续费、转移费用均为￥58；而 GoDaddy 的注册、续费价格为￥94.07.

限时活动
- 通过链接 http://www.dynadot.com/?st6XZ6K6t8G7X7q 注册账号，并在 48 小时内创建账户并消费 ¥65，我们都将获得价值 ¥32 的 DynaDollar，可以用于后续的域名续费。
- 具体的活动详情推荐给朋友。

综合比价后，我从 Dynadot 注册了 waylon.one 的域名，并且续费到 10 年。

注册

点击注册链接 http://www.dynadot.com 或 http://www.dynadot.com/?st6XZ6K6t8G7X7q(活动期间返利￥32)。
网站支持中文，可以在右上角选择 [中文]，界面可以切换为中文。
在注册过程中，注意验证码大小写敏感。

购买

选择好域名后，在右上角货币的地方，选择人民币支付，就可以使用支付宝或者微信支付。

域名 DNS 解析

IP 地址是网络上标识站点的数字地址，为了方便记忆，采用域名来代替 IP 地址标识站点地址。域名解析就是域名到 IP 地址的转换过程，域名的解析工作由 DNS 服务器完成。Dynadot 现在提供免费的域名解析服务。

登录账户后，在左上角依次点击我的域名 | 管理域名 | DNS 设置，选择自定义 DNS。

我已经购买了 VPS，打算在上面部署网站，所以域名解析到对应的 IP 上。注意，DNS 更改后需要等待一段时间才起效。
- 怎样为我的域名设置 DNS？

References

Autonomous Helicopter Flight via RL 笔记

Posted on 2020-04-25 Edited on 2023-05-19 In paper Word count in article: 3.7k Reading time ≈ 13 mins.

[toc]

Autonomous Helicopter Flight via Reinforcement Learning

Paper download link

多平台维护不易，内容实时更新于个人网站，请移步阅读最新内容。

Abstract

直升机飞行被普遍认为是一个具有挑战性的控制问题。作者在本文中描述了强化学习在自动直升机飞行控制器设计上的成功应用：本文首先拟合了一个随机、非线性的直升机动态模型，然后使用该模型去学习如何定点悬浮 (hover in place)，以及如何飞出 RC 直升机竞赛的复杂行为 (maneuvers)。 > In this paper, we describe the successful application of reinforcement learning to designing a controller for autonomous helicopter flight.

CM (Comment): 博客着重分析直升机模型建立，以及基于模型的控制策略学习。

INTRODUCTION

直升机悬停的难度，旋翼顺时针转动，直升机底盘会逆时针旋转；引入尾部旋翼来克服底盘的转动，但同时带来了漂移 (drift) 的问题。所以，直升机悬停时，一般是机体向右倾斜。 > So, for a helicopter to hover in place, it must actually be tilted slightly to the right, so that the main rotor’s thrust is directed downwards and slightly to the left, to counteract this tendency to drift sideways.
对于没有较多直升机控制知识的同学，建议先观看一个讲解直升机飞行原理和控制策略的科普视频，复习一下牛顿三大定律在直升机控制上的应用，否则后面的控制学习不容易理解。

看完视频后，相比可以看到直升机设计精巧 (ingenious solutions)，然而不直观的直升机动力学 (nonintuitive dynamics) 使得直升机的控制具有难度。

Autonomous Helicopter

硬件设备

Yamaha R-50 helicopter (长 3.6m，载重 20kg)
- an Inertial Navigation System (INS)
- a differential GPS system (a resolution of 2cm)
- a digital compass
- an onboard navigation computer

最后，基于卡尔曼滤波 (the position estimates given by the Kalman filte)，上述硬件以 50Hz 对外输出直升机的状态估计 (数字量)：位置 (position)，方向 (orientation)，速度 (velocity) 和角速度 (angular velocities)。

基于上述视频容易理解，大多数直升机是通过 4 维的运动空间 (4-dimensional action space) 来实现飞行控制的：桨旋转平台的前后左右的倾斜 (tilting this plane either forwards/backwards or sideways)，主螺旋桨角度的变化，机翼螺旋桨角度的变化。

Cyclic pitch: The longtitudinal (front-back) and latitudinal (left-right) cyclic pitch controls. By tilting this plane (the helicopter’s rotors rotate) either forwards/backwards or sideways, these controls cause the helicopter to accelerate forward/backwards or sideways.

Collective pitch: The (main rotor) collective pitch control. By varying the tilt angle ofthe rotor blades, the collective pitch control affects the main rotor’s thrust.

Yaw motion: The tail rotor collective pitch control. Using a mechanism similar to the main rotor collective pitch control, this controls the tail rotor’s thrust.

综上，本文的任务就是基于卡尔曼滤波的位置估计，周期 (50Hz) 挑选好的控制行为 (pick good control actions)。 > Using the position estimates given by the Kalman filter, our task is to pick good control actions every 50th of a second.

Model identification

Fit the helicopter’s dynamics model

数据输入：记录人类驾驶员操作直升机的数据 (12-dimensional helicopter state & 4-dimensional helicopter control inputs)，使用这些飞行数据进行模型拟合 (model fittin)。 > We began by asking a human pilot to fly the helicopter for several minutes, and recorded the 12-dimensional helicopter state and 4-dimensional helicopter control inputs as it was flown.
使用机体坐标系 (helicopter body coordinates)，可以减少参数辨识量。 > in which the x, y, and z axes are forwards, sideways, and down relative to the current position of the helicopter.

Our model is identified in the body coordinates, which has four fewer variables than the spatial (world) coordinates.

Locally weighted linear regression (局部加权线性回归)

局部加权线性回归知识
s(t+1) = f(s(t), a(t), noise)
> By applying locally-weighted regression with the state s(t) and action a(t) as inputs, and the one-step differences of each of the state variables in turn as the target output, this gives us a non-linear, stochastic, model ofthe dynamics, allowing us to predict s(t+1) as a function of s(t) and a(t) plus noise.
此外，和直升机的先验知识结合，提出优化模型的一些策略方法，减少拟合的参数量。 > Similar to the use of body coordinates to exploit symmetries, there is other prior knowledge that can be incorporated.

Similar reasoning allows us to conclude that certain other parameters should be 0， 1/50(50Hz) or g(gravity), and these were also hard-coded into the model.
加入三个不可观测的参数 (unobserved variables) 来描述模型在控制上的延迟 (model latencies)。

Finally, we added three extra (unobserved) variables to model latencies in the responses to the controls.
通过图片 (plots) 将长时间段的均方差显示出来的交互方式，更容易看出差异和比较来选择模型，观察模型在长时间段内的拟合准确程度。 > Our main tool for choosing among the models was plots.

查看在一段时间内 (at longer time scales) 的预测位置 (estimated position) 和真实位置 (true position) 的均方差 (mean-squared error)。

For a model, the mean-squared error (as measured on test data) between the helicopter’s true position and the estimated position at a certain time in the future.
建立好模型的意义，增强后期在真实环境下的可靠性。 > We wanted to verify the fitted model carefully, so as to be reasonably confident that a controller tested successfully in simulation will also be safe in real life.

对没有建模的噪声输入，担心噪声超出模型的预测。为次，通过图片检查 (plots) 直升机的预测输出是否超出误差阈值。

One of our concerns was the possibility that unmodeled correlations in noise might mean the noise variance of the actual dynamics is much larger than predicted by the model.

Reinforcement learning: The PEGASUS algorithm

对 PEGASUS 的描述

这些使用程序很难精确计算，但是可以借助于计算机来仿真 MDP 的动力学。

Learning to Hover

以其他控制器尝试为例 (μ-synthesis)，来说明问题的难度和微妙

These comments should not be taken as conclusive of the viability of any of these methods; rather, we take them to be indicative of the difficulty and subtlety involved in learning a helicopter controller.

从直升机悬停开始学习策略。 > We began by learning a policy for hovering in place.
- 直升机悬停控制的神经网络设计
  
  The picture inside the circles indicate whether a node outputs the sum of their inputs, or the tanh of the sum of their inputs. Each edge with an arrow in the picture denotes a tunable parameter. The solid lines show the hovering policy class. The dashed lines show the extra weights added for trajectory following.
  - 很明显，需要具备直升机控制的理论和实践知识。
  Each of the edges in the figure represents a weight, and the connections were chosen via simple reasoning about which control channel should be used to control which state variables.
结合具体的问题和目标，设计奖励函数和惩罚函数，这是需要除计算机知识和强化学习知识外的专业知识和实践经验。
- 二阶奖励函数 (靠近目标点 + 速度变化慢) > This encourages the helicopter to hover near target position, while also keeping the velocity small and not making abrupt movements.
- 二阶惩罚函数 (鼓励平滑控制和小行动) > To encourage small actions and smooth control of the helicopter, we also used a quadratic penalty for actions.
基于辨识的模型 (model identified)，使用 PEGASUS 算法来获得实用策略 (the utilities of policies) 的近似值 \(\[\hat{U}(\pi)\]\)。 > Using the model identified, we can now apply PEGASUS to define approximations \(\[\hat{U}(\pi)\]\) to the utilities of policies.
策略和动态特性的可连续和平滑性。

Since policies are smoothly parameterized in the weights, and the dynamics are themselves continuous in the actions, the estimates of utilities are also continuous in the weights.

求取策略的权重时，应用爬山算法来最大化 \(\[\hat{U}(\pi)\]\)，其中使用梯度上升 (gradient ascent algorithm) 和随机游走 (random-walk algorithm) 两种算法，来最大化策略 \(\[\pi\]\)。 > We may thus apply standard hillclimbing algorithms to maximize \(\[\hat{U}(\pi)\]\) in terms of the policy’s weights.
在策略学习中，最消耗资源的是重复加速蒙特卡罗评价 (Monte Carlo evaluation) 来获得策略，文章提出使用并行计算 (parallelized our implementation) 来加速蒙特卡罗评价的训练。
- 在不同的计算机使用不同的样本来计算；
- 然后聚合训练结果，获得策略 \(\[\hat{U}(\pi)\]\)。
The most expensive step in policy search was the repeated Monte Carlo evaluation to obtain \(\[\hat{U}(\pi)\]\). To speed this up, we parallelized our implementation, and Monte Carlo evaluations using different samples were run on different computers, and the results were then aggregated to obtain \(\[\hat{U}(\pi)\]\).
如何评价控制策略的表现：文章比较基于学习的策略和人类驾驶员的性能表现，来证明算法的稳定性。 > We also compare the performance of our learned policy against that of our human pilot trained and licensed by Yamaha to fly the R-50 helicopter.

Flying competition maneuvers

作者参加了 AMA (the Academy of Model Aeronautics) 组织的直升机飞行挑战 (an annual RC helicopter competition)，挑战极具难度的 3 个行为，比如如下图所示的 pirouette (turning in place)。

We took the first three maneuvers from the most challenging, Class III, segment of their competition.

文章目前只是学习了如何悬停 (hover)，那么如何能够沿着轨迹飞行呢？作者给出的一种做法是让直升机沿着特定的轨迹点，慢慢变化状态，控制策略和悬停基本一致。

How does one design a controller for flying trajectories? Given a controller for keeping a system’s state at a point, one standard way to make the system move through a particular trajectory is to slowly vary along a sequence of set points on that trajectory.
另一种做法，重新训练参数来准确跟踪轨迹 > Retrain the policy’s parameters for accurate trajectory following.
- trajectory following vs trajectory tracking
  
  Path/Trajectory following is all about following a predefined path which does not involve time as a constraint. Thus, if you are on the path and following it with whatever speed you have reached your goal. On the contrary, trajectory tracking involves time as a constraint. Meaning that you have to be at a certain point at a certain time.
- 基于直升机的理论机制，优化网络模型。
  
  Since we are now flying trajectories and not only hovering, we also augmented the policy class to take into account more of the coupling between the helicopter’s different sub-dynamics. / 比如旋转时，尾翼控制会带来漂移；上升下降控制时，带来旋转的影响，这一部分在最开始的视频中有讲到。
- 设计 (specify) 轨迹跟踪 (trajectory following) 时的奖励函数。
  - 贴近轨迹 (penalize deviation)：惩罚位置的偏差，计算直升机当前位置和其投影到理想轨迹的位置偏差。
    
    “tracked” position: the “projection” of the helicopter’s position onto the path of the idealized, desired trajectory.
    
    the learning algorithm pays a penalty that is quadratic between the actual position and the “tracked” position on the idealized trajectory.
  - 前进奖励 (making progress)：使用沿着轨迹增加的势函数 (potential function)，如果直升机前进，那么可以获得正奖励 (positive reward)。
    
    Since, we are already tracking where along the desired trajectory the helicopter is, we chose a potential function that increases along the trajectory. Thus, whenever the helicopter’s makes forward progress along this trajectory, it receives positive reward.
  - 定义解耦 (decouple our definition) > Finally, our modifications have decoupled our definition of the reward function from (x, y, z, w) and the evolution of (x, y, z, w) in time.
    - (x, y, z, w) represents a desired hovering position and orientation.
    We considered several alternatives, but the main one used ended up being a modification for flying trajectories that have both a vertical and a horizontal component (such as along the two upper edges of the triangle in III.1).
  - 模型控制需要根据实际测试，人为干预，不断调整，而不是一劳永逸，比如在 z 方向上人为加入时延，来避免 “bowed-out” 或者 “bowed-in” 轨迹。 > the z (vertical)-response of the helicopter is very fast. In contrast, the x and y responses are much slower.

References

转载 - 小狗钱钱读后感 - 不只是理财

Posted on 2020-03-17 Edited on 2023-05-19 In Reading Word count in article: 2k Reading time ≈ 7 mins.

挺喜欢这篇文章的几个非理财观点，文章转载于实用的建议。

通过阅读《小狗钱钱》这本书，我们除了可以收获很多金钱、理财相关的观念和知识外，还可以获得很多实用的建议，我认为将这些实践应用于日常生活中，可以帮助我们进步、达成自己的目标。

本书关于金钱、理财的观点和建议，大家可以看我的另一篇读后感《小狗钱钱》读后感 - 关于金钱。

实用的建议

你只有两个选择：做与不做，没有试试看

本书关于做一件事情的理念：你只有两个选择：做与不做，没有试试看。

当小狗钱钱给吉娅提出梦想储钱罐和梦想相册时，吉娅说自己会试试看，小狗钱钱对吉娅说：

你只有两个选择：做或不做，不能是我试一下”

如果你只是抱着试试看的心态，那么你只会以失败告终，你会一事无成。尝试纯粹是一种借口，你还没有做，就已经给自己想好了退路。

反思一下我们自己，在做一件事情时，我们是不是要么犹豫半天后选择不做，要么是不是会说 “我尝试一下吧”，其实那时候内心已经允许自己完不成了，甚至已经接受了失败这个结局。为了避免这样的事情发生，我们可以像吉娅一样，虽然没有监督自己的狗狗，但是可以自己监督自己，一旦说 “尝试一下” 就在本子上记录一次，最后争取只留下 “做” 与 “不做” 两个选项。

多想实现目标后的美好，少想事情做不好的原因

钱钱给吉娅提出她看待问题的一个错误思维：首先考虑的总是事情做不成的原因，而不是去想象达成目标后的美好。

你必须设想你已经拥有了这些东西，这样捏的一个小愿望才会变成一种强烈的愿望，你想象的越多，你的愿望就越强烈，那么你就会开始寻找机会来实现自己的梦想。

基于此，钱钱给她提出的可实践性建议是建立梦想相册，让她把梦想 “去美国交流”、“笔记本电脑” 相关的图片保存到相册中，时常拿出来看，加强自己的渴望。

关于梦想储钱罐和梦想相册，我自己也进行了实践了一下，我找出本子罗列自己十个需要金钱来实现的愿望，挑选出了三个，并设定了各自的资金来源，刚好咕咕机到手了，所以打算把与愿望相关的图片都打印出来，而且还要把梦想写的更详细一点，等我达成某个梦想时，再来与大家分享。

自信决定了是否可以成功

本书还强调了自信的重要性，而且认为过于自信比不够自信要好得多。

你的自信程度决定了你是否相信自己的能力，是否相信你自己，如果你根本不相信你能做到的话，那么你就根本不会动手去做，而假如你不开始去做，那么你就什么也得不到。

这段话我深感理解，我是一个自卑的人，任何时候都不自信，而且由于没有信心做成功，自然也不会付出相应的努力，所以很多事情不出所料地都失败了，而这些事情的失败像是地上的雪花，自卑这个雪球经过一片片雪花逐渐变的越来越大，之后又更容易沾上雪花。自卑、不为之努力、失败形成了一个完美的循环，自己无法逃脱。

关于如何建立自信，钱钱给吉娅的建议是：设立自己的 “成功日记”，每天记录至少 5 件个人成果，任何小事都可以，而且不用纠结有些事情是否属于成果，可以给予自己肯定的答复。这个建议之前也有在别的地方听到过，但是都没有付诸实践，这次我计划开始写 “成功日记”，到目前为止已经坚持了 17 天，未有过间断，我发现自己喜欢上了这项活动。虽然里面记录的大部分搜是很小的事情，现在我每天晚上睡觉前拿出我的小本本回想今天的个人成果，感觉很开心，这对自己至少是一种积极的暗示，而且我发现开始更愿意完成自己每天的任务计划，不知道是不是有它的功劳呢。

把精力集中在你知道的、能做的和拥有的东西上

当吉娅决定要开始赚钱，但是不知道如何下手时，钱钱给她讲了个故事，借故事提出两个建议：

为别人解决一个难题，那么你就能赚到很多钱；

把精力集中在你知道的、能做的和拥有的东西上。

如果你也有 “开源” 的打算，我感觉这两个建议非常好，考虑下别人需要什么和自己有什么。而第二个建议可能不仅限于赚钱，自己的事业或工作也一样，人的精力是有限的，专注于自己知道、能做和拥有的东西上，不要跟随潮流看大家干嘛自己也干嘛，最终可能什么都不精，找到自己会的，把某一方面做的很精通，可能整体带来的收益更大。

重要的事情要坚持做

由于小狗钱钱是吉娅捡到的，当吉娅找到狗主人时，由于害怕失去钱钱而十分害怕和烦恼，因此停止了成功日记的记录和梦想相册的整理。钱钱因此非常生气，它说：

困难总是在不断地出现。尽管如此，你每天还是要不间断地去做对你地未来意义重大地事情，这件事情可能花费时间不多，但就是这段时间会让一切变得不同。

这个有点观点比较像四现象法则中提出的对待 “重要不紧急” 事件的态度，这些事情可能不是十分紧急的，但是这些事情的重要性却很高，无论如何都应该坚持，那么有什么好的建议吗？钱钱给的建议是，应该在固定的时间里，有规律地做这些事情，这样可以防止由于其他紧急事情、或者心情等原因导致的中断。

72 小时规定

钱钱告诉吉娅在想做什么事情时，在 72 小时内去完成，不要拖太久时间，否则这件事情你可能永远不会再做了。

这个规定很适合于我们平常一些灵感和想法的实施，这些事情完全属于自主自愿，更多的是靠一时的冲动和兴趣，如果开始的太晚，或者持续时间太久，很容易中途放弃，最终就不了了之。

写在最后

关于实用的建议，我大概就梳理出了这么几个，希望大家可以找到适合自己的建议并进行实践，我实践一段时间后，也会和大家分享我的成长和收获。

简明版墨卡托投影坐标系 (原理到实现)

Posted on 2020-02-08 Edited on 2023-05-19 In USV Word count in article: 1.4k Reading time ≈ 5 mins.

[toc]

前言

FYI 本博客初稿完成于 2017 年，内容更新于个人网站 - 简明版墨卡托投影坐标系，请移步阅读最新内容。

本文讲述了墨卡托投影坐标系的基本原理和实现，但是因为地球非标准椭圆，经纬度和米坐标的转换复杂，本文提供的算法存在较大误差，仅适用于初步验证。

墨卡托投影坐标系

墨卡托投影 (Mercator Projection) 是一种 “等角正切圆柱投影”，荷兰地图学家墨卡托 (Mercator) 在 1569 年拟定：假设地球被围在一个中空的圆柱里，其赤道与圆柱相接触，然后再假想地球中心有一盏灯，把球面上的图形投影到圆柱体上，再把圆柱体展开，这就是一幅标准纬线为零度（即赤道）的 “墨卡托投影” 绘制出的世界地图。
墨卡托投影在今天对于航海事业起着极为重要的作用，目前世界各国绘制海洋地图时仍广泛使用墨卡托投影，国际水路局 (IHB) 规定：“除特殊情况外，各国都要用墨卡托投影绘制海图”。国际水路局发行的《大洋水深总图》是把全世界分成 24 幅编辑的，在南北纬 72 度之间就是使用墨卡托投影绘成的。

墨卡托投影性质

由于墨卡托投影的经纬线离开赤道逐渐以相同倍数伸长，所以又称为渐长投影，由于它是具有等角性质的圆筒投影，所以也叫做等角圆筒投影。注意：这种投影不适合高纬地区，通常纬度 60 度以上区域，不用此投影。

墨卡托投影有一个特别的特性：所有罗盘等角线，或称斜航线（就是与所经过的所有经线形成相同角度的航线，也称恒向航线）在墨卡托投影下都是直线。这使得在航海领域这个投影非常重要。注意：经纬线的伸长与纬线的正割成比例变化，随纬度增高极具拉伸，到极点成为无穷大；面积的扩大更为明显，在 60 度的地方面积要扩大四倍。如下图所示，地理上等半径圆在高纬度面积明显扩大。

墨卡托投影是按等角条件修改透视圆筒投影而得到的投影，等角（也称为保形）是指当地图上任何一点的各方向具有相同的比例，称为局部保形，透视圆筒投影如图 1 所示。从墨卡托投影图上可以看出，经线间隔的经度如果相等，则经线是等距平行的直线，纬线也是平行的直线，而且经纬线是相互垂直的。墨卡托投影对透视圆筒投影改造点：要使圆筒投影称为等角的性质，必须使由赤道向两极经线逐渐伸长的倍数与经线上各点相应的纬度扩大的倍数相同。

透视圆筒投影

墨卡托投影方程式

墨卡托投影以整个世界范围，赤道作为标准纬线，本初子午线作为中央经线，两者交点为坐标原点，向东向北为正，向西向南为负。南北极在地图的正下、上方，而东西方向处于地图的正右、左。由于墨卡托投影在两极附近是趋于无限值，因此它并没完整展现了整个世界，地图上最高纬度是 85.05 度（通过纬度取值范围 ys 反解计算可得到纬度值为 85.05112877980659）。为了简化计算，我们采用球形映射，而不是椭球体形状。

公式推导具体见文献墨卡托投影与大圆投影的构成及其在定航线计算航程与航向方面的应用_程光举。

利用等角条件 m=n 来讨论具体公式，具体分为三步： 1. 根据 m=n 得到地球表面投影到平面上的微积线段的关系式。
2. 把地球视为球体：设地球表面 A 点经纬坐标为（λ，Φ），对应的投影坐标为（x,y）, 基准纬线设置为赤道，则 R 为地球半径；

墨卡托投影方程式为：

把地球视为旋转椭球体

墨卡托投影正反解公式：

公式推导具体见文献墨卡托投影与大圆投影的构成及其在定航线计算航程与航向方面的应用_程光举。

程序实现

// Lite版，仅适用于初步验证
//把地球视为球体实现经纬度和墨卡托投影的函数
typedef struct Point
{
	double x;
	double y;
}WayPoint;

//经纬度转墨卡托
WayPoint lonLat2Mercator(WayPoint lonLat)
{
	WayPoint mercator;
	double x = lonLat.x * 20037508.34 / 180;
	double y = log(tan((90 + lonLat.y) * Pi / 360))/(Pi / 180);  
	y = y * 20037508.34 / 180;
	mercator.x = x;
	mercator.y = y;
	return mercator;
}

//墨卡托转经纬度
WayPoint Mercator2lonLat(WayPoint mercator)
{
	WayPoint lonLat;
	double x = mercator.x / 20037508.34 * 180;
	double y = mercator.y / 20037508.34 * 180;
	y = 180 / Pi * (2 * atan(exp(y * Pi / 180)) - Pi / 2);
	lonLat.x = x;
	lonLat.y = y;
	return lonLat;
}

Reference

程光举. 墨卡托投影与大圆投影的构成及其在确定航线、计算航程与航向方面的应用 [J]. 辽宁师院学报 (自然科学版),1980 (01):13-28.
墨卡托投影与经纬度转换源代码及原理文献

基于 interactive mode 实现 matplotlib 动态更新图片 (交互式绘图)

Posted on 2020-02-08 Edited on 2023-05-19 In Python Word count in article: 816 Reading time ≈ 3 mins.

[toc]

最近在研究动态障碍物避障算法，在 Python 语言进行算法仿真时需要实时显示障碍物和运动物的当前位置和轨迹，利用 Anaconda 的 Python 打包集合，在 Spyder 中使用 Python3.5 语言和 matplotlib 实现路径的动态显示和交互式绘图 (和 Matlab 功能类似)。

本博客初稿完成于 2017 年，多平台维护不易，内容更新于个人网站，请移步阅读最新内容。

背景知识

Anaconda

Anaconda 是一个用于科学计算的 Python 发行版，支持 Linux, Mac, Windows 系统，提供了包管理与环境管理的功能，可以很方便地解决多版本 python 并存、切换以及各种第三方包安装问题。

Anaconda 利用工具 / 命令 conda 来进行 package 和 environment 的管理，并且已经包含了 Python 和相关的配套工具。

matplotlib

matplotlib 是 python 最著名的绘图库，它提供了一整套和 matlab 相似的命令 API，十分适合交互式地进行制图。而且也可以方便地将它作为绘图控件，嵌入 GUI 应用程序中。其中，matplotlib 的 pyplot 子库提供了和 matlab 类似的绘图 API，方便用户快速绘制 2D 图表，它的文档相当完备，并且 Gallery 页面中有上百幅缩略图，打开之后都有源程序。

matplotlib 交互式绘图

原理

在调研 matplotlib 动态绘制曲线方法中，和 matlab 相似有 animation 方法和交互式绘图，但是 animation 方法灵活性不高，不太适合路径的实时动态显示，本文最后采用交互式绘图模（interactive mode）-- Using matplotlib in a python shell。

The interactive property of the pyplot interface controls whether a figure canvas is drawn on every pyplot command. If interactive is False, then the figure state is updated on every plot command, but will only be drawn on explicit calls to draw(). When interactive is True, then every pyplot command triggers a draw.

实践

当绘图语句中加入 pl.ion () 时，表示打开了交互模式。此时 python 解释器解释完所有命令后，给你出张图，但不会结束会话，而是等着你跟他交流交流。如果你继续往代码中加入语句，run 之后，你会实时看到图形的改变。当绘图语句中加入 pl.ioff () 时或不添加 pl.ion () 时，表示打关了交互模式。此时要在代码末尾加入 pl.show () 才能显示图片。python 解释器解释完所有命令后，给你出张图，同时结束会话。如果你继续往代码中加入语句，再不会起作用，除非你关闭当前图片，重新 run。

Example

# -*- coding: utf-8 -*-
"""
Created on Sat Mar 25 23:28:29 2017
@author: wyl
original link: https://www.yanlongwang.net/Python/python-interactive-mode/
"""

import matplotlib.pyplot as plt
from matplotlib.patches import Circle
import numpy as np
import math
    
plt.close()  #clf() # 清图  cla() # 清坐标轴 close() # 关窗口
fig=plt.figure()
ax=fig.add_subplot(1,1,1)
ax.axis("equal") #设置图像显示的时候XY轴比例
plt.grid(True) #添加网格
plt.ion()  #interactive mode on
IniObsX=0000
IniObsY=4000
IniObsAngle=135
IniObsSpeed=10*math.sqrt(2)   #米/秒
print('开始仿真')
try:
    for t in range(180):
        #障碍物船只轨迹
        obsX=IniObsX+IniObsSpeed*math.sin(IniObsAngle/180*math.pi)*t
        obsY=IniObsY+IniObsSpeed*math.cos(IniObsAngle/180*math.pi)*t
        ax.scatter(obsX,obsY,c='b',marker='.')  #散点图
        #ax.lines.pop(1)  删除轨迹
        #下面的图,两船的距离
        plt.pause(0.001)
except Exception as err:
    print(err)

How to bring in google AdSense to Hexo website / Hexo 网站承接 google AdSense 广告

Posted on 2020-02-08 Edited on 2023-05-19 In website Word count in article: 1.9k Reading time ≈ 7 mins.

[toc]

FYI 本博客实时更新于个人网站 - Hexo 网站引入 google AdSense，请移步阅读最新内容。

前言

个人网站 www.yanlongwang.net 已经运营近一年，每日的浏览量不断上升，现在维持在两位数，打算承接一点广告赚睡后收入，用来维持网站的日常运营，希望能覆盖网站的服务器和域名开销。

目前，自己通过 Google AdSense 在个人网站的广告位承接广告，下面主要讲解从注册 Google AdSense，到审核通过后设置广告的基本操作。

点击 www.yanlongwang.net 测试体验，如下图。在侧边栏下部应该可以看到 AdSense 推送的广告，如果看不到，查看浏览器是否安装了 AdBlock 等广告屏蔽插件，请暂时关闭这些插件。

// 系统
- Next Theme
// 在 hexo 目录下, 输入 “hexo version” 可以查看 Next 及依赖的版本
- hexo: 3.8.0
- hexo-cli: 1.1.0
- ...

广告联盟选择

在国内，比较知名的广告联盟都要求网站备案，比如百度广告联盟，否则无法承接广告，而自己的小网站暂时没有备案，所以只能选择国外的广告联盟。

国外的广告联盟，毫无疑问会选择 Google AdSense，申请门槛相对较低。网站一般运营几个月，有十几篇文章，是比较容易通过审核的。并且，AdSense 的广告相比百度联盟的还是比较优质的，所以最后选择 google AdSense 联盟。

注册网站搬梯子登录
点击右上角 Get started 进行注册；如有 google 账号，可以直接点击 sign in 进行登录。

填写完信息之后，页面会生成一段代码，要求你放置到你的网站的标记中。注意，当审核通过后，这一段代码在开启 “自动广告” 后，会自动推送广告。

针对 Next Theme，可以复制代码到 themes\_partials.swig 中任意一个 script 块下。

<!-- layout example in head.swig -->
{% if theme.google_site_verification %}
  <meta name="google-site-verification" content="{{ theme.google_site_verification }}" />
{% endif %}

<script data-ad-client="ca-pub-612**56" async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>

{% if theme.bing_site_verification %}
  <meta name="msvalidate.01" content="{{ theme.bing_site_verification }}" />
{% endif %}

在你添加完成之后，在 google AdSense 上点击确认，谷歌会到你的网站上进行核查和验收。
博主是 2020.02.05 上午申请，2020.02.06 上午收到申请通过的邮件。

配置广告位

在收到审核通过的邮件后，可以登录 google AdSense 在自己的网站上进行广告位置的筛选和设计了，目前 google AdSense 主要提供了自动广告和广告单元两种形式的广告添加方式。

自动广告

自动广告是 google AdSense 近来提供的一种广告形式，它能够通过分析你的博客布局结构，自定义的在你的网站中插入合适的广告，无论是内容，还是广告尺寸，都是完全契合网站内容本身的，算是一种比较高质量的广告。

打开 AdSense 首页，然后转到广告。您可以在概览中为各个网站设置自动广告。

如果您的网站已经启用自动广告，在审核通过的几个小时内，您便会在网站上看到相关的广告，并开始累积收入 (之前插入的检验代码，接入了 google AdSense 的自动广告)。

这种广告投放的几率比较小，在 PC 端效率比较低，如果你的网站支持移动端查看的话，会自动投放移动端自适应的广告。

广告单元

为了能够最高效的利用自己博客的广告位，AdSense 提供了三种固定广告位。

文字广告和展示广告 (即侧边栏，评论区之类的固定广告位)
信息流广告 (插入在信息流内容的广告位置)
文章内嵌广告 (主要是插入在每篇文章内部的开始，中间，结尾部分，展示次数比较多，强烈推荐)

由于本人的是博客网站，所以第二种信息流广告没有投入使用，第三种影响阅读体验，所以博主现在主要采用第一种方式，在侧边栏展示广告。

具体的操作流程是，在网站上，选择广告单元 -> 新建广告位 -> 选择对应的广告类型 -> 生成对应的广告代码。

广告单元插入

博主启用了自动广告，广告单元仅使用了第一种方式，在网站的侧边栏和评论区展示广告，但下面会提供几种针对 Hexo 的 Next Theme 广告单元代码位置的插入。

插入评论区

将广告单元生成的广告代码插入 _partials.swig 中的末尾即可。

  {% elseif theme.valine.appid and theme.valine.appkey %}
    <div class="comments" id="comments">
    </div>
  {% endif %}

<!-- Insert google ad blocks -->
<script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>
<!-- Comment_Below -->
<ins class="adsbygoogle"
     style="display:block"
     data-ad-client="ca-pub-61***56"
     data-ad-slot="83***73"
     data-ad-format="auto"
     data-full-width-responsive="true"></ins>
<script>
     (adsbygoogle = window.adsbygoogle || []).push({});
</script>

{% endif %}

插入侧边栏

将代码插入\_macro.swig 文件中

的最下侧。

      {% if theme.sidebar.b2t %}
        <div class="back-to-top">
          <i class="fa fa-arrow-up"></i>
          {% if theme.sidebar.scrollpercent %}
            <span id="scrollpercent"><span>0</span>%</span>
          {% endif %}
        </div>
      {% endif %}

<!-- Insert google ad blocks -->
<script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>
<!-- Comment_Below -->
<ins class="adsbygoogle"
     style="display:block"
     data-ad-client="ca-pub-6129496365361356"
     data-ad-slot="8323610673"
     data-ad-format="auto"
     data-full-width-responsive="true"></ins>
<script>
     (adsbygoogle = window.adsbygoogle || []).push({});
</script>

    </div>
  </aside>
{% endmacro %}

插入文章头部

在 \_custom 目录下，新建 google_adsense.swig，并将 AdSense 提供的广告代码放入其中，然后将

1	{% include '../_custom/google_adsense.swig' %}

插入到 \_macro.swig 文件中。

        {% else %}
          {% if post.type === 'picture' %}
            <a href="{{ url_for(post.path) }}">{{ post.content }}</a>
          {% else %}
            {{ post.content }}
          {% endif %}
        {% endif %}
      {% else %}
<!-- Insert google ad blocks -->
        {% include '../_custom/google_adsense.swig' %}

        {{ post.content }}
      {% endif  %}
    </div>
    {#####################}
    {### END POST BODY ###}
    {#####################}

我不建议在一个网页中布置多个广告，广告内容雷同，会严重影响用户阅读体验。在页面的侧边布置一个广告，难道不香吗？

注意事项

在成功接入 AdSense 后，google 会根据几种方式和数据判断广告点击是否作弊，从而注销你的账号，所以不要心存侥幸心理，好好发原创文章，提高网站的质量才是王道。

作弊广告点击者的 IP 地址与你 Adsense 账户登录 IP 地址相同
作弊广告点击的 CTR 数据太高
作弊广告点击者的 IP 地址来自同一个地理区域
根据 Cookies 判断作弊 Adsense 广告点击
作弊广告点击者页面停留时间太短
直接访问者的广告点击率过高
流量小但广告点击率高
在网页上用文字提示请求鼓动点击广告

Last but not least

现在通过 AdSense 能赚到的收入本身并不高，尤其是博客类的网站更是如此，但在有空余的地方放置一点广告还是能起到丰富博客的作用。

既然大家愿意花时间和精力去搭建个人博客，除了钱以外，肯定还有其他的目的，希望大家不忘初心，更不要舍本逐末，忘记了自己搭建博客的初衷，毕竟广告收入这个事情，有当然好，没有也不用气馁，尽力就好～

References

坚持原创技术分享，您的支持将鼓励我继续创作！

剑桥领思 (Linguaskill)

什么是 剑桥领思

报考 通用版 or 职场版？

考试形式

考试内容

阅读和听力

考试题型 -- 阅读

考试题型 -- 听力

写作

考试题型：

口语

考试题型

报名费用

全项报名

报名须知

认可度

认可院校

莫纳什大学

报名

剑桥领思考试 -- 全科

欧洲语言共同参考框架

References

Abstract

Introduction

本文贡献

Background

任务描述

问题抽象

Related Work

Timeline

Deep Reinforcement Learning

Preprocessing and Model Architecture

Experiments

Training and Stability

Visualizing the Value Function

Main Evaluation

Conclusion

References

背景知识

什么是 HTML？

HTML 标签

HTML 文档 = 网页

什么是 HTML5？

Ruby

例子

References

Google

注册 webmasters

添加网站资源

添加站点地图

配置站点地图

提交站点地图

Baidu

网站提交

资源提交

推送配置

References

域名注册商是什么

为什么选择 Dynadot ?

注册

购买

域名 DNS 解析

References

Abstract

INTRODUCTION

Autonomous Helicopter

硬件设备

Model identification

Fit the helicopter’s dynamics model

Locally weighted linear regression (局部加权线性回归)

Reinforcement learning: The PEGASUS algorithm

对 PEGASUS 的描述

Learning to Hover

Flying competition maneuvers

References

实用的建议

你只有两个选择：做与不做，没有试试看

多想实现目标后的美好，少想事情做不好的原因

自信决定了是否可以成功

把精力集中在你知道的、能做的和拥有的东西上

什么是剑桥领思

报考通用版 or 职场版？