GIFs are an average of 5-10 times larger than an efficiently encoded
MP4 video. This difference in size means that GIFs waste a large amount
of bandwidth and also load at a slower rate leading to a bad user
experience.
简评:之前的方法使用特征匹配 (feature-matching techniques)
的做法来描述和模仿行人的轨迹,但是人和人之间的特征是有差异的 (vary from
person to
person),所以生成的行人轨迹并不理想。这篇文献指出,尽管导航时指明机器人什么应该和人类交互做是比较困难的 (精确的行人导航机制),但是却可以简单地指明,什么是不应该做的 (违反社交规范)。尤其是,本文使用深度强化学习,提出一种时间高效
(time-efficient) 的遵守社会规范的导航策略。
This work notes that while it is challenging to directly specify the
details of what to do (precise mechanisms of human navigation), it is
straightforward to specify what not to do (violations of social norms).
Specifically, using deep reinforcement learning, this work develops a
time-efficient navigation policy that respects common social norms.
此外,本文是在作者基于深度强化学习的多智能体避障的研究基础上,在多智能体系统中引入具有社交意识的行为,本文的主要贡献是如何在 CADRL 中引入和融合社交行为。所以,可以简单认为,SA-CADRL
= SA (socially aware) + CADRL (collision avoidance with deep
reinforcement learning)。
This work extends the collision avoidance with deep reinforcement
learning framework (CADRL) to characterize and induce socially aware
behaviors in multiagent systems.
Since these methods do not capture human behaviors, they sometimes
generate unsafe/unnatural movements, particularly when the robot
operates near human walking speed.
More sophisticated motion models have been proposed, which would
reason about the nearby pedestrians’ hidden intents to generate a set of
predicted paths. Subsequently, classical path planning algorithms would
be employed to generate a collision-free path for the robot.
缺陷:导航问题分割为不关联的预测和路径规划,可能导致机器人冻结问题 (the
freezing robot
problem),机器人无法找到任何可行的行为,因为预测的轨迹让大部分的空间不可通行。
Separating the navigation problem into disjoint prediction and
planning steps can lead to the freezing robot problem, in which the
robot fails to find any feasible action because the predicted paths
could mark a large portion of the space untraversable.
My comment
(MC): 虽然作者认为这种做法不合理,但是这是在目前工业界比较流行的做法。导航问题分割为多个层次模块,上下游之间透明,容易迁移和调试。
A key to resolving this problem is to account for cooperation, that
is, to model/anticipate the impact of the robot’s motion on the nearby
pedestrians.
Model-based approaches are typically extensions of multiagent
collision avoidance algorithms, with additional parameters introduced to
account for social interactions.
Learning-based approaches aim to develop a policy that emulates human
behaviors by matching feature statistics. In particular, Inverse
Reinforcement Learning (IRL) has been applied to learn a cost function
from human demonstration (teleoperation), and a probability distribution
over the set of joint trajectories with nearby pedestrians.
简而言之,存在的方法试图建模或复制详细的社交行为机制 (mechanisms of
social
compliance),因为行人行为的随机性 (stochasticity),仍然很难去量化 (quantify)。
In short, existing works are mostly focused on modeling and
replicating the detailed mechanisms of social compliance, which remains
difficult to quantify due to the stochasticity in people’s
behaviors.
作者认为人类会遵循一系列简单的社交规范,比如从右侧通过 (passing on the
right)。所以在强化学习框架中描述 (characterize) 这些行为的特征,发现通过解决合作避障的问题可以生成 (emerge) 类人的导航惯例。
Building on a recent paper, we characterize these properties in a
reinforcement learning framework, and show that human-like navigation
conventions emerge from solving a cooperative collision avoidance
problem.
Symmetries in multiagent collision
avoidance
BACKGROUND
Collision
Avoidance with Deep Reinforcement Learning
首先,多智能体的避障,可以表述为在强化学习下的一系列行为决策 (a
sequential decision making) 问题。
A multiagent collision avoidance problem can be formulated as a
sequential decision making problem in a reinforcement learning
framework.
The unknown state-transition model takes into account the uncertainty
in the other agent’s motion due to its hidden intents.
随后,解决这个 RL 问题就是找到表达到达目标点的预估时间的最优值函数 (the
optimal value function),然后可以由值函数回溯得到最优策略 (optimal
policy)。
Solving the RL problem amounts to finding the optimal value function
that encodes an estimate of the expected time to goal.
然后,找到最优值函数的主要的挑战是,关联状态是连续的、高维的矢量,使得离散化和枚举状态空间不可行。
A major challenge in finding the optimal value function is that the
joint state sjn is a continuous, high-dimensional vector, making it
impractical to discretize and enumerate the state space.
最近,可以使用深度神经网络来解决这个强化学习的问题,去表示高维空间的值函数,并且具有人类水平的表现。
Recent advances in reinforcement learning address this issue by using
deep neural networks to represent value functions in high-dimensional
spaces, and have demonstrated human-level performance on various complex
tasks.
MC: 到目前为止,作者是在介绍自己已有的研究,the collision avoidance
with deep reinforcement learning framework
(CADRL),接下来会在这个基础上,引入多智能体间具有社交意识的行为。
Rather than trying to quantify human behaviors directly, this work
notes that the complex normative motion patterns can be a consequence of
simple local interactions.
因此,本文进一步猜想,相比于一系列精确定义的规则 (a set of precisely
defined procedural rules),社交规范是从相互避免碰撞的机制中新生的。
Thus, we conjecture that rather than a set of precisely defined
procedural rules, social norms are the emergent behaviors from a
time-efficient, reciprocal collision avoidance mechanism.
Reciprocity implicitly encodes a model of the other agents’ behavior,
which is the key for enabling cooperation without explicit
communication.
有点哲学感:
局部避免碰撞中的互惠原则,衍生出来了所谓的社交行为规范。作者进一步实验表明,无规则的 CADRL 也可以展示出一定的导航规范。(可以作为 research
hypothesis)
Reciprocity implicitly encodes a model of the other agents’ behavior,
which is the key for enabling cooperation without explicit
communication. While no behavioral rules were imposed in the problem
formulation, CADRL policy exhibits certain navigation conventions.
Existing works have reported that human navigation (or teleoperation
of a robot) tends to be cooperative and time- efficient. This work notes
that these two properties are encoded in the CADRL formulation through
using the min-time reward function and the reciprocity assumption.
However, the cooperative behaviors emerging from a CADRL solution are
not consistent with human interpretation. The next section will address
this issue and present a method to induce behaviors that respect human
social norms.
APPROACH
本章首先描述两个智能体如何在 RL 框架中塑造规范行为,然后将这一方法推广到多智能体场景。
We first describe a strategy for shaping normative behaviors for a
two-agent system in the RL framework, and then generalize the method to
multiagent scenarios.
This work notes that social norms are one of the many ways to resolve
a symmetrical collision avoidance scenario. To induce a particular norm,
a small bias can be introduced in the RL training process in favor of
one set of behaviors over others.
The advantage of this approach is that violations of a particular
social norm are usually easy to specify; and this specification need not
be precise. This is because the addition of a penalty breaks the
symmetry in the collision avoidance problem, thereby favoring behaviors
respecting the desired social norm.
最后,训练的结果表明学到了和人类行为类似的策略,比如 left-handed and
right-handed norms。
As long as training converges, the penalty sets’ size does not have a
major effect on the learned policy. This is expected because the desired
behaviors are not in the penalty set.
Since training was solely performed on a two-agent system, it was
difficult to encode/induce higher order behaviors, such as accounting
for the relations between nearby agents. This work addresses this
problem by developing a method that allows for training on multiagent
scenarios directly.
To capture the multiagent system’s symmetrical structure, a neural
network with weight-sharing and max-pooling layers is employed,
Network structure for multiagent
scenarios
在训练中,会先生成轨迹,然后将轨迹转化为经验集。
> The trajectories are then turned into state-value pairs and
assimilated into the experience sets.
CADRL 和 SA-CADRL 的训练区别 - Two experience sets are used to
distinguish between trajectories that reached the goals and those that
ended in a collision. - During the training process, trajectories
generated by SA-CADRL are reflected in the x-axis with probability. *
This procedure exploits symmetry in the problem to explore different
topologies more efficiently.
作者在网络训练时已经设置开关 (a binary flag indicating whether the
other agent is real or virtual (details),所以 n - 智能体的网络也可以用于 p
(p<=n) 个智能体的场景。
An n-agent network can be used to generate trajectories for scenarios
with fewer agents.
RESULTS
Computational
Details (online performance and offline training)
模型具有比较优秀的实时和收敛 (convergence and
time-efficient) 表现。
The size and connections in the multiagent network are tuned to
obtain good performance (ensure convergence and produce time-efficient
paths) while achieving real-time performance.
Simulation Results
三组对比试验:一组没有社交行为奖励函数,另外两组是偏向左和右的行为奖励函数。
Three copies of four-agent SA-CADRL policies were trained, one
without the norm inducing reward, one with the left-handed, and the
other with the right-handed.
Hardware Experiment
硬件设备
The differential-drive vehicle is outfitted with a Lidar for
localization, three Intel Realsenses for free space detection, and four
webcams for pedestrian detection.
A hardware demonstration video can be found at here.
CONCLUSION
Contribution
In a reinforcement learning framework, a pair of simulated agents
navigate around each other to learn a policy that respect human
navigation norms, such as passing on the right and overtaking on the
left in a right-handed system.
This approach is further generalized to multiagent (n > 2)
scenarios through the use of a symmetrical neural network
structure.
Moreover, SA-CADRL is implemented on robotic hardware, which enabled
fully autonomous navigation at human walking speed in a dynamic
environment with many pedestrians.
Future work
Consider the relationships between nearby pedestrians, such as a
group of people who walk together.
Sidewalks present a unique yet challenging environment in that the
navigable space combines elements of both roads and free indoor spaces.
Often sidewalk motion is restricted to two linear directions and the
resulting navigable space is limited, like on roads.
问题的复杂性在于,行人的活动比较随机,可能聚团 (group) 一起走。
However, pedestrians generally do not walk in perfect queues.
Instead, people tend to walk in groups of variable sizes and speeds and
move along with a general self-organizing crowd flow.
Compared to autonomous road navigation, sidewalk navigation must also
account for stochastic human movement that necessitates dynamic obstacle
avoidance. Furthermore, certain social rules, such as walking in lanes
or affording more space in the direction of walking than in the
perpendicular direction, are rules that a robot should follow as
well.
Here, the aforementioned approaches may be less effective as they do
not account for the physical sidewalk boundaries, or how robot movement
will affect pedestrian flow.
所以,本文的主要研究问题是如何考虑附近行人的行为和行人流,机器人最终抵达终点。
The key research question this paper considers is how mobile robots
can utilize nearby pedestrian behaviours and flows to navigate towards a
global goal.
When our navigation stack detects people moving towards the robot’s
goal, a ‘group surfing’ behaviour is used. This allows the robot to
imitate and participate in pedestrian social behaviours.
核心思路和目标:模仿人类的自然行为,包括沿路行走 (walking in
lanes),避障 (avoiding collisions with other pedestrians or
obstacles),路口等待 (waiting at intersections to cross),不走入交通中
(not walking into
traffic)。类似于仿生学,这次模仿的只不过是人类自己罢了。
Filter Candidate Groups
过滤那些远离航路点的行人组。 > Filter out groups moving away from
the waypoint.
重点解释下,\(\[v_{G_{i}}\cdot
x_{I}\]\)
的意义。在我看来,这个并没有物理意义,作者只不过想利用向量点乘的正负来判断行人是否在远离航路点。ps.
向量点乘的正负性取决于余弦角,只要两个向量的夹角小于 90 度 (向目标点靠近),就是正值。
> If this value is non-positive, discard Gi as a subgoal
candidate.
示意图
Smart Group Selection
方法核心思路和目标:从筛选后的行人组中,选择平均速度小于且最接近于机器人最大速度的作为最优跟踪组
(the optimal group to
follow)。选择该组中距离机器人最近行人的当前位置作为子目标点
(subgoals)。
Once we have filtered out unsuitable groups, the algorithm selects
the optimal group to follow.
那么,路径规划和避障的问题就是在机器人当前位置和最优跟踪组的最近行人之间,规划出一条无碰撞的路径。
We intentionally select the closest person as a subgoal as attempting
to reach the average group position could lead to path planning through
pedestrians located between the average group position and the robot’s
current position.
Curb Following
方法核心思路和目标:使用 3D laser sensor 采集点云,然后利用 Random
Sample Consensus (RANSAC) 算法去识别马路沿。
We make use of contextual knowledge; sidewalks are normally
surrounded by streets and buildings or empty space. Our robot first
acquires a surrounding point cloud using a 3D laser sensor and filters
out points that are at the same height as or above the plane defined by
the robot wheel contacts.
Collision Avoidance
Human-Aware Collision Avoidance
在 group surfing 和 curb following
中,使用已有的学习方法,Socially-Aware Collision Avoidance with Deep
Reinforcement Learning (SA-CADRL) 来作为避障算法。 > Socially-Aware
Collision Avoidance with Deep Reinforcement Learning (SA-CADRL), as the
collision avoidance component of our navigation stack. The collision
avoidance system navigates to a local subgoal generated by either the
group surfing or the curb following approach.
其中,引入社交奖励函数来鼓励社会行为。 The reinforcement training
process induces social awareness through social reward functions, which
give higher values to actions that follow social rules.
Static Obstacle Avoidance:
把静态障碍物作为静态的行人,仍然使用 SA-CADRL 算法来处理。 > We
also use SA-CADRL to avoid these static obstacles by adding “static
pedestrians” to the state vector.
SIMULATION
DEMONSTRATION AND EXPERIMENTS For
仿真环境的构建:使用 ROS 和 Gazebo 仿真套件。 > We use the Robot
Operating System (ROS) and Gazebo simulator suite. To simulate
pedestrians, we use the Pedsim ROS library, which relies on the social
force model.
In evaluating our navigation system, our main goal was to show that
the system successfully navigates the robot to its final goal through a
socially-acceptable path. That is, the path that our robot takes to the
goal is similar to what a pedestrian would take to the same goal.
首先,在虚拟环境中,使用提出的算法抵达终点;然后,文中引入对比试验,使用最短路径的方法抵达终点。
We tracked the path taken by the robot and the path taken by a
simulated pedestrian. We also tracked the shortest path that the robot
could take within the confines of the sidewalk.
现在,有三组轨迹,行人的真实轨迹,现有算法轨迹,最短路径的轨迹。本文使用
an independent samples t-test
等数学方法,比较两个轨迹和人类实际轨迹的相似度,来证明现有的算法更加符合人类的行为。
MC:赞一下,比较的有理有据。
HARDWARE DEMONSTRATION
Hardware Setup
机器人的配置如下:
We use the PowerBot from Omron Adept Mobile Robots as our
differential drive mobile base. The robot is equipped with multiple
sensors: a Velodyne VLP-16 3D LiDAR sensor; a SICK LMS-200 2D laser
sensor; a RealSense RGB-D sensor, and GPS and IMU sensors. Our
PowerBot’s max speed is 0.8m/s. This limits its capacity of following
faster pedestrian groups.
待提高的地方: - For the group surfing component, one main area for
improvement is in the selection process of groups to imitate. -
Criteria: group velocity; group trajectory; group size - External
observers of the group surfing behaviour will be interviewed to gauge if
the imitation behaviour is socially acceptable. - For collision
avoidance, a more specialized technique would allow for more efficient
navigation. - We hope to decouple static collision avoidance from
dynamic collision avoidance. - For curb following, our approach only
works for sidewalks that limit directly to the street, ignoring common
tree belt, median, hellstrip, etc. Our future plan is to introduce
detection and recognition of these non-transitable areas and incorporate
them in our navigation module.
Posted onEdited onInUbuntuWord count in article: 508Reading time ≈2 mins.
[toc]
Background
Recently, I updated my Ubuntu version from 16.04 to 18.04, and I'd
like to share my experience on how to build a work or study environment
for yourself.
This blog mainly introduces how to install Sogou pinyin in
non-Chinese versions of Ubuntu.
How to install Sogou pinyin
Firstly, you should install fcitx, a lightweight
input method framework aimed at providing environment independent
language support for Linux.
Recall that: > remove - remove is identical to install except that
packages are removed instead of installed. Note that removing a package
leaves its configuration files on the system. If a plus sign is appended
to the package name (with no intervening space), the identified package
will be installed instead of removed.
purge - purge is identical to remove except that packages are removed
and purged (any configuration files are deleted too). This of course,
does not apply to packages that hold configuration files inside the
user's home folder.
what sudo apt autoremove actually does? Whenever you install an
application (using apt-get), the system will also install the software
that this application depends on. It is common in Ubuntu/Linux that
applications share the same libraries. When you remove the appplication
the dependency will stay on your system. So apt-get autoremove will
remove those dependencies that were installed with applications and that
are no longer used by anything else on the system.
You can subsequently open Language Suppor to
double-check that Key board input method system has
change from ibus to fcitx, and you can change it manuallyif not. You can
ignore it and click the "Remind Me later" button if you are reminded
that "The language support is not installed completely".
Importantly, you must
reboot your system after that.
After that, you can download the deb package from the official
website, 搜狗输入法
for Linux, or here (also official links) directly.
You can install it in Ubuntu Software by
double-clicking, shown as below.
Importantly, you must
reboot your system again.
You can open fcitx config tool after rebooting, as you can see below.
You should also see Sogou Pinyin in the lists. Unfortunately, if not,
you can add it by clicking the + mark, choose Sogou Pinyin and reboot
the system again.
Now, you should input Chinese characters at the same time. In other
words, you can use it as convenient in Windows system, like the
Shuangpin input method. Yeah, you can learn more detail about Shuangpin
in 双拼学习.
Posted onEdited onInUbuntuWord count in article: 331Reading time ≈1 mins.
How
to install rtags for vim in Ubuntu 18.04 / 如何在 Ubuntu 18.04
vim 上安装 rtags 插件
This blog introduces how to install the best cross-reference tool, rtags, that I have ever
used in vim, step by step. I hope it helps.
What's rtags
Rtags is a
client/server application that indexes C/C++ code and keeps a persistent
file-based database of references, declarations, definitions,
symbolnames etc. It allows you to find symbols by name (including nested
class and namespace scope). Most importantly, it gives you proper
follow-symbol and find-references support.
Rtags comes with emacs support but there are projects supporting
other IDEs: vim-rtags
and sublime-rtags.
In this blog, We would install vim-rtags later.
How to install rtags
First, you need clang, which is a compiler front end
for the C, C++, Objective-C and so on. It uses the LLVM compiler
infrastructure as its back end and has been part of the LLVM release
cycle since LLVM 2.6.
// Commands in Ubuntu terminal git clone --recursive https://github.com/Andersbakken/rtags.git cd rtags <-- in rtags directory cmake -DCMAKE_EXPORT_COMPILE_COMMANDS=1 . make sudo make install
Thirdly, you should install vim-rtags in Vundle.vim. I think
it's the easiest way to install rtags plugin in vim. You need to add the
following line to .vimrc if you have installed Vundle, and then run
:PluginInstall in vim.
1
Plugin 'lyuts/vim-rtags' <-- install in vim
Finally, the last but essential step is that forcing
cmake to output compile_commands.json (like DCMAKE_EXPORT_COMPILE_COMMANDS)
and link it with rtags according to your project.
1 2 3
cmake -DCMAKE_EXPORT_COMPILE_COMMANDS=1 [***] your_src_path rc -J your_build_folder_path <-- use compile_commands.json in your build folder rdm & <-- launch the rtags server
Add the ClustrMaps hit tracker to your site or blog and see a
real-time map of your visitors from around the world! Proudly show and
grow your hidden community of interest.
You should test your configuration at:
https://www.ssllabs.com/ssltest/analyze.html?d=your.site
Q: Renew 失败,报错信息如下: > Attempting to renew cert
(www.xx.net) from /etc/letsencrypt/renewal/www.xx.net.conf produced an
unexpected error: 'ascii' codec can't decode byte 0xe8 in position 57:
ordinal not in range (128). Skipping.
In this paper, a navigation method for a small size hopping rover
with advantages on its mobility is discussed by considering with some
uncertainties caused by jumping behavior and measurement error.
和通常的路径规划一样,本文首先从环境数据中提取障碍物,构建三角形
(triangular polygons) 的环境网格,然后使用 A star
算法来规划安全的路径。其中,算法着重考虑了和障碍物的碰撞风险,复杂地形
(roughness of terrain) 和失败的跳跃行为 (failures of hopping
action)。
By extracting obstacles from environmental data and constructing
triangular polygons it is possible to form paths. The algorithm
considers with safety of collision with obstacles, roughness of terrain
and failures of hopping action, and then could generate safer path based
on A* algorithm.
作者提出一种新的方法,使用多个轻量紧凑的探索机器智能体 (light and
compact exploration robot agent) 来共同工作,构成一个大的智能体。
One possibility is the introduction of a light and compact
exploration robot agent, and it is possible that multiple types of
agents work together in one system. Various roles (functions) can be
played on various kinds of equipment, and all of them can constitute one
exploration system.
By allocating the same function (equipment) to some or many of them,
it is possible to ignore some percentage of the agent's loss rate, so
that risk can be distributed to the system and the mission and there is
a high possibility of obtaining higher efficiency.
Multi Exploration
虽然小机器人 (A small size rover) 因为大小 (size) 和重量 (weight)
有很多优势,但是其移动性 (traversability) 和测量能力 (measurement)
却受到局限。
However, its size causes problems on its traversability and
measurement ability.
Introduction of two types of rovers is being considered in the
exploration system. One is a land-based agent and a stochastic existence
region is given in the search region, contributing to the search of the
ground surface. The other is hopping rover. the rover that makes path
planning taking advantage of sensing from high places while moving the
exploration area together with the ground moving rover plays an
important role.
相比轮式机器人,在低重力环境下 (low gravitational
environment),可以通过跳跃的方式跳过障碍物,从而抄近路 (adopt a
short-cut path)。
Especially under low gravitational environment such as other planet
or satellite, it indicates higher performance, e.g.Thus, it can jump
over a long distance upon terrains and obstacles, adopt a short-cut path
without a detour of a wheeled type, and also measure an environment from
higher position in the air of jumping trajectory.
For a hopping rover, though a lot of jumping hardware designs have
been studied, its software e.g. navigation algorithms have been
discussed hardly. So, the navigation method hasn't been established by
taking advantage of hopping mobility such as jumping over
obstacles or a long distance yet.
In this paper, a navigation method for a small size hopping rover
with advantages on its mobility is discussed with some risk
considerations on its mobility and measured data.
PATH PLANNING FOR HOPPING
MOBILITY
Selection of Jumping
Target Position
移动机器人跳跃行为 (jumping motion) 的不确定性包括初始速度变化
(initial speed change),跳跃距离 (jump distance),跳跃方向 (jumping
direction),落地后的束缚 (bound after landing) 以及跳跃失败 (failure of
leap)。
The uncertainty factor of hopping rover's jumping motion is the
initial speed change, jump distance, jumping direction, bound after
landing, failure of leap.
因为跳跃行为的不确定性,所以需要选取目标落地点 (landing point)。
如何环境建模
连接被识别到的障碍物,构成不规则三角形的环境模型。
Each obstacle captured by sensing is connected and the observation
area is divided into triangles
The error is generally given in the form of a normal distribution,
which rides on the initial speed, the jumping angle, and the direction
angle, respectively.