Paper review Planning and Decision-Making for Autonomous Vehicles

Paper review Planning and Decision-Making for Autonomous Vehicles

[TOC]
对于目前的基于状态机或者部分融入优化cost function里的决策算法,我觉得会很容易走到瓶颈.于是查一下学术前沿的一些决策算法.总结于此文.论文地址

目的与总结

对于目前的基于状态机或者部分融入优化cost function里的决策算法,我觉得会很容易走到瓶颈.于是查一下学术前沿的一些决策算法.

Figure1_schema.png

本论文的讨论范围如上图.
其中我最感兴趣的是从环境-自车行为直接的三种模式:

  1. sequential planning:传统的方式,模块界限分明,几乎没有基于学习类的算法,决策主要依赖状态机或者部分融入优化cost function里.
  2. behavior-aware planning:决策与规划融合,通过MDP,Game theory,POMDP, Reinforment learning等更靠近机器学习的算法.
  3. end-to-end planning:一步到位,省略中间步骤,从传感器直接学习到自车行为.

下文中的(数字)均为引用文献,与原论文保持一致.

Abstract

此review总结三方面的内容

  1. 传统与基于机器学习的规划控制方法,主要集中在后者.
    emphasize recent approaches for integrated perception and planning and for behavior-aware planning(many of which rely on machine learning)
  2. 稍微触及到算法验证(个人无感).
    touch upon question of verification and safety.
  3. 智能交通流算法简介,尤其是拼车出行.
    discuss the state of the art and remaining challenges for managing fleets of autonomous vehicles.

1. INTRODUCTION

对智驾规控算法提出的疑问

  • 如何决策?
    how the vehicles decide where to go next
  • 如何长期短期内使用传感器数据?
    how vehicles use the data provided by their sensors to make decisions with short and long time horizons,
  • 如何与其他交通元素交互?
    how the interaction with other vehicles affects what to do,
  • 如何学习过去的经验以及人类的驾驶技术?
    how vehicles can learn how to drive from their history and from human driving,
  • 如何确保安全?
    how to ensure that the vehicle control and planning systems are correct and safe,
  • 如何共享出行?
    how to ensure that multiple vehicles on the road at the same time coordinate and are managed to move people and packages to their destinations in the most effective way.

2. MOTION PLANNING AND CONTROL

本章主要讲传统的规划与控制.

2.1. Vehicle Dynamics and Control

低速

  • PID
  • feedback linearization (14)
  • MPC

高速

  • 包括轮胎力学在内的完整车辆动力学模型.
    full dynamic model of the vehicle,tire forces (15–17).
  • 非线性控制,MPC,feedback–feedforward control
    Nonlinear control (18), model predictive control (19), or feedback–feedforward control (20)
  • 基于优化算法,学习类算法的系统识别
    optimization-based and learning-based techniques for system identification (21).
  • 针对时变系统的在线系统识别
    online model identification and lifelong system identification (22)

2.2. Parallel Autonomy

这个课题是我硕士期间的毕业论文课题.太怀念了.

自动控制有三种形式:

  1. 序列自动,特定场景下自动驾驶,其余人驾驶
    series autonomy, in which the human orders the vehicle to execute a function, which is similar to most self-driving approaches to date;
  2. 交叉自动,轮流控制
    interleaved autonomy, in which the human driver and the autonomous system take turns operating the vehicle;
  3. 并行自动(共享控制),自动驾驶系统是守护者
    parallel autonomy (also referred to as shared control), in which the autonomous system functions as a guardian angel

对于第三种的研究如下:

  1. 人机输入线性融合:Anderson et al. (23)
    linear combination of human input with the output of a safety system. 根据危急程度改变比例,the human input was combined with a
    computed trajectory based on the severity of the threat.

  2. 尽量符合人的意图的辅助:.Alonso-Mora et al. (25)
    minimize the deviation of the autonomous system’s plan from the driver’s intent.

问题:只提前看一步.limited to a single-step look-ahead.

解决方法1:Shia et al. (26),多预测人的输入
minimized the difference in steering wheel angle from the human predicted controlinput

解决方法2:nonlinear optimizers(28)
optimize simultaneously over steering angle and velocity or throttle input to achieve minimal intervention.

2.3. Motion Planning for Autonomous Vehicles

传统规划recent reviews (29, 30).
三大类(ps.论文作者对传统算法是多看不上,A*搜索类的都不总结…):

  1. lattice planners简单适合高速
    input space discretization with collision checking, such as lattice planners (e.g., 31, 32) or road-aligned primitives (e.g., 33), whose main advantage is their simplicity and effectiveness, especially in highway scenarios.
  2. RRT
    randomized planning, such as rapidly exploring random trees (RRT)(e.g., 34, 35),
  3. 优化,如果为非凸优化很难找到optima.
    constrained optimization and receding-horizon control (e.g., 19, 36).

如何解决在一大堆有些时候不得不违反rule?

  1. 调整cost function.
    Kuwata et al. (38) computed a cost map of the drivable space of the car and employed the RRT approach to find the path with the lowest cost.
  2. 用逻辑函数(状态机)
    specify the rules as logic functions and utilize automatic control synthesis. Tumova et al. (39) described a method to synthesize the motion that violates only the lowest-priority rule.
  3. syntactically co-safe linear temporal logic (scLTL)
    Vasile et al. (40) specify the desired behavior of the vehicle and employed an RRT∗ -based motion planner to obtain a provably minimum-violation trajectory for a single-vehicle and single-trip scenario.

3. INTEGRATED PERCEPTION AND PLANNING

第三种规划算法,感知与规划融合.

3.1. From Classical Perception to Current Challenges in Neural Network–Based Perception Systems

前人的survey (41).
下面几乎都是感知的state-of-art.

  • 传统的做法是手动设计features
    Classical perception systems extract information in the form of manually designed features from raw sensory data.
    例子:

    1. SIFT (Scale-Invariant Feature Transform)(44, 45)
    2. BRISK (Binary Robust Invariant Scalable Keypoints) (46)
    3. SURF (Speeded Up RobustFeatures) (47, 48)
    4. ORB (Oriented FAST and Rotated BRIEF)(49, 50).
    5. purely on vision:ORB-SLAM2 (50)
    6. purely on vision:SVO (Semidirect Visual Odometry)2.0 (52)
    7. purely on vision:LSD-SLAM (Large-Scale Direct Monocular SLAM) (53),
    8. combination of vision and lidar: (51)
  • deep neural network architectures

  1. Bar Hillel et al. (54) for a survey on road and lane detection
  2. Faster R-CNN (Faster Regional Convolutional Neural Network) (56)

问题1: 及时通信下的高分辨率high-resolution images in real time
解决方法:

  1. ENet (Efficient Neural Network) (59) achieved a 13-ms runtime on 1,024 × 2,048–pixel images with 58% mIoU( mean intersection over union) on the Cityscapes data set (43)
  2. ICNet(Image Cascade Network) (60)achieved 70% mIoU at 33 ms

问题2: 大量训练数据怎么来
Deep neural network architectures rely on large amounts of data

  1. real-world data sets:Cityscapes data set (43)
  2. Artificial data set:SYNTHIA data set (61).Johnson-Roberson et al. (62)使用GTA游戏里的游戏模型加速学习

问题3:insufficient feedback of uncertainty.

  1. Monte Carlo dropout sampling (65)
  2. principled Bayesian framework
    McAllister et al.(66) estimate and propagate uncertainty from every component throughout the entire system pipeline.

3.2. End-to-End Planning

由于我读机器学习的概念还只停留在吴恩达的《Machine Learning》上,所以下面只是摘抄出来以供后期查阅.

  • pioneered end-to-end driving
    ALVINN (Autonomous Land Vehicle in a Neural Network) (70)

  • behavior reflex approach learn
    Chen et al. (67): avoid off-road obstacles from raw stereo-camera inputs(71).

  • deep convolutional neural network

    1. NVIDIA(72) trained a deep convolutional neural network to map raw images from a front-facing camera directly to steering commands.故意增加训练输入During training, random shifts and rotations are applied to the original input image and virtual human interventions are simulated to artificially increase the number of training samples

    2. Bojarski et al. (73) showed that the network was capable of learning features resembling lane markings, road boundaries, and shapes of other vehicles, in an effort to explain the resulting behavior.

    3. Gurghian et al. (74) allowed for a close-up and uncluttered view of the lane by using two laterally mounted down-facing cameras and estimated the position inside the lane in an end-to-end fashion.

    4. Xu et al. (75) used a large-scale driving video data set to train an end-to-end on a task-based level and continuous driving behaviors

    5. SafeDAgger (76) is a query-efficient extension to DAgger (Dataset Aggregation) (77)

  • 在虚拟环境下可以强化学习失败场景
    Wolf et al. (81) presented an approach for learning to steer a vehicle in a simulation environment using a Deep Q-Network.
    问题:虚拟与现实图片差别(The gap between simulation and real-world data)
    The gap could be closed(82) by first segmenting the virtual image from the simulator with a segmentation network and then translating it into a realistic-looking image employing a generative network.

  • deep reinforcement learning
    Lillicrap et al. (83) proposed an actor–critic and model-free algorithm that is based on the deterministic policy gradient and relies on deep reinforcement learning.

4. BEHAVIOR-AWARE MOTION PLANNING

第二种规划算法,把决策与规划融合.
传统的算法决策需要完美的其他交通要素的轨迹信息.然后使用state machine(DARPA Urban Challenge e.g., 10, 11, 84)完成决策.这种方式只适合有限场景,并且没有考虑到交互.
expect a prediction over the future trajectories of other traffic participants

下面的算法会考虑交互interactive and cooperative decision-making.

4.1. Cooperation and Interaction

不交互的结果:出现freezing-robot problem (85),即机器人不知所措停止不动.

引起这个问题的原因在于uncertainty无法预判,下面是解决这个问题的4种思路:

  1. 尽量完整建立环境动力学模型
    Find a better description of the dynamics of the environment partially closed-loop receding-horizon control (86)但是即便完美预测轨迹也无法完全解决freezing-robot problem cannot always be prevented (85)

  2. Model cooperation based on a conditional formulation that models how the agents react to the robot’s actions (as in 87).前提是假设可以完全控制其他agents.

  3. Model cooperation via joint distributions
    joint probability distributions (85) and joint cost distributions(88).

  4. V2X方案
    A survey of cooperative planning (90)

4.2. Game-Theoretic Approaches

博弈论与概率分布算法的区别:

  1. 最大可能还是最大后验maximum likelihood or maximum a posteriori
    a similar manner for probabilistic approaches, where instead of optimizing for lowest cost, the vehicle’s controls are expected to follow the rule of maximum likelihood or maximum a posteriori.
  2. An assumption of indirect control over the other vehicle.
    是a joint cost/distribution 还是 in a two-player game, 虽然问题都是在交互中maximizes its own expected reward.

博弈论算法的挑战:

  • 建模modeling interactions
  • 维度dealing with the increased complexity

方案:

  1. 离散暴力搜索
    The simplest approach is to discretize the action space by motion primitives and to exhaustively search through all possible options (89).
  2. 不是暴力搜索而是tree搜索
    a tree-type structure and apply a search over the tree (91)
  3. 进一步地搜索 Monte Carlo tree search (92)
  4. 为降低计算复杂度,考虑前车影响力巨大以此压缩搜索空间
    Schwarting & Pascheka (93) assumed that the following vehicles’ actions are dominated by their predecessors.
  5. Stackelberg game(假设前车影响力巨大)
    Li et al. (94) modeled the decision-making in autonomous driving as a Stackelberg game:not all vehicles’ actions are interdependent with all other vehicles’ actions.这样agent的数目增加与复杂度的增加是线性的.

4.3. Probabilistic Approaches

  • MDP a simple Bayes model(仅针对他车加减速一个动作,)
    Wei et al. (95) associated social behavior with a simple Bayes model: Other vehicles are more likely to yield if decelerating and less likely to yield if accelerating. No reciprocal reward-based model is employed.

  • intelligent driver model(他车的加减速为joint cost function)
    intelligent driver model describes the dynamics of the longitudinal positions and velocities of single vehicles.考虑交互的方式:incorporates cooperative behavior by including other vehicles’ efforts (acceleration) into a joint cost function and therefore achieves a certain level of cooperation.

  • 粒子滤波器估计他车行为
    Hoermann et al. (97) particle filter to estimate the intelligent driver model’s behavior parameters.maximum acceleration, desired acceleration, desired velocity, minimum distance, and desired time gap.

  • probabilistic graphical model
    Dong et al. (98) 预测他车意图.

  • interacting Gaussian processes (85)
    each agent’s trajectory is modeled via a Gaussian process.
    Individual Gaussian processes are coupled through an interaction potential that models cooperation between different agents’ trajectories. Terms for affordance, for progress, and to penalize
    close distances to other agents can also be included in their joint cost function (88).

4.4. Partially Observable Markov Decision Processes POMDP

由于他车的意图不可直接观测所以从MDP进化为POMDP.
intentions and replanning procedures of the other agents are not directly observable and are encoded in hidden variables.

  • solving POMDP models offline
    calculate the best possible action not for the current belief state but rather for every imaginable belief state.
    求解时间太长,takes several minutes to hours .

  • approximate POMDP solutions(非offline)

    1. (99) planning all vehicles’ motions on preplanned paths, reducing the dimensionality of the state space + trade-offs among exploration and exploitation.

    2. 根据道路信息降低他车意图维度,离散他车意图
      Liu et al. (100). integrate the road context and the motion intention of another vehicle.A reference vehicle behavior corresponding to the road context is defined, and the other vehicle’s reaction is inferred by observing the deviation from
      the reference behavior.

    3. 不涉及交互,只一个state,剪枝到reachable state search
      without interactions, over specific regions of interest (101) instead of the whole set of other vehicles and only for the current belief state. This is typically done by a look-ahead search in the belief state space to explore only those belief states that are actually reachable from the state right now.

    4. 结合场景降低action选择维度
      Ulbrich & Maurer (101) applied a tree-based policy evaluation
      planning horizons<10s + domain knowledge can be incorporated into the action selection process.

    5. 由于离散的粒度不好掌握,那就连续的.
      Brechtel et al. (103) presented a continuous POMDP. incremental learning of an efficient space representation during value iteration. While reasoning about potentially hidden objects and observation uncertainty, they also consider the interactions of road users.

4.5. Learning-Based Approaches

  • 决策与规划解偶decouple decision-making and planning.
    Vallon et al. (105)trained a support vector machine for lane-change decision-making.

  • Gaussian mixture models parameterized by neural networks
    Gaussian mixture models parameterized by neural networks with features based on the ego vehicle’s and the surrounding vehicles’ states, past actions, and specifications and the road geometry were trained in Lenz et al. (106)

  • nonparametric prediction architecture (107)

  • Inverse reinforcement learning (IRL)
    cost function通过学习得到.learns the reward function via feature-based IRL from expert demonstrations.

    1. learn the reward function without a set of expert trajectories and predefined labels
      the weights of the reward function can also be found by having a human driver choose a preferred trajectory iteratively from a set of two candidate trajectories (109). This allows the vehicle to learn the reward function without a set of expert trajectories and predefined labels
    2. approximate-inference models
      Huang et al. (110) approximate-inference models,humans will not be exact in their IRL inference.
    3. Markov decision processes with an unknown reward function(111)
      Abbeel et al. (112) humanlike trajectories in parking lots
    4. maximum-entropy IRL
      Ziebart et al. (113) applied the principle of maximum-entropy IRL
    5. Continuous inverse optimal control with locally optimal examples (120) may be used to handle continuous states and actions. Levine & Kotun (120)
    6. risk-sensitive IRL
      Majumdar et al. (121) devised a framework for risk-sensitive IRL
    7. The maximum-entropy deep IRL framework (122)
    8. generative adversarial imitation learning (125)
      Kuefler et al. (124) demonstrated the effectiveness of this method.

5. VERIFICATION AND SYNTHESIS

这部分完全没看懂…也不是很感兴趣.
文献综述:Recent studies (126)

传统验证:simulation and case-based testing do not provide sufficient guarantees.

传统验证(ACC):safe controllers can be produced by model-based correct-by-construction synthesis, have been developed for low-complexity tasks such as adaptive cruise control (127) and control of signalized vehicular networks (128).

correct-by-construction synthesis is formal verification:examines the complete reachable state space of a model .
Online verification can be achieved with reachability analysis (131)

6. FLEET MANAGEMENT

仅仅考虑ride sharing 不考虑 pooling requests 也就是拼车到不同目的地:

  1. fluid approximations (136)
  2. queuing-based formulations (137)

ride-pooling problem
挑战在于时空上的高度耦合导致复杂度飙升.
more related to the vehicle-routing problem and the dynamic pickup and delivery problem (139–141), where spatiotemporally distributed demand must be picked up and delivered within prespecified time windows. A major challenge when addressing this problem is the need to explore a very large decision space while computing solutions fast enough to provide users with the experience of real-time booking and service.

  • anytime-optimal method
    Alonso-Mora et al. (143), who introduced an anytime-optimal method for requestmatching and dynamic vehicle routing in low- and high-capacity vehicles. The method, whichconsists of three steps—pruning of feasible trip combinations, assignment of trips to vehicles, and fleet rebalancing—showed that large-scale operation of vehicle fleets is possible.

  • 基于随机过程stochastic routing
    A recent review of stochastic routing highlighted state-of-the-art works in this area (146).

  • 考虑拥堵的算法congestion-aware routing
    Zhang et al. (149) described a constrained optimization method.
    Levin (150) introduced a fluid-approximation approach that also accounts for vehicle sharing

  • 考虑充电桩位置
    charge their batteries at a finite number of locations(151).

作者

cx

发布于

2021-12-08

更新于

2022-07-16

许可协议