common accelerator framework warpdrive update … › ... › presentations › bkk19-401.pdftables...
TRANSCRIPT
![Page 1: Common Accelerator Framework Warpdrive Update … › ... › presentations › bkk19-401.pdftables so user space address can be recognized by kernel. b. Malloc buffer can be used](https://reader033.vdocuments.mx/reader033/viewer/2022042409/5f254b5bddf1a63418027904/html5/thumbnails/1.jpg)
Common Accelerator Framework Warpdrive Update - BKK19-401Zhangfei Gao, Linaro2019.04
![Page 2: Common Accelerator Framework Warpdrive Update … › ... › presentations › bkk19-401.pdftables so user space address can be recognized by kernel. b. Malloc buffer can be used](https://reader033.vdocuments.mx/reader033/viewer/2022042409/5f254b5bddf1a63418027904/html5/thumbnails/2.jpg)
Agenda● Background● Target: Provide Accelerator● Investigation● Warpdrive● Performance
![Page 3: Common Accelerator Framework Warpdrive Update … › ... › presentations › bkk19-401.pdftables so user space address can be recognized by kernel. b. Malloc buffer can be used](https://reader033.vdocuments.mx/reader033/viewer/2022042409/5f254b5bddf1a63418027904/html5/thumbnails/3.jpg)
Background1. More and more hardware accelerators, such as
compressors/decompressors, encryptors/decryptors, and AI engines, are introduced to the market. Most of them need to be used in user space. We need software infrastructure to support these applications.
2. This is important especially to ARM-base solution, because ARM is good for domain specific customizing
![Page 4: Common Accelerator Framework Warpdrive Update … › ... › presentations › bkk19-401.pdftables so user space address can be recognized by kernel. b. Malloc buffer can be used](https://reader033.vdocuments.mx/reader033/viewer/2022042409/5f254b5bddf1a63418027904/html5/thumbnails/4.jpg)
General platform
CPUs
Non-sva capable devices
Discrete devices
Integrated devices
memory
MM
U
IOMMU
Root Complex
Eg: Legacy Devices
Eg: PCIE attached Devices
Eg: Processor Accelerators
![Page 5: Common Accelerator Framework Warpdrive Update … › ... › presentations › bkk19-401.pdftables so user space address can be recognized by kernel. b. Malloc buffer can be used](https://reader033.vdocuments.mx/reader033/viewer/2022042409/5f254b5bddf1a63418027904/html5/thumbnails/5.jpg)
TargetKunpeng920
zip/gzip
HiAccQM
On chip pcie interface
Accelerator Engine
App directly call accelerator engine
1024 queues, support up to 1024 process.
sr-iov : support pf & 63 vfs
Accelerators:zip/gzip, hpre, SM3/4, sec, poe
![Page 6: Common Accelerator Framework Warpdrive Update … › ... › presentations › bkk19-401.pdftables so user space address can be recognized by kernel. b. Malloc buffer can be used](https://reader033.vdocuments.mx/reader033/viewer/2022042409/5f254b5bddf1a63418027904/html5/thumbnails/6.jpg)
Investigation #1 no-iommu
CPU device
Host/Physical memory
VA
PAPA
VA
CPU page tables
CPU page tables
dma_alloc_coherent
dma_mmap_coherent
no-iommu
Limitations:a. Continuous memory
requiredb. Reserve memory (cma)
maybe required
![Page 7: Common Accelerator Framework Warpdrive Update … › ... › presentations › bkk19-401.pdftables so user space address can be recognized by kernel. b. Malloc buffer can be used](https://reader033.vdocuments.mx/reader033/viewer/2022042409/5f254b5bddf1a63418027904/html5/thumbnails/7.jpg)
Investigation #2: iommu
CPU device
Host/Physical memory
IOVA
PAPA
VA
CPU page tables
IOMMU page tablesiommu
Limitations:a, IOMMU_DOMAIN_UNMANAGED mode has to be used, to solve iova conflict with vfio, so dma_api can not be used.b. Multi-process does not support.
![Page 8: Common Accelerator Framework Warpdrive Update … › ... › presentations › bkk19-401.pdftables so user space address can be recognized by kernel. b. Malloc buffer can be used](https://reader033.vdocuments.mx/reader033/viewer/2022042409/5f254b5bddf1a63418027904/html5/thumbnails/8.jpg)
Investigation #3: iommu SVA
CPU device
Host/Physical memory
VA
PAPA
VA
CPU page tables
CPU page tablesIommu: SVA
Pro:a. Iommu directly use cpu page
tables so user space address can be recognized by kernel.
b. Malloc buffer can be used by kernel dma since device page fault.
c. Multi-process support since pasidd. Kernel dma_api can be used
where cd=0 is used.
![Page 9: Common Accelerator Framework Warpdrive Update … › ... › presentations › bkk19-401.pdftables so user space address can be recognized by kernel. b. Malloc buffer can be used](https://reader033.vdocuments.mx/reader033/viewer/2022042409/5f254b5bddf1a63418027904/html5/thumbnails/9.jpg)
Shared Virtual Address (SVA)
CD table
ssv = 0
ssv = 1 & ssid = x
STEkernel io-pgtable
process io-pgtable
DMA access to control queue are performed with ssv=0DMA access to the data queue are performed with ssv=1 & ssid = x (pasid)
![Page 10: Common Accelerator Framework Warpdrive Update … › ... › presentations › bkk19-401.pdftables so user space address can be recognized by kernel. b. Malloc buffer can be used](https://reader033.vdocuments.mx/reader033/viewer/2022042409/5f254b5bddf1a63418027904/html5/thumbnails/10.jpg)
Shared Virtual Address (SVA)
CD
Ctx table
STE
Stream tables
PTE
Stream tables
SID SSID IOVA IPA
SVA native enabling on ARM platform (Jean-Philippe Brucker, ARM)https://lkml.org/lkml/2019/2/20/518
SID: Stream ID, identifies a deviceSSID: substream id, identifies an address space (pasid)
![Page 11: Common Accelerator Framework Warpdrive Update … › ... › presentations › bkk19-401.pdftables so user space address can be recognized by kernel. b. Malloc buffer can be used](https://reader033.vdocuments.mx/reader033/viewer/2022042409/5f254b5bddf1a63418027904/html5/thumbnails/11.jpg)
Warpdrive
1. Accelerator framework for user space application. Proposed by Kenneth Lee from Hisilicon ([email protected]).
2. Includes kernel (uacce) and user (warpdrive lib) facilities.3. Based on iommu, protects kernel and other application by setting boundary to the hardware
access range.4. Especially using iommu-sva feature, maintaining unified address space between the
process and hardware context.5. Multi-process & multi-queue support.6. Compatibility (no-iommu, no-sva capable, sva capable)7. SR-IOV support, vfio-pci for virtual machine
![Page 12: Common Accelerator Framework Warpdrive Update … › ... › presentations › bkk19-401.pdftables so user space address can be recognized by kernel. b. Malloc buffer can be used](https://reader033.vdocuments.mx/reader033/viewer/2022042409/5f254b5bddf1a63418027904/html5/thumbnails/12.jpg)
Warpdriveapp zip app
library
uacce
SM3/4 hpre zip SEC POE
crypto
qm
get/put_queue
sys interfacechrdev interface
register
helper
![Page 13: Common Accelerator Framework Warpdrive Update … › ... › presentations › bkk19-401.pdftables so user space address can be recognized by kernel. b. Malloc buffer can be used](https://reader033.vdocuments.mx/reader033/viewer/2022042409/5f254b5bddf1a63418027904/html5/thumbnails/13.jpg)
Warpdrive kernel
1. Current Statusa. RFCV3 was sent by Kenneth directly using iommu interface
https://lkml.org/lkml/2018/11/12/1951b. Support no-iommu, non-sva capable, sva-capablec. Jean-Philippe Brucker sva v4 patch has been verified, using platform
device stall mode.i. https://lkml.org/lkml/2019/2/20/518ii. git://linux-arm.org/linux-jpb.git sva/current
d. Support zip/gzip, hpre
2. Plan a. In supporting: SM3/4, SEC, POE
![Page 14: Common Accelerator Framework Warpdrive Update … › ... › presentations › bkk19-401.pdftables so user space address can be recognized by kernel. b. Malloc buffer can be used](https://reader033.vdocuments.mx/reader033/viewer/2022042409/5f254b5bddf1a63418027904/html5/thumbnails/14.jpg)
Warpdrive user
1. Current statusa. Support zip/gzip, hpreb. Support multi-queuec. Support async mode, batch processing
2. Plan:a. OpenSSL interfaceb. Provide patches to compatible with zlib, switch to builtin zlib if not
found hardware
![Page 15: Common Accelerator Framework Warpdrive Update … › ... › presentations › bkk19-401.pdftables so user space address can be recognized by kernel. b. Malloc buffer can be used](https://reader033.vdocuments.mx/reader033/viewer/2022042409/5f254b5bddf1a63418027904/html5/thumbnails/15.jpg)
Challenge
1. SVA patches still in review https://lkml.org/lkml/2019/2/20/518a. SVA only sharing stage-1 page tables with the CPU, not support
sharing stage-2 yet.b. SVA support PF, but still not consider VFs.c. Platform devices using stall features need quirks.
![Page 16: Common Accelerator Framework Warpdrive Update … › ... › presentations › bkk19-401.pdftables so user space address can be recognized by kernel. b. Malloc buffer can be used](https://reader033.vdocuments.mx/reader033/viewer/2022042409/5f254b5bddf1a63418027904/html5/thumbnails/16.jpg)
Performance
cpu no-iommu iommu: non-sva iommu: sva iommu: sva(2q)
real(s) 1.91s 0.01s 0.01s 0.02s 0.01s
user(s) 1.91s 0.00s 0.00s 0.00s 0.00s
sys(s) 0.00s 0.01s 0.01s 0.01s 0.01s
speed(M/s): 12.565 2400 2400 1200 2400
cmd: time gzip <data.super> del
time ./test_hisi_zip -g <data.super> dst
size: data.super 24M
![Page 17: Common Accelerator Framework Warpdrive Update … › ... › presentations › bkk19-401.pdftables so user space address can be recognized by kernel. b. Malloc buffer can be used](https://reader033.vdocuments.mx/reader033/viewer/2022042409/5f254b5bddf1a63418027904/html5/thumbnails/17.jpg)
Performance
![Page 18: Common Accelerator Framework Warpdrive Update … › ... › presentations › bkk19-401.pdftables so user space address can be recognized by kernel. b. Malloc buffer can be used](https://reader033.vdocuments.mx/reader033/viewer/2022042409/5f254b5bddf1a63418027904/html5/thumbnails/18.jpg)
Welcome to joinKernel:https://github.com/Kenneth-Lee/linux-kernel-warpdrive.gitUser:https://github.com/Kenneth-Lee/warpdrive.git