Discussion:
[PATCH v9 net-next 0/4] load imm64 insn and uapi/linux/bpf.h
Alexei Starovoitov
2014-09-03 03:17:22 UTC
Permalink
Hi,

V8->V9
- rebase on top of Hannes's patch in the same area [1]
- no changes to patches 1 and 2
- added patches 3 and 4 as RFC to address Daniel's concern.
patch 3 moves eBPF instruction macros and
patch 4 split macros which are shared between classic and eBPF
into bpf_common.h
3 and 4 will be used by verifier testsuite and will be reposted later
during stage III

V8 thread with 'why' reasoning and end goal:
https://lkml.org/lkml/2014/8/27/628

Original set [2] of ~28 patches I'm planning to present in 4 stages:

I. this 2 patches to fork off llvm upstreaming
II. bpf syscall with manpage and map implementation
III. bpf program load/unload with verifier testsuite (1st user of
instruction macros from bpf.h and 1st user of load imm64 insn)
IV. tracing, etc

[1] http://patchwork.ozlabs.org/patch/385266/
[2] https://lkml.org/lkml/2014/8/26/859
Alexei Starovoitov
2014-09-03 03:17:23 UTC
Permalink
add BPF_LD_IMM64 instruction to load 64-bit immediate value into a register.
All previous instructions were 8-byte. This is first 16-byte instruction.
Two consecutive 'struct bpf_insn' blocks are interpreted as single instruction:
insn[0].code = BPF_LD | BPF_DW | BPF_IMM
insn[0].dst_reg = destination register
insn[0].imm = lower 32-bit
insn[1].code = 0
insn[1].imm = upper 32-bit
All unused fields must be zero.

Classic BPF has similar instruction: BPF_LD | BPF_W | BPF_IMM
which loads 32-bit immediate value into a register.

x64 JITs it as single 'movabsq %rax, imm64'
arm64 may JIT as sequence of four 'movk x0, #imm16, lsl #shift' insn

Note that old eBPF programs are binary compatible with new interpreter.

It helps eBPF programs load 64-bit constant into a register with one
instruction instead of using two registers and 4 instructions:
BPF_MOV32_IMM(R1, imm32)
BPF_ALU64_IMM(BPF_LSH, R1, 32)
BPF_MOV32_IMM(R2, imm32)
BPF_ALU64_REG(BPF_OR, R1, R2)

User space generated programs will use this instruction to load constants only.

To tell kernel that user space needs a pointer the _pseudo_ variant of
this instruction may be added later, which will use extra bits of encoding
to indicate what type of pointer user space is asking kernel to provide.
For example 'off' or 'src_reg' fields can be used for such purpose.
src_reg = 1 could mean that user space is asking kernel to validate and
load in-kernel map pointer.
src_reg = 2 could mean that user space needs readonly data section pointer
src_reg = 3 could mean that user space needs a pointer to per-cpu local data
All such future pseudo instructions will not be carrying the actual pointer
as part of the instruction, but rather will be treated as a request to kernel
to provide one. The kernel will verify the request_for_a_pointer, then
will drop _pseudo_ marking and will store actual internal pointer inside
the instruction, so the end result is the interpreter and JITs never
see pseudo BPF_LD_IMM64 insns and only operate on generic BPF_LD_IMM64 that
loads 64-bit immediate into a register. User space never operates on direct
pointers and verifier can easily recognize request_for_pointer vs other
instructions.

Signed-off-by: Alexei Starovoitov <***@plumgrid.com>
---
Documentation/networking/filter.txt | 8 +++++++-
arch/x86/net/bpf_jit_comp.c | 17 +++++++++++++++++
include/linux/filter.h | 18 ++++++++++++++++++
kernel/bpf/core.c | 5 +++++
lib/test_bpf.c | 21 +++++++++++++++++++++
5 files changed, 68 insertions(+), 1 deletion(-)

diff --git a/Documentation/networking/filter.txt b/Documentation/networking/filter.txt
index c48a9704bda8..81916ab5d96f 100644
--- a/Documentation/networking/filter.txt
+++ b/Documentation/networking/filter.txt
@@ -951,7 +951,7 @@ Size modifier is one of ...

Mode modifier is one of:

- BPF_IMM 0x00 /* classic BPF only, reserved in eBPF */
+ BPF_IMM 0x00 /* used for 32-bit mov in classic BPF and 64-bit in eBPF */
BPF_ABS 0x20
BPF_IND 0x40
BPF_MEM 0x60
@@ -995,6 +995,12 @@ BPF_XADD | BPF_DW | BPF_STX: lock xadd *(u64 *)(dst_reg + off16) += src_reg
Where size is one of: BPF_B or BPF_H or BPF_W or BPF_DW. Note that 1 and
2 byte atomic increments are not supported.

+eBPF has one 16-byte instruction: BPF_LD | BPF_DW | BPF_IMM which consists
+of two consecutive 'struct bpf_insn' 8-byte blocks and interpreted as single
+instruction that loads 64-bit immediate value into a dst_reg.
+Classic BPF has similar instruction: BPF_LD | BPF_W | BPF_IMM which loads
+32-bit immediate value into a register.
+
Testing
-------

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 39ccfbb4a723..06f8c17f5484 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -393,6 +393,23 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
EMIT1_off32(add_1reg(0xB8, dst_reg), imm32);
break;

+ case BPF_LD | BPF_IMM | BPF_DW:
+ if (insn[1].code != 0 || insn[1].src_reg != 0 ||
+ insn[1].dst_reg != 0 || insn[1].off != 0) {
+ /* verifier must catch invalid insns */
+ pr_err("invalid BPF_LD_IMM64 insn\n");
+ return -EINVAL;
+ }
+
+ /* movabsq %rax, imm64 */
+ EMIT2(add_1mod(0x48, dst_reg), add_1reg(0xB8, dst_reg));
+ EMIT(insn[0].imm, 4);
+ EMIT(insn[1].imm, 4);
+
+ insn++;
+ i++;
+ break;
+
/* dst %= src, dst /= src, dst %= imm32, dst /= imm32 */
case BPF_ALU | BPF_MOD | BPF_X:
case BPF_ALU | BPF_DIV | BPF_X:
diff --git a/include/linux/filter.h b/include/linux/filter.h
index c78994593355..bf323da77950 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -166,6 +166,24 @@ enum {
.off = 0, \
.imm = IMM })

+/* BPF_LD_IMM64 macro encodes single 'load 64-bit immediate' insn */
+#define BPF_LD_IMM64(DST, IMM) \
+ BPF_LD_IMM64_RAW(DST, 0, IMM)
+
+#define BPF_LD_IMM64_RAW(DST, SRC, IMM) \
+ ((struct bpf_insn) { \
+ .code = BPF_LD | BPF_DW | BPF_IMM, \
+ .dst_reg = DST, \
+ .src_reg = SRC, \
+ .off = 0, \
+ .imm = (__u32) (IMM) }), \
+ ((struct bpf_insn) { \
+ .code = 0, /* zero is reserved opcode */ \
+ .dst_reg = 0, \
+ .src_reg = 0, \
+ .off = 0, \
+ .imm = ((__u64) (IMM)) >> 32 })
+
/* Short form of mov based on type, BPF_X: dst_reg = src_reg, BPF_K: dst_reg = imm32 */

#define BPF_MOV64_RAW(TYPE, DST, SRC, IMM) \
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index b54bb2c2e494..2c2bfaacce66 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -242,6 +242,7 @@ static unsigned int __bpf_prog_run(void *ctx, const struct bpf_insn *insn)
[BPF_LD | BPF_IND | BPF_W] = &&LD_IND_W,
[BPF_LD | BPF_IND | BPF_H] = &&LD_IND_H,
[BPF_LD | BPF_IND | BPF_B] = &&LD_IND_B,
+ [BPF_LD | BPF_IMM | BPF_DW] = &&LD_IMM_DW,
};
void *ptr;
int off;
@@ -301,6 +302,10 @@ select_insn:
ALU64_MOV_K:
DST = IMM;
CONT;
+ LD_IMM_DW:
+ DST = (u64) (u32) insn[0].imm | ((u64) (u32) insn[1].imm) << 32;
+ insn++;
+ CONT;
ALU64_ARSH_X:
(*(s64 *) &DST) >>= SRC;
CONT;
diff --git a/lib/test_bpf.c b/lib/test_bpf.c
index 9a67456ba29a..413890815d3e 100644
--- a/lib/test_bpf.c
+++ b/lib/test_bpf.c
@@ -1735,6 +1735,27 @@ static struct bpf_test tests[] = {
{ },
{ { 1, 0 } },
},
+ {
+ "load 64-bit immediate",
+ .u.insns_int = {
+ BPF_LD_IMM64(R1, 0x567800001234L),
+ BPF_MOV64_REG(R2, R1),
+ BPF_MOV64_REG(R3, R2),
+ BPF_ALU64_IMM(BPF_RSH, R2, 32),
+ BPF_ALU64_IMM(BPF_LSH, R3, 32),
+ BPF_ALU64_IMM(BPF_RSH, R3, 32),
+ BPF_ALU64_IMM(BPF_MOV, R0, 0),
+ BPF_JMP_IMM(BPF_JEQ, R2, 0x5678, 1),
+ BPF_EXIT_INSN(),
+ BPF_JMP_IMM(BPF_JEQ, R3, 0x1234, 1),
+ BPF_EXIT_INSN(),
+ BPF_ALU64_IMM(BPF_MOV, R0, 1),
+ BPF_EXIT_INSN(),
+ },
+ INTERNAL,
+ { },
+ { { 0, 1 } }
+ },
};

static struct net_device dev;
--
1.7.9.5
Alexei Starovoitov
2014-09-03 03:17:25 UTC
Permalink
move instruction macros (like BPF_MOV64_REG or BPF_ALU32_IMM)
from linux/filter.h into uapi/linux/bpf.h
so that userspace programs can use them.

verifier testsuite (in later patches) will be using them.

Signed-off-by: Alexei Starovoitov <***@plumgrid.com>
---
include/linux/filter.h | 226 ----------------------------------------------
include/uapi/linux/bpf.h | 226 ++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 226 insertions(+), 226 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index ff77842af3e1..c220b27b10df 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -37,232 +37,6 @@ struct bpf_prog_info;
/* BPF program can access up to 512 bytes of stack space. */
#define MAX_BPF_STACK 512

-/* Helper macros for filter block array initializers. */
-
-/* ALU ops on registers, bpf_add|sub|...: dst_reg += src_reg */
-
-#define BPF_ALU64_REG(OP, DST, SRC) \
- ((struct bpf_insn) { \
- .code = BPF_ALU64 | BPF_OP(OP) | BPF_X, \
- .dst_reg = DST, \
- .src_reg = SRC, \
- .off = 0, \
- .imm = 0 })
-
-#define BPF_ALU32_REG(OP, DST, SRC) \
- ((struct bpf_insn) { \
- .code = BPF_ALU | BPF_OP(OP) | BPF_X, \
- .dst_reg = DST, \
- .src_reg = SRC, \
- .off = 0, \
- .imm = 0 })
-
-/* ALU ops on immediates, bpf_add|sub|...: dst_reg += imm32 */
-
-#define BPF_ALU64_IMM(OP, DST, IMM) \
- ((struct bpf_insn) { \
- .code = BPF_ALU64 | BPF_OP(OP) | BPF_K, \
- .dst_reg = DST, \
- .src_reg = 0, \
- .off = 0, \
- .imm = IMM })
-
-#define BPF_ALU32_IMM(OP, DST, IMM) \
- ((struct bpf_insn) { \
- .code = BPF_ALU | BPF_OP(OP) | BPF_K, \
- .dst_reg = DST, \
- .src_reg = 0, \
- .off = 0, \
- .imm = IMM })
-
-/* Endianess conversion, cpu_to_{l,b}e(), {l,b}e_to_cpu() */
-
-#define BPF_ENDIAN(TYPE, DST, LEN) \
- ((struct bpf_insn) { \
- .code = BPF_ALU | BPF_END | BPF_SRC(TYPE), \
- .dst_reg = DST, \
- .src_reg = 0, \
- .off = 0, \
- .imm = LEN })
-
-/* Short form of mov, dst_reg = src_reg */
-
-#define BPF_MOV64_REG(DST, SRC) \
- ((struct bpf_insn) { \
- .code = BPF_ALU64 | BPF_MOV | BPF_X, \
- .dst_reg = DST, \
- .src_reg = SRC, \
- .off = 0, \
- .imm = 0 })
-
-#define BPF_MOV32_REG(DST, SRC) \
- ((struct bpf_insn) { \
- .code = BPF_ALU | BPF_MOV | BPF_X, \
- .dst_reg = DST, \
- .src_reg = SRC, \
- .off = 0, \
- .imm = 0 })
-
-/* Short form of mov, dst_reg = imm32 */
-
-#define BPF_MOV64_IMM(DST, IMM) \
- ((struct bpf_insn) { \
- .code = BPF_ALU64 | BPF_MOV | BPF_K, \
- .dst_reg = DST, \
- .src_reg = 0, \
- .off = 0, \
- .imm = IMM })
-
-#define BPF_MOV32_IMM(DST, IMM) \
- ((struct bpf_insn) { \
- .code = BPF_ALU | BPF_MOV | BPF_K, \
- .dst_reg = DST, \
- .src_reg = 0, \
- .off = 0, \
- .imm = IMM })
-
-/* BPF_LD_IMM64 macro encodes single 'load 64-bit immediate' insn */
-#define BPF_LD_IMM64(DST, IMM) \
- BPF_LD_IMM64_RAW(DST, 0, IMM)
-
-#define BPF_LD_IMM64_RAW(DST, SRC, IMM) \
- ((struct bpf_insn) { \
- .code = BPF_LD | BPF_DW | BPF_IMM, \
- .dst_reg = DST, \
- .src_reg = SRC, \
- .off = 0, \
- .imm = (__u32) (IMM) }), \
- ((struct bpf_insn) { \
- .code = 0, /* zero is reserved opcode */ \
- .dst_reg = 0, \
- .src_reg = 0, \
- .off = 0, \
- .imm = ((__u64) (IMM)) >> 32 })
-
-#define BPF_PSEUDO_MAP_FD 1
-
-/* pseudo BPF_LD_IMM64 insn used to refer to process-local map_fd */
-#define BPF_LD_MAP_FD(DST, MAP_FD) \
- BPF_LD_IMM64_RAW(DST, BPF_PSEUDO_MAP_FD, MAP_FD)
-
-/* Short form of mov based on type, BPF_X: dst_reg = src_reg, BPF_K: dst_reg = imm32 */
-
-#define BPF_MOV64_RAW(TYPE, DST, SRC, IMM) \
- ((struct bpf_insn) { \
- .code = BPF_ALU64 | BPF_MOV | BPF_SRC(TYPE), \
- .dst_reg = DST, \
- .src_reg = SRC, \
- .off = 0, \
- .imm = IMM })
-
-#define BPF_MOV32_RAW(TYPE, DST, SRC, IMM) \
- ((struct bpf_insn) { \
- .code = BPF_ALU | BPF_MOV | BPF_SRC(TYPE), \
- .dst_reg = DST, \
- .src_reg = SRC, \
- .off = 0, \
- .imm = IMM })
-
-/* Direct packet access, R0 = *(uint *) (skb->data + imm32) */
-
-#define BPF_LD_ABS(SIZE, IMM) \
- ((struct bpf_insn) { \
- .code = BPF_LD | BPF_SIZE(SIZE) | BPF_ABS, \
- .dst_reg = 0, \
- .src_reg = 0, \
- .off = 0, \
- .imm = IMM })
-
-/* Indirect packet access, R0 = *(uint *) (skb->data + src_reg + imm32) */
-
-#define BPF_LD_IND(SIZE, SRC, IMM) \
- ((struct bpf_insn) { \
- .code = BPF_LD | BPF_SIZE(SIZE) | BPF_IND, \
- .dst_reg = 0, \
- .src_reg = SRC, \
- .off = 0, \
- .imm = IMM })
-
-/* Memory load, dst_reg = *(uint *) (src_reg + off16) */
-
-#define BPF_LDX_MEM(SIZE, DST, SRC, OFF) \
- ((struct bpf_insn) { \
- .code = BPF_LDX | BPF_SIZE(SIZE) | BPF_MEM, \
- .dst_reg = DST, \
- .src_reg = SRC, \
- .off = OFF, \
- .imm = 0 })
-
-/* Memory store, *(uint *) (dst_reg + off16) = src_reg */
-
-#define BPF_STX_MEM(SIZE, DST, SRC, OFF) \
- ((struct bpf_insn) { \
- .code = BPF_STX | BPF_SIZE(SIZE) | BPF_MEM, \
- .dst_reg = DST, \
- .src_reg = SRC, \
- .off = OFF, \
- .imm = 0 })
-
-/* Memory store, *(uint *) (dst_reg + off16) = imm32 */
-
-#define BPF_ST_MEM(SIZE, DST, OFF, IMM) \
- ((struct bpf_insn) { \
- .code = BPF_ST | BPF_SIZE(SIZE) | BPF_MEM, \
- .dst_reg = DST, \
- .src_reg = 0, \
- .off = OFF, \
- .imm = IMM })
-
-/* Conditional jumps against registers, if (dst_reg 'op' src_reg) goto pc + off16 */
-
-#define BPF_JMP_REG(OP, DST, SRC, OFF) \
- ((struct bpf_insn) { \
- .code = BPF_JMP | BPF_OP(OP) | BPF_X, \
- .dst_reg = DST, \
- .src_reg = SRC, \
- .off = OFF, \
- .imm = 0 })
-
-/* Conditional jumps against immediates, if (dst_reg 'op' imm32) goto pc + off16 */
-
-#define BPF_JMP_IMM(OP, DST, IMM, OFF) \
- ((struct bpf_insn) { \
- .code = BPF_JMP | BPF_OP(OP) | BPF_K, \
- .dst_reg = DST, \
- .src_reg = 0, \
- .off = OFF, \
- .imm = IMM })
-
-/* Function call */
-
-#define BPF_EMIT_CALL(FUNC) \
- ((struct bpf_insn) { \
- .code = BPF_JMP | BPF_CALL, \
- .dst_reg = 0, \
- .src_reg = 0, \
- .off = 0, \
- .imm = ((FUNC) - __bpf_call_base) })
-
-/* Raw code statement block */
-
-#define BPF_RAW_INSN(CODE, DST, SRC, OFF, IMM) \
- ((struct bpf_insn) { \
- .code = CODE, \
- .dst_reg = DST, \
- .src_reg = SRC, \
- .off = OFF, \
- .imm = IMM })
-
-/* Program exit */
-
-#define BPF_EXIT_INSN() \
- ((struct bpf_insn) { \
- .code = BPF_JMP | BPF_EXIT, \
- .dst_reg = 0, \
- .src_reg = 0, \
- .off = 0, \
- .imm = 0 })
-
#define bytes_to_bpf_size(bytes) \
({ \
int bpf_size = -EINVAL; \
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 13bd4bf3b100..97edd23622fc 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -54,6 +54,232 @@ enum {
/* BPF has 10 general purpose 64-bit registers and stack frame. */
#define MAX_BPF_REG __MAX_BPF_REG

+/* Helper macros for filter block array initializers. */
+
+/* ALU ops on registers, bpf_add|sub|...: dst_reg += src_reg */
+
+#define BPF_ALU64_REG(OP, DST, SRC) \
+ ((struct bpf_insn) { \
+ .code = BPF_ALU64 | BPF_OP(OP) | BPF_X, \
+ .dst_reg = DST, \
+ .src_reg = SRC, \
+ .off = 0, \
+ .imm = 0 })
+
+#define BPF_ALU32_REG(OP, DST, SRC) \
+ ((struct bpf_insn) { \
+ .code = BPF_ALU | BPF_OP(OP) | BPF_X, \
+ .dst_reg = DST, \
+ .src_reg = SRC, \
+ .off = 0, \
+ .imm = 0 })
+
+/* ALU ops on immediates, bpf_add|sub|...: dst_reg += imm32 */
+
+#define BPF_ALU64_IMM(OP, DST, IMM) \
+ ((struct bpf_insn) { \
+ .code = BPF_ALU64 | BPF_OP(OP) | BPF_K, \
+ .dst_reg = DST, \
+ .src_reg = 0, \
+ .off = 0, \
+ .imm = IMM })
+
+#define BPF_ALU32_IMM(OP, DST, IMM) \
+ ((struct bpf_insn) { \
+ .code = BPF_ALU | BPF_OP(OP) | BPF_K, \
+ .dst_reg = DST, \
+ .src_reg = 0, \
+ .off = 0, \
+ .imm = IMM })
+
+/* Endianess conversion, cpu_to_{l,b}e(), {l,b}e_to_cpu() */
+
+#define BPF_ENDIAN(TYPE, DST, LEN) \
+ ((struct bpf_insn) { \
+ .code = BPF_ALU | BPF_END | BPF_SRC(TYPE), \
+ .dst_reg = DST, \
+ .src_reg = 0, \
+ .off = 0, \
+ .imm = LEN })
+
+/* Short form of mov, dst_reg = src_reg */
+
+#define BPF_MOV64_REG(DST, SRC) \
+ ((struct bpf_insn) { \
+ .code = BPF_ALU64 | BPF_MOV | BPF_X, \
+ .dst_reg = DST, \
+ .src_reg = SRC, \
+ .off = 0, \
+ .imm = 0 })
+
+#define BPF_MOV32_REG(DST, SRC) \
+ ((struct bpf_insn) { \
+ .code = BPF_ALU | BPF_MOV | BPF_X, \
+ .dst_reg = DST, \
+ .src_reg = SRC, \
+ .off = 0, \
+ .imm = 0 })
+
+/* Short form of mov, dst_reg = imm32 */
+
+#define BPF_MOV64_IMM(DST, IMM) \
+ ((struct bpf_insn) { \
+ .code = BPF_ALU64 | BPF_MOV | BPF_K, \
+ .dst_reg = DST, \
+ .src_reg = 0, \
+ .off = 0, \
+ .imm = IMM })
+
+#define BPF_MOV32_IMM(DST, IMM) \
+ ((struct bpf_insn) { \
+ .code = BPF_ALU | BPF_MOV | BPF_K, \
+ .dst_reg = DST, \
+ .src_reg = 0, \
+ .off = 0, \
+ .imm = IMM })
+
+/* BPF_LD_IMM64 macro encodes single 'load 64-bit immediate' insn */
+#define BPF_LD_IMM64(DST, IMM) \
+ BPF_LD_IMM64_RAW(DST, 0, IMM)
+
+#define BPF_LD_IMM64_RAW(DST, SRC, IMM) \
+ ((struct bpf_insn) { \
+ .code = BPF_LD | BPF_DW | BPF_IMM, \
+ .dst_reg = DST, \
+ .src_reg = SRC, \
+ .off = 0, \
+ .imm = (__u32) (IMM) }), \
+ ((struct bpf_insn) { \
+ .code = 0, /* zero is reserved opcode */ \
+ .dst_reg = 0, \
+ .src_reg = 0, \
+ .off = 0, \
+ .imm = ((__u64) (IMM)) >> 32 })
+
+#define BPF_PSEUDO_MAP_FD 1
+
+/* pseudo BPF_LD_IMM64 insn used to refer to process-local map_fd */
+#define BPF_LD_MAP_FD(DST, MAP_FD) \
+ BPF_LD_IMM64_RAW(DST, BPF_PSEUDO_MAP_FD, MAP_FD)
+
+/* Short form of mov based on type, BPF_X: dst_reg = src_reg, BPF_K: dst_reg = imm32 */
+
+#define BPF_MOV64_RAW(TYPE, DST, SRC, IMM) \
+ ((struct bpf_insn) { \
+ .code = BPF_ALU64 | BPF_MOV | BPF_SRC(TYPE), \
+ .dst_reg = DST, \
+ .src_reg = SRC, \
+ .off = 0, \
+ .imm = IMM })
+
+#define BPF_MOV32_RAW(TYPE, DST, SRC, IMM) \
+ ((struct bpf_insn) { \
+ .code = BPF_ALU | BPF_MOV | BPF_SRC(TYPE), \
+ .dst_reg = DST, \
+ .src_reg = SRC, \
+ .off = 0, \
+ .imm = IMM })
+
+/* Direct packet access, R0 = *(uint *) (skb->data + imm32) */
+
+#define BPF_LD_ABS(SIZE, IMM) \
+ ((struct bpf_insn) { \
+ .code = BPF_LD | BPF_SIZE(SIZE) | BPF_ABS, \
+ .dst_reg = 0, \
+ .src_reg = 0, \
+ .off = 0, \
+ .imm = IMM })
+
+/* Indirect packet access, R0 = *(uint *) (skb->data + src_reg + imm32) */
+
+#define BPF_LD_IND(SIZE, SRC, IMM) \
+ ((struct bpf_insn) { \
+ .code = BPF_LD | BPF_SIZE(SIZE) | BPF_IND, \
+ .dst_reg = 0, \
+ .src_reg = SRC, \
+ .off = 0, \
+ .imm = IMM })
+
+/* Memory load, dst_reg = *(uint *) (src_reg + off16) */
+
+#define BPF_LDX_MEM(SIZE, DST, SRC, OFF) \
+ ((struct bpf_insn) { \
+ .code = BPF_LDX | BPF_SIZE(SIZE) | BPF_MEM, \
+ .dst_reg = DST, \
+ .src_reg = SRC, \
+ .off = OFF, \
+ .imm = 0 })
+
+/* Memory store, *(uint *) (dst_reg + off16) = src_reg */
+
+#define BPF_STX_MEM(SIZE, DST, SRC, OFF) \
+ ((struct bpf_insn) { \
+ .code = BPF_STX | BPF_SIZE(SIZE) | BPF_MEM, \
+ .dst_reg = DST, \
+ .src_reg = SRC, \
+ .off = OFF, \
+ .imm = 0 })
+
+/* Memory store, *(uint *) (dst_reg + off16) = imm32 */
+
+#define BPF_ST_MEM(SIZE, DST, OFF, IMM) \
+ ((struct bpf_insn) { \
+ .code = BPF_ST | BPF_SIZE(SIZE) | BPF_MEM, \
+ .dst_reg = DST, \
+ .src_reg = 0, \
+ .off = OFF, \
+ .imm = IMM })
+
+/* Conditional jumps against registers, if (dst_reg 'op' src_reg) goto pc + off16 */
+
+#define BPF_JMP_REG(OP, DST, SRC, OFF) \
+ ((struct bpf_insn) { \
+ .code = BPF_JMP | BPF_OP(OP) | BPF_X, \
+ .dst_reg = DST, \
+ .src_reg = SRC, \
+ .off = OFF, \
+ .imm = 0 })
+
+/* Conditional jumps against immediates, if (dst_reg 'op' imm32) goto pc + off16 */
+
+#define BPF_JMP_IMM(OP, DST, IMM, OFF) \
+ ((struct bpf_insn) { \
+ .code = BPF_JMP | BPF_OP(OP) | BPF_K, \
+ .dst_reg = DST, \
+ .src_reg = 0, \
+ .off = OFF, \
+ .imm = IMM })
+
+/* Function call */
+
+#define BPF_EMIT_CALL(FUNC) \
+ ((struct bpf_insn) { \
+ .code = BPF_JMP | BPF_CALL, \
+ .dst_reg = 0, \
+ .src_reg = 0, \
+ .off = 0, \
+ .imm = ((FUNC) - __bpf_call_base) })
+
+/* Raw code statement block */
+
+#define BPF_RAW_INSN(CODE, DST, SRC, OFF, IMM) \
+ ((struct bpf_insn) { \
+ .code = CODE, \
+ .dst_reg = DST, \
+ .src_reg = SRC, \
+ .off = OFF, \
+ .imm = IMM })
+
+/* Program exit */
+
+#define BPF_EXIT_INSN() \
+ ((struct bpf_insn) { \
+ .code = BPF_JMP | BPF_EXIT, \
+ .dst_reg = 0, \
+ .src_reg = 0, \
+ .off = 0, \
+ .imm = 0 })
+
struct bpf_insn {
__u8 code; /* opcode */
__u8 dst_reg:4; /* dest register */
--
1.7.9.5
Alexei Starovoitov
2014-09-03 03:17:26 UTC
Permalink
userspace programs that wish to use eBPF instruction macros
(like BPF_MOV64_REG or BPF_ALU32_IMM) need to include two files:
uapi/linux/filter.h and uapi/linux/bpf.h
Move common macro definitions that are shared between
classic BPF and eBPF into uapi/linux/bpf_common.h

Signed-off-by: Alexei Starovoitov <ast-uqk4Ao+rVK5Wk0Htik3J/***@public.gmane.org>
---
include/uapi/linux/Kbuild | 1 +
include/uapi/linux/bpf.h | 1 +
include/uapi/linux/bpf_common.h | 55 ++++++++++++++++++++++++++++++++++++++
include/uapi/linux/filter.h | 56 +--------------------------------------
4 files changed, 58 insertions(+), 55 deletions(-)
create mode 100644 include/uapi/linux/bpf_common.h

diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild
index fb3f7b675229..abd0a85cedc7 100644
--- a/include/uapi/linux/Kbuild
+++ b/include/uapi/linux/Kbuild
@@ -68,6 +68,7 @@ header-y += binfmts.h
header-y += blkpg.h
header-y += blktrace_api.h
header-y += bpf.h
+header-y += bpf_common.h
header-y += bpqether.h
header-y += bsg.h
header-y += btrfs.h
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 97edd23622fc..df9c9bdaa8a0 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -8,6 +8,7 @@
#define _UAPI__LINUX_BPF_H__

#include <linux/types.h>
+#include <linux/bpf_common.h>

/* Extended instruction set based on top of classic BPF */

diff --git a/include/uapi/linux/bpf_common.h b/include/uapi/linux/bpf_common.h
new file mode 100644
index 000000000000..e2d8b1ca3730
--- /dev/null
+++ b/include/uapi/linux/bpf_common.h
@@ -0,0 +1,55 @@
+#ifndef _UAPI__LINUX_BPF_COMMON_H__
+#define _UAPI__LINUX_BPF_COMMON_H__
+
+/* Instruction classes */
+#define BPF_CLASS(code) ((code) & 0x07)
+#define BPF_LD 0x00
+#define BPF_LDX 0x01
+#define BPF_ST 0x02
+#define BPF_STX 0x03
+#define BPF_ALU 0x04
+#define BPF_JMP 0x05
+#define BPF_RET 0x06
+#define BPF_MISC 0x07
+
+/* ld/ldx fields */
+#define BPF_SIZE(code) ((code) & 0x18)
+#define BPF_W 0x00
+#define BPF_H 0x08
+#define BPF_B 0x10
+#define BPF_MODE(code) ((code) & 0xe0)
+#define BPF_IMM 0x00
+#define BPF_ABS 0x20
+#define BPF_IND 0x40
+#define BPF_MEM 0x60
+#define BPF_LEN 0x80
+#define BPF_MSH 0xa0
+
+/* alu/jmp fields */
+#define BPF_OP(code) ((code) & 0xf0)
+#define BPF_ADD 0x00
+#define BPF_SUB 0x10
+#define BPF_MUL 0x20
+#define BPF_DIV 0x30
+#define BPF_OR 0x40
+#define BPF_AND 0x50
+#define BPF_LSH 0x60
+#define BPF_RSH 0x70
+#define BPF_NEG 0x80
+#define BPF_MOD 0x90
+#define BPF_XOR 0xa0
+
+#define BPF_JA 0x00
+#define BPF_JEQ 0x10
+#define BPF_JGT 0x20
+#define BPF_JGE 0x30
+#define BPF_JSET 0x40
+#define BPF_SRC(code) ((code) & 0x08)
+#define BPF_K 0x00
+#define BPF_X 0x08
+
+#ifndef BPF_MAXINSNS
+#define BPF_MAXINSNS 4096
+#endif
+
+#endif /* _UAPI__LINUX_BPF_COMMON_H__ */
diff --git a/include/uapi/linux/filter.h b/include/uapi/linux/filter.h
index 253b4d42cf2b..47785d5ecf17 100644
--- a/include/uapi/linux/filter.h
+++ b/include/uapi/linux/filter.h
@@ -7,7 +7,7 @@

#include <linux/compiler.h>
#include <linux/types.h>
-
+#include <linux/bpf_common.h>

/*
* Current version of the filter code architecture.
@@ -32,56 +32,6 @@ struct sock_fprog { /* Required for SO_ATTACH_FILTER. */
struct sock_filter __user *filter;
};

-/*
- * Instruction classes
- */
-
-#define BPF_CLASS(code) ((code) & 0x07)
-#define BPF_LD 0x00
-#define BPF_LDX 0x01
-#define BPF_ST 0x02
-#define BPF_STX 0x03
-#define BPF_ALU 0x04
-#define BPF_JMP 0x05
-#define BPF_RET 0x06
-#define BPF_MISC 0x07
-
-/* ld/ldx fields */
-#define BPF_SIZE(code) ((code) & 0x18)
-#define BPF_W 0x00
-#define BPF_H 0x08
-#define BPF_B 0x10
-#define BPF_MODE(code) ((code) & 0xe0)
-#define BPF_IMM 0x00
-#define BPF_ABS 0x20
-#define BPF_IND 0x40
-#define BPF_MEM 0x60
-#define BPF_LEN 0x80
-#define BPF_MSH 0xa0
-
-/* alu/jmp fields */
-#define BPF_OP(code) ((code) & 0xf0)
-#define BPF_ADD 0x00
-#define BPF_SUB 0x10
-#define BPF_MUL 0x20
-#define BPF_DIV 0x30
-#define BPF_OR 0x40
-#define BPF_AND 0x50
-#define BPF_LSH 0x60
-#define BPF_RSH 0x70
-#define BPF_NEG 0x80
-#define BPF_MOD 0x90
-#define BPF_XOR 0xa0
-
-#define BPF_JA 0x00
-#define BPF_JEQ 0x10
-#define BPF_JGT 0x20
-#define BPF_JGE 0x30
-#define BPF_JSET 0x40
-#define BPF_SRC(code) ((code) & 0x08)
-#define BPF_K 0x00
-#define BPF_X 0x08
-
/* ret - BPF_K and BPF_X also apply */
#define BPF_RVAL(code) ((code) & 0x18)
#define BPF_A 0x10
@@ -91,10 +41,6 @@ struct sock_fprog { /* Required for SO_ATTACH_FILTER. */
#define BPF_TAX 0x00
#define BPF_TXA 0x80

-#ifndef BPF_MAXINSNS
-#define BPF_MAXINSNS 4096
-#endif
-
/*
* Macros for filter block array initializers.
*/
--
1.7.9.5
Alexei Starovoitov
2014-09-03 03:17:24 UTC
Permalink
allow user space to generate eBPF programs

uapi/linux/bpf.h: eBPF instruction set definition

linux/filter.h: the rest

This patch only moves macro definitions, but practically it freezes existing
eBPF instruction set, though new instructions can still be added in the future.

These eBPF definitions cannot go into uapi/linux/filter.h, since the names
may conflict with existing applications.

Full eBPF ISA description is in Documentation/networking/filter.txt

Signed-off-by: Alexei Starovoitov <***@plumgrid.com>
---
include/linux/filter.h | 56 +-------------------------------------
include/uapi/linux/Kbuild | 1 +
include/uapi/linux/bpf.h | 65 +++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 67 insertions(+), 55 deletions(-)
create mode 100644 include/uapi/linux/bpf.h

diff --git a/include/linux/filter.h b/include/linux/filter.h
index bf323da77950..8f82ef3f1cdd 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -10,58 +10,12 @@
#include <linux/workqueue.h>
#include <uapi/linux/filter.h>
#include <asm/cacheflush.h>
+#include <uapi/linux/bpf.h>

struct sk_buff;
struct sock;
struct seccomp_data;

-/* Internally used and optimized filter representation with extended
- * instruction set based on top of classic BPF.
- */
-
-/* instruction classes */
-#define BPF_ALU64 0x07 /* alu mode in double word width */
-
-/* ld/ldx fields */
-#define BPF_DW 0x18 /* double word */
-#define BPF_XADD 0xc0 /* exclusive add */
-
-/* alu/jmp fields */
-#define BPF_MOV 0xb0 /* mov reg to reg */
-#define BPF_ARSH 0xc0 /* sign extending arithmetic shift right */
-
-/* change endianness of a register */
-#define BPF_END 0xd0 /* flags for endianness conversion: */
-#define BPF_TO_LE 0x00 /* convert to little-endian */
-#define BPF_TO_BE 0x08 /* convert to big-endian */
-#define BPF_FROM_LE BPF_TO_LE
-#define BPF_FROM_BE BPF_TO_BE
-
-#define BPF_JNE 0x50 /* jump != */
-#define BPF_JSGT 0x60 /* SGT is signed '>', GT in x86 */
-#define BPF_JSGE 0x70 /* SGE is signed '>=', GE in x86 */
-#define BPF_CALL 0x80 /* function call */
-#define BPF_EXIT 0x90 /* function return */
-
-/* Register numbers */
-enum {
- BPF_REG_0 = 0,
- BPF_REG_1,
- BPF_REG_2,
- BPF_REG_3,
- BPF_REG_4,
- BPF_REG_5,
- BPF_REG_6,
- BPF_REG_7,
- BPF_REG_8,
- BPF_REG_9,
- BPF_REG_10,
- __MAX_BPF_REG,
-};
-
-/* BPF has 10 general purpose 64-bit registers and stack frame. */
-#define MAX_BPF_REG __MAX_BPF_REG
-
/* ArgX, context and stack frame pointer register positions. Note,
* Arg1, Arg2, Arg3, etc are used as argument mappings of function
* calls in BPF_CALL instruction.
@@ -322,14 +276,6 @@ enum {
#define SK_RUN_FILTER(filter, ctx) \
(*filter->prog->bpf_func)(ctx, filter->prog->insnsi)

-struct bpf_insn {
- __u8 code; /* opcode */
- __u8 dst_reg:4; /* dest register */
- __u8 src_reg:4; /* source register */
- __s16 off; /* signed offset */
- __s32 imm; /* signed immediate constant */
-};
-
#ifdef CONFIG_COMPAT
/* A struct sock_filter is architecture independent. */
struct compat_sock_fprog {
diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild
index 24e9033f8b3f..fb3f7b675229 100644
--- a/include/uapi/linux/Kbuild
+++ b/include/uapi/linux/Kbuild
@@ -67,6 +67,7 @@ header-y += bfs_fs.h
header-y += binfmts.h
header-y += blkpg.h
header-y += blktrace_api.h
+header-y += bpf.h
header-y += bpqether.h
header-y += bsg.h
header-y += btrfs.h
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
new file mode 100644
index 000000000000..479ed0b6be16
--- /dev/null
+++ b/include/uapi/linux/bpf.h
@@ -0,0 +1,65 @@
+/* Copyright (c) 2011-2014 PLUMgrid, http://plumgrid.com
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ */
+#ifndef _UAPI__LINUX_BPF_H__
+#define _UAPI__LINUX_BPF_H__
+
+#include <linux/types.h>
+
+/* Extended instruction set based on top of classic BPF */
+
+/* instruction classes */
+#define BPF_ALU64 0x07 /* alu mode in double word width */
+
+/* ld/ldx fields */
+#define BPF_DW 0x18 /* double word */
+#define BPF_XADD 0xc0 /* exclusive add */
+
+/* alu/jmp fields */
+#define BPF_MOV 0xb0 /* mov reg to reg */
+#define BPF_ARSH 0xc0 /* sign extending arithmetic shift right */
+
+/* change endianness of a register */
+#define BPF_END 0xd0 /* flags for endianness conversion: */
+#define BPF_TO_LE 0x00 /* convert to little-endian */
+#define BPF_TO_BE 0x08 /* convert to big-endian */
+#define BPF_FROM_LE BPF_TO_LE
+#define BPF_FROM_BE BPF_TO_BE
+
+#define BPF_JNE 0x50 /* jump != */
+#define BPF_JSGT 0x60 /* SGT is signed '>', GT in x86 */
+#define BPF_JSGE 0x70 /* SGE is signed '>=', GE in x86 */
+#define BPF_CALL 0x80 /* function call */
+#define BPF_EXIT 0x90 /* function return */
+
+/* Register numbers */
+enum {
+ BPF_REG_0 = 0,
+ BPF_REG_1,
+ BPF_REG_2,
+ BPF_REG_3,
+ BPF_REG_4,
+ BPF_REG_5,
+ BPF_REG_6,
+ BPF_REG_7,
+ BPF_REG_8,
+ BPF_REG_9,
+ BPF_REG_10,
+ __MAX_BPF_REG,
+};
+
+/* BPF has 10 general purpose 64-bit registers and stack frame. */
+#define MAX_BPF_REG __MAX_BPF_REG
+
+struct bpf_insn {
+ __u8 code; /* opcode */
+ __u8 dst_reg:4; /* dest register */
+ __u8 src_reg:4; /* source register */
+ __s16 off; /* signed offset */
+ __s32 imm; /* signed immediate constant */
+};
+
+#endif /* _UAPI__LINUX_BPF_H__ */
--
1.7.9.5
Daniel Borkmann
2014-09-03 15:46:39 UTC
Permalink
Post by Alexei Starovoitov
allow user space to generate eBPF programs
uapi/linux/bpf.h: eBPF instruction set definition
linux/filter.h: the rest
This patch only moves macro definitions, but practically it freezes existing
eBPF instruction set, though new instructions can still be added in the future.
These eBPF definitions cannot go into uapi/linux/filter.h, since the names
may conflict with existing applications.
Full eBPF ISA description is in Documentation/networking/filter.txt
Ok, given you post the remaining two RFCs later on this window as
you indicate, I have no objections:

Acked-by: Daniel Borkmann <dborkman-H+wXaHxf7aLQT0dZR+***@public.gmane.org>
Daniel Borkmann
2014-10-13 17:21:49 UTC
Permalink
On 09/03/2014 05:46 PM, Daniel Borkmann wrote:
...
Post by Daniel Borkmann
Ok, given you post the remaining two RFCs later on this window as
Ping, Alexei, are you still sending the patch for bpf_common.h or
do you want me to take care of this?

Cheers,
Daniel
Alexei Starovoitov
2014-10-13 21:49:53 UTC
Permalink
Post by Daniel Borkmann
...
Post by Daniel Borkmann
Ok, given you post the remaining two RFCs later on this window as
Ping, Alexei, are you still sending the patch for bpf_common.h or
do you want me to take care of this?
It's not forgotten.
I'm not sending it only because net-next is closed
and it seems to be -next material.
Daniel Borkmann
2014-10-14 07:34:30 UTC
Permalink
Post by Alexei Starovoitov
Post by Daniel Borkmann
...
Post by Daniel Borkmann
Ok, given you post the remaining two RFCs later on this window as
Ping, Alexei, are you still sending the patch for bpf_common.h or
do you want me to take care of this?
It's not forgotten.
I'm not sending it only because net-next is closed
and it seems to be -next material.
Well, the point was since it's UAPI you're modifying, that it needs
to be shipped before it first gets exposed to user land ...

I think that should be reason enough ... there's no point in doing
this at a later point in time.
Alexei Starovoitov
2014-10-14 08:43:17 UTC
Permalink
Post by Daniel Borkmann
Post by Alexei Starovoitov
Post by Daniel Borkmann
...
Post by Daniel Borkmann
Ok, given you post the remaining two RFCs later on this window as
Ping, Alexei, are you still sending the patch for bpf_common.h or
do you want me to take care of this?
It's not forgotten.
I'm not sending it only because net-next is closed
and it seems to be -next material.
Well, the point was since it's UAPI you're modifying, that it needs
to be shipped before it first gets exposed to user land ...
I think that should be reason enough ... there's no point in doing
this at a later point in time.
Moving common #defines from filter.h into bpf_common.h can
be done at any point in time. For the sake of argument if
there is an app that includes both filter.h and bpf.h, it will
continue to work just fine.
Anyway, since you insist, I will send it now, though it definitely
can wait.
Daniel Borkmann
2014-10-14 20:27:45 UTC
Permalink
Post by Alexei Starovoitov
Post by Daniel Borkmann
Post by Alexei Starovoitov
Post by Daniel Borkmann
...
Post by Daniel Borkmann
Ok, given you post the remaining two RFCs later on this window as
Ping, Alexei, are you still sending the patch for bpf_common.h or
do you want me to take care of this?
It's not forgotten.
I'm not sending it only because net-next is closed
and it seems to be -next material.
Well, the point was since it's UAPI you're modifying, that it needs
to be shipped before it first gets exposed to user land ...
I think that should be reason enough ... there's no point in doing
this at a later point in time.
Moving common #defines from filter.h into bpf_common.h can
be done at any point in time. For the sake of argument if
there is an app that includes both filter.h and bpf.h, it will
continue to work just fine.
Correct, but the argument was that we can _avoid_ this from the
very beginning. Thus, user space applications making use of eBPF
only need to include <linux/bpf.h>, nothing more.

Doing this at any later point in time will just lead to the need
to include both headers.

Alexei Starovoitov
2014-09-04 05:41:48 UTC
Permalink
Post by Daniel Borkmann
Post by Alexei Starovoitov
allow user space to generate eBPF programs
uapi/linux/bpf.h: eBPF instruction set definition
linux/filter.h: the rest
This patch only moves macro definitions, but practically it freezes
existing
eBPF instruction set, though new instructions can still be added in the
future.
These eBPF definitions cannot go into uapi/linux/filter.h, since the names
may conflict with existing applications.
Full eBPF ISA description is in Documentation/networking/filter.txt
Ok, given you post the remaining two RFCs later on this window as
David,

I see that all 4 patches are marked as rfc, whereas I tagged
only 3 and 4 as 'rfc' and mentioned in cover letter that they
will be coming for real in stage III. Here they support patch 2
and show the future changes to bpf.h file.
Patches 1 and 2 are good to go.
Note that they're on top of Hannes's patch:
http://patchwork.ozlabs.org/patch/385266/
Do you want me to resubmit just first two?

Thanks
David Miller
2014-09-04 21:49:15 UTC
Permalink
From: Alexei Starovoitov <ast-uqk4Ao+rVK5Wk0Htik3J/***@public.gmane.org>
Date: Wed, 3 Sep 2014 22:41:48 -0700
Post by Alexei Starovoitov
Do you want me to resubmit just first two?
Yes, you can't submit hodge-podge path series, either it's all to
be applied or it's all RFC material.
Loading...