00:46 karolherbst: "atom add u32 %r36 g[%r2d+0x0] %r37" .. :)
00:46 karolherbst: imirkin: I guess we don't do atomics on normal global memory yet?
00:47 karolherbst: mhh.. but we do that on ssbos.. uhm..
00:47 karolherbst: how does that work then
00:47 karolherbst: ohh, it's just an assert
00:48 karolherbst: seems like it should just work (tm)
00:52 imirkin: yeah, we do atomics on g[] just fine
00:52 imirkin: dunno which assert you're talking about
00:53 karolherbst: ohh, it's CL stuff
00:53 karolherbst: I hit something inside the atom lowering
00:54 karolherbst: it doesn't handle getting global memory atoms directly yet
00:54 imirkin: in CL, or in nv50_ir?
00:54 karolherbst: nv50_ir
00:54 imirkin: oh hm
00:54 karolherbst: case FILE_MEMORY_GLOBAL: return true; fixes it :p
00:54 imirkin: ah :)
00:54 karolherbst: anyway, CTS test passes, so I guess that should be fine
00:54 imirkin: shared memory is a bit trickier
00:55 imirkin: esp pre-maxwell
00:55 karolherbst: yeah.. but this is about global memory only now
00:55 karolherbst: I run the SVM OpenCL CTS tests
00:55 karolherbst: (just want to check if my implementation is actually correct)
00:56 imirkin: sure
00:58 karolherbst: it's still a bit weird to do atomics on the GPU on malloced memory...
01:01 imirkin: :)
01:14 karolherbst: ehh.. bug inside the CTS uff
01:15 karolherbst: or... my bug
01:19 karolherbst: my bug :)
13:35 uis: imirkin
13:37 uis: What about NVE4 shader compiler?
13:58 kherbst: uhh "gr: M2MF 80000002 [PUSH_NOT_ENOUGH_DATA]"
13:59 imirkin: that means you've been very bad.
14:10 kherbst: apparently
14:10 kherbst: it seems to happen when enqueing a kernel
14:12 kherbst: imirkin: any idea what it might complain about? const buffer not getting enough data or something?
14:12 kherbst: but.. that would be weird
14:15 kherbst: ohh, the push buffers are quite small
14:16 kherbst: imirkin: do we have a pushbuffer decoder somewhere?
14:16 kherbst: a string based one
14:17 imirkin: so
14:17 imirkin: all the m2mf/p2mf stuff has to happen in one go
14:17 imirkin: you can't split the command/data across pushbufs
14:18 imirkin: which is why we have these big PUSH_SPACE things at the top
14:18 imirkin: NOT_ENOUGH_DATA means you told it you'd give it X values, but gave it Y
14:18 imirkin: the push data's aren't wrong in themselves (you'd get a diff error otherwise)
14:18 imirkin: they just don't match
14:22 kherbst: mhhh
14:23 kherbst: "0x1000f010" is the last entry
14:23 kherbst: but that looks fine as well
14:23 kherbst: so the push buffer itself contains less than the hardware got told
14:27 kherbst: ohh wait, that's just part of the last command anyway
14:30 kherbst: imirkin: any idea on how to debug that besides checking that we will them up correctly?
14:35 kherbst: ohhhhhhh
14:35 kherbst: it's that error
14:36 kherbst: imirkin: https://github.com/karolherbst/mesa/commit/4fa20aa0ba5e3b650c85fc24f8880124aa6b9171
14:37 kherbst: mhh.. still getting an error :/
14:38 kherbst: ohh, that's just a runtime bug I think
14:41 kherbst: wait.. that error disappeared
14:41 kherbst: now I still have a different one: gr: GPC0/TPC0/TEX: 80000041
14:43 kherbst: okay.. this is some CFG stuff I mess up
14:44 kherbst: uff.. yes
14:44 kherbst: the shader is totoaly bonkers
14:45 kherbst: https://gist.githubusercontent.com/karolherbst/9013894fa96c0c60436a96515ba1dfc2/raw/31004966456b6ae4d6bfa5028e2a9b03e6e3900b/gistfile1.txt
14:46 kherbst: or maybe that's fine?
14:46 kherbst: I am worried about the ssy stuff
14:49 kherbst: ehhh
14:49 kherbst: "set u8 $p0 eq s8 $r0 c0[0x10]"
14:49 kherbst: uff
14:49 kherbst: yeah, that's not gonna fly
14:50 imirkin_: i recently added some support for PSETP
14:50 imirkin_: (not pushed yet)
14:50 imirkin_: although that's not quite what you want
14:51 kherbst: ohh, the problem is rather the 8 bit operation?
14:51 kherbst: it's real 8 bit data
14:51 kherbst: not a 1 bit predicate
14:51 imirkin_: right, so PSETP is not quite what you want :)
14:51 kherbst: yeah.. but that's still not causing the issue
14:51 imirkin_: ok. probably not helping though.
14:52 imirkin_: what else is wrong?
14:52 kherbst: now I have this: https://gist.githubusercontent.com/karolherbst/1df791c3fc512458982851b36a4098cf/raw/bfa41a8c99838f7bca90847b017e36ec2decd168/gistfile1.txt
14:52 imirkin_: ok
14:52 kherbst: "gr: GPC0/TPC0/TEX: 80000041" and "fifo: fault 00 [READ] at 00007f675a333000 engine 00 [GR] client 01 [GPC0/T1_0] reason 00 [PDE] on channel 2 [00ffbae000 test_svm[8500]]"
14:52 kherbst: mhh
14:52 kherbst: oom memory read maybe?
14:52 imirkin_: does the LD become a LDG?
14:53 imirkin_: T1_0 i think means "texture"
14:53 imirkin_: or something that happens via the texture data path
14:53 kherbst: mhhhh
14:53 kherbst: we don't do ldg/stg yet, do we?
14:53 imirkin_: which LDG may?
14:53 imirkin_: we might
14:53 imirkin_: i forget
14:53 kherbst: we don't
14:54 kherbst: but that's suppose to work anyway
14:54 kherbst: would have noticed earlier
14:54 imirkin_: oh
14:54 imirkin_: hm
14:54 imirkin_: ld e u8 $r0 g[$r0] 0x1
14:54 imirkin_: 8-bit load
14:54 imirkin_: that hasn't been exactly tried much in the past.
14:54 imirkin_: also 99% sure that the i2i after is implied
14:54 imirkin_: by the load
14:55 kherbst: should be fine
14:55 imirkin_: can you run this through nvdisasm and make sure it decodes to something similar?
14:55 kherbst: LD.E.U8 R0, [R0] ;
14:55 imirkin_: i mean the whole thing
14:56 kherbst: mhhh, I replaced the load by a constant and now it's not traping anymore (but yeah, the nvdisasm output looked identical)
14:57 imirkin_: silly question... what's the literal value of r0 and r1 at that point?
14:57 kherbst: ld b64 $r0 c0[0x0]
14:57 imirkin_: yes
14:57 imirkin_: and what is that equal to
14:58 kherbst: yeah.. let me dump the data
14:58 imirkin_: thanks
14:58 imirkin_: i'm wondering if the address isn't weird somehow
14:58 imirkin_: and/or in that goddamn idiotic memory window space
14:58 kherbst: probably it is
14:58 kherbst: ohh it's the SVM pointer :)
14:58 imirkin_: right
14:59 imirkin_: but that still has to be a value that the GPU can understand and has mapped...
14:59 kherbst: it's mapped
14:59 kherbst: 0x44ffa000
14:59 kherbst: the entire address space is mapped
15:00 imirkin_: ok, pretty sure that's not where the window is
15:00 imirkin_: i think we stuck it at the top of 32-bit space
15:00 kherbst: the entire application address space is mapped
15:00 imirkin_: is the high word 0?
15:01 kherbst: ohh, I messed up my printf.. :D
15:01 kherbst: 0x7f5d26ca8000
15:02 imirkin_: ok, so that's within 48 bits, so that's good
15:03 imirkin_: that looks like the kind of address it's faulting at
15:03 imirkin_: so either it's not sufficiently mapped
15:03 imirkin_: or more needs to be done elsewhere
15:03 kherbst: well
15:03 kherbst: the pointer is a result of malloc essentially
15:04 kherbst: and that's supposed to work
15:04 kherbst: unless there is something funky going on
15:05 kherbst: ohhhhhh, the heck
15:05 kherbst: I see it know
15:05 kherbst: stupid pbuf vs pBuf
15:05 kherbst: it's not a SVM pointer
15:06 kherbst: the application creates a cl buffer object based on that, and the buffer gets mapped
15:06 kherbst: but I am sure there is no real buffer as everything just messes up internally
15:06 kherbst: uff
15:06 imirkin_: yeah, can't help you with that.
15:12 kherbst: okay.. passing in the SVM pointer directly just works
15:12 kherbst: okay.. so how does this buffer stuff works :)
15:12 imirkin_: cool!
15:12 kherbst: I only know that mapping the buffer should return the SVM pointer, but yeah
15:12 kherbst: this needs some additions
15:13 kherbst: imirkin_: anyway, something like this is needed: https://github.com/karolherbst/mesa/commit/4fa20aa0ba5e3b650c85fc24f8880124aa6b9171
15:14 kherbst: memcpy is kind of too much here...
15:14 imirkin_: ffs
15:14 imirkin_: you can just run over
15:14 kherbst: well
15:14 imirkin_: and push the full extra word
15:15 imirkin_: er
15:15 kherbst: yeah...
15:15 imirkin_: full extra bytes
15:15 kherbst: this commit is from april 2018 :D
15:17 kherbst: imirkin_: https://github.com/karolherbst/mesa/commit/232697a60720dcef7feffb829248a57bd34bbc83
15:17 imirkin_: sgtm
15:18 imirkin_: someone will bitch about accessing info->input past the end
15:18 imirkin_: and invariably it'll trigger some dumb pagefault
15:18 kherbst: it's a int32_t array, no?
15:18 imirkin_: i'd say the ST should ensure a word-size allocation
15:18 imirkin_: oh, it is? nice.
15:18 kherbst: mhh, let me check
15:18 imirkin_: i assumed void *, but what do i know
15:19 kherbst: yeah.. it's void*
15:21 kherbst: "std::vector<uint8_t> input" backs the pointer
15:21 imirkin_: bleargh
15:23 kherbst: at least shrink_to_fit is never called
15:32 kherbst: imirkin_: maybe we should just have a PUSH_DATAbp variant or something which does byte based addressing...
15:32 kherbst: sounds like the best fix here
15:34 imirkin_: sure
15:34 imirkin_: PUSH_DATAb :)
15:37 kherbst: :)
15:38 kherbst: we have that for the pre nv30 driver :)
15:38 kherbst: ohh no
15:38 kherbst: that's for boolean
15:40 kherbst: mhh... I think we can ignore that naming collision ...
15:40 kherbst: any better idea for a name?
15:58 kherbst: "FAILED 7 of 13 tests." progress..
17:41 uis: What arch used in nve4 falcon?
17:42 uis: RV?
17:44 karolherbst: nope, custom ISA
17:45 HdkR: Falcon ISA :)
17:46 uis: Is it supported by r2?
17:47 uis: radare2
17:49 HdkR: https://github.com/radareorg/radare2 Grepping falcon there didn't bring anything up
17:50 HdkR: https://github.com/aquynh/capstone/tree/master/arch and I don't see Falcon there
17:56 uis: And should nve4 falcon code be signed?
18:04 karolherbst: no
18:44 HdkR: karolherbst: Blah. Next time I'll make sure to order from amazon.de. Customs is terrible
18:46 karolherbst: :D
18:46 HdkR: Apparently the case hasn't even shipped yet
18:46 karolherbst: I also got some email internally..
18:46 HdkR: hah :D
18:47 karolherbst: I'd prefer to get around customs though
18:47 karolherbst: no idea what you could do on your end
18:47 HdkR: Yea, amazon.de will avoid that since they ship there
18:47 karolherbst: ohh, I meant for the current order
18:47 HdkR: ah
18:48 HdkR: Nah, I've had this problem in the past, typically means the receiver has to handle it except in extreme cases
18:50 HdkR: If there is anything I can help with let me know though
18:50 karolherbst: I'll take a look tomorrow, otherwise I pay the customs myself
18:51 karolherbst: shouldn't be more than 25% of the price or so
18:51 HdkR: Blah. This is why amazon charged me more. They should have handled that :/
18:51 HdkR: Need to slot a gift card in to the next one or something :P