00:46karolherbst: "atom add u32 %r36 g[%r2d+0x0] %r37" .. :)
00:46karolherbst: imirkin: I guess we don't do atomics on normal global memory yet?
00:47karolherbst: mhh.. but we do that on ssbos.. uhm..
00:47karolherbst: how does that work then
00:47karolherbst: ohh, it's just an assert
00:48karolherbst: seems like it should just work (tm)
00:52imirkin: yeah, we do atomics on g just fine
00:52imirkin: dunno which assert you're talking about
00:53karolherbst: ohh, it's CL stuff
00:53karolherbst: I hit something inside the atom lowering
00:54karolherbst: it doesn't handle getting global memory atoms directly yet
00:54imirkin: in CL, or in nv50_ir?
00:54imirkin: oh hm
00:54karolherbst: case FILE_MEMORY_GLOBAL: return true; fixes it :p
00:54imirkin: ah :)
00:54karolherbst: anyway, CTS test passes, so I guess that should be fine
00:54imirkin: shared memory is a bit trickier
00:55imirkin: esp pre-maxwell
00:55karolherbst: yeah.. but this is about global memory only now
00:55karolherbst: I run the SVM OpenCL CTS tests
00:55karolherbst: (just want to check if my implementation is actually correct)
00:58karolherbst: it's still a bit weird to do atomics on the GPU on malloced memory...
01:14karolherbst: ehh.. bug inside the CTS uff
01:15karolherbst: or... my bug
01:19karolherbst: my bug :)
13:37uis: What about NVE4 shader compiler?
13:58kherbst: uhh "gr: M2MF 80000002 [PUSH_NOT_ENOUGH_DATA]"
13:59imirkin: that means you've been very bad.
14:10kherbst: it seems to happen when enqueing a kernel
14:12kherbst: imirkin: any idea what it might complain about? const buffer not getting enough data or something?
14:12kherbst: but.. that would be weird
14:15kherbst: ohh, the push buffers are quite small
14:16kherbst: imirkin: do we have a pushbuffer decoder somewhere?
14:16kherbst: a string based one
14:17imirkin: all the m2mf/p2mf stuff has to happen in one go
14:17imirkin: you can't split the command/data across pushbufs
14:18imirkin: which is why we have these big PUSH_SPACE things at the top
14:18imirkin: NOT_ENOUGH_DATA means you told it you'd give it X values, but gave it Y
14:18imirkin: the push data's aren't wrong in themselves (you'd get a diff error otherwise)
14:18imirkin: they just don't match
14:23kherbst: "0x1000f010" is the last entry
14:23kherbst: but that looks fine as well
14:23kherbst: so the push buffer itself contains less than the hardware got told
14:27kherbst: ohh wait, that's just part of the last command anyway
14:30kherbst: imirkin: any idea on how to debug that besides checking that we will them up correctly?
14:35kherbst: it's that error
14:36kherbst: imirkin: https://github.com/karolherbst/mesa/commit/4fa20aa0ba5e3b650c85fc24f8880124aa6b9171
14:37kherbst: mhh.. still getting an error :/
14:38kherbst: ohh, that's just a runtime bug I think
14:41kherbst: wait.. that error disappeared
14:41kherbst: now I still have a different one: gr: GPC0/TPC0/TEX: 80000041
14:43kherbst: okay.. this is some CFG stuff I mess up
14:44kherbst: uff.. yes
14:44kherbst: the shader is totoaly bonkers
14:46kherbst: or maybe that's fine?
14:46kherbst: I am worried about the ssy stuff
14:49kherbst: "set u8 $p0 eq s8 $r0 c0[0x10]"
14:49kherbst: yeah, that's not gonna fly
14:50imirkin_: i recently added some support for PSETP
14:50imirkin_: (not pushed yet)
14:50imirkin_: although that's not quite what you want
14:51kherbst: ohh, the problem is rather the 8 bit operation?
14:51kherbst: it's real 8 bit data
14:51kherbst: not a 1 bit predicate
14:51imirkin_: right, so PSETP is not quite what you want :)
14:51kherbst: yeah.. but that's still not causing the issue
14:51imirkin_: ok. probably not helping though.
14:52imirkin_: what else is wrong?
14:52kherbst: now I have this: https://gist.githubusercontent.com/karolherbst/1df791c3fc512458982851b36a4098cf/raw/bfa41a8c99838f7bca90847b017e36ec2decd168/gistfile1.txt
14:52kherbst: "gr: GPC0/TPC0/TEX: 80000041" and "fifo: fault 00 [READ] at 00007f675a333000 engine 00 [GR] client 01 [GPC0/T1_0] reason 00 [PDE] on channel 2 [00ffbae000 test_svm]"
14:52kherbst: oom memory read maybe?
14:52imirkin_: does the LD become a LDG?
14:53imirkin_: T1_0 i think means "texture"
14:53imirkin_: or something that happens via the texture data path
14:53kherbst: we don't do ldg/stg yet, do we?
14:53imirkin_: which LDG may?
14:53imirkin_: we might
14:53imirkin_: i forget
14:53kherbst: we don't
14:54kherbst: but that's suppose to work anyway
14:54kherbst: would have noticed earlier
14:54imirkin_: ld e u8 $r0 g[$r0] 0x1
14:54imirkin_: 8-bit load
14:54imirkin_: that hasn't been exactly tried much in the past.
14:54imirkin_: also 99% sure that the i2i after is implied
14:54imirkin_: by the load
14:55kherbst: should be fine
14:55imirkin_: can you run this through nvdisasm and make sure it decodes to something similar?
14:55kherbst: LD.E.U8 R0, [R0] ;
14:55imirkin_: i mean the whole thing
14:56kherbst: mhhh, I replaced the load by a constant and now it's not traping anymore (but yeah, the nvdisasm output looked identical)
14:57imirkin_: silly question... what's the literal value of r0 and r1 at that point?
14:57kherbst: ld b64 $r0 c0[0x0]
14:57imirkin_: and what is that equal to
14:58kherbst: yeah.. let me dump the data
14:58imirkin_: i'm wondering if the address isn't weird somehow
14:58imirkin_: and/or in that goddamn idiotic memory window space
14:58kherbst: probably it is
14:58kherbst: ohh it's the SVM pointer :)
14:59imirkin_: but that still has to be a value that the GPU can understand and has mapped...
14:59kherbst: it's mapped
14:59kherbst: the entire address space is mapped
15:00imirkin_: ok, pretty sure that's not where the window is
15:00imirkin_: i think we stuck it at the top of 32-bit space
15:00kherbst: the entire application address space is mapped
15:00imirkin_: is the high word 0?
15:01kherbst: ohh, I messed up my printf.. :D
15:02imirkin_: ok, so that's within 48 bits, so that's good
15:03imirkin_: that looks like the kind of address it's faulting at
15:03imirkin_: so either it's not sufficiently mapped
15:03imirkin_: or more needs to be done elsewhere
15:03kherbst: the pointer is a result of malloc essentially
15:04kherbst: and that's supposed to work
15:04kherbst: unless there is something funky going on
15:05kherbst: ohhhhhh, the heck
15:05kherbst: I see it know
15:05kherbst: stupid pbuf vs pBuf
15:05kherbst: it's not a SVM pointer
15:06kherbst: the application creates a cl buffer object based on that, and the buffer gets mapped
15:06kherbst: but I am sure there is no real buffer as everything just messes up internally
15:06imirkin_: yeah, can't help you with that.
15:12kherbst: okay.. passing in the SVM pointer directly just works
15:12kherbst: okay.. so how does this buffer stuff works :)
15:12kherbst: I only know that mapping the buffer should return the SVM pointer, but yeah
15:12kherbst: this needs some additions
15:13kherbst: imirkin_: anyway, something like this is needed: https://github.com/karolherbst/mesa/commit/4fa20aa0ba5e3b650c85fc24f8880124aa6b9171
15:14kherbst: memcpy is kind of too much here...
15:14imirkin_: you can just run over
15:14imirkin_: and push the full extra word
15:15imirkin_: full extra bytes
15:15kherbst: this commit is from april 2018 :D
15:17kherbst: imirkin_: https://github.com/karolherbst/mesa/commit/232697a60720dcef7feffb829248a57bd34bbc83
15:18imirkin_: someone will bitch about accessing info->input past the end
15:18imirkin_: and invariably it'll trigger some dumb pagefault
15:18kherbst: it's a int32_t array, no?
15:18imirkin_: i'd say the ST should ensure a word-size allocation
15:18imirkin_: oh, it is? nice.
15:18kherbst: mhh, let me check
15:18imirkin_: i assumed void *, but what do i know
15:19kherbst: yeah.. it's void*
15:21kherbst: "std::vector<uint8_t> input" backs the pointer
15:23kherbst: at least shrink_to_fit is never called
15:32kherbst: imirkin_: maybe we should just have a PUSH_DATAbp variant or something which does byte based addressing...
15:32kherbst: sounds like the best fix here
15:34imirkin_: PUSH_DATAb :)
15:38kherbst: we have that for the pre nv30 driver :)
15:38kherbst: ohh no
15:38kherbst: that's for boolean
15:40kherbst: mhh... I think we can ignore that naming collision ...
15:40kherbst: any better idea for a name?
15:58kherbst: "FAILED 7 of 13 tests." progress..
17:41uis: What arch used in nve4 falcon?
17:44karolherbst: nope, custom ISA
17:45HdkR: Falcon ISA :)
17:46uis: Is it supported by r2?
17:49HdkR: https://github.com/radareorg/radare2 Grepping falcon there didn't bring anything up
17:50HdkR: https://github.com/aquynh/capstone/tree/master/arch and I don't see Falcon there
17:56uis: And should nve4 falcon code be signed?
18:44HdkR: karolherbst: Blah. Next time I'll make sure to order from amazon.de. Customs is terrible
18:46HdkR: Apparently the case hasn't even shipped yet
18:46karolherbst: I also got some email internally..
18:46HdkR: hah :D
18:47karolherbst: I'd prefer to get around customs though
18:47karolherbst: no idea what you could do on your end
18:47HdkR: Yea, amazon.de will avoid that since they ship there
18:47karolherbst: ohh, I meant for the current order
18:48HdkR: Nah, I've had this problem in the past, typically means the receiver has to handle it except in extreme cases
18:50HdkR: If there is anything I can help with let me know though
18:50karolherbst: I'll take a look tomorrow, otherwise I pay the customs myself
18:51karolherbst: shouldn't be more than 25% of the price or so
18:51HdkR: Blah. This is why amazon charged me more. They should have handled that :/
18:51HdkR: Need to slot a gift card in to the next one or something :P