00:14 karolherbst: mhh
00:14 karolherbst: if I increase the WrDepBar value by 1, it seems to be fixed
00:20 imirkin_: problem solved!
00:20 karolherbst: uhhhhh
00:20 imirkin_: ;)
00:20 karolherbst: findFirstUse doesn't check for the Flags
00:20 karolherbst: only GPR and predicate
00:20 imirkin_: heh
00:20 karolherbst: and my guess is, it should check for flags as well
00:21 karolherbst: and it only checks def(0) as well
00:21 karolherbst: which also seems kind of wrong
00:21 karolherbst: what if there are two defs
00:23 imirkin_: then the second def is a flag ;)
00:23 karolherbst: always?
00:23 imirkin_: probably not
00:23 imirkin_: but usually
00:23 karolherbst: ;)
00:23 karolherbst: well
00:24 karolherbst: for that code, even one exception might cause troubles nobody can debug, because of stuff like this
00:24 imirkin_: yeah, i wouldn't be money on that statement :)
00:24 imirkin_: but it's probably true.
00:27 karolherbst: mhh
00:38 karolherbst: imirkin_: how can I check if a Value of FILE_FLAGS is writing the CC bit?
00:38 karolherbst: or well, or how can I check if two Values of FILE_FLAGS are considered equal?
00:39 imirkin_: easy. if flagsDef >= 0, then it's writing CC
00:39 imirkin_: if flagsSrc >= 0, it's reading CC
00:39 karolherbst: :D okay
00:39 imirkin_: for nv50, there are actually multiple flags registers
00:39 imirkin_: but for nvc0+, it's just the one.
00:39 karolherbst: fixed
00:40 karolherbst: yay
00:40 karolherbst: and I don't even see that as a hack
00:40 karolherbst: wow
00:40 karolherbst: maybe not correct, but not a hack either
00:40 karolherbst: imirkin_: https://github.com/karolherbst/mesa/commit/61d25776c7dae762f1290a6ef871ccd1a64e3335
00:41 karolherbst: hakzsam: does that sound like something we have to do for you? ^
00:41 karolherbst: uhm, phrasing...
00:41 karolherbst: *to you
00:42 imirkin_: sounds reasonable to me.
00:42 karolherbst: well right, but it isn't correct
00:42 karolherbst: well I should be more precise in the FILE_FLAGS case
00:42 karolherbst: but
00:42 karolherbst: even if they write to different bits...
00:43 karolherbst: I mean you don't want to have reads from there if something writes into it, right?
00:43 imirkin_: there's only one flag.
00:43 karolherbst: ohh, okay
00:43 imirkin_: on nv50 it's a more complex register
00:43 karolherbst: lol
00:43 karolherbst: "-Sigyn- Your actions in #nouveau tripped automated anti-spam measures, but were ignored based on your time in channel; stop now, or automated action will still be taken. If you have any questions, please don't hesitate to contact a member of staff"
00:43 imirkin_: but on nvc0+, it's a single-bit thing
00:43 imirkin_: yeah, you talk too much.
00:43 karolherbst: ....
00:43 imirkin_: :p
00:43 karolherbst: like I care
00:44 imirkin_: well, you'll care when you're banned from freenode
00:44 pmoreau: Wait, how did you manage to trigger that? karolherbst :-D
00:44 imirkin_: probably all the URLs he pastes
00:44 imirkin_: about how to make tons of money while working from home
00:44 karolherbst: :D
00:44 pmoreau: @-D
00:44 pmoreau: :-D
00:45 karolherbst: like that message already assumes it is right... "stop now", yeah, like I would...
00:46 karolherbst: anyway, I should sleep, I worked too much today
04:20 TheXzoron: I just tried to swap from my gtx 970 on the proprietary drivers to a gtx 780ti with nouveau but I can't get direct rendering
04:20 TheXzoron: I'm guessing a package wasn't cleanly uninstalled or uninstalled
04:20 TheXzoron: I have mesa and nouveau
04:21 TheXzoron: but glxinfo spits Xlib: extension "GLX" missing on display ":0" and Error: couldn't find RGB GLX visual or fbconfig
04:22 imirkin: yeah, that most likely means you're loading nvidia's libglx
04:22 imirkin: pastebin xorg log?
04:22 TheXzoron: https://zoronunderground.club/f/Xorg.0.log
04:22 TheXzoron: yeah it looks like it
04:22 imirkin: [ 624.756] (II) Loading /usr/lib/xorg/modules/extensions/libglx.so
04:23 imirkin: [ 624.758] (II) Module glx: vendor="NVIDIA Corporation"
04:24 TheXzoron: should I just delete this file
04:24 TheXzoron: oh wait nvidia-libs is still there
04:24 TheXzoron: guess I missed that package
04:26 imirkin: well, you still want a libglx.so, just one supplied by xorg
04:27 imirkin: depending on your distro, that's a symlink somewhere
04:27 TheXzoron: yeah I'm installing libGL libEGL and libGLES right now after force removing and seeing what they needed
04:28 imirkin: libglx is part of xorg
04:28 imirkin: it's installed as part of xorg-server on my distro
04:30 TheXzoron: it was for me aswell but when I originally intended on using the non-free driver I overwrote with nividia's implementations
04:30 TheXzoron: well replaced the package
04:33 TheXzoron: Now for a non-improper nvidia driver uninstallation related problem. The monitor I have attached to my displayport registers but does not show any output on the device itself
04:34 TheXzoron: I have had problems with it consantly but typically after selecting the input it would resolve itself
04:34 TheXzoron: just a minor annoyance on first startup
04:34 TheXzoron: that doesn't work now
04:35 TheXzoron: not sure if the displayport input on the monitor is bad or just bad cable
04:36 TheXzoron: either way it's odd that with nouveau it doesn't even fix itself
04:36 TheXzoron: like with the proprietary driver it would flash then go back to saying "no input"
04:37 TheXzoron: then after selecting it again it would turn on
04:38 TheXzoron: *selecting on the monitor
04:39 TheXzoron: any idea or is this not worth looking into?
04:40 imirkin: you could manually turn it off and on
04:40 imirkin: do you see link training failures in the logs?
04:40 imirkin: DP is well-known for being finicky
04:40 imirkin: (and not just with nouveau)
04:41 TheXzoron: logs in dmesg or xorg or both?
04:41 imirkin: dmesg
04:43 TheXzoron: https://zoronunderground.club/f/dmesg.txt
04:43 TheXzoron: not that I can tell
04:44 imirkin: yea looks fine
04:46 TheXzoron: I guess I'll use the dvi port for now as I have a displayport to dvi adapter
04:47 TheXzoron: when I used to dual boot windows with my gtx 970 it got to the point this monitor just would never work with displayport at all
04:47 TheXzoron: but on linux with the blob drivers it would be fine after selecting an input
04:49 TheXzoron: Wish I could have some understanding of why this was
05:40 TheXzoron: https://zoronunderground.club/f/xorg.conf
05:40 TheXzoron: is there something else that would be good to add into my xorg.conf
05:40 TheXzoron: this is all that seemed useful to me in the man page
05:41 TheXzoron: DRI is redundant now that I have glamor
05:43 TheXzoron: which might I add should be added in the man page as it's open mentioned in the DRI section
05:46 airlied: if you want to use glamor I'd just use the modeseting driver and then you don't need an xorg.conf
05:50 TheXzoron: how would I do that?
05:52 airlied: TheXzoron: deinstall nouveau or set driver "modesetting" in the xorg.conf
05:55 TheXzoron: now I don't know much about this kind of stuff
05:56 TheXzoron: but would modesetting mean worse performance since it's no accellerated
05:58 airlied: TheXzoron: it uses glamor for accel
06:03 TheXzoron: which is accelerated by the gpu?
06:04 TheXzoron: I dunno I guess I need better knowledge of how all these things work together
06:07 TheXzoron: What would be something to read to really get a grip on this sort of thing
06:07 airlied: TheXzoron: well you were selecting glamor for the nouveau driver
06:07 airlied: modesetting uses the same accel
06:08 TheXzoron: I dunno I googled it and it seemed like that method was to replace the other method
06:08 airlied: it's a generic acceleration that uses the 3D driver
06:12 TheXzoron: but what I'm trying to understand is why modesetting which the man page says is not accelerated would be preferable to nouveau for a driver
06:13 airlied: TheXzoron: the manpage is wrong
06:18 TheXzoron: oh
06:18 TheXzoron: so what would be the difference then between the two driver definitions
06:18 airlied: nouveau's glamor support probably isn't as good as modesetting's
06:19 airlied: so if you want to use glamor, it's usually a good idea to use modesetting
06:20 skeggsb: nouveau's ddx doesn't even support glamor anymore...
06:21 skeggsb: exactly because of that reason ;)
06:23 TheXzoron: ah so nouveau never completed the glamor implementation because modesetting works just as well as it would? if I get this right
06:25 airlied: pretty much
06:31 TheXzoron: I see
06:31 TheXzoron: I found the article and bugzilla with the discussion now
06:38 TheXzoron: so I take it I do not even the the xf86-video-nouveau now as that's for much older hardware than my 780ti
06:38 TheXzoron: so I could just uninstall the package so it doesn't load
06:39 airlied: should work
06:43 TheXzoron: now is there a directive I could pass into xorg.conf to remove tearing or is a compositor better https://zoronunderground.club/f/xorg.conf
08:58 chillfan: Any recent improvements for kepler? Just checking really, I find it to be working very well with the later versions of mesa
09:23 pmoreau: karolherbst: Testing in which order the BBs were traversed and fix the forward to cross was what I needed to check as well yesterday. :-/
09:24 karolherbst: pmoreau: right, but that involves a feature which isn't currently inside mesa anyway ;) So no pressure on those ones
09:24 pmoreau: karolherbst: Which feature is that?
09:25 karolherbst: spirv to nvir?
09:25 pmoreau: Yeah OK, but I have access to that feature, surprisingly :-p
09:26 karolherbst: well, here is how I see this issue: either we have a bug inside codegen if we assume correct DFS semantics or your code didn't corespond to codegens form of DFS
09:26 pmoreau: And I would love to be able to get loops/control flow working properly as those are used in a couple of CTS tests.
09:27 karolherbst: right
09:28 karolherbst: but still, there is no pressure on this. In the worst case I just look into it after I took care of all the other issues :p
09:28 pmoreau: :-D
09:29 hakzsam: karolherbst: could be, yeah
09:30 karolherbst: hakzsam: well, I am sure we a least have to iterate over all defs here, not just 0 in any case
09:30 karolherbst: *at
09:33 hakzsam: you should check what blob is doing for the carry flag also
09:34 hakzsam: if I remember correctly, I have never seen any barriers for that, but I could be wrong
09:35 karolherbst: well, it seems to help at least
09:36 hakzsam: maybe you are just lucky because you are inserting extra barrierS?
09:36 karolherbst: all the extra barriers I insert are for the writes to CC
09:36 karolherbst: I checked with just the loop
09:36 karolherbst: and this didn't fix it
09:37 karolherbst: and yeah
09:37 karolherbst: inserting extra barriers is kind of the motivation of that patch in the first place, right?
09:38 hakzsam: yeah but maybe it's just useless and the issue is somewhere else
09:38 hakzsam: would be very interesting to check blob first
09:38 karolherbst: hakzsam: well this is the issue mainly: if there are isntructions a, b and c, and a writes to a src of c, b writes to CC and c uses CC
09:38 karolherbst: in the current code, there is no dependency considered between b and c
09:38 karolherbst: this sounds horriby wrong
09:38 karolherbst: c has to wait until b writes into CC
09:39 karolherbst: I don't see how it could work any other way
09:39 karolherbst: I could check what the blob does, though
09:39 hakzsam: so, just for the latency to be 6 when b writes to CC
09:39 hakzsam: *force
09:40 karolherbst: that sounds like a hack
09:40 hakzsam: one thing you might want to do is to force all instructions to wait 15 cycles
09:40 karolherbst: well, it didn't help
09:40 karolherbst: it only help when I changed the write barrier to max for all
09:40 karolherbst: that's how I even came to this patch in the first place
09:41 hakzsam: I see but does the patch fix the issue?
09:41 karolherbst: yes
09:43 hakzsam: I'm still a bit suprised that we need a barrier for the CC flag to be honest
09:43 karolherbst: why?
09:43 karolherbst: I mean you have to wait until the instruction writes to it right?
09:43 hakzsam: because barriers are for instructions that don't have a fixed latency
09:43 karolherbst: you could have a situation where an instructions srcs are written to 5 or 6 instructions ago, and CC in insn->prev
09:43 hakzsam: all ALU ops have a fixed latency, ie 6
09:44 hakzsam: can't you double check with blob?
09:44 hakzsam: I know that is not as simple because blob is re-ordering instructions
09:45 karolherbst: well, I will check first, what changes in the mesa produce code, maybe it is just changing the wait indeed
09:45 karolherbst: ohh, I have some ptx code which may apply here
09:45 karolherbst: or produces usable sass
09:46 karolherbst: hakzsam: in mesa
09:46 karolherbst: old: 000000c0: 60460226 015f8401 sched (st 0x6 wr 0x1 rd 0x2 ru 0x3) (st 0x2 wr 0x0 rd 0x3 wt 0x1) (st 0x1 wt 0xa)
09:46 karolherbst: new: 000000c0: 20460226 005f8403 sched (st 0x6 wr 0x1 rd 0x2 ru 0x3) (st 0x2 wr 0x0 rd 0x1 wt 0x3) (st 0x1 wt 0x2)
09:47 karolherbst: instructions: https://gist.githubusercontent.com/karolherbst/3293353d7df91d8136fe3188a8824b2b/raw/8036efa0146266e1b8d61ef6cba83bcd97345f4c/gistfile1.txt
09:50 hakzsam: karolherbst: this is because imad requires barriers
09:50 hakzsam: unrelated to cc (I think)
09:50 karolherbst: mhh
09:50 hakzsam: or my first analysis was wrong and this is why imad requires barriers
09:50 karolherbst: well the second imad depends on the first one, right?
09:50 hakzsam: because all situations need the cc flag
09:51 hakzsam: sure
09:51 karolherbst: and the current code doesn't keep track of this dependency, afaik
09:52 karolherbst: and this method is only checked if need_wr_bar is true in the first place
09:53 hakzsam: this is a good point for you :)
09:53 karolherbst: I have some cubin file with "IADD R8.CC, R8, R9; IADD.X R3, R3, RZ"
09:53 hakzsam: but again, only blob can tell us what we should do
09:53 karolherbst: but I don't know how to extract the sass and put it through envydis
09:53 karolherbst: or print the sched opcodes in nvdisasm
09:53 hakzsam: I don't remember
09:53 hakzsam: sorry, I have other things to do right now, but I could have a second look later on
09:55 karolherbst: meh...
09:56 karolherbst: on opt level 0 all the scheds have this: sched (st 0xf wt 0x3) (st 0xf wt 0x3) (st 0xf wt 0x3)
09:56 karolherbst: or most have
09:57 karolherbst: and on 1 it uses xmad and no .CC stuff anymore
10:05 karolherbst: it seems like for iadd they only use st, but I just don't get any imads, because they are always opting it into xmad
10:06 karolherbst: ohh, now I did
10:06 karolherbst: hakzsam: if this helps, it is on opt level 0 though: https://gist.githubusercontent.com/karolherbst/8c722bbd071332412ab6949b94c2a296/raw/34b7e5efd089a93f3a759e4cc81749ea3cd12618/gistfile1.txt
10:07 karolherbst: seems like hardcoded default scheds values to me though
10:08 karolherbst: but with that patch we are more close to those values in the first place
10:21 karolherbst: mhh, dual issueing on maxwell, still a todo
10:43 karolherbst: pmoreau: are you aware of any fails int he basic test? it takes a while to run that
10:50 pmoreau: karolherbst: Yes I am. (I feel like I am repeating myself, but) there are multiple tests within basic that use loops, and given the existing bug, that makes the GPU hang for a bit each time before it recovers.
10:51 pmoreau: Hence why I want that bug to be fixed relatively soon ;-)
10:51 karolherbst: ahh right
10:51 pmoreau: karolherbst: Also, I track the ones that succeed here: https://trello.com/c/mHuV36bJ/20-get-testbasic-from-the-opencl-cts-to-pass
10:52 karolherbst: ohh okay
10:52 karolherbst: yeah it just takes ages to run those...
10:52 pmoreau: loop, local_arg_def, local_kernel_def and local_kernel_scope fail because of the loop bug. The constants one, I haven’t tried, and the images one won’t work since images aren’t supported.
10:52 karolherbst: but now I compile stuff optimized and hope that at least the compilation is much faster
10:52 pmoreau: There are some more that aren’t in this list, which I haven’t even tried.
10:56 karolherbst: ohhh yeah
10:56 karolherbst: now it is fast
10:56 karolherbst: like 10x
11:32 pmoreau: karolherbst: Did you comment those tests out?
12:15 karolherbst: pmoreau: I changed the cflags to -O3
12:15 pmoreau: OK
12:16 pmoreau: It was in debug mode before I guess?
12:16 karolherbst: yeah
12:16 karolherbst: currently running every test now
12:23 karolherbst: pmoreau: 35 tests pass in total
12:23 pmoreau: Of basic or the whole CTS?
12:24 karolherbst: basic_test
12:24 pmoreau: Could you please give a link to the run output please?
12:24 karolherbst: https://gist.githubusercontent.com/karolherbst/25cd2fd77ce8989dc73ab09308d40dfb/raw/87d51ac4f415b9cc5c8889718472f47fef483cfe/tmp.patch
12:24 karolherbst: those tests I have disabled
12:24 karolherbst: well
12:24 karolherbst: those are the tests which fail
12:24 pmoreau: 35, nice! So there are 21 more that passes than what I did test.
12:26 karolherbst: well some of the fails are due to issues within the spirv to nvir pass, some are RA caused
12:26 karolherbst: and I think this covers around 75% of those fails
12:27 karolherbst: stuff like "Unsupported builtin 33"
12:27 pmoreau: get_global_offset is something new in CL1.2 IIRC, so it needs to be added to clover first, but should be easy to do so I think.
12:27 pmoreau: What was wrong with the kernel_memory_alignment_* ones?
12:28 karolherbst: Unsupported opcode 117
12:28 karolherbst: for local
12:28 pmoreau: None of the image ones are going to work for now, which represents quite a few of the tests you commented out.
12:29 pmoreau: OpConvertPtrToU
12:29 karolherbst: same for the other two
12:29 karolherbst: mhh
12:29 karolherbst: OpConvertPtrToU sounds trivial to implement, right?
12:29 pmoreau: Shouldn’t be too hard to add
12:29 pmoreau: Yeah
12:29 karolherbst: well, I will try to find codegen issues first
12:30 pmoreau: And I expect it also uses OpConvertUToPtr, which also needs to be implemented.
12:30 karolherbst: mhh
12:30 karolherbst: global_work_offsets doesn't print any compiler errors
12:30 karolherbst: ERROR: Map value (0,0,1) was erroneously returned ...
12:30 pmoreau: I am not sure what that one is doing.
12:31 karolherbst: calculations as it seems
12:31 karolherbst: reads from c7
12:32 karolherbst: pmoreau: https://github.com/awatry/OpenCL-CTS/blob/cl12_trunk/test_conformance/basic/test_global_work_offsets.cpp#L20
12:33 karolherbst: uhhhhh
12:33 pmoreau: global_work_offset is the one I was thinking of, but it looks like it is an OpenCL 1.1 feature, not 1.2.
12:33 karolherbst: well, it seems like some weird things are happening
12:34 pmoreau: So, I might be the one that failed to implement it.
12:34 pmoreau: https://phabricator.pmoreau.org/diffusion/MESA/browse/nouveau_spirv_support/src/gallium/drivers/nouveau/codegen/nv_ir_from_spirv.cpp;a38ca48958316c23ccd93a34f1eeb91c0e876c58$3304
12:35 karolherbst: uhm... yeah
12:35 pmoreau: But I don’t know if we are configuring GID_OFF before launching the kernel, or just passing (0,0,0) to it.
12:37 karolherbst: [ERROR] At word No.22: "ID 12 has not been defined"
12:37 karolherbst: there are also a few errors like this in some tests
12:37 pmoreau: Ok, and the get_global_offset test just retrieves those offset. So, those are two tests that can be easily fixed.
12:38 karolherbst: Unsupported opcode 42
12:38 karolherbst: in kernel_preprocessor_macros
12:39 karolherbst: Unsupported opcode 157 in kernel_limit_constants
12:39 pmoreau: That sounds like the validator complaining (The ”[ERROR]” message). Maybe it didn’t recognise one of the instructions, and therefore couldn’t track its output which was being used somewhere else?
12:40 pmoreau: Opcode 42 == OpConstantFalse, should be easy to add it as well as OpConstantTrue.
12:40 karolherbst: ahh nice
12:40 karolherbst: where are the opcode thingies?
12:41 karolherbst: parameter_types: https://gist.githubusercontent.com/karolherbst/6dba590fd348932463a94870ab66c9ea/raw/cada31523fba5075b587228dcbc24749a4f0eb43/gistfile1.txt
12:41 karolherbst: that is the one with ERROR
12:41 pmoreau: Opcode 157 == OpIsInf. Maybe easy to add? Dunno if there is a hardware instruction for that, or just do some manual testing.
12:41 pmoreau: https://www.khronos.org/registry/spir-v/specs/1.0/SPIRV.htm has all the opcodes
12:41 karolherbst: I meant in your code
12:42 karolherbst: well, I will find it
12:42 pmoreau: You mean that loop? https://phabricator.pmoreau.org/diffusion/MESA/browse/nouveau_spirv_support/src/gallium/drivers/nouveau/codegen/nv_ir_from_spirv.cpp;a38ca48958316c23ccd93a34f1eeb91c0e876c58$1680
12:42 pmoreau: s/loop/switch
12:42 karolherbst: right
12:42 pmoreau: Could you dump the SPIR-V for parameter_types please?
12:43 karolherbst: how?
12:43 pmoreau: CLOVER_DEBUG=spirv CLOVER_DEBUG_FILE=somefile
12:44 pmoreau: That will give you a somefile.spvasm (not 100% sure about the extension
12:44 karolherbst: pmoreau: https://gist.githubusercontent.com/karolherbst/a16c5355bec43884c0326f90a63a514c/raw/f10c6d2d82d54f72b04909acc3934621dc139f43/gistfile1.txt
12:46 karolherbst: pmoreau: that spirv spec site returns 404 for me
12:46 pmoreau: Add an “l” at the hand, I failed when copy/pasting
12:47 pmoreau: *end
12:47 pmoreau: Gosh, I can’t type
12:52 karolherbst: ahh, okay, nice
12:53 karolherbst: "A function (10) may not be targeted by both an OpEntryPoint instruction and an OpFunctionCall instruction."
12:53 karolherbst: well, apperantly it can
12:58 imirkin: TheXzoron: yeah, if you want glamor, use modesetting. it'll use the GL driver for accel. i'd recommend sticking to the nouveau DDX though because it's a much simpler and more stable piece of code.
12:58 karolherbst: pmoreau: one test also needs the builtin WorkDim
13:05 TheXzoron: imirkin: so using the nouveau driver is better performant if i don't use glamor?
13:05 TheXzoron: or just more stable as my computer just had a kernel panic after getting up this morning
13:06 imirkin: TheXzoron: no idea about performance -- imho that doesn't really matter since you have an overpowered GPU for what X needs to handle
13:06 imirkin: TheXzoron: should be more stable though
13:06 imirkin: skeggsb: it'd be nice if you could test my DP-MST support patches for the nouveau ddx btw
13:06 TheXzoron: Well I Was running f-zero GX in dolphin-emu and it seemed to be alot more slow than the blob was
13:07 TheXzoron: like it couldn't maintain 60 with no enhancements at the opening
13:07 imirkin: TheXzoron: that has nothing to do with what DDX you're running
13:08 imirkin: TheXzoron: are you reclocking your GPU? if not, it boots to lowest perf level
13:08 TheXzoron: I saw a boost by clocking it to 0F but it was still choppy
13:09 TheXzoron: like more often on the opening it was below 60
13:09 imirkin: ok
13:09 imirkin: well - the expectation is that nouveau performs somewhere between 50-80% of the blob, clock-for-clock
13:09 imirkin: depending on the game
13:10 imirkin: skeggsb: in case you lost the link, https://people.freedesktop.org/~imirkin/patches/0001-drmmode-update-logic-for-dynamic-connectors-paths-an.patch
13:13 TheXzoron: I expected it to be not as performant but able to play wii/gc w/o major enhancements but I guess not. so I guess this is another thing that will go to the passthrough gnulinux VM
13:14 TheXzoron: might be more ideal to get a lower power gpu for monitors as this gpu has a large power draw
13:15 karolherbst: pmoreau: mhh, that opencl to spir-v compiler thing also has a few bugs here and there
13:15 karolherbst: input.cl:6:16: error: call to 'min' is ambiguous
13:16 karolherbst: min(x[tid], y[tid]);
13:16 karolherbst: __kernel void test_fn(__global double *x, __global double *y, __global double *dst)
13:16 karolherbst: it calls min(double, double), no idea why the compiler doesn't use this
13:16 karolherbst: ohh, maybe there is no double
13:17 karolherbst: due to cl_khr_fp64
13:24 karolherbst: imirkin: ohh by the way, the CTS kind of requires the driver to handle allocations up to 100GB or more
13:25 imirkin: lol
13:26 karolherbst: wait, did I say 100GB?
13:26 karolherbst: that wasn't correct, this is: Attempting to allocate a 983040MB array and fill with blocking writes.
13:27 karolherbst: well it also tries 1.04858e+06MB, but gives up about that
13:27 karolherbst: bo_new(0, 0): -22. smart bo_new
13:28 karolherbst: I think this depends on CL_DEVICE_MAX_MEM_ALLOC_SIZE and CL_DEVICE_GLOBAL_MEM_SIZE
13:28 karolherbst: and we might not want to expose too high numbers there
13:28 karolherbst: dunno
13:29 imirkin: yeah
13:29 imirkin: perhaps exposing the 1TB there isn't such a great idea.
13:30 karolherbst: well, if we just wait long enough, even 1TB will be fine at some point
13:31 pmoreau: I want a GPU with 1TB of VRAM now!!
13:32 pmoreau: karolherbst: Hum, weird for the function thing. I am not sure how they handle device-side kernel launch
13:32 karolherbst: mhh
13:32 karolherbst: 8: cvt u64 %r7d u32 %r6
13:32 karolherbst: this sounds okayish, or isn't it?
13:32 imirkin: i forget where we ended up with that
13:32 pmoreau: On the other hand, that might be an OpenCL 2.0 or 2.1 feature, so those might be SPIR-V 1.1 or 1.2 only features.
13:33 karolherbst: pmoreau: currently trying to implement those convert opcoddes
13:33 imirkin: but pmoreau and i talked a lot about whether that was legal or not
13:33 karolherbst: *opcodes
13:33 pmoreau: karolherbst: it sounds okayish, but it is not.
13:33 karolherbst: meh
13:33 pmoreau: Last I tried, my GK107 was not happy with them
13:33 karolherbst: well, nv50_ir::GCRA::makeCompound complains
13:33 imirkin: well, the literal cvt's definitely don't work
13:34 imirkin: but should it be ok in the IR and then have something lower them away? dunno
13:34 pmoreau: And looking at the blob, it does not use cvt at all and does the conversion manually.
13:34 karolherbst: mhhh
13:34 karolherbst: we do it as well
13:34 pmoreau: (for those conversions, it might use them somewhere else)
13:34 karolherbst: that cvt gets converted to...
13:34 pmoreau: I have a patch on my branch which replaces the cvt
13:34 karolherbst: mov+merge
13:35 pmoreau: But I think my patch is partly buggy.
13:35 karolherbst: mhh
13:35 karolherbst: weird
13:35 karolherbst: the shader looks fine
13:35 karolherbst: https://gist.githubusercontent.com/karolherbst/f93bfde1aacf1de3f6117964fcc97e48/raw/b31d8be3afef4dd2179a432d6d5bb45e51873cd3/gistfile1.txt
13:36 karolherbst: void nv50_ir::GCRA::makeCompound(nv50_ir::Instruction*, bool): Assertion `base == size' failed.
13:37 karolherbst: uhm
13:37 karolherbst: merge u64 %r74d %r72d %r47
13:37 karolherbst: this is wrong, right?
13:38 karolherbst: heh, cvt u64 %r26d %r25d
13:41 pmoreau: There is no cvt in the shader you linked
13:42 karolherbst: it got lowered away
13:42 karolherbst: that is basically prior RA
13:42 pmoreau: But the shl line 14 and 15 look weird
13:42 karolherbst: how so?
13:43 karolherbst: well, anyway, that merge and that cvt I quoted are the problem
13:43 karolherbst: a cvt without a sType?
13:43 karolherbst: shouldn't happen, right?
13:43 pmoreau: Yeah
13:44 pmoreau: What kind of shl is it with a dType of u32 and an stype of u64? And only the imm is 64-bit, the registers are 32-bit ones
13:46 pmoreau: And I don’t think the first shl(add?) supports having an immediate as first source + it could be automatically computed
13:46 karolherbst: mhh
13:46 karolherbst: that cvt should have gotten a u64
13:46 karolherbst: ohh
13:46 karolherbst: u64 to u64
13:46 karolherbst: we might not handle that quite well
13:47 karolherbst: pmoreau: ahh, your code :p
13:47 pmoreau: That could be lowered to a MOV. How does one end up with a cvt u64 to u64? oO
13:47 pmoreau: What, my code? :-D
13:48 karolherbst: well
13:48 karolherbst: no clue?
13:48 karolherbst: I just added those convert ops
13:48 karolherbst: and then it happen
13:49 karolherbst: okay, yeah
13:49 karolherbst: changing the op to MOV seems to work
13:49 karolherbst: will post a patch soon
13:50 pmoreau: I’m not sure I am following what you did. “I just added those convert ops” which ops, and where?
13:50 karolherbst: ptrToU and UToPtr
13:51 karolherbst: pmoreau: v
13:51 karolherbst: https://github.com/karolherbst/mesa/commit/6e48cff2e4581200ef862e816c53608925c260b0
13:54 pmoreau: Sounds about right. If the result type is not the same size as the input, it should properly zero extend or truncate.
13:54 karolherbst: whatever the spec is saying
13:55 pmoreau: That what the spec is saying; I was just thinking out loud, that the code most likely follows the spec in those situations already and you don’t need to make further changes.
13:55 pmoreau: *That’s
13:55 karolherbst: ahh
14:04 karolherbst: mhh, that one fix regresses a test
14:15 karolherbst: ....
14:16 karolherbst: pmoreau: seems legit, right? https://gist.githubusercontent.com/karolherbst/c9cd9d451022e43499caeca5cff7ef2d/raw/ef08333106ad7fbd03201fb05ebe08767f7e417e/gistfile1.txt
14:16 karolherbst: last line
14:17 pmoreau: Woot oo
14:17 karolherbst: anyway, send out a patch for that OP_MOV thing
14:18 pmoreau: I thought you added OpConvertPtrToU, why is it still complaining?
14:18 karolherbst: this tests regressed
14:18 karolherbst: seems like there is some random issue going on
14:18 karolherbst: but it behaved kind of weird
14:18 karolherbst: the more often I run that test, the more it passes
14:18 karolherbst: well with the support for OpConvertPtrToU
14:19 karolherbst: some Vector size 1 failed: 0x3 is not properly aligned. issues
14:19 karolherbst: don't know
14:19 karolherbst: I assume broken state
14:19 karolherbst: or we write into wrong locations in memry
14:22 pmoreau: karolherbst: You do realise the patch you just does not apply on master as my initial patch was never merged? Master is still using those 64-bit cvts even if the hardware does not support them.
14:22 pmoreau: *you just sent
14:22 karolherbst: ohhh crap
14:22 pmoreau: ;-p
14:23 karolherbst: yeah then the patch doesn't matter
14:25 karolherbst: mov u64 %r30d (0)
14:25 karolherbst: this looks wrong
14:25 pmoreau: Yes
14:26 karolherbst: initially it was 6: cvt u64 %r4d (0)
14:27 pmoreau: Where did the src (or res) disappear?
14:28 karolherbst: fromt he beginning
14:29 pmoreau: It never had one? Oo
14:29 karolherbst: yeah
14:29 karolherbst: maybe in the spir-v it had one
14:29 karolherbst: yep
14:29 karolherbst: spirv
14:30 karolherbst: mkCvt(OP_CVT, dstTy, res, srcTy, src.value[i].value)->saturate = saturate;
14:30 karolherbst: Value *res = getScratch(elemByteSize); this seems to return NULL
14:31 karolherbst: %36 = OpConvertPtrToU %ulong %test_mem0
14:31 karolherbst: ?
14:31 karolherbst: %test_mem0 = OpVariable %_ptr_Workgroup__arr_uchar_ulong_3 Workgroup
14:31 karolherbst: uhm wait
14:31 karolherbst: not res is NULL
14:32 karolherbst: src.value[i].value should be
14:33 pmoreau: It could be, since it’s a pointer
14:33 karolherbst: ahh
14:33 karolherbst: how to I get the pointer then?
14:34 pmoreau: .pointer instead, IIRC
14:35 pmoreau: Nope, it’s .symbol
14:35 karolherbst: okay, trying that
14:35 pmoreau: But that’s going to give you a Symbol*, not a Value*
14:36 karolherbst: mhh
14:37 pmoreau: Give me 40 min, I’ll just get home since I’m not making any progress here.
14:38 pmoreau: I’m surprised %test_mem0 does not go through an OpLoad or OpCopyMemory instruction before being used.
14:39 karolherbst: well, seems like using Symbol is kind of fine
14:39 karolherbst: at least it compiles
14:39 karolherbst: well
14:39 karolherbst: still NULL
14:39 pmoreau: *kind of* until it horribly breaks.
14:39 pmoreau: That seems weird
14:40 karolherbst: ohh wait
14:40 karolherbst: mhh
14:40 karolherbst: I am sure there is a proper way of ding that
14:41 karolherbst: cvt u64 %r4d s[0x0]
14:41 karolherbst: heh
14:41 karolherbst: sure that .symbol is correct?
14:41 pmoreau: Yeah, that seems better. But I would still assume there should be a load before it being use
14:41 karolherbst: yeah
14:42 karolherbst: the emiter doesn't like this
14:42 karolherbst: mov u32 $r4 s[0x0]
14:42 pmoreau: Maybe that’s the job of that OpConvertPtrToU :-D
14:42 karolherbst: weird thing is, that it is always the same address
14:42 pmoreau: gtg, bbiab
14:43 karolherbst: k
14:43 imirkin_: karolherbst: you have to use a OP_LOAD
14:43 karolherbst: imirkin_: ahh
14:43 karolherbst: cool, thanks
14:43 imirkin_: you can't just move this stuff around
14:44 imirkin_: and you can't put s[0] as an arg to anything other than an OP_LOAD
14:44 imirkin_: at least not on input into the IR
14:44 imirkin_: same goes for consts, global memory, etc
14:44 karolherbst: okay
14:45 karolherbst: ohh and a ptr value
14:45 karolherbst: mhh, I see
14:47 karolherbst: using .symbol and .indirect sounds like a good choice here...
14:47 karolherbst: allthough indirect should be NULL as well, weird
14:53 karolherbst: uhm... mhh
14:54 karolherbst: imirkin_: how do we get the address of that s[0x0] thing?
14:55 karolherbst: aka the address of the symbol
14:55 karolherbst: not what is stored inside it
15:20 pmoreau: karolherbst: Try something like: mkMov(res, mkImm(.symbol->reg.data.offset)); if (.indirect != nullptr) mkOp2(OP_ADD, res, res, .indirect)
15:22 karolherbst: sounds reasonable, let me check
15:23 pmoreau: And then for the opposite convert, it would be something like: if (src.asImm()) { Symbol = createSymbol(src.asImm()->reg.data.u64); indirect = nullptr} else { .symbol = createSymbol(0); .indirect = src }
15:23 karolherbst: doesn't look correct though
15:23 karolherbst: this is basically what I get: mov u32 %r4d 0x00000000
15:24 pmoreau: It is address 0 in shared space
15:24 karolherbst: it is always 0
15:24 pmoreau: If you had s[0x4], it should be 4
15:24 karolherbst: here is the spirv: https://gist.githubusercontent.com/karolherbst/d62a79dac2a85955530c381324fe51b4/raw/ca663ae49ef37da3299a170ea9fecd54f49824b1/gistfile1.txt
15:25 karolherbst: well it complains about alignment in the end anyway
15:25 karolherbst: let me check what that test is actually testing
15:26 karolherbst: also mov u32 %r4d 0x00000000
15:26 karolherbst: but meh
15:28 karolherbst: there is a mkMov by the way
15:28 karolherbst: ohh wait...
15:28 karolherbst: third arg is the type
15:28 karolherbst: we should set it as well
15:32 karolherbst: mhh
15:36 pmoreau: I know there is a mkMov, I even had here: “16:20:40 pmoreau │ karolherbst: Try something like: mkMov(res, mkImm(.symbol->reg.data.offset)); if (.indirect != nullptr) mkOp2(OP_ADD, res, res, .indirect)“ :-p
15:36 karolherbst: well kernel_memory_alignment_global passes though
15:37 pmoreau: So, progress?
15:37 karolherbst: seems like it
15:37 karolherbst: https://github.com/karolherbst/mesa/commit/e69e75504fca1a547f7fd564454b7f630c146d05
15:37 pmoreau: :-)
15:38 karolherbst: ernel_memory_alignment_private fails if I run it with all those kernel tests
15:38 karolherbst: not if I run it alone
15:38 karolherbst: but only the ulong subtest fails there
15:38 karolherbst: super weird
15:38 karolherbst: in the constant test: At word No.25: "ID 11 has not been defined"
15:39 pmoreau: Hum, it only works by luck I would guess
15:39 pmoreau: OK, I’ll have a look at that one
15:39 karolherbst: %11 = OpConstantNull %_arr_v2uchar_ulong_3
15:39 pmoreau: BTW, how should we proceed regarding commits: do we cherry-pick commits from each others’ tree?
15:40 karolherbst: I rebase on top of yours
15:40 pmoreau: I tend to rebase quite often to not lag too much behind master.
15:40 pmoreau: OK
15:40 pmoreau: I’ll cherry-pick your new commits then, to have them as well
15:40 karolherbst: k
15:41 karolherbst: mhh
15:41 karolherbst: that OpConstantNull thing should be easy to fix
15:41 karolherbst: any idea why it isn't found?
15:41 karolherbst: the code basically goes like this:
15:41 karolherbst: %11 = OpConstantNull %_arr_v2uchar_ulong_3
15:41 karolherbst: %mem2 = OpVariable %_ptr_UniformConstant__arr_v2uchar_ulong_3 UniformConstant %11
15:42 pmoreau: If the validator does not handle OpConstantNull, for some reason
15:42 karolherbst: mhh
15:42 karolherbst: I'll put an assert
15:43 karolherbst: heh
15:44 karolherbst: word No.25 means 25th line?
15:44 pmoreau: Seems weird, but, maybe?
15:44 karolherbst: mhh
15:44 karolherbst: is this even your error message?
15:45 pmoreau: It’s in the SPIR-V validator, i.e. spirv-val from SPIRV-Tools
15:45 karolherbst: mhh
15:45 pmoreau: (I would guess)
15:46 karolherbst: wow
15:46 karolherbst: that spirv-dis thing makes it easier to read
15:46 pmoreau: It does :-)
15:47 karolherbst: it also reorders stuff
15:47 karolherbst: mhhh
15:47 pmoreau: test_constant fails for me, but I don’t get that error you have
15:47 karolherbst: kernel_memory_alignment_constant ?
15:48 pmoreau: Ah!! I thought you were talking about test_constant
15:49 pmoreau: (or just constant it seems)
15:50 pmoreau: OK, now I get it :-)
15:56 karolherbst: any ideas?
15:56 pmoreau: I’m looking into it
15:57 karolherbst: okay, I will try to add supports for other ops we are missing then
16:05 pmoreau: Could be a bug in the linker: I get this error in the validation done after the linker is run.
16:10 karolherbst: mhh
16:10 karolherbst: interesting
16:10 karolherbst: %11 is removed after linking
16:11 pmoreau: Oh yeah, I know that bug! It was reported on SPIRV-Tools
16:11 karolherbst: :)
16:12 pmoreau: I wrote a fix but haven’t upstreamed it yet cause I also need to write unit tests for the handling of all decorations
16:12 karolherbst: where is the fix?
16:12 karolherbst: then I would apply it locally
16:12 karolherbst: on your master branch?
16:12 pmoreau: I will add it to the for_nouveau branch
16:12 karolherbst: ahh
16:12 karolherbst: nice, thanks
16:13 pmoreau: The master branch is tracking master. I have “feature”/fix branches for each and every pull request/fix.
16:13 karolherbst: k
16:13 karolherbst: opcode 42 or 157 now...
16:15 karolherbst: ahh right, 42 was OpConstantFalse
16:15 karolherbst: should be easy enough
16:15 pmoreau: Yup
16:16 pmoreau: Go for OpConstantTrue while you are at it :-)
16:16 pmoreau: And 157 is OpIsInf
16:23 pmoreau: karolherbst: Fixed pushed onto for_nouveau
16:23 pmoreau: *Fix
16:24 karolherbst: opcodes 42, 157 and builtins 30, 33 missing
16:24 karolherbst: I think this should be all
16:24 pmoreau: But for some reason I’m still triggering that bug when not using the command line tool :-/
16:24 karolherbst: mhh
16:27 pmoreau: Oh, BuiltIn 30 (WorkDim), that’s the dimension of the grid/work: 1D, 2D, 3D.
16:28 pmoreau: Dunno if there’s a special reg for that, but it’s easy to compute on the fly
16:28 pmoreau: s/special reg/system value
16:29 pmoreau: SV_WORK_DIM meh, that should do the job :-D
16:33 karolherbst: :D
16:35 karolherbst: pmoreau: wondering what we should do if something asks for a 8bit bool
16:35 karolherbst: int
16:37 karolherbst: 1 was int true, 0 was int false, 0xffffffff was float false and 0x00000000 wasa float true?
16:38 karolherbst: or are the float ones swapped?
16:45 pmoreau: 8-bit bool or 8-bit int? I am getting confused
16:45 karolherbst: well if we have a 8 bit int value
16:45 karolherbst: but mhh, the value doesn't change
16:45 karolherbst: so meh
16:46 karolherbst: I am only confused about those float bools anyway
16:47 karolherbst: pmoreau: does this look okay? https://github.com/karolherbst/mesa/commit/97344a849b171b2b775f455a5ea1af27c947d07a
16:48 karolherbst: uhm, I can remove line 220
16:49 pmoreau: This looks over complicated TBH just do the same thing as generateConstant for TypeInt and TypeFloat
16:50 karolherbst: generateConstant takes some spirv value, right?
16:50 pmoreau: Rather, look at what OpConstant is doing
16:50 pmoreau: It does take the parsed instruction
16:52 pmoreau: So, just override generateConstant for TypeBool
16:53 karolherbst: that won't do
16:53 karolherbst: the target type matters
16:53 karolherbst: int bool != float bool
16:53 karolherbst: they have different values
16:54 pmoreau: Why are we talking about int bool vs float bool? It’s just a TypeBool
16:54 karolherbst: mhh
16:54 karolherbst: ohh, I see
16:54 pmoreau: It does not say anything about them being ints or floats or whatever
16:54 karolherbst: mhh
16:55 karolherbst: well, converting that to nvir may be a bit more complicated then
16:55 pmoreau: I have been representing them using predicates
16:55 karolherbst: ohh, okay
16:56 pmoreau: I *think* TypeBool can’t be used willy-nilly by other opcodes: they first have to be explicitly casted.
16:56 karolherbst: ohh, okay
16:57 karolherbst: well, I can't use generateConstant though, because that still operates on sources and so on
16:58 pmoreau: It doesn’t look like there is an existing system value for GlobalOffset. Will need to do some investigation on how it should work
16:58 karolherbst: parameter_types hits an assert in the emiter now
16:59 pmoreau: “because that still operates on sources and so on” sorry, I didn’t understand that
16:59 karolherbst: it takes an operandIndex parameter
16:59 pmoreau: yes
17:00 karolherbst: well, we have a constant, no value or something like this
17:00 karolherbst: well
17:00 karolherbst: not even a constant
17:00 karolherbst: it isn't bool(false), but false()
17:00 imirkin_: i don't think pointers were particularly contemplated when making the nv50 ir, so symbol is the best you have. it has a base, but no indirect info which is kept as part of the ValueRef
17:01 pmoreau: karolherbst: Well, false() would return you a bool which would be set to false, no?
17:02 karolherbst: I mean we don't have operands in the first place
17:02 karolherbst: and generateConstant is parsing operands
17:02 karolherbst: generateConstant uses mkImm(Value*), but we need mkImm(int)
17:03 karolherbst: I am sure I could hack around that, but...
17:03 pmoreau: generateConstant uses generateImmediate which uses mkImm(int)
17:03 karolherbst: we could have a different method, where we can just put constants into the code
17:03 karolherbst: and rework the generateNullConstant to use this one
17:03 karolherbst: or so
17:04 karolherbst: pmoreau: are you sure about that?
17:04 pmoreau: I am reading the code ;-) (plus I do now it a bit)
17:04 karolherbst: ohh wait...
17:04 karolherbst: well, there is still the getOperand call
17:05 pmoreau: OK, it’s not necessarily mkImm(int): it could be mkImm(float) if we are creating a float, or mkImm(unsigned int), etc.
17:05 karolherbst: it doesn't fit, that's what I mean
17:05 karolherbst: sure we could use it somehow
17:05 karolherbst: but it wasn't the purpose of this method
17:05 karolherbst: it parses an operand
17:05 karolherbst: and makes an immediate value out of it
17:05 pmoreau: If you are overriding TypeBool::generateConstant, *you* decide what it is doing
17:05 karolherbst: but we don't have an operand here
17:05 karolherbst: no, the interface does decide it
17:05 pmoreau: So, you could decide to not call generateImmediate
17:06 karolherbst: ad the interface states it parses an operand
17:06 karolherbst: well, "uint16_t &operandIndex"
17:06 pmoreau: The interface does not say it has to call generateImmediate
17:06 karolherbst: no, but it takes an operandIndex
17:06 karolherbst: we don't have any operands here
17:06 pmoreau: As for the operandIndex, I think it is only? for ConstantComposite
17:07 karolherbst: okay, here is what I mean: this method doesn't take a constant value, but an operandIndex
17:07 karolherbst: so it isn
17:07 karolherbst: 't usable for constant values
17:07 karolherbst: or shouldn't be
17:07 karolherbst: I could just put 0 and 1 as the index and be done with it
17:08 karolherbst: but we shouldn't start doing bad code in this stage already
17:10 pmoreau: That operandIndex only has a valid meaning for TypeStruct::generateConstant and TypeVector::generateConstant
17:10 karolherbst: and for TypeInt
17:10 karolherbst: and for TypeFloat
17:10 karolherbst: ...
17:11 pmoreau: Well, because it can be called from the TypeStruct
17:11 karolherbst: okay, well, suppose I use that function, how should I differ between false and true?
17:11 karolherbst: read the opcode out of parsedInstruction?
17:11 pmoreau: Yes
17:12 pmoreau: I don’t see why you would need an extra argument for that.
17:13 pmoreau: I wouldn’t be against redesigning the whole Type* class system, but haven’t really come up with a better idea.
17:13 karolherbst: somehow this doesn't feels like good design, because we would move the logic into several places where opcodes are parsed
17:14 karolherbst: dunno though
17:14 karolherbst: doesn't feel like a clean solution
17:14 pmoreau: It definitely isn’t
17:15 pmoreau: I was thinking of having a struct which would store typeOpcode, numElements, elementsType, size, alignment, etc and use that for all types, but it won’t work for functions and images.
17:16 karolherbst: mhh
17:16 karolherbst: ./../../../../src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp:313: void nv50_ir::CodeEmitterGM107::emitCBUF(int, int, int, int, int, const nv50_ir::ValueRef&): Assertion `!(s->reg.data.offset & ((1 << shr) - 1))' failed.
17:16 karolherbst: cvt f32 $r2 u8 c0[0x1]
17:17 pmoreau: what’s shr?
17:17 karolherbst: pmoreau: mhh, yeah, dunno. I don't know enough yet
17:17 karolherbst: 2
17:18 karolherbst: ohhh
17:18 karolherbst: it has to be 0x4 aligned
17:18 karolherbst: ...
17:19 karolherbst: ld u8 %r1 c0[0x1]
17:19 karolherbst: whatever produces that
17:19 pmoreau: That should work
17:19 imirkin_: karolherbst: yeah, c0[0x1] can't work
17:19 pmoreau: Oh, misread
17:19 imirkin_: it has to be aligned by 4
17:19 karolherbst: yeah..
17:19 imirkin_: if it's a u8, there are subops to do the I2F
17:19 karolherbst: I just didn't see it before
17:19 pmoreau: I thought ld u8 %r1 g[0x1]
17:19 imirkin_: so you can do like I2F R0, C[0x0][0x0].B2
17:19 imirkin_: or something
17:20 imirkin_: which will pick out the third byte
17:20 karolherbst: I am currently more interested why that happens at all
17:20 pmoreau: Why not use BFE?
17:20 imirkin_: pmoreau: I2F
17:20 karolherbst: pmoreau: are those the OpFunctionParameter thingies?
17:20 pmoreau: Isn’t that int -> float?
17:20 karolherbst: in parameter_types
17:20 imirkin_: it is.
17:20 imirkin_: BFE just does bit extract
17:21 imirkin_: while this picks out the right byte directly and converts to float
17:21 pmoreau: I had forgotten it was supposed to be converted later on
17:21 karolherbst: uhh
17:21 karolherbst: char%s c, uchar%s uc, short%s s, ushort%s us, int%s i, uint%s ui, float%s f
17:21 karolherbst: as method parameters
17:21 karolherbst: yeah, right
17:21 pmoreau: That I2F is really neat! Can it extracts any byte?
17:21 imirkin_: yeah
17:21 imirkin_: and half-word
17:22 pmoreau: Awesome stuff!
17:22 imirkin_: i.e. .H0/H1 and .B0/B1/B2/B3
17:22 imirkin_: i have opts which detect such things and adjust
17:22 pmoreau: I can see why you recommend to use it! :-)
17:23 imirkin_: https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp#n2007
17:24 karolherbst: well
17:24 karolherbst: we don't need to convert them
17:24 pmoreau: I think I know where it comes from
17:24 karolherbst: well
17:24 karolherbst: we must not convert them
17:24 karolherbst: imirkin_: we really just need bit extraction
17:25 karolherbst: pmoreau: or do we have to convert anything here for the function parameters?
17:25 pmoreau: karolherbst https://phabricator.pmoreau.org/diffusion/MESA/browse/nouveau_spirv_support/src/gallium/drivers/nouveau/codegen/nv_ir_from_spirv.cpp;a38ca48958316c23ccd93a34f1eeb91c0e876c58$1986
17:25 karolherbst: well, in fact we just need to fixup load();, right?
17:26 pmoreau: Yeah, I think so
17:26 karolherbst: okay
17:26 pmoreau: It should be fine for global, but the other ones need to be fixed
17:26 karolherbst: yeah
17:26 karolherbst: I will comment that test out for now...
17:27 karolherbst: want to do a complete run before I head home
17:29 karolherbst: pmoreau: 15 fails out of 37 :)
17:29 karolherbst: uhm
17:29 karolherbst: *52
17:30 karolherbst: so 37 passes
17:30 pmoreau: Two more than earlier today
17:30 karolherbst: yeah
17:30 karolherbst: + the fuzzy one
17:30 karolherbst: but yeah
17:31 pmoreau: I’ll do a rebase + cherry-pick your patches and push everything, before doing some more cleanup.
17:31 karolherbst: okay
17:31 karolherbst: well
17:31 karolherbst: that 37 is without the false/true opcodes
17:32 karolherbst: I think two got fixed by the linker fixes
18:04 imirkin_: note that stuff like BFE is nvc0+ -- if you want it to work on nv50, you'll either need to add lowering, or just feed it in as AND + SHR, which we have a pass to recognize and convert back into BFE's
18:09 pmoreau: AND+SHR should be what I’m currently doing IIRC, but it looks like I’m missing it in some places.
18:22 karolherbst: pmoreau: the test which is using those constant opcodes is failing due to linker issues...
18:23 pmoreau: Which test is that again?
18:23 karolherbst: kernel_preprocessor_macros
18:24 pmoreau: OK, will have a look once I have rebased everything
18:24 karolherbst: awespme
18:24 karolherbst: *awesome
18:33 pmoreau: umad24 seems completely borked now
18:33 karolherbst: mhhh
18:33 pmoreau: karolherbst: Your patches seem to break umad24 here.
18:33 pmoreau: Gonna have a look at them
18:33 karolherbst: mhhh annoying
18:33 karolherbst: okay
18:34 pmoreau: I am currently getting “umad24: FAIL: 0xc0c9d83f instead of 0x5129d83f”
18:34 karolherbst: pmoreau: ohh wait
18:34 karolherbst: I know the issue
18:34 karolherbst: I fixed it
18:34 pmoreau: But haven’t pushed it to your branch yet?
18:34 karolherbst: wait, I will rebase that stuff
18:34 pmoreau: OK
18:36 karolherbst: pushed it
18:37 pmoreau: Thanks
18:37 karolherbst: yeah I did -2 intead of +2
18:39 pmoreau: Meh, -2/+2, it’s just the same! :-D
18:41 pmoreau: Still failing
18:41 karolherbst: mhh
18:41 karolherbst: it shouldn't, let me check it here
18:41 pmoreau: And still with the same value
18:45 karolherbst: are smad and umad failing?
18:46 pmoreau: smad is fine
18:46 karolherbst: interesting
18:46 pmoreau: (at least with the current test)
18:47 karolherbst: can you check if the commit before my rework is fine?
18:47 pmoreau: umad24 works if I use (U24, U16L, 32) like I had originally
18:47 karolherbst: ahhh
18:48 karolherbst: can you check with nvdisasm and envydis?
18:48 pmoreau: Was going to
18:48 karolherbst: I am sure I emit the correct opcodes
18:48 karolherbst: but maybe I screwed up
18:48 pmoreau: envydis says mad b32 $r2 u24 $r2 u16l c0[0xc] u32 $r3
18:49 karolherbst: with your fix or with my code?
18:49 pmoreau: with my fix
18:49 karolherbst: ...
18:49 karolherbst: meh
18:49 karolherbst: why though
18:49 karolherbst: this is a bit odd I have to say
18:49 karolherbst: why should it be u16l?
18:52 karolherbst: pmoreau: what if you put things like 0x1 * 0x54321 + 0x12345678
18:53 pmoreau: nvidsasm agrees with it being u16l
18:53 karolherbst: yeah, I was quite sure we do the correct encoding
18:55 pmoreau: Pass with my fix, fails otherwise
18:55 karolherbst: 0x1 * 0x54321 + 0x12345678?
18:55 karolherbst: how is that even possible
18:55 pmoreau: umad24: FAIL: 0x12349999 instead of 0x12399999
18:56 karolherbst: mhhh
18:56 karolherbst: I say that instruction is broken
18:56 karolherbst: could you put S24?
18:56 karolherbst: uhm...
18:56 karolherbst: yeah,
18:57 karolherbst: anyway, I am kind of away for the next hour or so
18:58 pmoreau: 1*0x6262f3f3+0 = 0xf3f3 with your patch
18:58 karolherbst: with S24=
18:58 karolherbst: ?
18:59 karolherbst: mhh, maybe I set it up oin my kepler and figure out what the hell is wrong here
18:59 pmoreau: 0xfffff3f3
18:59 karolherbst: that sounds even more wrong
18:59 karolherbst: but envydis and nvdisasm agree with me, right?
19:00 karolherbst: and I am sure I didn't change the actual encoding in my patch at all
19:00 karolherbst: except for the third source
19:01 pmoreau: They do agree with the encoding
19:01 karolherbst: *sigh* so either hw error
19:01 karolherbst: or nvdisasm is wrong
19:13 pmoreau: karolherbst: I’ll revert to U16L for now. Might want to completely drop it later.
19:14 karolherbst: k
19:16 pmoreau: And pushed
20:05 karolherbst: pmoreau: what is funny though is that the s24 and s16L bits work as expected
20:05 karolherbst: so you bascially got u16L, s24, u24, s16L as the order
20:08 pmoreau: Weird stuff
20:12 karolherbst: now imagine the current code being broken...
20:13 karolherbst: but luckily only the u versions were used
20:13 karolherbst: oh wait ;)
22:21 Mortiarty: so I did NOT buy nvidia card - instead i bought a new pc without graphic card and still use my gtx650ti with nouveau which works great.
22:39 TheXzoron: experienced another kernel panic with modesetting
22:40 TheXzoron: so I'm switching to default nouveau
22:44 pmoreau: karolherbst: Hum, the CFG of the SPIR-V binary is slightly different from the one I generate.
22:44 karolherbst: mhh
22:44 karolherbst: weird
22:45 pmoreau: Run `spirv-cfg loop_with_if.spv | dot -Tpng -o loop_with_if.png` to visualise it
22:49 karolherbst: pmoreau: doesn't it look bascially the same?
22:50 pmoreau: I have some extra BBs
22:50 karolherbst: right
22:50 karolherbst: and some are either empty or have just a bra
22:50 karolherbst: I think the if_else BB should be empty
22:51 pmoreau: They have a forward edge whereas we have a cross one, but yeah.
22:51 pmoreau: I could be nice to not have those empty BBs
22:59 karolherbst: well right, but it shouldn't cause us such troubles