13:07 rhyskidd: any other review comments on VBIOS DisplayPort table support? See here: https://github.com/envytools/envytools/pull/125
14:08 karolherbst: imirkin: :( fail vs hang in a shader, with Flatteningpass it hangs otherwise it just fails: https://gist.github.com/karolherbst/8fba08ef220d86e7db68d14106d64cf5
14:09 karolherbst: difference is a "not $p0 bra BB:5 (8); break BB:3 (8)" vs "$p0 break BB:3 (8)"
14:09 karolherbst: and this is with TGSI
14:11 karolherbst: "nvdisasm error : Unaligned instruction found" ups
14:16 karolherbst: ohh right, full 4 instruction blocks I need
14:37 imirkin: karolherbst: indeed you do. flattening pass is a bit ... sensitive
14:53 karolherbst: imirkin: well right, but the changes it did are looking fine
14:54 karolherbst: the test fails anyway, so I assume it just changes things a little so they even hang the GPU or something
14:55 karolherbst: imageAtomicMin/imageAtomicMax and imageAtomicCompSwap are failing
14:56 karolherbst: mhh opt level 0: 1536.000000 vs 864.000000 opt level 1: 1536.000000 vs 846.000000
14:56 karolherbst: for swap
15:01 karolherbst: PTX has no sured.cas or sured.swap ...
15:04 imirkin: which platform is this?
15:05 imirkin: iirc fermi/kepler need something special
15:05 imirkin: or maybe that's for shared...
15:09 karolherbst: maxwell
15:09 karolherbst: well pascal
15:09 karolherbst: but I doubt that makes a difference
15:11 imirkin: hmmmm
15:11 imirkin: iirc SURED.CAS exists there
15:11 karolherbst: yeah..
15:11 imirkin: but it's a diff encoding or something
15:12 karolherbst: well, PTX doesn't allow it
15:12 imirkin: https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp#n3074
15:12 karolherbst: but the PTX sured gets translated into suclamp+subfm+sueau+prmt+RED?
15:12 karolherbst: yeah, I know it exists in the hw ISA
15:12 imirkin: oh. PTX. yeah i dunno about ptx.
15:12 karolherbst: I chcked with nvdisasm
15:13 imirkin: i was talking about nvdisasm.
15:13 karolherbst: I figured
15:13 imirkin: i think the PTX thing happens because it's more efficient
15:13 imirkin: it's probably done in such a way that only a single lane takes effect or something
15:13 karolherbst: mhh
15:13 karolherbst: I see
15:14 karolherbst: VSETP.EQ.U8.U16.AND.... with -O0
15:14 karolherbst: that's interesting
15:31 pendingchaos: imirkin: is there any reason why no nouveau code seems to call the FIRMWARE method?
15:31 pendingchaos: is there some better way to change the conservative rasterization dilation?
15:32 imirkin: pendingchaos: no, there isn't. we've just never had to worry about that. there's always been a regular method we could call.
15:33 imirkin: pendingchaos: as the name suggests, it relies on the ctxsw fw to do certain things. the nouveau-made one does not implement those methods.
15:33 imirkin: however for GM20x+, we use the nvidia one, and i *presume* (but haven't verified) that those same methods are supported
17:10 imirkin: mslusarz: i remember you had an archive of a bunch of traces you used to test demmt with?
17:24 Subv: when does nvc0 call the NVC0_3D_RT metod? (0x0800)
17:25 imirkin: nvc0_validate_fb
17:29 imirkin: (note that it's a group of methods that define an RT, and there are 8 such groups)
18:09 Subv: i see
18:10 Subv: another question, what would happen if you use method CB_DATA without first calling CB_ADDRESS_HIGH/LOW? ie, what happens to the data passed to CB_DATA when CB_ADDRESS is 0
18:10 imirkin: it tries to write to VA 0...
18:14 Subv: i would think that's an error, but i've got a game doing exactly that :D (unless of course it sets CB_ADDRESS via some unimplemented macro)
18:15 imirkin: i think the blob driver does set CB_ADDRESS via macro for some odd reason actually
19:04 Subv: mm, how does nvc0 handle storage buffers? are they bound as normal const buffers that you can use special instructions to read/write to?
19:21 karolherbst: Subv: you mean ssbos right?
19:21 Subv: yes
19:21 karolherbst: those are buffers
19:21 karolherbst: not const buffers
19:21 karolherbst: const buffers are read only
19:22 karolherbst: it is basically just global memory
19:24 karolherbst: Subv: in nouveau we store the meta information of each ssbo inside the driver const buffer (c15) and do direct load/stores against global memory
19:25 karolherbst: and we do basically base + offset math in the shader
19:25 karolherbst: + bounds check
19:26 Subv: ah
19:27 Subv: that makes sense, the binary i'm trying out now has a function called BindStorageBuffer that sets the current constbuffer to some address, and writes buffer addresses to it via CB_DATA
19:27 karolherbst: Subv: I give you an example
19:28 Subv: i suppose this constbuffer address is where the ssbo information goes
19:29 karolherbst: Subv: roughly this happens: https://gist.githubusercontent.com/karolherbst/9f309bf1102f8d2b1bb1b01f58250b1d/raw/c7d3160017a320ef269f05468480e97068427e86/gistfile1.txt
19:29 Subv: the address of the const buffer it uses is stored in the (undocumented) 0x3460 register, the macro reads this register and sets it as the current constbuffer
19:29 karolherbst: well inside our shader compiler
19:29 karolherbst: we translate the buffer access to b[0x0] in the first step
19:30 karolherbst: then convert it to global access later on
19:30 karolherbst: you might see something similiar inside the shader you got there
19:31 karolherbst: Subv: where stuff is stored is up to the driver of course
19:31 Subv: ah i see
19:31 karolherbst: and can be anywhere inside a const buffer
19:31 karolherbst: but usually the last one (c15 for non compute, c7 for compute) are the driver const buffer to store driver specific stuff
19:31 Subv: thanks!
19:32 karolherbst: I think nvidia might use the same const buffs for that stuff
19:32 karolherbst: but different offsets
19:33 Subv: heh, it's bound to c0[]
19:33 feaneron: hi folks. after upgrading to linux 4.15.10, sometimes nouveau yields a error message that looks concerning:
19:33 feaneron: [ 7.137025] nouveau 0000:03:00.0: bus: MMIO read of 00000000 FAULT at 612004 [ !ENGINE ]
19:33 karolherbst: yeah, then it is c0. I don't know
19:34 feaneron: what is the appropriate place to report this bug?
19:34 karolherbst: feaneron: it is more or less concerning. It just means we read something which isn't there
19:34 karolherbst: shouldn't do any harm
19:34 karolherbst: but yeah, it is a bit annoying
19:34 Subv: sometimes i get confused about what's a hardware requirement, and what's not required but simply nouveau convention
19:37 karolherbst: Subv: I am actually wondering if some other devs trying to get a switch emulator working are actually reeing the shader ISA....
19:37 feaneron: karolherbst: right; but every time i see that, mesa apparently stops recognizing the nvidia card. instead, running it reports some llvm pasta
19:37 karolherbst: feaneron: interesting
19:37 feaneron: they might not be related. i honestly don't know enough to assert that.
19:37 karolherbst: feaneron: something else in dmesg?
19:39 feaneron: well, the latest mutter was awakening the dgpu every time it had to render something
19:39 karolherbst: uhhh
19:39 feaneron: i saw some crashes flying by, but unfortunately i didn't copy them
19:39 karolherbst: seriously?
19:39 feaneron: yes
19:39 karolherbst: :(
19:39 karolherbst: well, I guess booting with nouveau.runpm=0 might fix that issue
19:39 karolherbst: but yeah
19:39 karolherbst: that's not great
19:39 feaneron: well, good news: this happens no more
19:40 karolherbst: nice...
19:40 feaneron: https://gitlab.gnome.org/GNOME/mutter/merge_requests/51
19:40 Subv: karolherbst: what do you mean?
19:41 karolherbst: Subv: well, usually you need to understand the shader ISA to emulate stuff, right?
19:41 Subv: yeah
19:42 feaneron: in wayland, that meant a very annoying stutter every time the dgpu was put to sleep. and to help that, libinput was duplicating events every stutter.
19:42 karolherbst: I am curious if there are some not knowing about nouveau or the CUDA disassembler and just go ahead and try to reverse engineer the entire ISA from scratch
19:42 feaneron: a freaking chaos
19:42 karolherbst: feaneron: .... sounds like a bug in mutter or so
19:43 feaneron: might be indeed. again, not lectured enough to assert that.
19:43 feaneron: anyway, let me see if i can get something more useful
19:43 Subv: heh, i hope nobody goes through that pain
19:44 karolherbst: I would even say it is impossible on the switch
19:44 karolherbst: because you can't really just run your own code and see what happens
19:45 karolherbst: this just hit me: who would have thought that all the work on nouveau done would help at some point writing an emulator where most of the reverse engineering work is actually already done which results in less pain, but also less fun
19:45 karolherbst: not sure if I should be sorry or happy for those :p
19:47 karolherbst: imirkin: what is the deal with preex? I thought it has to be put before doing an actual ex2?
20:18 feaneron: i was wrong. that error did not affect anything
20:18 feaneron: looks like something else is broken, and calling glxinfo under wayland reads llvm as the renderer
20:19 feaneron: or something like that
20:30 imirkin: karolherbst: it does.
20:33 karolherbst: imirkin: interesting
20:34 karolherbst: imirkin: I am sure if I put preex2 before ex2 in my nir converter stuff fails
20:34 imirkin: preex2 does some kind of transformation
20:34 imirkin: and then ex2 consumes that result
20:34 imirkin: and produces the exp2() of the original input
20:35 karolherbst: imirkin: ohh, preex2 is added by codegen automatically
20:35 imirkin: yes, as a lowering thing iirc
20:35 karolherbst: but not presin
20:35 karolherbst: this is what confused me
20:36 imirkin: really?
20:36 karolherbst: yeah
20:36 karolherbst: presin is entered in from_tgsi
20:36 karolherbst: added
20:36 imirkin: hm ok
20:37 karolherbst: mhh, maybe I can just move it where I do the 64 -> 32 bit lowering and we can move all the lowerings we do for all chipsets the same while converting to SSA there as well later
20:37 karolherbst: just I doubt it is worth the effort really