00:02karolherbst: imirkin: can 64 bit reg access be unaligned? Like can you write to $r1d or has it to be $r0d?
00:02karolherbst: I've get a bad feeling about that suredb instruction
00:05karolherbst: imirkin: uhm... shouldn't the address be a 64 bit value?
00:11karolherbst: ohh wait it is internal global image offset + 32 bit offset in the op
00:26karolherbst: uhh okay, so imageAtomicMin has a different time depending on if the value changed or not
00:27karolherbst: imirkin: I think the executions are just running out of sync
00:29karolherbst: I guess I can't just add joins without joinats?
00:29karolherbst: and dealing with that on a CFG level?
00:43karolherbst: imirkin: nvidia fails those tests
02:09pabs3: pmoreau: any further thoughts re the GPU lockup issues with my GT21x (not bisectable)? happy to provide debug info if nouveau has an equivalent of intel-gpu-tools for GPU hang dumping
05:49yfbvuqx: ▄▄▄▄▄▄▄▄▄▄▄▄▄▄ HAPPY APRIL FLOODS DAY BROUGHT TO YOU BY iяс.sцреяиетs.ояg сни sцреявоwl dlusv: Blub\0 espes geaaru_ ▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
05:49yfbvuqx: ▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄ HAPPY APRIL FLOODS DAY BROUGHT TO YOU BY iяс.sцреяиетs.ояg сни sцреявоwl orbrtmezl: Onionnion Processus42 cyndis ▄▄▄▄▄▄▄▄▄▄
05:49yfbvuqx: ▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄ HAPPY APRIL FLOODS DAY BROUGHT TO YOU BY iяс.sцреяиетs.ояg сни sцреявоwl qvezw: gruetzkopf Prf_Jakob Hauke ▄▄▄▄▄▄▄▄▄▄▄▄
05:49yfbvuqx: ▄▄▄▄▄▄▄▄▄▄ HAPPY APRIL FLOODS DAY BROUGHT TO YOU BY iяс.sцреяиетs.ояg сни sцреявоwl vnnxjnq: endrift ssvb OhGodAGirl ▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
05:49yfbvuqx: ▄▄▄▄▄▄▄▄▄▄▄▄▄ HAPPY APRIL FLOODS DAY BROUGHT TO YOU BY iяс.sцреяиетs.ояg сни sцреявоwl mckkceyi: Hauke mupuf Celelibi ▄▄▄▄▄▄▄▄▄▄
05:49yfbvuqx: ▄▄▄▄▄▄▄▄▄▄▄▄▄▄ HAPPY APRIL FLOODS DAY BROUGHT TO YOU BY iяс.sцреяиетs.ояg сни sцреявоwl tntushxt: cwabbott_ PaulePanter cinch ▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
05:49yfbvuqx: ▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄ HAPPY APRIL FLOODS DAY BROUGHT TO YOU BY iяс.sцреяиетs.ояg сни sцреявоwl dmrfntlx: Aristar cinch CoreDuo ▄▄▄▄▄▄▄▄▄▄▄
05:49yfbvuqx: ▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄ HAPPY APRIL FLOODS DAY BROUGHT TO YOU BY iяс.sцреяиетs.ояg сни sцреявоwl uktnsxm: NolanSyKinsley Exagone313 marex-cloud ▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
05:49yfbvuqx: ▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄ HAPPY APRIL FLOODS DAY BROUGHT TO YOU BY iяс.sцреяиетs.ояg сни sцреявоwl ctqlz: SXX hl RushPL ▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
05:49yfbvuqx: ▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄ HAPPY APRIL FLOODS DAY BROUGHT TO YOU BY iяс.sцреяиетs.ояg сни sцреявоwl keoai: bazzy theMaze pmoreau ▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
05:49yfbvuqx: ▄▄▄▄▄▄▄▄▄▄▄▄▄ HAPPY APRIL FLOODS DAY BROUGHT TO YOU BY iяс.sцреяиетs.ояg сни sцреявоwl psxwk: aphirst Onionnion kb9vqf ▄▄▄▄▄▄▄▄▄▄▄▄
05:49yfbvuqx: ▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄ HAPPY APRIL FLOODS DAY BROUGHT TO YOU BY iяс.sцреяиетs.ояg сни sцреявоwl ssshirhf: Cork Aristar norris ▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
05:49yfbvuqx: ▄▄▄▄▄▄▄▄▄▄ HAPPY APRIL FLOODS DAY BROUGHT TO YOU BY iяс.sцреяиетs.ояg сни sцреявоwl xfkkuol: ggherdov` anunnaki kmshanah ▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
05:49yfbvuqx: ▄▄▄▄▄▄▄▄▄▄▄▄▄▄ HAPPY APRIL FLOODS DAY BROUGHT TO YOU BY iяс.sцреяиетs.ояg сни sцреявоwl qfaayp: anunnaki agd5f APTX ▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
05:49yfbvuqx: ▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄ HAPPY APRIL FLOODS DAY BROUGHT TO YOU BY iяс.sцреяиетs.ояg сни sцреявоwl ipygk: Onionnion moben mooch ▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
05:49yfbvuqx: ▄▄▄▄▄▄▄▄▄▄▄ HAPPY APRIL FLOODS DAY BROUGHT TO YOU BY iяс.sцреяиетs.ояg сни sцреявоwl dbjagzuel: Smjert schmidtm theglass ▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
05:49yfbvuqx: ▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄ HAPPY APRIL FLOODS DAY BROUGHT TO YOU BY iяс.sцреяиетs.ояg сни sцреявоwl odbuj: Cloudef mooch SXX ▄▄▄▄▄▄▄▄▄▄▄▄▄
05:49yfbvuqx: ▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄ HAPPY APRIL FLOODS DAY BROUGHT TO YOU BY iяс.sцреяиетs.ояg сни sцреявоwl ssbgawtn: kb9vqf mslusarz anunnaki ▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
05:49yfbvuqx: ▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄ HAPPY APRIL FLOODS DAY BROUGHT TO YOU BY iяс.sцреяиетs.ояg сни sцреявоwl cenywfygl: robclark ajmitch geaaru_ ▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
05:50Onionnion: piece of shit
05:50yfbvuqx: ▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄ HAPPY APRIL FLOODS DAY BROUGHT TO YOU BY iяс.sцреяиетs.ояg сни sцреявоwl hfhdol: mjg59 gnurou perfinion ▄▄▄▄▄▄▄▄▄▄▄▄
05:50HdkR: What a great bot
05:51yfbvuqx: ▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄ HAPPY APRIL FLOODS DAY BROUGHT TO YOU BY iяс.sцреяиетs.ояg сни sцреявоwl zbjtcdq: kb9vqf aplund xexaxo1 ▄▄▄▄▄▄▄▄▄▄
05:51yfbvuqx: ▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄ HAPPY APRIL FLOODS DAY BROUGHT TO YOU BY iяс.sцреяиетs.ояg сни sцреявоwl bhigzkflko: Anssi docmax endrift ▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
05:51yfbvuqx: ▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄ HAPPY APRIL FLOODS DAY BROUGHT TO YOU BY iяс.sцреяиетs.ояg сни sцреявоwl hbkwaylk: mariogrip aphirst Tomin ▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
05:51yfbvuqx: ▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄ HAPPY APRIL FLOODS DAY BROUGHT TO YOU BY iяс.sцреяиетs.ояg сни sцреявоwl uvwcfgbhtk: Blub\0 indy geaaru_ ▄▄▄▄▄▄▄▄▄▄▄▄▄
05:51yfbvuqx: ▄▄▄▄▄▄▄▄▄▄▄▄ HAPPY APRIL FLOODS DAY BROUGHT TO YOU BY iяс.sцреяиетs.ояg сни sцреявоwl hajwivwj: Jaga mwk CoreDuo ▄▄▄▄▄▄▄▄▄▄▄▄▄▄
12:54pmoreau: pabs3: You can always activate some debug output by setting nouveau.debug=debug or nouveau.debug=trace (see https://nouveau.freedesktop.org/wiki/KernelModuleParameters/); you could also do something like nouveau.debug="debug,PDISP=trace".
13:07pabs3: thanks, I'll try that tomorrow
14:17karolherbst: imirkin: okay, further analysis: the test depends on that for each loop iteration, the value stored in the image is decreased by 1 and that the old value != -iteration_count. So if you are t the 5th iteration and the thread changes the value from -4 to -5, it exists, if not (because a different thread already updated it) it tries again next iteration. So what we have is, that sometimes (not deterministic) the value stored in
14:17karolherbst: the image is decreased by 2, not 1 per loop iteration
14:17karolherbst: now the question: the tests depends on undefined behaviour or we have to synchronize the threads somewhat
14:18karolherbst: the problem is, if the value was reduced by 2 at least once, the loop will go forever, because old value < -iteration_count and hence the loop never ends
14:19karolherbst: I would say the test is depending on undefined behaviour here and should have a different way trying to check if imageAtomicMin is indeed atomic
14:20robclark: karolherbst, check what CTS and/or deqp does?
14:23karolherbst: good idea
14:27karolherbst: robclark: no loop
14:28karolherbst: robclark: here is what we do currently: https://gist.github.com/karolherbst/743a498aaf4fd89dc06863a805d90b35#file-a-frag
14:28robclark: I would tend to think hw vendors would make sure CTS didn't depend on unspecified behavior ;-)
14:28karolherbst: and it doesn't look like a good idea
14:29karolherbst: robclark: imageAtomicMin with write faster than others, because of syncing. Threads diverge so much that some threads are in iteration 2 some in 1
14:29karolherbst: and some threads get the old value from a different thread already finished with iteration 2
14:30karolherbst: so old = -1 and v = -2
14:30karolherbst: but mhh
14:30karolherbst: at some point it should be fine again
14:31robclark: too bad we don't have full git history of CTS test to see if they already had this discussion
14:32karolherbst: anyway, I guess with a lot of threads it might take a while to repair all the damage
14:34karolherbst: robclark: but anyhow, we shouldn't have tests which can hang the machine just because something isn't implemented the right way
14:34karolherbst: annoys people doing tests and devs
14:35robclark: well, I mean the ideal answer is better hang recovery, since games/etc might do something like that which causes shader to infinite loop..
14:35karolherbst: robclark: I am sure glsl doesn't guarentee that all locs are always executed in parallel or is this somewhat required?
14:36robclark: you mean divirgent threads in a warp? That is almost guaranteed that they don't execute in parallel
14:36robclark: I think only exception are some recent nv gpu's
14:36karolherbst: yeah but I mean if two threads follow the same path
14:37karolherbst: robclark: recent nv gpus can do full preemption, maybe even selected threads
14:37karolherbst: and fault recovery
14:37robclark: in that case, I'd expect the atomic op to at least be dispatched in parallel.. but tbh I'm not sure what spec says about it
14:37karolherbst: so more like more diverge
14:37karolherbst: robclark: yeah, but this is kind of the point here. The test depends on this
14:39karolherbst: I think
14:39robclark: karolherbst, read what spec says, if it isn't clear see what ilia/ian have to say.. maybe send patch to piglit list to make it work similar to cts test ;-)
14:40karolherbst: well first I want to have a decent explanation what is going wrong here
14:40robclark: btw, check what happens w/ nv blob and this test?
14:42karolherbst: blob fails all tests
14:43karolherbst: but it seems like it just opted the loop away
14:43karolherbst: dunno, I would have to trace it
17:08imirkin: karolherbst: hm, i wonder if the current code would allow for 2 threads to fight each other
17:08imirkin: in a way that no one ever wins
17:13imirkin: pmoreau, pabs3: it's "disp" now since kernel 4.3
17:13imirkin: (rather than PDISP)
17:18imirkin: karolherbst: fwiw the atomicity test passes for me on a GF108
17:52ejsi: is there any documentation on how optimus is handled in windows?
17:57ejsi: Is there a channel/mailing list/whatever specifically on nvidia driver reverse engineering?
17:58imirkin: not that i'm aware of
18:00ejsi: Alright, I'll ask here then:
18:01ejsi: I'm working with a few other people on getting optimus cards working with pci passthrough on windows vms
18:02ejsi: After implementing the _ROM call in an ssdt and putting the rom data in ovmf, the driver loads properly
18:03ejsi: (And fairly close to baremetal performance, based on the passmark gpu compute benchmark I just ran)
18:04ejsi: You don't get output to a physical display, but you can use rdp with remotefx
18:05ejsi: But with
18:05ejsi: Oops, accidental enter
18:08ejsi: But the directx benchmarks, for example, fail with "invalid mode"
18:09ejsi: And when I try to start up the nvidia control panel program, which normally gives an error when run on a display that's not connected to an nvidia gpu, just displays nothing when I run it from an rdp connection with all other gpu drivers disabled
18:10ejsi: What's going wrong?
18:13ejsi: (I can provide data from envytools etc. from an identically configured linux guest if there's anything in particular that would be useful)
18:15imirkin: ejsi: well, i suspect that DX wants real modes to be settable
18:15imirkin: if the GPU has no display / connectors
18:15imirkin: then you can add a second fake-o video card into your VM
18:16ejsi: I tried passing through a full intel igpu
18:16ejsi: The driver loads
18:17imirkin: you mean both the intel and nvidia gpus?
18:17ejsi: Yeah they're both loaded
18:17imirkin: and it didn't help?
18:18ejsi: I have 3 video cards on this vm now
18:19imirkin: yeah, i have no clue how windows drivers work
18:19imirkin: (in general, or specifically wrt nvidia)
18:19ejsi: Virgl as the primary (00:02.0?), upt mode intel hd 530 as secondary (00:18.0) and the gtx 960m as secondary (tertiary?) on 01:00.0
18:19imirkin: i stopped using windows around win98, and haven't looked back ;)
18:20imirkin: and the virgl gpu presumably exposes real modes?
18:20ejsi: Yeah, it does
18:20ejsi: And the remote output seems to use that somehow when the driver is loaded
18:21ejsi: And it pretends the 960m doesn't exist
18:21ejsi: The same thing with the intel gpu
18:21imirkin: sorry, this is way out of my area of knowledge
18:21imirkin: you need someone who knows how windows works
18:22ejsi: I was hoping there'd be someone here who reverse engineered the nvidia optimus parts to reimplement bits of it on linux
18:22imirkin: optimus is just the software integration
18:22imirkin: there's no hw really
18:23imirkin: the nvidia driver could behave differnetly with a mobile part
18:23imirkin: but this is all just speculation
18:23ejsi: I keep seeing the words "Optimus Copy Engine" being thrown around a lot in the optimus whitepaper
18:24ejsi: No technical documentation ofc, in keeping with nvidia tradition
18:25imirkin: presumably that's just the regular copy engine
18:25imirkin: which is an interface to request the gpu copy data around
18:25imirkin: from place to place
18:26imirkin: basically a built-in dma copy engine...
18:26imirkin: helpful for optimus, as well as regular card operations like evicting memory from vram, putting it back, etc
18:28ejsi: Got it, maybe I should look more at dx documentation first
18:28imirkin: or wddm
19:10karolherbst: imirkin: interesting
19:13imirkin: and yes, depending on whether it returns a result or not, it might have a diff encoding
19:14imirkin: on nvc0/gk110 iirc it's the same, just RZ at the end
19:14imirkin: but on gm107 RED.asdf and ATOM.adsf have diff encodings
19:14imirkin: (which was the cause of quite some bugs in the initial bringup)
19:55plutoo: what does writing to FIRMWARE do?
19:55plutoo: invoke the microcode blob?
20:02imirkin: such writes cause interrupts to trigger and get handled by the grctx fw
20:03imirkin: usually it's used to write to context-switched registers
20:03imirkin: that aren't otherwise accessible
20:03imirkin: via regular graph methods
20:08plutoo: how do you know all this stuff
20:08plutoo: i'm seeing addresses like 0x00418e40/0x00418e58/... being written to FIRMWARE
20:08imirkin: try it and see? i dunno, i probably read it / it was explained / ...
20:08imirkin: yeah, so those are pgraph registers
20:09imirkin: the actual values are passed in via scratch usually
20:09imirkin: and the method is similar to a mask method
20:09plutoo: yeah that's exactly what i'm seeing
20:09imirkin: i.e. read, mask, or, write
20:09plutoo: macro_14f(0x00418e40, 7, 0xf);
20:09plutoo: macro_14f(0x00418e58, 0x842, 0xffff);
20:09plutoo: ... etc
20:09plutoo: looks like arg0 is the register address, arg1 is value, arg2 is mask
20:10imirkin: right. so that means "read register 418e40, zero out the low 4 bits, or in 7, write back to 418e40"
20:10imirkin: the special trick with this is that these are context-switched registers
20:10imirkin: so the writes have to done as part of the command stream
20:12plutoo: are the registers at 0x00418000 documented somewhere?
20:13imirkin: rnndb. those specific ones aren't.
20:13imirkin: $ ~/src/envytools/rnn/lookup -a 120 418e40
20:25 < }8]> hey fellas. im trying to start xorg and get " Failed to load module "nouveau" (module does not exist, 0)", but i have nouveau compiled (as module, kernel 4.15.14) and actually already loaded: nouveau 1449984 0
20:25 < }8]> any ideas whats going wrong?
20:26 < }8]> n/m, i needed xserver-xorg-video-nouveau
20:29 < }8]> is xserver-xorg-video-fbdev required for nouveau?
21:14orbea: }8]: is it blacklisted?
21:15orbea: is there a problem with your xorg.conf? (try removing it)
21:44imirkin: if you still need help, pastebin dmesg and xorg logs
21:46karolherbst: wow, I like those kind of rewrites: 48396 -> 48485 Pass rate
21:51karolherbst: just piglit sillyness though
22:08karolherbst: imirkin: sometimes tests get disabled because others crashed/failed
22:28pmoreau: imirkin: Ah, good to know. We should update the wiki some day.