11:18 cosurgi: On some remote server I am using:
11:18 cosurgi: export DISPLAY=:22
11:18 cosurgi: Xvfb :22 -screen 7 1024x768x8 &
11:18 cosurgi: xfce4-session &
11:18 cosurgi: x11vnc -display :22 -bg -nopw -xkb
11:18 cosurgi: There's an nvidia card, and a nouveau driver loaded.
11:19 cosurgi: The probem is that I want to open several instances with working opengl graphics inside Xvfb
11:19 cosurgi: I don't care if that's software rendered.
11:19 cosurgi: But I could not find a way to force Xvfb to use software rendering.
11:20 cosurgi: When I launch a second one I get an error about not being able to use /dev/dri/card0
11:20 cosurgi: How can I force Xvfb to use software rendering?
11:20 cosurgi: I only need this remote server to draw some super simple opengl stuff.
11:21 cosurgi: Those several instances are for several different users. All of them need simpe opengl to work.
11:21 cosurgi: *simple.
11:22 gnarface: you sure it isn't a permissions problem?
11:22 gnarface: oh, maybe xvfb has something to do with it
11:22 gnarface: i don't know
12:10 leidurleo: I Do not precisely know how the scheduling is still done, in multi SIMT core environment, I have attilagpu but miaow does not implement texture units which are responsible for scheduling stuff to certain compute units.
12:12 leidurleo: So in opencl and compute based stuff, there are abstractions for scheduling, this corresponds to some methods of texture units and arbitration on graphics shading, i have my ideas how to do this, but i may not be accurate what the real hw does.
12:16 leidurleo: what i have gathered so far, on compute workloads, based of the abstraction in the compute shader the chip allocates wavefronts dynamically to the compute units
12:17 leidurleo: but i think that sort of thing does not happen with texture units, where workgroups in fragment shading are allocated by the fixed function hw instead
12:26 leidurleo: i just do not understand the concept what is shared resource there, since the end goal is executing the same instruction with different operands and fragment on different compute units in parallel
12:27 leidurleo: i assume on opencl this is done via cache coherency protocol of instruction cache somehow
12:28 leidurleo: on earlier chips if this type of thing was possible it should had been some hw dependent command buffer methods on graphics shading
12:43 diogenes_: Hello guys, where can i find all available nouveau.config= options?
12:46 diogenes_: ok found it
12:49 leidurleo: the correct solution would be, the warp that spawns other warps stores it's queue entry where others will have the broadcasted pc entries which are temporary, then arbiter chooses another king warp, which spawns all other warps, and only king stores the queue entry etc.
12:52 leidurleo: othwrwise chip has duplicated queue entries instead of little lock-step serialized logic
12:52 leidurleo: which is definitely not good
14:32 leidurleo: ouh, actually the chip once again does that
14:34 leidurleo: I am not sure how the same PC is issued in fixed function for all the Compute Units, except for some amd cards, but it can be done with playing a little bit with texture units
14:36 leidurleo: instr_info_table.v and wave.c in miaow code, kinda referre to this, when f_decode_wfid == f_vgpr_alu_done_wfid reload new stuff from cache
15:40 leidurleo: rereading separate shader objects arb spec
15:49 leidurleo: this is crap like expected, well it is sure, that old GPUs did not have multiple compute units
15:51 leidurleo: i'd expect something like sm3.0 to have them as the first ones
16:07 leidurleo: the hw dispatcher has some shared tables , however i would not do things that way to be honest it seems pretty heavyweight adding latency a bit
16:31 Hopland: Hey folks. The nouveau driver freezes my system completely (i.e kernel panic). I'm unsure of how to collect logs. I'm running fedora 29. Is the magic SysRq key the way to go?
16:32 karolherbst: Hopland: mhh, is there anything special you do or jsut something randomly happening?
16:32 karolherbst: _but_
16:32 karolherbst: I think it's just an engine hang and the system would unfreeze the moment the process gets killed
16:32 karolherbst: Hopland: do you have a second machine you can use to ssh into your machine to check that?
16:33 Hopland: Unfortunately not
16:34 karolherbst: Hopland: what you could do is to have a timer doing a sync from time to time (1 minute interval or something) and then after the next freeze you can check the logs of the last boot
16:34 karolherbst: maybe it's still in the log
16:34 leidurleo: Hopland: skeggsb uses netconsole , serial terminal would be ok, but all that requires another machine
16:34 karolherbst: then you could check with "journalctl --boot -1" whats in there
16:35 leidurleo: so you have not gotten the context scheduler still into shape, to ban corrupted bits?
16:36 Hopland: karolherbst: nothing substantial. There are a few entries in regards to nouveau however - but no crash dumpb.
16:36 karolherbst: Hopland: well, mind pasting the last few lines nouveau prints?
16:37 Hopland: The last few lines seem to be common or standard outputs, but there are a few interesting lines at initial load...
16:37 Hopland: nouveau 0000:01:00.0: DRM: BIT table 'A' not found
16:37 Hopland: nouveau 0000:01:00.0: DRM: BIT table 'L' not found
16:37 Hopland: not sure if that means anything
16:37 leidurleo: Hopland: what is the app triggering this...?
16:38 Hopland: what app? gdm... I guess. It's different though. On some liveusb's I got into the desktop, but after a short while.. freeze
16:38 leidurleo: long time ago, i had serious locking due to lack of multithreading of contexts, and i gave up on it
16:38 leidurleo: Hopland: well that is encouraging :D, what chip?
16:39 Hopland: for instsance: on elementary OS liveusb the system would instantly freeze is I tried to open settings or the user menu on the top right
16:39 Hopland: Well I've been going on for two weeks now blacklisting the nouveau driver, but I thought I might jump down that rabbit hole
16:39 leidurleo: Hopland: i do not think it is kepler then, kepler was little more stable when it came to glamor, some new chip then
16:40 leidurleo: *than
16:40 Hopland: 1060
16:40 karolherbst: Hopland: hard to say what's the cause without having anything in the logs
16:40 Hopland: It should also be noted that I'm not sure if the acpi on this system fully conforms to the linux kernel
16:41 Hopland: I know
16:41 Hopland: I think I got a kernel panic log in ubuntu, but now I'm running fedora
16:41 Hopland: right now I'm just trying to figure out how I could collect that data
16:42 leidurleo: https://www.techpowerup.com/gpu-specs/nvidia-gp106.g797
16:43 leidurleo: 4.4billion transistor bundled pascal
16:43 Hopland: Are there any kernel boot parameters for nouveau that might help isolate the problem?
16:44 leidurleo: is that chip suppose to give 3d hardware acceleration those days too, did they release those signatures i mean NVIDIA?
16:45 Hopland: It should be noted that this is an ASUS ROG gaming laptop (specifically the Zephyrus M, GM501)
16:45 Hopland: There might be some BIOS trickery going on.. but nothing I can alter, as the BIOS is a bit strict about what one can do
16:46 leidurleo: as i remember ben i think did the 3d engine code, but prolly the firmware of reclocking would not work on this
16:46 leidurleo: i do not know nothing about such card
16:47 Hopland: can't find anything about it in the bug tracker
16:50 leidurleo: i vaguely remember reading something about it, i think maybe pascal was the first chip to remove the GigaThread engine, that sits between the pcie and decode units
16:54 Hopland: Should I try to reboot with the nouveau.debug="PTHERM=debug,PTIMER=debug" boot parameter? the kernel config in /boot says that the nouveau driver has been built with the CONFIG_NOUVEAU_DEBUG=5 flag
16:55 Hopland: and what about NvI2C?
16:56 Hopland: nothing in i2cdetect -l shows nvidia related i2c entries...
17:02 karolherbst: Hopland: that won't help
17:02 Hopland: k
17:02 leidurleo: Hopland: seems none of the experts are coming into this, sorry i dunno anything about NVIDIA cards
17:02 Hopland: installing kdump now btw
17:02 karolherbst: you really want to be able to get the last dmesg from the last boot
17:03 karolherbst: Hopland: "journalctl --boot -1 --dmesg" might give you a more filtered result
17:03 karolherbst: and there might be some errors at the end
17:03 leidurleo: Hopland: is the firmware loaded on this card for acceleration?
17:03 karolherbst: or "engine resets" or something
17:04 Hopland: 1) the dmesg is not very informative. I've already showed you everything that was off in regards to nouveau. 2) I'm not sure if the firmware is loaded for acceleration
17:04 leidurleo: https://www.phoronix.com/scan.php?page=news_item&px=NVIDIA-Signed-Firmware-Pascal
17:05 karolherbst: Hopland: mhhh, so probably the journal wasn't flushed out to the drive :/
17:05 karolherbst: Hopland: how does the freeze look like? nothing happens, but you can still move the cursor?
17:05 karolherbst: leidurleo: whether the firmwares are loaded or not are completly irrelevant to this issue
17:05 karolherbst: spoiler: they are
17:05 Hopland: cursor freezes as well. I can't even ctrl-alt-f1 into tty
17:06 karolherbst: mhh
17:06 leidurleo: *plonk*
17:06 karolherbst: Hopland: mhhh, so maybe it is indeed a hard crash, so yeah...
17:07 karolherbst: Hopland: do you have pstore mounted? /sys/fs/pstore/
17:07 karolherbst: there _might_ be some files in there
17:07 karolherbst: but... it seems all distributions are configureing it wrong
17:07 karolherbst: so there won't be any
17:08 Hopland: karolherbst: nothing there
17:08 karolherbst: yeah...
17:08 karolherbst: crappy distributions :P
17:08 karolherbst: none of them does it
17:08 karolherbst: Hopland: "/sys/module/pstore/parameters/backend" contains "(null)", right?
17:09 Hopland: Jepp
17:09 Hopland: It shure does dere, buddy
17:10 karolherbst: mhhhh
17:10 leidurleo: karolherbst: i followed your vision about graphics workload warp scheduling, and was it such, that on graphics you get parallelism on only one core if the compute unit is not being used , it sounded ammusing, but it looks like someone is also talking about it?
17:10 Hopland: installing kernel-debuginfo and crash right now
17:10 Hopland: See if I can't get a proper crashdump
17:10 karolherbst: Hopland: I know that there are some systems where writing into the uefi can brick the machine though :/
17:10 karolherbst: but I think those issues are resolved
17:10 karolherbst: and only covered samsung laptops
17:10 karolherbst: or something
17:10 karolherbst: Hopland: yeah.. well, the thing is, when the kernel crashes for real, there is nothing the kernel can do
17:10 karolherbst: not even writing into the fs
17:10 karolherbst: because it could corrupt it
17:11 karolherbst: pstore can be used to store the current dmesg into the uefi
17:11 karolherbst: and read out on next boot
17:11 Hopland: I see
17:12 karolherbst: Hopland: with pstore.backend=efi it can be enabled though... but again, there were some systems out there which were allergic to that kind of stuff :/
17:12 karolherbst: best alternative would be netconsole
17:12 karolherbst: but that requires a second machine
17:13 karolherbst: there is also some weirdo memory backend for pstore
17:13 karolherbst: but I have no idea how to use that one
17:14 Hopland: I'm going to test some boot parameters - brb
17:15 karolherbst: Hopland: ohh, you could also just use your phone
17:15 karolherbst: and use that as a ssh client
17:16 karolherbst: and just run "dmesg -w'"
17:16 karolherbst: this might be enough to print something
17:19 Hopland: Watchdog BUG: soft lockup
17:20 karolherbst: ahh
17:20 karolherbst: is there also a stacktrace?
17:24 Hopland: Yup ^^
17:25 Hopland: https://paste.fedoraproject.org/paste/NBvwn4pEz~Od2K1-TrkVSA
17:25 Hopland: BAM!
17:27 karolherbst: huh
17:27 karolherbst: ahh, ignoring the first stacktrace
17:27 karolherbst: ohhh
17:27 karolherbst: Hopland: this is a laptop right?
17:27 Hopland: Indeed
17:27 karolherbst: ahh, heh.. known issue
17:28 karolherbst: mhhh
17:28 Hopland: My beautiful multi-coloured ASUS ROG Zephyrus M GM501 ^^
17:28 Hopland: Really? Cool! That means I can just wait :D
17:28 karolherbst: well...
17:28 karolherbst: "wait" is a bit overstressing it
17:28 Hopland: or...?
17:29 karolherbst: we have no idea on how to fix it, and nvidia is "working on it"
17:29 Hopland: crudd
17:29 karolherbst: the painful part is, the nvidia driver doesn't do that runtime d3cold stuff on linux
17:29 karolherbst: so we can't really reverse engineer it
17:29 leidurleo: hmm what, means this card is not working on nvidia binary too?
17:29 karolherbst: Hopland: nouveau.runpm=0 "fixes" it
17:30 Hopland: So I'm guessing if it'll ever be fixed it'll be fixed in the form of a firmware update
17:30 karolherbst: but.. your GPU is always on then
17:30 Hopland: Yeha
17:30 Hopland: yeah*
17:30 HdkR: "working on it"
17:30 karolherbst: HdkR: well, there _was_ an update
17:30 HdkR: Oh snap
17:30 Hopland: Even on Ubuntu, there's this bug since 16.04 where some dGPUs will be running - DESPITE the fact that the module has been unloaded
17:31 karolherbst: Hopland: well, that's the default
17:31 Hopland: Not on Windows :(
17:31 karolherbst: Hopland: what you could do is to blacklist nouveau and let something enable the runpm features through sysfs
17:31 Hopland: there it turns off quite nicely
17:31 karolherbst: Hopland: even on windows, it needs driver support for turning the gpu off
17:31 karolherbst: well right...
17:31 karolherbst: the issue is not turning it off
17:31 Hopland: Of course it does
17:31 karolherbst: the issue is turning it back on
17:31 Hopland: ah
17:31 karolherbst: this fails... for whatever reasons
17:31 karolherbst: and essentially hangs your system
17:32 karolherbst: but...
17:32 karolherbst: if there is no driver doing stuff, it all turns off/on quite nicely
17:32 Hopland: Which is why my default is to not have the nvidia driver installed and blacklisting nouveau from boot parameters
17:32 karolherbst: Hopland: I have a kernel module which only does that: https://github.com/karolherbst/pci-stub-runpm
17:32 karolherbst: needs tweaking for other devices
17:32 karolherbst: and such
17:33 karolherbst: but you could also let tlp or laptop-mode-tools handle that
17:33 karolherbst: doesn't really matter
17:33 karolherbst: Hopland: mind pasting your lspci -t ?
17:33 karolherbst: uhm..
17:33 karolherbst: with nouveau blacklisted
17:33 Hopland: In the presence of GODS one must comply
17:34 Hopland: https://paste.fedoraproject.org/paste/4DWIirSY5A3RXLhyWMtI5A
17:34 karolherbst: ahh, that makes it easy
17:35 leidurleo: shit. i would like to know how the SM in glsl is being scheduled to
17:35 karolherbst: echo "auto" > /sys/bus/pci/devices/0000\:01\:00.0/power/control
17:35 karolherbst: echo "auto" > /sys/bus/pci/devices/0000\:00\:01.0/power/control
17:35 karolherbst: Hopland: ^^
17:35 karolherbst: with that your GPU should turn off and on
17:35 karolherbst: well... it wouldn't really turn on all that much, because no driver
17:35 karolherbst: but still
17:35 Hopland: hmm
17:35 karolherbst: shouldn't cause any hangs
17:36 karolherbst: cat /sys/bus/pci/devices/0000\:01\:00.0/power/runtime_status
17:36 karolherbst: tat should print "suspended"
17:36 karolherbst: *that
17:36 Hopland: active
17:36 karolherbst: did you invoke the echo calls?
17:36 karolherbst: both of them?
17:36 Hopland: nope
17:36 Hopland: should I? ^^;
17:36 karolherbst: if you do, then runtime_status should flip over to suspended
17:37 karolherbst: yeah
17:37 karolherbst: well, if you care about using your laptop on battery that is :p
17:37 karolherbst: also, might spin down the fans a little
17:37 Hopland: I'm having a weird bug now where I can't sudo anymore
17:38 karolherbst: you can't
17:38 karolherbst: because
17:38 karolherbst: you try to write the output of "sudo echo auto" into the file as your user ;)
17:38 karolherbst: "echo "auto" | sudo tee ..." is workaround
17:38 Hopland: no no - I can't sudo anything
17:38 karolherbst: or root shell
17:38 karolherbst: ohh
17:38 karolherbst: mhh
17:38 karolherbst: that's a problem then
17:39 karolherbst: Hopland: sudo -s?
17:39 Hopland: At leasst not in wayland/gdm
17:39 karolherbst: well worst case
17:39 karolherbst: su root
17:39 karolherbst: and root password
17:39 Hopland: in tty it works ^^;
17:40 karolherbst: ...
17:40 karolherbst: weird
17:41 Hopland: indeed it is
17:41 Hopland: Going to try and reboot - brb
17:42 Hopland: reboot did the trick!
17:42 karolherbst: huh?
17:42 karolherbst: what do you mean?
17:43 Hopland: must've been some systemd cockup
17:43 Hopland: in any case, I ran both those echos and upon cat'ing runtime status: suspended
17:43 karolherbst: Hopland: with nouveau blacklisted?
17:43 Hopland: Yep
17:44 karolherbst: okay, nice
17:44 karolherbst: "lspci" just works?
17:44 Hopland: no nvidia driver either
17:44 Hopland: jepp
17:44 karolherbst: lspci might cause some delay though
17:44 karolherbst: okay
17:44 karolherbst: I guess that should work out alright then
17:44 karolherbst: we are trying to get a proper fix for that, but that's not as easy :/
17:45 karolherbst: I have some ideas what the issue is, but nothing where I'd say I really undertand the issue
17:48 Hopland: You are so far above me on that one
17:49 Hopland: Created a bash script and a systemd service to echo those values on boot
17:49 Hopland: Thanks :D
17:49 Hopland: I'm going to stick around on this channel, so if you want me to test something let me know
17:50 Hopland: Just don't brick my system ^^;;
17:56 leidurleo: all that i understand from the hw dispatcher that this is also round robin
17:56 leidurleo: between CUs but...there are details missing still
18:05 leidurleo: current understanding is that dispatch latency is just killing the performance, it is not worth it, so running things on single core for the first pixel would be just fine
18:27 leidurleo: one nvidia guy says so in stackoverflow though, that when one workgroup is finished or threadblock, newer nvidia cards raise an interrupt and go to the next
18:28 leidurleo: https://stackoverflow.com/questions/6605581/what-is-the-context-switching-mechanism-in-gpu
19:05 leidurleo: https://devtalk.nvidia.com/default/topic/736932/what-happen-to-shared-memory-on-block-preemption-/
19:33 leidurleo: http://3dvision.princeton.edu/courses/COS598/2014sp/slides/lecture08_GPU.pdf
19:34 leidurleo: ok I am done, it seems finally I was able to agree with karolherbst too, in the beginning it sounded a bit weird to me, that on GLSL only one CU is scheduled at time
19:45 leidurleo: so now you should understand right, instead of reusing the filled in queue slots after the first pixel is done which would give all the performance in the world, you do fetch anything again with full pipeline, therefor the performance is bad also
20:00 armadi: Hi all
20:00 armadi: has anyone had any luck with a laptop with nvidia optimus + displayport monitors via a tb3 dock?
20:02 armadi: Manjaro works out-of-the-box on the laptop screen and Kubuntu (with nouveau) works out-of-the-box on the dp monitors
20:02 armadi: but nothing I have tried has made them both work (Besides the windows that the laptop came with)
20:03 karolherbst: armadi: mhh, weird
20:03 karolherbst: it should just work, more or less
20:03 karolherbst: I'd imagine that kubuntu does something weird if the laptop screen doesn't work
20:03 karolherbst: because there is no reason why it wouldn't
20:03 armadi: The manjaro install (my preferred one) doesn't seem to like loading nouveau, it's never loaded at boot
20:04 armadi: if I try to load it, it errors and hangs
20:04 karolherbst: mhhh
20:04 karolherbst: what is the error?
20:04 armadi: the first relevant one is `nouveau 0000:01:00.0: bus: MMIO read of 00000000 FAULT at 409800 [ TIMEOUT ]`
20:04 karolherbst: doesn't matter
20:04 armadi: the hanging is from a billion `nouveau 0000:01:00.0: i2c: aux 0004: begin idle timeout bad00100`
20:05 karolherbst: ufff
20:05 karolherbst: that sounds like an ugly one
20:05 karolherbst: does nouveau load alright without the tb3 display connected?
20:06 armadi: sort of, there are still errors but it doesn't actually hang
20:06 armadi: brb let me try again
20:07 leidurleo: anyhow: karolherbst: what i did there was, add the xilinx block_ram module and fixed the vgpr.v and issue testbenches minorly, and called the issue_tb.v from vgpr_tb.v with minor modification, issue_tb.v alone would had been fine too
20:08 leidurleo: and than simulated stuff with verilator, since i do not have synapsys, and there wr_decode_data flops were dumped in the Trace.cpp file
20:08 Armadi_: Ok connected on my phone so when it hangs I don't drop
20:08 leidurleo: and final thing is the code...but that is very easy
20:10 Armadi_: It hangs a little and then loads with the same errors
20:10 Armadi_: And then tries to load Nvidia which I have blacklisted ?????
20:10 leidurleo: block_ram module for the vgpr register file was needed to be added and regfile implementation changed based of the scratch aka miaow2.0, since the old one did not simulate or compile
20:13 armadi3: karolherbst: any idea why nvidia will get loaded if I have it blacklisted?
20:14 armadi3: I mean it's not loaded but it threw some messages in the kernel log ???
20:14 leidurleo: ah heck now i remember, that issue_tb.v was not enough but, i just called the issue module from vgpr_tb.v, i can give the trace though
20:16 armadi3: by booting, loading nouveau with runpm=0, and *then* connecting the dock, I can get xrandr to show the ports, but they show as disconnected
20:16 armadi3: I've been messing with it for a few days now
20:18 leidurleo: karolherbst: finnish hackers from Tampere have the full opencl implementation called POCL as you know, it is 2.0 compliant, they use it as a front-end to their TCE processors
20:18 karolherbst: armadi3: userspace
20:18 karolherbst: if applications do cuda stuff eg
20:18 karolherbst: or CL
20:18 karolherbst: the nvidia userspace tries to load the nvidia kernel module
20:19 karolherbst: armadi3: mhh, interesting
20:19 armadi3: do you think I should try uninstalling then?
20:19 armadi3: nvidia blobs I mean
20:19 karolherbst: armadi3: I know we had some issues there, but it should work with newest software
20:19 karolherbst: armadi3: shouldn't matter
20:19 karolherbst: nouveau.runpm=0 is definetly a good idea to debug this
20:20 karolherbst: armadi3: what's the version of the kernel you are running?
20:20 armadi3: 5.0.1, the kubuntu that worked on the DP monitors was 4.19
20:20 armadi3: or maybe 4.18 idr
20:20 karolherbst: mhhh
20:21 karolherbst: can you try to plug them out and in again?
20:21 karolherbst: also content of your /var/log/Xorg.0.log should be helpful
20:22 armadi3: nothing in the kernel log from unplugging/replugging the DP
20:22 armadi3: I'll get a paste for the X log
20:22 leidurleo: it is distribution specific, but the nvidia driver binary today, has also some KMS console plumbing?
20:23 armadi3: karolherbst: https://pybin.pw/8Jiu
20:23 leidurleo: armadi3: otherwise just unload the nvidia module, and change the glx xorg module against mesa stuff
20:23 leidurleo: armadi3: which distribution?
20:24 armadi3: leidurleo: manjaro
20:24 leidurleo: manjoro, is arch based right, i can not remember how that worked
20:24 armadi3: leidurleo: according to lsmod, nvidia isn't actually loaded
20:24 leidurleo: i use mint linux
20:24 armadi3: leidurleo: yes it is basically arch
20:25 leidurleo: what i think maybe the usual problem could be that there are remains of the nvidia ddx glx modules somewhere
20:26 armadi3: how would I go about checking/fixing that?
20:26 karolherbst: armadi3: uff, it uses modeset for both devices... mhh
20:27 leidurleo: armadi3: well i managed to open your log, and it does not prove my theory
20:27 leidurleo: you are trying to load an intel module there
20:27 leidurleo: probably cause you have also an integrated gpu
20:27 armadi3: leidurleo: intel needs to be loaded for optimus, doesn't it?
20:27 karolherbst: armadi3: sadly there aren't many with a TB3 display yet.. allthough that should be simply be DP
20:28 karolherbst: Lyude: any ideas?
20:28 leidurleo: armadi3: well i can't remember there was something like reverse prime which yeah might had needed to be loaded indeed
20:29 Lyude: karolherbst: drm.debug=0x106
20:29 karolherbst: armadi3: listen to Lyude
20:29 karolherbst: DP expert :p
20:29 armadi3: ok, stand by for reboot
20:30 leidurleo: i think that line figures that out probably
20:30 leidurleo: if there is output mux available
20:30 Armadi_: I can't get into grub lol, it boots too fast
20:31 Armadi_: It's tab to interrupt? Or lshift
20:31 karolherbst: Lyude: debug can be changed at runtime, right?
20:31 Armadi_: Oh well let's do that then
20:31 karolherbst: figuring out hotplugging might be even easier
20:31 karolherbst: Armadi_: hotplugging also doesn't work, right?
20:32 karolherbst: Armadi_: as root "echo 0x106 > /sys/module/drm/parameters/debug"
20:32 karolherbst: and then hotplug the display
20:33 karolherbst: there should be new stuff in dmesg then
20:33 Lyude: karolherbst: yes
20:33 Armadi_: Yes indeedy
20:33 Lyude: Armadi_: left shift/f8/esc
20:34 Lyude: there's also a command to make it go to the grub menu on the next boto
20:34 Armadi_: Ok there is new stuff in dmesg, here's a paste of it in whole: https://pybin.pw/GH6K
20:35 Lyude: Armadi_: you'll need to unplug the monitors in question and plug them back in, as that debug option I gave you logs the AUX channel transactions going between the GPU and the displays
20:36 armadi4: Lyude: I did that
20:36 Lyude: so I can see the actual detection sequence
20:36 Lyude: o_O
20:36 Lyude: you're sure?
20:36 Lyude: mind just trying it one more time for hahas?
20:36 armadi4: Yes
20:36 armadi4: just did, no new messages
20:36 karolherbst: Armadi_: mind giving us your full dmesg?
20:37 karolherbst: I ... have a bad feeling
20:37 armadi4: that last link should be it
20:37 Lyude: Oh
20:37 Lyude: Might also be the fact you don't have nouveau loaded
20:37 armadi4: literally piped dmesg into curl for that
20:37 karolherbst: ohh
20:37 armadi4: oh you're right lol
20:37 karolherbst: :/
20:37 Armadi_: Ok we'll try that again
20:38 Armadi_: Oh wait no
20:38 Armadi_: :| gotta reboot now
20:38 leidurleo: :)
20:40 leidurleo: Lyude: btw. what this AUX transaction log should reveal/show?
20:40 leidurleo: cause again of course i know nothing about that
20:41 Armadi_: Ok, set drm debug, now loading nouveau...
20:42 leidurleo: allready googled, yeah insanity
20:42 Armadi_: Ok so I did it right this time and there's still no new messages (besides the 4 that you already have)
20:43 Armadi_: New dmesg: http://pybin.pw/7rMS
20:45 leidurleo: i can see there totally different messages about nvidia outputs
20:46 leidurleo: since prolly you loaded the nouveau this time too
20:46 armadi5: yes, I set drm.debug, then loaded nouveau, then plugged in the dock, then unplugged/replugged one of the DP cables
20:46 armadi5: idk if that's the best order
20:47 karolherbst: ufff
20:47 armadi5: still not sure why nouveau isn't loading at boot
20:47 karolherbst: big ufff
20:47 leidurleo: [ 126.703146] NVRM: No NVIDIA graphics adapter probed!
20:47 karolherbst: armadi5: what version of linux-firmware do you have?
20:47 karolherbst: leidurleo: please don't interfer with debugging if you have no clue
20:47 armadi5: karolherbst: 20190212.28f5f7d-1
20:48 karolherbst: mhhhhh
20:48 karolherbst: big uff
20:48 karolherbst: the accel engine doesn't come up
20:49 karolherbst: it might be the bug I have on my gp107 as well... but that's a firmware bug nvidia has to fix
20:49 armadi5: cool cool cool cool
20:49 armadi5: I mean worst case I have to use kubuntu and have 33% fewer monitors than I'd like
20:49 karolherbst: yeah.. and the display engine is off
20:49 karolherbst: but
20:50 karolherbst: why does it work with kubuntu
20:50 karolherbst: armadi5: nvidia driver with kubuntu?
20:50 armadi5: Do you think it would help if I booted up kubuntu to get some logs from there?
20:50 armadi5: karolherbst: nope, that's using nouveau as well.
20:50 karolherbst: mhhhh
20:50 karolherbst: yeah... a dmesg would help
20:50 armadi5: okay
20:50 karolherbst: and /var/log/Xorg.0.log
20:52 Armadi_: Keep in mind this is booting from a live USB so things might be weird
20:52 Lyude: hm
20:52 Lyude: karolherbst: yeah I was just about to say before I scrolled down that I think DP probably isn't the issue here
20:53 Lyude: Armadi_: don't worry, linux is smart enough to not really make any difference when booting off a USB
20:53 Lyude: unlike some other OSs
20:53 Armadi_: :)
20:54 leidurleo: interact what? i can just read the log, there is something bad with memory engine maybe
20:55 armadi7: also, interestingly, I don't need to authorize the tb3 dock for dp to work from kubuntu
20:55 Lyude: yeah that's expected
20:55 leidurleo: https://fuse.wikichip.org/news/1224/a-look-at-nvidias-nvlink-interconnect-and-the-nvswitch/
20:55 karolherbst: it's a firmware setting though
20:55 Lyude: DP kinda goes onto it's own lane iirc
20:55 Lyude: karolherbst: for DP? I don't believe so
20:55 karolherbst: ohh, DP, yes
20:56 karolherbst: Lyude: I think on my XPS I can require auth for DP as well... not quite sure
20:56 karolherbst: but there are like multiple settings
20:56 karolherbst: just turned it oj a while ago :)
20:56 karolherbst: well the basic TB stuff
20:56 Lyude: karolherbst: yeah, I'm not entirely sure myself either after I noticed the other day my razer laptop doesn't notice the MST hub connected to my TB3 dock until it's authorized
20:56 Lyude: could just be mst being slow though
20:56 armadi7: Xorg.0.log: https://pybin.pw/oO_f
20:57 karolherbst: Lyude: well, that's TB then ;)
20:57 armadi7: dmesg: https://pybin.pw/6nfC
20:57 Lyude: karolherbst: maybe, I'd like to actually look at the kernel output to be sure because if it is that means I'm probably going to try at some point to see if we can hook bw info from TB into drm again
20:57 karolherbst: heh :D
20:57 Lyude: once my plate is like, not overflowing
20:57 karolherbst: right :)
20:58 karolherbst: I also have a nasty issue to debug right now, and that's nasty
20:58 karolherbst: "nouveau 0000:01:00.0: gr: init failed, -22"
20:58 karolherbst: so there is that
20:59 karolherbst: but apperantly we don't fail as fatal as we do on newer kernels?
20:59 armadi7: karolherbst: yes but this is kubuntu where the dp monitors *do* work
20:59 karolherbst: yeah, exactly
20:59 armadi7: ohh
20:59 karolherbst: the gr engine shouldn't be required for displaying stuff
21:00 karolherbst: but.. maybe we changed something somehow and now we fail to bring the display stuff up with gr init fails
21:00 karolherbst: skeggsb should know
21:00 Lyude: karolherbst: it may cause problems or be a sign of a larger underlying issue though
21:00 karolherbst: well
21:00 karolherbst: gr fails
21:00 karolherbst: that's a point where I am already willing to stop debugging and blame nvidia for shipping us crappy firmware
21:01 Lyude: armadi7: if you have the knowledge and/or willpower: a bisect of this issue would help a ton
21:01 karolherbst: and crappy means that there is no "error" in order for us to debug it anyway
21:01 armadi7: Lyude: this is my new setup and I'm on the second day of my job still trying to get my monitors working, so I'll see if I have time
21:02 armadi7: Lyude: tbh I'm not even sure it's the version mismatch causing the difference (unless you are), it could be a config thing
21:03 Lyude: I'd imagine it's a version mismatch, there isn't any config that can make nouveau just not turn on like that :p
21:03 Lyude: s/version mismatch/regression/
21:03 armadi7: in any case, thanks to the whole crew here for giving it a try
21:04 armadi7: fun fact: if I boot up kubuntu without the dock, the laptop screen still doesn't work. There's just no display output, so that's super cool :|
21:04 armadi7: So it looks like there might be an intel bug as well
21:05 karolherbst: :/
21:05 karolherbst: armadi7: well that seems to be fixed with a newer kernel though
21:05 Lyude: that's very suspecious
21:05 karolherbst: or userspace stack
21:08 karolherbst: Lyude: or well.. kubuntu uses the modesetting ddx, where the other one used intel
21:08 Lyude: ooooh, yeah that could be it
21:08 armadi7: all right, it's quitting time here so I'm out. Thanks again for your help. Lyude, I will try to bring back some useful info if I ever find the time to bisect (if I do before it's fixed)
21:09 Lyude: armadi7: np, thank you for reporting the issue! :)
21:48 lebomees: maybe someone can read it... delt a month with those simulators.
21:48 lebomees: http://dpaste.com/2A2HR6W
21:50 lebomees: i extracted a part from the trace file, it is from instr_info_table.v and includes only f_vgpr_alu_wr_done_wfid mux and the starts of or some part of f_vgpr_lsu_wr_done_wfid
21:51 lebomees: there are those hex numbers in the right hand side, and 40*40 instances approximately, it will take some time to read for the best too, maybe glisse can manage to do that
21:53 lebomees: the whole trace froze the dpaste site on my computer, cause it was a half a million lines approximately
22:01 lebomees: i can't make it more clear, i had trouble reading the code myself too, maybe this dump helps, anyhow those the issue queues
22:03 lebomees: there are "7 Vtemp" variables from Vtemp2990 to Vtemp3091 each having 7*40 different wfid combinations or so
22:06 lebomees: this linux console stuff is pretty neat, it always ticks reliably on extra big files, and the simulator was also neat to dig those flops out
22:07 lebomees: allthough pretty nuts stuff, i do not pretend to be any good at this, but the code is easy
22:07 lebomees: the final code
22:14 lebomees: i think i mostly explained how this works too, i dunno if you can only understand
22:15 lebomees: hw wise it does not cost much to have such central flops in the chip, but performance wise this does nuts stuff in low power mode
22:16 lebomees: so designers have planted such queues there, and i explained shortly allready how to run them, so i need to go now