01:09 karolherbst: mslusarz: mhh, I get the feeling that mmt suddenly stops tracing
01:09 karolherbst: any ideas?
01:09 karolherbst: https://gist.githubusercontent.com/karolherbst/396d897449e4c4282b110686afce77f5/raw/c8be192d2f06387f81f3a25d5b51cd901628b365/gistfile1.txt
01:09 karolherbst: are the last lines
01:10 karolherbst: ohh wait
01:11 karolherbst: seems like it just stops parsing stuff
01:24 karolherbst: anyway, seems there is actually a lot of missing
11:59 karolherbst: mslusarz: mhh, somehow we fail to detect all ib pointers
12:00 karolherbst: https://gist.githubusercontent.com/karolherbst/69dec1c6cfa5504c3402ff91dd86ebaa/raw/5cb733ae01837b29718c51665b384dce4cc0ec2d/gistfile1.txt
12:00 karolherbst: the top one is getting parsed
12:00 karolherbst: the lower one, not
12:01 karolherbst: and the lower one looks obviously like it contains sane data
12:08 karolherbst: maybe this is more helpful to look at: https://gist.githubusercontent.com/karolherbst/76f06eb8fc232b0e532d1d37dbfddb6a/raw/b99ba6d71984b1bd9c810ad54a8ee9f707f58fa1/gistfile1.txt
12:08 karolherbst: I removed 0 read/writes
12:08 karolherbst: valid entries should start at "w 3:0x0000 (gpu=0x200600000), 0x20012000"
13:09 karolherbst: mslusarz: ohhhh, that thing is detected like 8000 lines below...
13:09 karolherbst: w 2:0x0000 (gpu=0x200400000), 0x00600000
13:09 karolherbst: w 2:0x0004 (gpu=0x200400004), 0x0002ae02 IB: address: 0x200600000, size: 171, not_main, buffer id: 3
15:58 karolherbst: ohhh wow
15:58 karolherbst: steam beta ships a wine fork
15:59 karolherbst: using dxvk
16:02 diogenes_: there should be winepak somewhere on the horizon
16:09 RSpliet: karolherbst: what's the status of valgrind-mmt, OpenCL and recent drivers?
16:09 karolherbst: RSpliet: quite good actualy
16:09 karolherbst: *actually
16:09 karolherbst: I am sure I don't get everything
16:10 karolherbst: but at least tons of GP104_COMPUTE. messages are decoded
16:10 RSpliet: That's the github/envytools/valgrind instance?
16:10 karolherbst: no
16:10 karolherbst: with those it doens't work at akk
16:10 karolherbst: *all
16:11 karolherbst: RSpliet: https://github.com/karolherbst/valgrind/commit/63203ebe3989ce6b591c301752afc09199463be0
16:11 karolherbst: https://github.com/karolherbst/envytools/commit/1b79037ac51b0a5d4e1f86cb4912a4cbc38900ce
16:12 RSpliet: Ok. Pre-UVM drivers should kinda sorta work with envytools/valgrind ?
16:12 karolherbst: yeah, I expect as much
16:13 RSpliet: Hmm no it's not quite as black and white as that
16:14 karolherbst: I am sure they had something like UVM before already
16:14 karolherbst: or maybe not
16:14 RSpliet: Yeah since 2013, but just didn't use it until recently
16:14 karolherbst: anyway, with the uvm based driver it didn't work on my kepler GPU as well
16:14 RSpliet: With 367.27 valgrind-mmt works, but that's the last version that does
16:15 karolherbst: ahh
16:15 karolherbst: so 367.27 it is
16:15 RSpliet: Yeah, I was hoping to find something slightly newer, as I don't want to ignore 2 years of shader compiler improvements
16:16 karolherbst: well
16:16 karolherbst: you should be able to grab the shaders with my patches on newer drivers
16:16 karolherbst: allthough I am sure there are still issues
16:16 karolherbst: currently at the end of the trace I get that: https://gist.githubusercontent.com/karolherbst/46a5c7181f653fd1ab4415ee863082db/raw/7b98061c06ff6bb840c1bb0eb836221ee892ae3a/gistfile1.txt
16:16 karolherbst: and the DATA is the OpenCL kernel I launched
16:17 RSpliet: Yeah, and sadly I don't have a lot of time for issues :-P Appreciate the WIP though!
16:17 karolherbst: yeah.. currently digging through how we find those pushbufs and decode them
16:17 karolherbst: for odd reasons I never found the launch command
16:18 karolherbst: but there are tons of other code uploaded as well
16:18 karolherbst: and I think it is the trap handler, which is like super huge
16:21 karolherbst: RSpliet: any idea what "SPA_VERSION' could refer to?
16:21 RSpliet: in what context?
16:21 karolherbst: 5.7 as the version I found inside the trace
16:21 karolherbst: uhm compute
16:21 karolherbst: RSpliet: PB: 0x00000507 GP104_COMPUTE.SPA_VERSION = { MINOR = 0x7 | MAJOR = 0x5 }
16:22 karolherbst: it should be there since kepler
16:23 karolherbst: RSpliet: but I found other golden things inside the open-gpu-docs: SHADER_LOCAL_MEMORY_WINDOW_ADDRESS, SHADER_SHARED_MEMORY_WINDOW_ADDRESS
16:24 RSpliet: Yeah those I knew the existence of... you can pick your virtual addresses for local/shared mem
16:24 karolherbst: those exist since Pascal though
16:24 pmoreau: Nice! Were you not looking for those at some point, when dealing with passing the base shared pointer from OpenCL to the kernels?
16:25 karolherbst: I am sure pre pascal there might be different methods
16:25 karolherbst: pmoreau: open-gpu-doc/Compute-Class-Methods/1/clc0c0.h :)
16:25 karolherbst: there is tons of good stuff in it
16:25 RSpliet: I darenth say what SPA stands for.
16:26 karolherbst: well what we call MEM_BARRIER
16:27 karolherbst: :O
16:27 karolherbst: "SET_SHADER_EXCEPTIONS"
16:27 karolherbst: maybe I need that for the trap handler?
16:28 karolherbst: RSpliet: why not?
16:29 RSpliet: Because I don't know :-D
16:29 karolherbst: :D
16:29 karolherbst: anyway
16:29 karolherbst: those header files are golden
16:30 karolherbst: they should be more or less complete
16:30 HdkR: Scalable Processor Array?
16:30 karolherbst: HdkR: what's that? sounds like a terrible marketing term :p
16:31 HdkR: haha :D
16:31 karolherbst: ohhh crap, it actually is
16:35 RSpliet: In that case I'd expect it to roughly correlate with the compute capabilities, and be a read-only value?
16:36 pendingchaos: karolherbst: there's also some launch descriptor documentation: https://download.nvidia.com/open-gpu-doc/qmd/1/
16:41 pmoreau: karolherbst: Can’t remember if I had a look or not at that document. I should do some day.
17:02 RSpliet: mslusarz: it appears I still require increasing "m8" in coregrind/m_initimg/initimg-linux.c:1007 to 128*m1 for valgrind-mmt to work with my NVE7 on NVIDIA 367.27
18:06 karolherbst: RSpliet: fun, the trap handlers address has to be 0x40 aligned, even on maxwell
18:06 karolherbst: so at least I know the GPU verifies whatever value I put into it
19:03 mslusarz: RSpliet: that's not mmt fault
19:08 mslusarz: karolherbst: I'm not sure if it's still an issue, but if you want me to help you with IB detection I need to see the full log
19:17 Lyude: karolherbst: do you know what ever happened to those patches for fixing module unload with nouveau_backlight_exit?
19:20 agd5f: Lyude, what's the context? the device is not reset after a reboot? Is this a Hybrid graphics system?
19:20 Lyude: agd5f: correct! it seems that occasionally on this Thinkpad P50, the bios may not actually be power cycling the GPU on reboots causing us to sometimes get a GPU at boot that's still turned on
19:22 agd5f: Lyude, not aware of any issues like that on AMD systems.
19:22 agd5f: Lyude, however,
19:22 agd5f: there are a lot of platform tricks to reduce latency
19:24 Lyude: agd5f: oh?
19:24 agd5f: e.g., on HG (_PR3) systems, since the OS is in control of the power state, it's assumed the OS will know the state of the device so on resume for example, the bios will just leave the device in whatever state it was previously in (off or on) rather than always turning it on
19:24 Lyude: !!!!!
19:25 Lyude: that means this is even more of a problem then I thought!
19:25 karolherbst: mslusarz: yeah.. I am not quite sure. If I look at what got parsed it looks like quite a lot of things are acutally missing: https://drive.google.com/open?id=1Q6MYPPplKrVWr_tH9YRTyg_R3gDStHk5
19:25 Lyude: skeggsb, jfyi ^
19:25 Lyude: so we absolutely do need a ->shutdown(), patches for that coming soon after I fix the backlight bug :)
19:26 gnarface: Lyude: i'm curious about bios settings related to "quick boot" or "quick POST"
19:26 agd5f: Lyude, the idea being that if the dGPU was not in use at suspend time, there's no need to power it up on resume just to power it down again after resume
19:27 gnarface: Lyude: i found some models of dell bioses actually retained ram content to speed reboots. it functioned for linux too, but i disabled it because of suspicion that it was causing weird problems like that
19:27 Lyude: agd5f: that counts for full machine restarts as well/
19:27 Lyude: gnarface: !!!
19:27 Lyude: karolherbst: ^ you are definitely going to be interested in that dell fact
19:27 Lyude: *as well?
19:27 agd5f: Lyude, that seems questionable since you'd need to do device discovery. but maybe windows handles that somehow?
19:27 Lyude: agd5f: hm, maybe...
19:28 gnarface: lemme see if i can dig up the exact name. it was a pre-efi machine but i doubt they removed the feature.
19:28 agd5f: Lyude, sounds more like a bios bug
19:28 karolherbst: Lyude: I never had issues with system reboots itself
19:28 Lyude: agd5f: yeah, it probably is
19:28 Lyude: I figure we may as well add a ->shutdown() callback anyhow, but i'm going to bring this up to lenovo as well
19:29 Lyude: karolherbst: yeah it's really just this P50, I was moreso curious if there were other vendors this had gotten seen on
19:29 agd5f: Lyude, windows driver may always do a gpu reset of some sort on load
19:29 agd5f: so they never notice
19:29 karolherbst: Lyude: well, in the end, all the ports are driven by the intel GPU on the Dell I have
19:30 Lyude: karolherbst: i was talking about the reboot thing
19:30 Lyude: erm, the thing gnarface mentioned
19:30 karolherbst: ohh mhh, yeah, dunno
19:30 karolherbst: I mean sometimes rebooting takes way longer, but...
19:31 Lyude: karolherbst: after hitting this bug, i'd say it's good to make sure :)
19:31 Lyude: (this bug meaning "not powering off on poweroff")
19:32 karolherbst: I know that on windows you have this weirdo feature that you can't switch a kernel when you simply "shutdown" as it does some weirdo save state to something thingy, so you have to reboot
19:33 gnarface: i'm having trouble finding it again. it wasn't the quick POST thing, it was called like "fast reboot" or something like that... i discovered the problem when a driver would crash, leaving a device in a bad state that persisted through reboots but reset after a full powerdown, unplug, count to 30s, then cold boot
19:33 gnarface: that's when i noticed that it also after the first boot would skip the entire post screen and everything before GRUB in record time
19:33 gnarface: like, so fast it didn't seem like it really rebooted
19:33 gnarface: but there i was, seeing GRUB again
19:35 mslusarz: karolherbst: where in this log you think demmt missed IB?
19:36 mslusarz: karolherbst: I still see [no data] in this trace...
19:36 karolherbst: mslusarz: ohh, really?
19:37 karolherbst: let me check
19:37 karolherbst: huh
19:37 karolherbst: ohhh
19:37 karolherbst: it is the old file, sry
19:39 karolherbst: mslusarz: https://drive.google.com/open?id=1W8jVkX0IiSb641s_w6Awon36A-uMPSBp
19:41 karolherbst: regarding missing IBs: I am not 100%, after getting a better understanding, I doubt anything is missing from the 2nd buffer. no idea if there are more buffers where IBs might be
19:41 karolherbst: anyway
19:41 karolherbst: I don't see the compute program being launched inside the trace
19:41 karolherbst: or I am missing something
19:44 karolherbst: mslusarz: that part looks super suspicous for example: https://gist.githubusercontent.com/karolherbst/0ac6b0fc8e1192aa463b089a0f7225fa/raw/613eeb60c63539615026daa05255ab8b2ddc36d0/gistfile1.txt
19:44 mslusarz: ok
19:45 karolherbst: I mean, it can always be that I messed something up parsing the UVM ioctls
19:45 karolherbst: or that I need to parse the UVM_REGISTER_CHANNEL ones as well
19:45 mslusarz: are you hanling all Pascal-specific COPY objects/methods in demmt?
19:46 karolherbst: I doubt that
19:46 karolherbst: allthough traces from graphic applications always looked good enough
19:46 karolherbst: except the missing compute shaders
19:48 karolherbst: mhh the 6th buffer seems to contain IB entries as well. There are a few others. 34 is also one with quite a few
19:48 mslusarz: ok, on older generations code like this: https://github.com/envytools/envytools/blob/master/demmt/object_g80_m2mf.c#L99 mattered for decoding
19:49 karolherbst: uhm, I mean pointers to IBs
19:51 karolherbst: ohh
19:52 karolherbst: I see. let me debug a bit further. Just got a better understanding again
20:02 mslusarz: karolherbst: hmm, I just noticed that the code for gk104_copy (and up) doesn't ever call gpu_mapping_register_write...
20:19 mslusarz: karolherbst: at line 67165 there's a fifo object created which clearly has an IB address matching buffer 6 address
20:22 mslusarz: decoding of ioctl's ptr is wrong because of size mismatch in decode_nvrm_ioctl_create (which should be fixed...), but the code in handle_nvrm_ioctl_create should handle that fine
20:27 mslusarz: I suspect nvrm_get_fifo is wrong
20:36 mslusarz: yeah, it assumes fifo object must be a parent of the object it evaluates; buffer 6 is object 0x5c00001c, which has a device as a parent and fifo object (0x5c000019) is in other hierarchy
20:38 mslusarz: to fix that we should probably look at all fifos registered for a device
20:43 mslusarz: huh, nvrm_get_fifo already does that in slightly convoluted way
20:43 mslusarz: I guess you have to debug it
20:48 karolherbst: yeah
20:49 karolherbst: mslusarz:
20:49 karolherbst: ..
20:49 karolherbst: mslusarz: any idea what might be inside buffer 66?
20:53 karolherbst: mslusarz: I think I found the issue
20:56 karolherbst: so, when parsing 0x200e00000, nvrm_get_pb_pointer_found returns true, but the bound mapping is kind of useless
20:56 karolherbst: the condition at line 148 inside demmt/buffer_decode.c is false
20:56 karolherbst: so it never even tries to decode the ob
20:56 karolherbst: *ib
20:57 karolherbst: mapping->ib.is is NULL actually
20:57 mslusarz: mapping->ib.is is int...
20:58 karolherbst: well, then it is 0
20:58 mslusarz: it's set in line 98, so something was wrong earlier
20:59 karolherbst: pb_pointer_found is true
20:59 karolherbst: mhh, let me try something
21:00 karolherbst: wondering what happens if I just set it to false
21:00 mslusarz: i don't see how that's possible
21:00 karolherbst: well, but nvrm_get_pb_pointer_found(dev) returns true
21:01 karolherbst: okay, I forced it to false inside gdb now
21:01 karolherbst: a fifo object is found
21:01 karolherbst: IB buffer: 6
21:02 karolherbst: w 6:0x0004 (gpu=0x200e00004), 0x00029e02 IB: address: 0x201000000, size: 167, not_main, buffer id: 7
21:02 karolherbst: and then that: https://gist.githubusercontent.com/karolherbst/d11ab7126d62c37ba5f09344128ceba8/raw/dc856bbadfd16e75457658bc9008c232df047477/gistfile1.txt
21:03 mslusarz: looks good
21:03 karolherbst: yeah
21:03 karolherbst: well, there is that "ERROR: Guessing handle 0x0000c1b5, driver forgot to call NVRM_MTHD_FIFO_IB_OBJECT_INFO?" thing
21:03 karolherbst: but I guess we can deal with that later
21:03 mslusarz: now you have to figure out when pb_pointer_found was set
21:04 karolherbst: yeah
21:11 karolherbst: mslusarz: happens when parsing "w 2:0x0000 (gpu=0x200400000), 0x00600000"
21:11 karolherbst: nvrm_get_device seems to return the same gpu_object
21:16 karolherbst: mhh
21:17 karolherbst: it simply traverse through the parents
21:17 karolherbst: or well
21:17 karolherbst: rather just goes up the parent chain
21:17 karolherbst: and returns the first obj->class_ == NVRM_DEVICE_0
21:18 mslusarz: well, that's fine
21:18 karolherbst: well, but we set pb_pointer_found on the parent object
21:19 mslusarz: there's only one device object in this trace anyway, so it doesn't matter
21:19 karolherbst: so now that's valid for all the children
21:19 karolherbst: or rather set
21:20 mslusarz: ok, I get it now
21:20 karolherbst: so I guess the first buffer resolving the IB wins and all the other one fail
21:21 mslusarz: the whole point of pb_pointer_found was to speed up lookups
21:21 karolherbst: yeah, I figured
21:22 karolherbst: I guess we should rather set it on the mapping or mapping->object?
21:22 mslusarz: the easiest fix is to remove it, but demmt will be uselessly slow
21:22 karolherbst: mapping I guess, because that's where the stuff is stored actually
21:23 karolherbst: or what do you think?
21:24 karolherbst: so move gpu_object->pb_pointer_found to cpu_mapping->pb_pointer_found?
21:24 karolherbst: or maybe even inside cpu_mapping.ib
21:25 mslusarz: from nvrm_object to gpu_mapping or gpu_object
21:26 mslusarz: I;m not sure which one yet, let me thinkk
21:28 karolherbst: well, the gpu_object is the same, so this wouldn't work
21:28 karolherbst: for both buffers
21:29 mslusarz: hmm
21:29 karolherbst: well we set the ib stuff on the cpu_mapping struct anyway
21:36 mslusarz: maybe gpu_object should have a reference to its fifo?
21:36 Lyude: cool, backlight fixed, now to try adding shutdown handling to nouveau :)
21:37 Lyude: (although, probably doing that after I clean up the backlight code a bit more...)
21:37 mslusarz: and the first time lookup fails we should mark an object as not connected to fifo
21:37 karolherbst: mslusarz: yeah, maybe
21:38 karolherbst: anyway, moving it into cpu_mapping seems to work and it's still quite fast
21:39 karolherbst: but if you have a much better idea, I would be happy to make the changes towards this. I am just focusing on something which works for now
21:39 mslusarz: cpu_mapping can be destroyed
21:40 karolherbst: :D
21:40 karolherbst: that struct is quite huge though
21:40 mslusarz: being connected to fifo is a property of an object, not its mapping
21:40 karolherbst: ahh, makes sense
21:41 mslusarz: by being "destroyed", I mean "unmapped"
21:41 karolherbst: yeah, I figured that out later
21:41 karolherbst: anyway, here is the new parsed output: https://drive.google.com/open?id=1xFA1boof1UZHfvnYtm8NsxJofdrPCMb-
21:41 karolherbst: looks _much_ better
21:42 karolherbst: now I also get methods on GP100_COPY :)
21:43 karolherbst: nice
21:43 karolherbst: I also get the CMD stuff :)
21:43 mslusarz: still no LAUNCH though
21:44 karolherbst: yeah...
21:51 karolherbst: mslusarz: I kind of get the feeling that mmt stopped tracing
21:51 mslusarz: what do you mean?
21:52 karolherbst: well, the application never called UVM_UNITIALIZE or cleaned up all the stuff it requested
21:52 karolherbst: or maybe it doesn't have to
21:52 karolherbst: UVM_DEINITIALIZE is it called, actually
21:54 karolherbst: and when I look at the end of the parsed output it kind of doesn't look like the application is done doing its stuff
21:54 karolherbst: but maybe it is fine, dunno
21:55 mslusarz: if manual cleanup is not needed many applications don;t bother to do it...
21:57 mslusarz: you are tracing application you have a source code for, right?
21:57 karolherbst: yes
21:59 mslusarz: then yoi can modify it to print something at the end and in mmt enable tracing of writes to stdout and stderr
21:59 mslusarz: --mmt-trace-stdout-stderr
22:00 karolherbst: mhh, it is actually there
22:00 karolherbst: ohhh, then I can also print something out before launching the kernel
22:00 karolherbst: nice
22:02 karolherbst: mslusarz: https://gist.githubusercontent.com/karolherbst/1cf723ba01581d6cfd1707b00640fafa/raw/2a458163719afe47e6355c38d1c439f6c8bf580c/gistfile1.txt
22:02 karolherbst: ahh. colors
22:03 karolherbst: https://gist.githubusercontent.com/karolherbst/1cf723ba01581d6cfd1707b00640fafa/raw/50d9423fb33ae1cded9c87f4fcf0f0d834f080b7/gistfile1.txt
22:05 karolherbst: yeah
22:05 karolherbst: that CMD stuff is when the kernel is launched
22:05 karolherbst: *QMD
22:06 mslusarz: so maybe last store triggers it?
22:06 karolherbst: maybe
22:06 karolherbst: skeggsb: do you know something about that?
22:07 karolherbst: mslusarz: anyway, many thanks for your help. Now we can finally trace nvidia-uvm enabled nvidia drivers :)
22:07 mslusarz: np :)
22:09 karolherbst: mhh the thing is, in mesa we call LAUNCH actually
22:10 karolherbst: ohhhh
22:10 karolherbst: mhh, no, no idea
22:11 karolherbst: mslusarz: ... there is a suspicious write
22:11 karolherbst: "w 9:0x028c, 0x00000004"
22:11 karolherbst: this happens right before reading out the result buffer
22:12 mslusarz: oh, I haven't noticed it :D
22:12 karolherbst: ;)
22:12 mslusarz: what is mapping 9?
22:12 skeggsb: nvidia use InlineQmd(), which launches as soon as it's got a whole one afaik
22:12 karolherbst: skeggsb: okay, makes sense
22:13 karolherbst: skeggsb: any idea what that "w 9:0x028c, 0x00000004" write is then?
22:13 karolherbst: that happens like literally right after the qmd
22:13 skeggsb: looks like a GP_PUT write to me
22:14 karolherbst: skeggsb: anyway, valgrind-mmt and demmt are now working with nvidia-uvm :)
22:14 karolherbst: mslusarz: that one, right? "LOG: mmap: address: 0x83ed000, length: 0x00001000, id: 9, offset: 0x00000000, fd: 11, cid: 0xc1d00016, handle: 0x5c000021 [class: 0xc06f NVRM_FIFO_IB_UNKC0]"
22:14 mslusarz: yeah
22:15 mslusarz: so software methid probably?
22:16 karolherbst: 0x5c000021 has three children, GP104_COMPUTE, GP100_COPY and GP104_COPY
22:17 karolherbst: but well
22:17 karolherbst: skeggsb already gave the answer
22:18 karolherbst: skeggsb: how do I know which CMD versions is getting used?
22:19 karolherbst: skeggsb: also, I am looking for a reliable way to figure out if the last "GP104_COMPUTE.UPLOAD.DATA" write is actually code. Current assumption is, if a "GP104_COMPUTE.FLUSH = { CODE }" happens, I can assume the last write to DATA was a shader?
22:21 karolherbst: skeggsb: btw, valve anounced today they start to ship a wine fork within their steam beta clients and they use dxvk
22:22 karolherbst: I don't know if they have an OpenGL fallback enabled
22:22 karolherbst: but at least that increases the priority for vulkan quite significantly :)
22:22 mslusarz: bed time for me, bye
22:22 HdkR: Need more devs bashing on Nouveau :P
22:22 karolherbst: they do some whitelisting on windows games which are suppose to run with their fork
22:23 HdkR: Bashing in to shape, not bad mouthing
22:24 airlied: karolherbst: don't think there is a GL fallback
22:25 karolherbst: yeah, perf is crappy in comparison
22:25 karolherbst: the dxvk stuff runs really well
22:25 karolherbst: got into the annoying trouble setting up a cross compiler :)
23:47 Lyude: skeggsb: btw; do you have any objections to having nouveau shutdown the card on kernel shutdown? (e.g. using the driver->shutdown callback)