05:49 mupuf: karolherbst: fun :D
08:12 karolherbst: mupuf: ... yeah. More or less
08:13 karolherbst: mupuf: anyway, I don't seem to find any proper I2C with the register information
08:14 karolherbst: mupuf: and this is my current patch: https://github.com/karolherbst/envytools/commit/b0460741cfac12c05befc25e4b3a9d6edeabeac5
08:16 karolherbst: mhh
08:16 karolherbst: I am sure we have a mask though
08:16 karolherbst: as we still have those VID GPIOs
08:17 karolherbst: anyway, this "-- Mode CHIL PWM, acceptable range [712500, 1150000] µV, base voltage 306250 µV (unk = 1), step 6250 µV" _should_ be correct
08:18 karolherbst: "306250 µV" is the minimum voltage the PWM can produce
08:18 karolherbst: and it has 6250 µV steps
08:18 karolherbst: and around 256 VID entries in total
16:17 Subv: hi, would it be too complicated for a nouveau beginner like me to implement the AMD_pinned_memory extension for nvc0? from what i can see in the code (nouveau_bufferobj.c) it should be pretty straightforward
16:23 gnarface: i think you should try it
16:23 gnarface: don't just wait for someone else to talk you out of it
16:23 HdkR: I'm guessing that implementing it similarly to how coherent + persistently mapped buffers should make it fairly straightforward?
16:25 HdkR: er, coherent + persistent + client
16:26 karolherbst: Subv: I would wait until we land all that HMM stuff first (and we also have to change a few memory related things for vulkan)
16:26 karolherbst: ask skeggsb
16:26 HdkR: Woo HMM
16:27 Lyude: karolherbst: poke: I think I figured out a better way to fix the RPM issues (still figuring out the i2c stuff though, so that will have to wait) over the weekend
16:28 karolherbst: Lyude: okay. I left a few comments on your patches, don't know if you have seen it
16:28 HdkR: karolherbst: How do you feel about increasing the SSBO maximum size to stupid sizes? :)
16:28 karolherbst: HdkR: not great?
16:28 karolherbst: why?
16:29 HdkR: 4GB max would be nice to start doing silly mappings :)
16:29 karolherbst: uhm, no?
16:29 HdkR: No?
16:29 karolherbst: what happens if somebody wants to allocate a 4GB SSBO then?
16:29 HdkR: Go for it, eat up that 4GB of VRAM :D
16:29 karolherbst: ;)
16:30 karolherbst: we report stupid limits for VRAM anyway
16:31 karolherbst: HdkR: https://lists.freedesktop.org/archives/mesa-dev/2018-July/201331.html
16:31 HdkR: I think HMM would make this nicer anyway
16:31 Subv: what's wrong with allocating a 4gb SSBO? :(
16:31 HdkR: Since then I could do a 4GB SSBO + sparse_buffer to do sparse nonsense with it
16:32 HdkR: and then if pinned_memory was added to the mix...
16:32 HdkR: bwehehehe
16:33 HdkR: I'm guessing sparse won't be implemented before HMM is merged in anyway
16:33 Subv: full disclosure: i just want AMD_pinned_memory so i can zero-copy map my 4GB buffer of memory (that represents the Nintendo Switch's memory capacity) into the GPU, and do manual address translation inside of a shader
16:34 HdkR: and I'm just insane and want to do insane things
16:34 gnarface: Subv: what are you trying to do??? turn your linux box into an external GPU for your Switch?
16:34 Subv: gnarface: working on a Nintendo Switch emulator
16:35 JayFoxRox:waves to Subv
16:35 Subv:waves back to JayFoxRox
16:36 HdkR: karolherbst: Basically the nvidia blob wont ever support pinned_memory, so I'm going to Nouveau to try and get my insanity satiated
16:49 nyef: For some reason, I keep parsing "HMM" as "Hidden Markov Model", and I'm fairly sure that that's not correct for this context.
16:49 karolherbst: it isn't
16:50 karolherbst: HdkR: why not?
16:50 nyef: "MM" is probably "Memory Manager"...
16:50 karolherbst: throw though money at them
16:50 karolherbst: nyef: nope
16:50 karolherbst: well
16:50 karolherbst: not quite
16:50 karolherbst: "Heterogeneous Memory Management"
16:51 nyef: Meaning... managing different kinds of memory?
16:52 karolherbst: nyef: shared VM between CPU and GPU
16:52 karolherbst: nyef: like the same pointer can be used on the CPU and the GPU and it points to the same memory
16:52 karolherbst: nyef: like you can do clSetKernelArgSVMPointer(0, malloc(0x100)) and it works just like that
16:53 karolherbst: uhm... kernel arg, but you get the idea I hope
16:53 nyef: Hunh. Same pointer on a supervisor address space level, or on a user address space level?
16:54 HdkR: karolherbst: Why won't it support pinned_memory? No idea
16:55 karolherbst: HdkR: again, throw enough money at them :p
16:55 karolherbst: nyef: the full VM
16:55 karolherbst: but, you kind of have to manage it what goes into the shared stuff
16:56 karolherbst: you end up doing it in the driver, but you can just map the entire VM into the GPU if you want to
16:56 karolherbst: but usually you will only map whatever you have allocateed inside your current process
16:57 nyef: Okay, that makes a certain amount of sense.
16:58 karolherbst: imirkin: any objections to that patch? https://lists.freedesktop.org/archives/mesa-dev/2018-July/201331.html
16:58 karolherbst: vram_size has special handling for GPUs without dedicated memory
16:59 karolherbst: so its is set to whatever we can allocated for the GPU
16:59 karolherbst: *allocate
17:01 HdkR: karolherbst: I'm assuming the lack of public tests is what is really the killer
17:01 HdkR: Plus the whole AMD_* bit :P
17:04 Subv: mm, nouveau implements glBufferSubData as a memcpy to the mapped buffer address, are nouveau buffer mappings always coherent?
17:04 nyef: ... Thinking about it, the whole "it's all VRAM, we just steal some for the CPU" thing that was mentioned earlier about the Xbox reminds me of some emulator-breaking tricks that were invented for the NES. Stuff like causing the CPU to execute from I/O registers, or abusing bus capacitance to hold NOPs while executing from undecoded address space until a DMA process brought some control-flow instruction into play.
17:06 karolherbst: nyef: does it actually matter where memory is comming from?
17:08 nyef: Given that you effectively have a NUMA machine here? Yes.
17:08 nyef: Or, in some cases, DON'T have a NUMA machine.
17:09 Conmanx360: Wasn't that the difference between the PS3 and the Xbox 360? The Xbox 360 had shared memory with the CPU, although I think it was all a uniform memory. The PS3 didn't, and it had a performance penalty because of it
17:10 nyef: Conmanx360: You pay a penalty either way, it's just a different penalty.
17:10 gnarface: the PS3 only had 256MB of memory and it did something weird to free it and load it faster .... something that trickled into the Bioshock 2 engine PC port, they can't seem to replicate in wine...
17:11 Conmanx360: Sony must have decided it was a big enough issue that they went with shared CPU + GPU memory on the PS4, it uses GDDR5 (I think)
17:11 gnarface: it causes effectively a massive gushing memory leak
17:11 karolherbst: well, GPU memory is usually faster than CPU memory anyway
17:11 nyef: If you share the same memory bus between your CPU and your GPU, that limits your total bandwidth. If they have separate busses, you have to pay penalties for accessing data that's on the other bus.
17:12 gnarface: with little knowledge of the real possibilities i suspected they must have had some trick to address 256MB of ram as more than 256MB
17:12 karolherbst: better than sharing CPU memory
17:12 Conmanx360: That bioshock 2 thing is interesting, will look into it.
17:12 karolherbst: and you don't have to copy between CPU and GPU memory anymore
17:12 karolherbst: that alone can save you a ton on a console
17:13 nyef: Things are a bit better with compute-bound or largely-idle processes as opposed to memory-bound processes.
17:14 gnarface: Conmanx360: https://bugs.winehq.org/show_bug.cgi?id=34658 they don't know wtf is up still and it hasn't been tested officially in forever, but i can confirm this is still happening as of the latest wine-staging 3.13
17:14 gnarface: Conmanx360: (it may NOT be relevant for that new "Bioshock 2 Remastered" release on steam right now)
17:15 gnarface: they put in a bunch of fixes trying to chase the leaked memory, but all they did is slow the leak in many cases. there's still certain areas of the game that leak it out all onto the floor real fast though.
17:16 gnarface: it may or may not be relevant, but the first real bad one is also the first underwater zone
17:17 gnarface: the flooding seemed to trigger it a lot, so i suspected it had something to do with reflections or liquids
17:17 gnarface: that might just be a red herring though
17:17 Conmanx360: Well, something is probably not being cleared after being allocated, or it just keeps allocating something because it doesn't know it got it.
17:17 gnarface: my suspicion was that there's something the PS3 doesn't have to free because it just moves it off the end of the physical memory in some cost-free manner
17:18 gnarface: or something like that
17:18 gnarface: so there's no way to figure out how it happened just from examining the game code
17:18 nyef: Something-something ringbuffer something?
17:18 gnarface: yea, something like that
17:18 Conmanx360: Hmm... that is interesting. It might be something possible to debug with wine tracing. I recently added a software texture conversion for DXT5 volume textures to Wine, and I run into memory issues
17:19 Lyude: karolherbst: oh hey, the new solution I came up with over the weekend actually works!
17:19 Conmanx360: I'm not sure if Wine kind of adds some memory overhead, but when you're already constrained by 4gb in 32-bit, it runs out pretty quick.
17:19 karolherbst: Lyude: nice
17:20 Lyude: karolherbst: I realized that we can actually literally just use the runtime PM action itself as a barrier to stop hotplugs and fbcon events at the right time using pm_runtime_get() (not sync(), that part is important!) to determine whether or not the device is in the process of suspending. if it is, we just skip doing anything because pm_runtime_get() will have scheduled a wakeup anyway
17:20 karolherbst: Lyude: ahh
17:20 karolherbst: that makes sense
18:12 Lyude: karolherbst: btw: since this new method makes rolling back suspend unecessary (I think it would still be nice for us to do this since it means faster resumes in hpd-during-pm-request situations, but it's not really needed now to fix these issues) i'm going to move some of the patches for fixing rollback in nouveau's suspend process to a seperate series
18:12 karolherbst: Lyude: yeah, makes sense
18:14 Lyude: we may also need rollback to solve the i2c stuff, the only solution I've come up with in my head so far that doesn't fight against PM is to disable waiting on children before beginning suspend (so we can wake our i2c children in the parts we need them for), followed by attempting to runtime suspend all of them again and rolling back if any of the i2c devices got used by anyone else during that period
18:14 Lyude: anyway, will have the patch series out in just a moment
18:15 karolherbst: Lyude: currently thinking about, what if we are inside some I2C stuff
18:15 karolherbst: Lyude: like, calling sensors while the GPU suspends and we want to read out the power sensor
18:15 karolherbst: or whilel resuming
18:15 karolherbst: *while
18:15 karolherbst: in the worst case it takes up to 3 reads to get all the data
18:15 karolherbst: and in the resume case we have to set the power sensor config as well
18:16 Lyude: karolherbst: the solution I just mentioned should cover that, actually. If we call pm_suspend_ignore_children() from within the runtime pmops resume callbacks for the main nouveau device, we can make it so that grabbing power references on our i2c devices doesn't deadlock since they won't wait for the pending pm reqs on nouveau to complete.
18:16 karolherbst: or is the hwmon stuff already protected?
18:16 Lyude: which means that codepaths that read from i2c should 'just work' even if they call pm_runtime_get_sync(i2c->dev) or whatever
18:16 karolherbst: Lyude: yeah.. I meant it more from a higher level perspective
18:17 karolherbst: Lyude: nvkm_iccsense_read_all
18:18 karolherbst: or maybe it really just works
18:18 Lyude: karolherbst: so, right now I think all of that should work. the real problem is access from userspace not waking things up, and the difficulties of adding low-level pm_runtime_get_sync() calls without deadlocking in nvkm
18:18 Lyude: since we have to use the i2c sensors during the actual suspend process
18:19 karolherbst: well, we kind of have to grab a reference when reading out I2C sensors, no? Because the data is pretty useless if we wait until we have a suspend/resume cycle, no?
18:19 Lyude: karolherbst: yep! that's why the ignore children part is important. So, by default our i2c devices are children of nouveau
18:20 karolherbst: like, reading out 2 lanes, suspend+resume, reading out the last lane
18:20 karolherbst: ahh
18:20 Lyude: Which means if we grab a runtime PM ref on one, it starts by resuming the parents first
18:20 Lyude: Hence: if we try to just do things normally during s/r and grab a pm runtime ref on one of them, they will deadlock waiting for us (their parents) to finish
18:21 Lyude: but you can override that with pm_suspend_ignore_children(). The trick though, is that we generally do want the whole parent/child structure to be maintained by pm runtime automatically
18:21 Lyude: e.g. if our i2c devices are alive, so are we
18:21 nyef: This "ignore_children" bit reminds me of the phrase "children should be seen and not heard." (-:
18:22 Lyude: so if we just ignore children in part of the suspend/resume callbacks, we can just unset the ignore children option then resync with our children to see if something woke us up during that period of ignoring children
18:22 Lyude: nyef: hehehe
18:22 Lyude: karolherbst: that also means that pm_runtime_get_sync() for i2c devices we're the parents of will complete immediately instead of blocking, since they won't bother waiting for the parent
18:22 karolherbst: mhh
18:22 karolherbst: but what does that means for i2c transactions?
18:23 karolherbst: do they fail?
18:23 karolherbst: or do we fail getting a ref?
18:23 nyef: Lyude: On the ACPI front, I've sortof come to the conclusion that I need to start looking for chipset manuals and taking a copy of the system bios and running it through a disassembler. Overall, enough work for little enough reward that it's getting deferred for the time being.
18:24 Lyude: karolherbst: nope; they actually suceed in both our runtime pmops callback and userspace, which is why it's a bit of a trick and why we need to resync after we're done using those devices in the pmops context
18:24 Lyude: So like, there's pm_runtime_suspend() as well which is what we'd use for that
18:24 Lyude: set ignore children to false again, loop through all our children and try pm_runtime_suspend(), we will have dropped refs that we grabbed ourselves in the pmops context so that if userspace didn't interrupt us, suspending each device should still work fine
18:25 nyef: Lyude: It seems that the main ACPI bits basically just ship a buffer around and issue some sort of escape-to-smm trap or similar.
18:25 Lyude: karolherbst: then in the case that we can't suspend one of them (due to userspace having suddenly grabbed a ref) we just roll back the suspend process
18:26 Lyude: so, that means all our i2c transactions work normally, and userspace transactions still force the GPU to stay awake, and there's no deadlocks and everyone's happy
18:27 Lyude: /and/ it's not hacky as hell
18:27 Lyude: (also; we can even get the benefit of moving all of the i2c acquire/release stuff into the actual pmops for our i2c children!)
18:28 Lyude: well, the actual enabling/disabling PADs part anyway
18:35 karolherbst: mhh I see. Hope that works out in the end
18:35 karolherbst: not that it is critical for the power sensor stuff, but yeah
19:07 Lyude: karolherbst (also danvet, you will probably like this new version much more): new version of those patches for fixing the connector deadlocking is on the list now
19:08 karolherbst: Lyude: you didn't CC'ed me, right?
19:08 karolherbst: okay, it is on the nouveau ML. k
19:08 Lyude: karolherbst: it should be cc'd
19:08 karolherbst: well, at least not me :p
19:08 Lyude: ?? it cc'd your gmail
19:09 karolherbst: weird
19:09 karolherbst: it doesn't show up
19:09 Lyude: huh
19:09 Lyude: RCPT TO:<karolherbst@gmail.com>
19:09 karolherbst: Lyude: yeah well, not in the mail I got
19:09 Lyude: strange
19:10 karolherbst: Lyude: "[PATCH v6 0/5] Fix connector probing deadlocks from RPM bugs", right?
19:10 Lyude: https://paste.fedoraproject.org/paste/lMNkglv766JQ8qpaYxXbMg and another one!
19:10 Lyude: karolherbst: yep
19:10 karolherbst: strange... anyhow, I will see if I can review it today, otherwise I will do so tomorrow
19:10 Lyude: cool, thank you!
19:34 karolherbst: Lyude: mhh, this time I wasn't in CC again :(, wondering what is up here (or didn't you include me this time?)
19:34 Lyude: karolherbst: i did include you each time o-o
19:34 Lyude: not sure what's going wrong
19:34 Lyude: i'll try your redhat email next time
19:34 karolherbst: I can give you a screenshot if you don't believe me :D
19:35 Lyude: hehe, I believe you, although I am interested in a sceenshot anyhow
19:35 karolherbst: Lyude: I am not sure if the "TO:<karolherbst@gmail.com>" syntax _actually_ works
19:35 karolherbst: mind trying with a name or without the <>?
19:35 Lyude: karolherbst: well the ccs for that are picked up from my ml_cccmd
19:35 Lyude: so I don't think that's it
19:35 karolherbst: mhh, strange
19:36 karolherbst: that is quite the bummer, as everything which doesn't contain my address (or the gmail one) doesn't land in my inbox
19:36 karolherbst: so I might not look at those emails at all
19:37 Lyude: karolherbst: do you want me to resend with your RH email? (along with the many other patches I've got waiting on the ML?)
19:37 karolherbst: we could try
19:37 karolherbst: shouldn't matter as long as it gets to both address (either directly or via ML)
19:38 Lyude: yeah, it should definitely be on the ml
19:38 karolherbst: yeah, I mean I got them
19:38 karolherbst: but they don't appear in my inbox (as I filter everything away)
19:38 karolherbst: except it was directly addressed to me
19:38 karolherbst: so that's why I am wondering actually
20:16 Lyude: Do we have demmio working with nouveau these days?
20:21 karolherbst: Lyude: yes
20:21 karolherbst: why?
20:22 karolherbst: maybe chipset couldn't be detected or something? allthough I guess we added all to rnndb
20:22 Lyude: karolherbst: for debugging the disp init fail issues on this P50 mainly
20:22 karolherbst: okay, but do you have issues with demmio or did you simply wanted to know if it works?
20:22 Lyude: karolherbst: if it works
20:22 karolherbst: ahh
20:23 Lyude: karolherbst: it hasn't the previous times I tried with nouveau, only with the blob
20:23 karolherbst: mhh weird
20:23 karolherbst: demmio shouldn't cause any issues
20:23 karolherbst: I mean, we still have some issues like we don't parse the repeate instructions and so on
20:23 karolherbst: *repeat
20:24 karolherbst: Lyude: do you remember the issue you ran into?
20:24 karolherbst: I did a trace two weeks ago and it worked, that's all I know :p
20:24 Lyude: karolherbst: not in the slightest! iirc it was something as simple as "nothing got recorded"
20:24 karolherbst: uhhh
20:24 Lyude: this was like
20:24 Lyude: more then a year ago
20:25 karolherbst: I guess there was an issue with the trace itself then
20:25 Lyude: probably
20:25 karolherbst: anyway, I fixed the mmiotracer some months ago
20:25 karolherbst: I messed up implementing hugepage support, so mapping with offsets broke, but that should cause other issues
20:25 karolherbst: like your machine crashing
20:26 Lyude: yeah, I remember that one
20:26 Lyude: karolherbst: very strange question now: would demmio pick up the nvidia GPU if it was loaded with a pci stub driver
20:27 karolherbst: Lyude: yes
20:27 karolherbst: but
20:28 karolherbst: it doesn't autodetect the chipset
20:28 Lyude: ah, that should be good enough
20:28 karolherbst: or just map and read 0x0 ;)
20:28 karolherbst: well, I guess passing the chipset should be less painful
20:28 Lyude: karolherbst: my theory on the bizarre disp fail is that if we really are having our pushbuffer corrupted (I think that is what's happening, just based off my previous experience with lenovo laptops...), I should see mmio writes to the GPU even if there's no driver loaded
20:28 karolherbst: Lyude: the thing is, if you don't do read from anything ioremapped, nothing shows up
20:29 Lyude: beyond the general pci writes you'd normally expect
20:29 karolherbst: okay
20:29 karolherbst: Lyude: I've pushed my runpm stub driver today: https://github.com/karolherbst/pci-stub-runpm
20:29 karolherbst: has a fixed vendor/device id matching though
20:30 Lyude: if that's the case, either the firmware is being seriously rude or there's another driver writing to a shared resource where our GPU is
20:30 karolherbst: or memory isn't 0ed out and we kick something by accident
20:30 karolherbst: or it is, and we still kick it
20:30 Lyude: karolherbst: oooh, good idea
20:30 karolherbst: but yeah, sounds like a good thing to test
22:08 Lyude: skeggsb: poke; you around? I've go some more questions about mst stuff with nvidia
22:10 skeggsb: Lyude: sure
22:10 Lyude: skeggsb: so; do you know if nv50_mstm_enable() does actually need to be called on every topology disable? asking because I see a couple of places where we disable the MST topology, but don't actually seem to call that
22:13 skeggsb: where?
22:13 Lyude: skeggsb: nv50_mstm_service, albeit it hasn't had any problems yet
22:14 skeggsb: ah, yeah, i'm not sure about that, nor even why it's necessary to do that there.. i think i stole that from intel as a "just in case"
22:15 Lyude: ahh
22:18 karolherbst: Lyude: third patch, why is -EACCES okay?
22:18 karolherbst: I mean, why is it okay to resume?
22:18 karolherbst: Does it simply mean the runpm stuff is "disabled" or whatever reason
22:18 karolherbst: ?
22:18 Lyude: karolherbst: -EACCES = disable_depth is nonzero so disabled, yeah
22:18 karolherbst: never actually checked what that error means in runpm
22:18 karolherbst: okay
22:19 karolherbst: Lyude: I am a bit worried about the ret == 0 case
22:19 Lyude: karolherbst: which, the fbcon one or the connector one/
22:19 karolherbst: 0 means _get was able to resume the GPU, right?
22:19 karolherbst: in nouveau_fbcon_output_poll_changed
22:20 Lyude: nope, 1 == device already active, 0 == pm_req successfully queued (e.g. it's not up yet, but it will be soon), < 0 == error
22:20 karolherbst: uhm...
22:20 karolherbst: weird
22:20 karolherbst: doesn't make it a bit useless?
22:21 Lyude: karolherbst: actually it's super useful, because if we're ever in a state where our pm request had to be queued instead of it being already active, we're guaranteed that the current state of the GPU is either suspending or resuming
22:21 karolherbst: I mean, you call _get because you want to do something with the GPU, how would it make sense to return 0 before the GPU is completly up?
22:21 Lyude: karolherbst: because the request is still queued and the resume still happens!
22:21 Lyude: that's also what the noidle at those spots is for
22:22 Lyude: Additionally, hpd_work starts off with pm_runtime_get_sync() so we're guaranteed it will sync on the pm request as well
22:22 karolherbst: mhh ohh, wait. the _sync variant is with the wait...
22:22 karolherbst: *sigh*, doesn't that runpm stuff looks a bit _too_ complex? Maybe it has to be that complex, but... oh well
22:23 Lyude: tbqh, it's the least complex it can go, and the way we're handling things seems to be how the runtime pm core says we should do it
22:23 karolherbst: well, I meant the runpm interface
22:23 Lyude: granted; the pm_runtime_get() trick there is not mentioned, but it still goes along with the general idea of "racing with pm is normal bcause rpm can't know when incoming io requests will happen"
22:24 karolherbst: sure
22:24 karolherbst: Lyude: okay, here is a thing. nouveau_fbcon_output_poll_changed called, pm_runtime_get returns 0, stalls for whatever reasons, GPU resumes in the meantime. Now we set "fbcon->hotplug_waiting = true;"
22:25 karolherbst: this might happen, or does something prevents that?
22:25 karolherbst: I mean, except that this is like super unlikely to happen
22:26 Lyude: karolherbst: nope! so; that's what the new hotplug_lock in the fbcon struct is for
22:26 karolherbst: ohhh, right
22:26 Lyude: on resume we call nouveau_fbcon_set_suspend(0) which starts a worker asynchronously to bring up fbcon, at the end of that is where we call nouveau_fbcon_hotplug_resume() and they both sync on that lock
22:26 karolherbst: yeah, I just forgot about the lock :)
22:30 karolherbst: Lyude: so, the 5th patch does something similiar, does it make sense to add a lock there as well?
22:31 karolherbst: mhh I don't know _that_ much about the kworker things
22:31 karolherbst: especially what happens if we schedule when the worker is inactive
22:31 karolherbst: I assume it gets scheduled and it triggers whenever we enable the worker again
22:31 karolherbst: ?
22:36 Lyude: karolherbst: luckily we don't need any extra lock there, the only logic that adds is that if we can't resume the device immediately, we just don't pay attention to the hotplug event
22:37 Lyude: erm
22:37 Lyude: sorry-we schedule hpd_work, which pays attention to it for us
22:37 Lyude: and hpd_work syncs on pm_runtime_get_sync(), so it won't start until the GPU is runtime resumed again
22:37 Lyude: if you schedule the kworker multiple times, it'll only execute one
22:37 Lyude: *once
22:38 Lyude: of course, if it executes, then gets scheduled again later it will reexecute
22:38 karolherbst: mhhh
22:39 karolherbst: yeah, I guess this looks fine, nouveau_display_hpd_work calls pm_runtime_get_sync so it waits until the resume finishes
22:40 karolherbst: Lyude: I guess we could replace the pm_runtime_put_sync with pm_runtime_put inside nouveau_display_hpd_work, no?
22:40 karolherbst: I don't really see a point on waiting there for a sync
22:40 Lyude: karolherbst: ah right! yes-I was meaning to do that myself, although you want pm_runtime_put_autosuspend() (for now)
22:40 karolherbst: yeah, sure
22:41 karolherbst: Lyude: anyway, that series: 1+2 acked, 3-5 reviewed
22:42 Lyude: yessssssss
22:42 karolherbst: maybe at some point I get to look into drm code itself :D
22:43 karolherbst: Lyude: the shorter series is also acked
22:43 Lyude: karolherbst: actually it's r-b'd by you but I forgot to readd the tags :P
22:43 karolherbst: yeah, I know
22:45 karolherbst: I hope I am at least consistent with what tags I throw around me :p
22:47 karolherbst: Lyude: anyway, with my stub driver my laptop is like super stable regarding suspend/resume on my XPS now
22:47 karolherbst: even when I switch to nouveau in the meantime (with disabled runpm)
22:47 Lyude: nice!
22:47 karolherbst: and back for suspending
22:47 Lyude: it's a start
22:47 karolherbst: yeah...
22:48 karolherbst: skeggsb: is there a nice way to disable engines/subdevs so that nouveau just skips those?
22:48 karolherbst: I would like to figure out what engine/subdev may cause those issues
22:55 Lyude: karolherbst: btw; do you know where the evo push kicking stuff happens?
22:56 karolherbst: uhm, in evo_kick or something?
22:56 karolherbst: let me check
22:56 karolherbst: yep
22:56 karolherbst: check ./nouveau/dispnv50/disp.c
22:57 karolherbst: sooooo
22:57 karolherbst: that comment keeps me thinking
23:11 Lyude: ooooh, well hey there
23:11 Lyude: making nouveau load before anything else seems to have some rather interesting effects
23:11 karolherbst: :D
23:11 karolherbst: I wouldn't know
23:12 Lyude: effects which are Extremely Suspecious even though I have been told they shouldn't bwe the cause of this problem, like
23:12 karolherbst: usually nouveau is like the last things which are loaded on my systems
23:12 karolherbst: Lyude: "shouldn't"
23:12 karolherbst: ;)
23:12 Lyude: hehe
23:13 Lyude: karolherbst: with rd.driver.pre=nouveau, see https://paste.fedoraproject.org/paste/j-S8aCQ0CPatGw9VXuQrXg 1.989284
23:13 karolherbst: :D
23:13 karolherbst: fun
23:14 Lyude: skeggsb: you're /sure/ those dpcd accesses at the top have nothing to do with disp failing? e.g. there's no chance that something somewhere isn't holding a lock that it should be/is being initialized too early?
23:16 karolherbst: Lyude: what happens if you put a ssleep(1); at the top of evo_kick?
23:16 Lyude: let's see
23:17 karolherbst: Lyude: I mean, it would be fun if the hardware would screw up because memory isn't really synced, wouldn't it?
23:17 Lyude: also: those disp returns look suspeciously similar to the values in drivers/gpu/drm/nouveau/nvkm/subdev/i2c/auxg94.c...
23:17 karolherbst: mhhhhhhh
23:22 Lyude: karolherbst: it still fails, but then again that disp fail happens before the first kick
23:23 karolherbst: right
23:23 karolherbst: I forgot
23:23 Lyude: i'm going to try putting a sleep on those first few dpcd accesses
23:24 karolherbst: Lyude: I am sure it is something nastier than that, but maybe you are lucky
23:24 Lyude: karolherbst: maybe...
23:56 Lyude: karolherbst: so;
23:57 Lyude: i realized I have no idea how to mmiotrace when the problem happens as early as this one does :s