12:30 pendingchaos: imirkin: could you review https://patchwork.freedesktop.org/patch/238984/ sometime?
12:42 imirkin: pendingchaos: + nvc0->state.uniform_buffer_bound[s] = 65536;
12:42 imirkin: should that be "true"?
12:42 pendingchaos: making uniform_buffer_bound bool was a mistake
12:42 pendingchaos: it was a change I forgot to revert
12:43 imirkin: but .... it basically is bool, right?
12:43 imirkin: since it's either 0 or 65536
12:43 pendingchaos: it probably could be
12:43 imirkin: i think bool is ok
12:43 imirkin: just need to adjust some of the usages
12:44 pendingchaos: I tried doing it but didn't end up doing it for some reason
12:44 imirkin: should be identical...
12:45 pendingchaos: I think I'll try again to create a version using bool
14:08 pendingchaos: imirkin: can you elaborate on "that won't end well ..."
14:32 pendingchaos: imirkin: oh, the size of cb_bindings is 5, not 6
14:32 pendingchaos: thought it was 6
14:32 pendingchaos: wait
14:32 pendingchaos: I'm thinking of the wrong thing
14:33 pendingchaos: 's' in nvc0_compute_validate_constbufs is used to access uniform_buffer_bound, not cb_bindings
15:04 karolherbst: airlied: mhh, this looks a bit odd: https://gist.githubusercontent.com/karolherbst/e8a5216bed03d62dff582f54e1a93b65/raw/f1b648618e8dd5c851395980c88a475c77bc425d/gistfile1.txt
15:05 karolherbst: PEG0 is the _PR3 resource
15:05 karolherbst: or well the parent
15:05 karolherbst: and PEGP is some GPU stuff where also the _ROM lives and the handles for D0/D1/D3hot states
15:14 karolherbst: mhhh
15:14 karolherbst: the pci subsystem reads the pci config space at 0x68
15:14 karolherbst: and there it is still at D3
15:25 karolherbst: okay, finally I get the feeling I understand what ACPI and pci are doing and what might get messed up specificly
15:28 mupuf: speaking about fw, /me heard about the Linux UEFI validation project today after bitching about the lack of such test framework
15:59 karolherbst: mhh, and pci_read_config_word returns 0xffff inside pci_raw_set_power_state
15:59 karolherbst: which is like super bad
17:44 karolherbst: so uhm
17:44 karolherbst: pci_read_config_word fails
17:44 karolherbst: but not like returning an error
17:45 karolherbst: just returning writing ~0 into the output and return 0
17:59 mattst88: does any modern NVIDIA ISA have vector immediates?
18:00 mattst88: like, add 1,2,3,4 to separate channels?
18:00 mattst88: I expect not, given the "scalar" view
18:01 HdkR: mattst88: What about the FP16 instructions? :)
18:01 mattst88: also, how are the predicate registers set?
18:02 mattst88: an explicit comparison instruction, or can the comparison be done as part of the instruction producing the result?
18:04 HdkR: mattst88: Technically both? :P
18:05 mattst88: okay. can you explain it to me?
18:05 HdkR: I physically can't, no. I'm sure someone else can
18:20 pendingchaos: mattst88: there are comparison instructions that can write to a predicate register and ones that can write to GPRs (0xffffffff if true, 0 if false iirc)
18:21 pendingchaos: I hope that answers your question, but I'm not sure I fully understood it
18:21 mattst88: thank you :)
18:22 mattst88: i965 can put a 'conditional modifier' on most instructions -- all compare the result of the instruction with 0, and set the "flag" (predicate register) accordingly
18:22 mattst88: sounds like NVIDIA doesn't do that, and instead has comparison instructions whose purpose is to do that operation?
18:26 pendingchaos: (I assume you're talking about the Maxwell/Pascal ISA btw)
18:26 pendingchaos: I think so?
18:26 pendingchaos:disappears for a few minutes
18:28 HdkR: mattst88: Maxwell/Pascal has the CCReg for that
18:28 HdkR: Which isn't the same as the predicate registers
18:29 HdkR: I assume on the intel side it's something like x86 or ARM where most ALU ops can return the result and (optionally) update the flags register
18:30 HdkR: (With ARM instructions capable of being predicated on that flag)
19:05 mattst88: HdkR: yeah, very similar to the condition codes on x86
19:26 karolherbst: interesting
19:26 karolherbst: airlied: when we get the device out of d3cold, the pci pm cap states already d0
20:00 HdkR: mattst88: Nice
20:02 HdkR: mattst88: Repro'd, no idea if it is due to an app bug or driver bug
20:02 HdkR: ...
20:02 HdkR: That's completely unrelated uh, ignore :P
20:04 karolherbst: mattst88: there are some weirdo vectorized instructions though
20:04 karolherbst: but they operate on one registers
20:05 karolherbst: you just put 4 or 2 values inside a register
20:05 nyef: SIMD Within A Register?
20:05 karolherbst: yes
20:05 karolherbst: in PTX those are the v* instructions
20:06 karolherbst: well, some of those are scalar though
20:06 karolherbst: vadd4 puts the result of the 4 bytes sized additions into the destination
20:07 mattst88: ahh, right
20:07 karolherbst: mattst88: also, we have like 7 predicates we can use instead of sources/destinations for instructions that support it
20:07 karolherbst: the 8th one is always true
20:08 karolherbst: also CCreg != predicates
20:08 karolherbst: they don't replace each other
20:08 mattst88: what do you mean by use instead of src/dest? in the instruction word they go in the place of src/dest?
20:08 karolherbst: CC is just a flag bit set on the flag register
20:08 karolherbst: mattst88: different instruction encoding
20:08 mattst88: right, okay
20:08 karolherbst: you basically have some variants of the same instructions
20:08 karolherbst: and some have predicate variants
20:09 mattst88: cool, makes sense
20:09 mattst88: so you have 8 predicate registers, with one hardwired to true?
20:09 karolherbst: we basically hav three type of boolean values: predicates, int bools, float bools
20:09 karolherbst: yeah
20:09 mattst88: nice. intel only has 4 :(
20:09 karolherbst: int bools: -1/0 float bools: 1.0/0
20:09 karolherbst: mattst88: well, predicates aren't that useful in general
20:10 mattst88: can you configure a compare instruction to return those differently typed bools or something?
20:10 karolherbst: you can use them to execute instructions conditionally though
20:10 karolherbst: mattst88: yeah, depends on the type set
20:10 karolherbst: some instructions even take two types
20:10 karolherbst: like most compare instructions have a source and a dest type
20:10 karolherbst: so you can compare two ints, and write a float bool
20:11 mattst88: yeah, that's great
20:11 karolherbst: HdkR: on volta we get even a second CC bit, or maybe a third one even? I forgot
20:11 HdkR: karolherbst: heh?
20:11 mattst88: we have to emit 2 instructions for b2f(x < y), whereas I guess that's one instruction on nvidia
20:11 karolherbst: iadd3 is kind of broken, because it can only set one CC bit pre volta
20:11 karolherbst: so you can get two overflows, but iadd3 can't tell you
20:12 karolherbst: mattst88: yeah, that's a set.f32.u/f32 dest src0 src1 for us
20:12 HdkR: karolherbst: Volta killed CCReg though?
20:12 karolherbst: HdkR: there was never a CCreg
20:12 karolherbst: there is the flags reg
20:12 karolherbst: which contains CC bits
20:12 HdkR: Correct
20:13 karolherbst: and volta hs apperantly two bits now
20:13 HdkR: in IADD3 you mean
20:13 karolherbst: generally
20:13 karolherbst: for most instructions it just doesn't make much sense
20:13 karolherbst: HdkR: volta has a 128 bit ISA
20:13 karolherbst: so you can be wasteful
20:13 HdkR: Aye
20:14 HdkR: karolherbst: Also don't forget about the psetp instruction ;)
20:14 HdkR: That one is awesome
20:14 karolherbst: you mean setp
20:14 karolherbst: but we also have selp
20:14 karolherbst: which is a slct, just without the compare
20:15 HdkR: Well, I'm talking about psetp specifically in this instance due to the number of predicates it can take
20:16 karolherbst: how does a psetp make any sense?
20:16 HdkR: https://github.com/yuzu-emu/yuzu/blob/master/src/video_core/renderer_opengl/gl_shader_decompiler.cpp#L1497 Sadly I'm terrible at grepping mesa :P
20:17 karolherbst: oh well
20:18 HdkR: six predicates, because why not
20:18 karolherbst: huh?
20:18 HdkR: :D
20:19 karolherbst: ohh, there is actually a psetp in the maxwell ISA
20:19 karolherbst: the heck
20:19 HdkR: yep
20:19 karolherbst: and/or/xor
20:19 karolherbst: oh well
20:20 HdkR: It's just a mad instruction
20:20 karolherbst: hu?
20:20 HdkR: mad = ridiculous
20:20 HdkR: I love it :D
20:20 karolherbst: huh?
20:21 karolherbst: how is that ridiculous?
20:21 HdkR: Because I love that it can almost consume every predicate register
20:21 karolherbst: well, 5
20:22 karolherbst: two are for predicated execution
20:22 karolherbst: or uhm
20:22 karolherbst: the semantics are a bit weird on maxwell
20:22 HdkR: six :P
20:23 karolherbst: it only takes two sources
20:24 karolherbst: or can it take three
20:24 karolherbst: ?
20:25 HdkR: Three input, two output, one execution guard
20:26 karolherbst: oh well
20:26 HdkR: https://github.com/yuzu-emu/yuzu/blob/master/src/video_core/engines/shader_bytecode.h#L375 <--- They don't encode the conditional execution predicate in their structs
20:27 HdkR: :)
20:27 HdkR: It's just super silly and I love it
20:27 karolherbst: I am still not quite sure what I should think of all the emulator emerging and they just basically redo what already exists
20:28 HdkR: Well they are redoing the reverse engineering and use Nouveau as a major reference
20:28 HdkR: obviously not for the sake of supporting the hardware but for emulation
20:28 karolherbst: I am not talking about the reverse engineering bits
20:29 HdkR: ?
20:29 karolherbst: HdkR: there are only 5 preds on that union
20:29 HdkR: They don't encode the execution predicate in their structs since it is in all instructions
20:30 karolherbst: ohh, mhh, right
20:35 HdkR: <3 quirky hardware
20:38 karolherbst: actually those emulator seem to be quite fast already. wondering how well they will run on the more demanding games
20:44 HdkR: It's coming up at a scary pace. Looks like they haven't hit any game doing crazy things with the GPU yet though
21:02 nyef: MCP89 HDMI audio not working with certain sinks (at least two known-incompatible sinks): It's not going to be on the audio codec side, as that code doesn't change when switching to the nvidia driver. It's not the infoframe, as suppressing that breaks things on more sinks. So it's going to be clock regeneration packets, isn't it?
21:05 nyef: Experimental protocol: Connect to an affected sink with nouveau, image the PDISPLAY register space, reboot to nvidia drivers, play sound through the system, image the PDISPLAY register space again. For each register that differs, temporarily set it to the nouveau value to see if it affects audio playback.
21:10 nyef: ACR packet contents depend on TMDS clock and audio sample rate, possibly audio bitrate (sample rate times sample size), so once one combination of TMDS clock and audio format have been determined, try others.
21:12 nyef: This... seems reasonable. Maybe.
21:13 nyef: Advantage: Requires no additional equipment. Disadvantage: Poor state-space coverage.
21:25 karolherbst: HdkR: I don't think they tried yet
21:31 HdkR: karolherbst: They have. Hitting some features that are harder to emulate
21:31 karolherbst: HdkR: ahh
21:31 karolherbst: yeah
21:31 HdkR: Like LD/ST
21:31 karolherbst: there is some stuff you can kind of map 1 to 1 to OpenGL, but then there is stuff you can't
21:32 karolherbst: HdkR: huh? LD/ST to/from memory should be trivial, no?
21:32 HdkR: The buffer handling is more difficult
21:32 karolherbst: sure
21:32 karolherbst: but this is still something I would consider one of the simplier things
21:33 karolherbst: HdkR: in worst case you could just allocate 2GB of memory and place the global memory stuff in there
21:34 HdkR: Oh yea, there are definitely more nefarious things that they will hit, but this is one that won't be nice. Since it is a UMA system the games don't need to do anything to ensure coherency(if it is mapped coherently) and now the program needs to track buffer changes somehow
21:34 karolherbst: sure, right
21:34 karolherbst: but then you can just let the game point directly into your malloced buffer
21:35 karolherbst: but right
21:35 karolherbst: you kind of need to reupload that stuff on the OpenGL API level
21:35 HdkR: Well, you can't actually. Since it is emulated at a lower level, the game does all the memory management itself. It's not great
21:35 karolherbst: I have no idea how to do that fast
21:35 karolherbst: but I am sure a slow way of doing things isn't that hard
21:35 HdkR: and the API on the device just lets you point the GPU to pointers
21:36 karolherbst: sure
21:36 HdkR: Slow is possible
21:36 HdkR: Really this is just a slightly harder problem that they can do. It'll get really obnoxious once they hit features that just aren't exposed to GL
21:36 karolherbst: yeah
21:36 HdkR: Like CUDA
21:36 HdkR: :P
21:37 karolherbst: well
21:37 karolherbst: that is also kind of trivial on the hardware level
21:37 karolherbst: you map memory, you run a kernel and read things back ;)
21:37 HdkR: Sure, but how do you emulate it when that API isn't available? :P
21:37 karolherbst: compute shaders?
21:37 karolherbst: and does switch allow cuda actually?
21:38 HdkR: Compute shaders aren't flexible enough to handle all cases, but it would probably be a good starting point
21:38 HdkR: Yes
21:38 karolherbst: well, if there are no interactions you could actually also use OpenCL, but this might be slow with a software thing
21:38 karolherbst: HdkR: I mean, you get the binary
21:38 karolherbst: on the hardware it is the same as compute shaders
21:38 karolherbst: or compute shaders are equal to kernels
21:38 karolherbst: doesn't really matter
21:38 HdkR: Sure, but how do you end up doing something like parallel kernel launches inside of compute shaders?
21:39 HdkR: er, is that what the feature is called...?
21:39 karolherbst: isn't there an extension?
21:39 HdkR: "Dynamic Parallelism"...That's a stupid thing to name a feature
21:39 karolherbst: yeah
21:39 HdkR: I don't think there is an extension for it in compute shaders
21:39 karolherbst: maybe there is no gl extension for that
21:40 karolherbst: but yeah
21:40 karolherbst: _that_ will be painful
21:40 HdkR: Becomes a hard problem once you hit features that the API doesn't expose :P
21:40 karolherbst: yes
21:40 HdkR: Technically mesa could gain cuda support and make their lives easier though
21:40 HdkR: ;)
21:41 karolherbst: :p
21:41 karolherbst: they are free to send patches
21:41 karolherbst: I think you can do that in OpenCL though
21:42 karolherbst: yeah CL 2.0 can do that
21:42 HdkR: Apparently it was added in CL 2.0
21:42 HdkR: yea
21:42 HdkR: So they could implement cuda via CL 2.0, which wouldn't work on Nvidia because they only support CL 1.x :P
21:42 karolherbst: fun will be if there are cuda <-> gl interactions on the switch
21:42 karolherbst: because then you need a GPU driver with gl <-> cl interactions as well
21:43 HdkR: aye
21:43 karolherbst: HdkR: :D right
21:43 karolherbst: or maybe vulkan can do something here?
21:43 karolherbst: dunno
21:44 HdkR: I don't think Vulkan Compute is much beyond GL compute
21:44 karolherbst: there is still this future "we merge CL into vulkan" thing
21:45 HdkR: Vulkan is actually a worse option for them as well since they lose a ton of GL extensions for random edge features
21:45 karolherbst: the good thing is, you don't really have to care about all those fancy CL opcs, because they don't exist on hw anyway
21:45 karolherbst: they can send patches
21:45 karolherbst: :p
21:45 HdkR: I'm looking forward to mesa implement a bunch of VK_YUZU_* extensions
21:46 HdkR: implementing*
21:46 karolherbst: well, somebody has to do it
21:46 karolherbst: but I doubt it would be that hard
21:47 mooch2: lmfao
21:53 karolherbst: Lekensteyn: I have a kind of working module which simply suspends a device with the _PR3 stuff
21:53 karolherbst: it seems to work and doesn't run into the issue the bbswitch branch has (I think)
21:54 HdkR: karolherbst: Time consuming though, which equates to hardness for a lot of people
21:54 karolherbst: HdkR: well, you can't have everything :p
21:54 karolherbst: mhhhhhh
21:54 karolherbst: I really want to hit that runpm bug with my stub module though
21:55 mooch2: i still hope that there's a template vulkan driver somewhere
21:55 karolherbst: pci_enable_device + pci_set_master seems are fine o use actually
21:55 mooch2: or at least some sort of guide on how to make one >.>
21:55 karolherbst: mooch2: you don't need it with vulkan
21:55 mooch2: ?
21:55 mooch2: why not?
21:55 karolherbst: because there is the khronos loader
21:55 mooch2: no i mean
21:56 mooch2: a driver that is barebones for you to build on
21:56 mooch2: i want to make a nouveau vulkan driver
21:56 karolherbst: yeah
21:56 karolherbst: that is the vulkan loader
21:56 mooch2: uh
21:56 mooch2: how?
21:56 karolherbst: because the application loads it and you provide the backend
21:57 mooch2: yeah, but i want to make that backend *rolls eyes*
21:57 HdkR: mooch2: I assume you want something like a skeleton structure inside of mesa that shows how to do the initial code layout and gives information on what exactly you should be implementing there
21:57 mooch2: ya know, the thing that most people mean when they say vulkan driver
21:57 karolherbst: mooch2: then make the backend ;)
21:57 HdkR: Like a documentation to help out
21:57 mooch2: HdkR, yeah
21:57 mooch2: karolherbst, i don't know how
21:57 karolherbst: there is always the vulkan specification
21:57 karolherbst: mooch2: ask in #dri-devel
21:57 mooch2: aight
21:58 karolherbst: mooch2: anyway, there are a few things we need to do before anyway
21:58 karolherbst: mooch2: like moving codegen oustide of gallium
21:58 HdkR: For a similar idea, when I wrote my first LLVM backend it would have been nice to have a skeleton structure to tell me wtf I'm doing instead of just "Write the backend"
21:58 HdkR: ;)
21:59 karolherbst: HdkR: I guess you can run vulkaninfo and it crashes
21:59 karolherbst: and then you implement that stuff
21:59 HdkR: haha
21:59 karolherbst: and hen you run vulkan gears :p
22:00 HdkR: Time equates hardness again. For someone unfamiliar with the codebase, you don't even have an indication as to what you should be adding
22:01 karolherbst: HdkR: simple soltuion: run vulkan applications, they usually crash when you forget something :p
22:01 karolherbst: also
22:01 karolherbst: there is the vulkan cTS
22:01 HdkR: haha
22:02 karolherbst: ;)
22:02 karolherbst: change code until all tests passes, sounds easy, no?
22:03 HdkR: Yea, it's a one step plan to writing a vulkan driver
22:04 Lekensteyn: karolherbst: in that phoronix thread you said that the PCI subsystem fails to read, but the question is what leads to that?
22:04 karolherbst: Lekensteyn: sillynes
22:04 Lekensteyn: does your module call _PR3 directly?
22:04 karolherbst: no
22:04 karolherbst: here is the thing
22:05 karolherbst: the pci subsystem asks the ACPI subsstem to do the d3cold -> d0 transition on the platform level
22:05 karolherbst: and this is fine
22:05 karolherbst: this works
22:05 karolherbst: I called the aCPI stuff
22:05 karolherbst: second thing: the device comes out from that in the D0 state
22:05 karolherbst: sooo
22:05 karolherbst: if you ask the pci config space what state the device is at, it repsonds with D0
22:06 karolherbst: Lekensteyn: now, if you read the code carefully, you see something nice inside pci_raw_set_power_state
22:07 karolherbst: "if (dev->current_state == state) return 0"
22:07 karolherbst: and that function just returns
22:07 karolherbst: _but_
22:07 karolherbst: current_state is fed from a call to pci_read_config_word
22:07 karolherbst: and this call puts 0xffff into the result
22:07 Lekensteyn: huh, didn't it have a special case for D3cold?
22:07 karolherbst: basically making the current state 0xffff & 0x3 (mask of the device state ) == D3
22:07 karolherbst: no
22:08 karolherbst: this is _acpi_s task
22:08 karolherbst: not pci
22:08 karolherbst: pci doesn't know about d3cold
22:08 karolherbst: here is what happens on my device on suspend:
22:08 karolherbst: set pm capability state to d3
22:09 karolherbst: call \_SB.PCI0.PEG0.PEGP._PS3
22:10 karolherbst: (now d3hot all the way)
22:10 karolherbst: call \_SB.PCI0.PEG0.PG00._OFF
22:10 karolherbst: now the device is in d3cold state
22:10 karolherbst: so on resume we do the opposite
22:10 karolherbst: call
22:10 karolherbst: \_SB.PCI0.PEG0.PG00._ON
22:10 karolherbst: call \_SB.PCI0.PEG0.PEGP._PS0
22:10 karolherbst: now ACPI responds that the device is in D0 state
22:11 karolherbst: so now pci gets back to work and reads the pci pm capability state via pci_read_config_word
22:11 karolherbst: but now the troubles begin
22:11 karolherbst: pci_read_config_word returns 0, and writes 0xffff into the output parameter
22:11 karolherbst: which basically means it didn't read the actual value in the first place
22:11 karolherbst: but somehow the pci communication device <-> kernel is broken
22:12 karolherbst: the device is fine though, or at least the ACPI controller says it is
22:14 karolherbst: Lekensteyn: this is the code where resuming works: https://gist.githubusercontent.com/karolherbst/73e6d053ac38613329a75042a3c5b2af/raw/cc5793850269d8c9b8ac201d0418896a875f3aee/pci-stub-runpm.c
22:14 karolherbst: I am slowly adding pci stuff from nouveau over until it breaks
22:14 Lekensteyn: my device (and many others) run in an infinite loop in the ACPI firmware code (while PG00._ON is begin executed). Is it possible that your firmware where you observe this does not loop forever until the device can be accessed again?
22:14 karolherbst: it doesn't
22:15 karolherbst: Lekensteyn: try with the linked code then
22:15 karolherbst: just change the pci ids
22:15 karolherbst: Lekensteyn: the thing is, we do too much inside the nouveau code, as all that pci crap can be removed in the _PR3 case from runtime_*
22:16 Lekensteyn: I can try this later as it is potentially fatal (for proper testing I need to remove the acpi_osi option too)
22:17 Lekensteyn: but shouldn't this be solved in the PCI core? Even without nouveau, but with runtime pm on the lockup is reproducible
22:17 karolherbst: mhh
22:17 karolherbst: not for me
22:17 karolherbst: if I have no driver loaded, the device suspends/resumes without issues
22:18 karolherbst: maybe there is even more to it in the end
22:18 Lekensteyn: are both the GPU and its parent PCIe port runtime enabled in that case?
22:18 karolherbst: but I have the situation: fails with nouveau, works without
22:18 karolherbst: Lekensteyn: no idea about the parent port
22:18 karolherbst: why?
22:18 karolherbst: but it seems like the parent stays in D0
22:18 Lekensteyn: (btw, if what you said before is true (trying to probe the PCI PM register while in D3), perhaps that pattern causes breakage)
22:19 karolherbst: uhm
22:19 karolherbst: it gets prbed while the device is in d0
22:19 karolherbst: just the pci communication is screwed
22:19 karolherbst: or something else there
22:19 karolherbst: removing/rescanning the device/bus helps
22:20 karolherbst: but this is a bit hard to actually verify
22:20 Lekensteyn: as for the behavior without nouveau loaded, the pci port enables runtime pm by default (since a year ago or more). the GPU is still not runtime PM enabled by default, *unless* you load nouveau or override via sysfs
22:20 karolherbst: I know
22:20 karolherbst: but it seems the port stays in D0 for me
22:21 karolherbst: at least that one is set to on
22:21 karolherbst: not auto
22:21 karolherbst: anyway
22:21 Lekensteyn: should be "auto" to reproduce
22:21 karolherbst: even if there is an issue involved with having the root port suspended, shouldn't we focus on the issues we with the root port being on first?
22:21 karolherbst: ;)
22:22 Lekensteyn: not necessarily, PG00._OFF is only called when the root port is runtime suspended
22:23 karolherbst: uhm, no
22:23 karolherbst: or wait, let me check something
22:24 karolherbst: ohhhhhh, that makes sense
22:24 karolherbst: Lekensteyn: no, forget what I said, the root port gets suspended
22:24 karolherbst: it just happens when I unplug my power
22:24 karolherbst: ....
22:24 karolherbst: so that explains that
22:24 karolherbst: I was already wondering why the GPU only gets put into d3cold when I disconnected the poer supply
22:25 karolherbst: so yeah
22:25 karolherbst: that works for me basically
22:25 Lekensteyn: unplugging power causes what? changing the power/control on <-> auto?
22:25 karolherbst: yes
22:25 karolherbst: and that's why my GPU gets put into d3cold
22:25 karolherbst: I was always wondering about that
22:25 karolherbst: I guess this is TLP doing stuff or something else
22:26 Lekensteyn: do you have something like TLP enabled? or some other daemon/udev(?) rule that changes this?
22:26 Lekensteyn: TLP to blame :)
22:26 karolherbst: I guess so
22:26 karolherbst: anyway
22:26 karolherbst: with my stub driver that works perfectly
22:27 karolherbst: Lekensteyn: https://gist.githubusercontent.com/karolherbst/e0ae12eea7fc2f84eca1663671f93cf6/raw/6b19afabaf075ee52af3e39d0e4ede06c60665d4/gistfile1.txt
22:27 karolherbst: I guess you know that that lspci call is usually fatal ;)
22:28 Lekensteyn: due to runtime resume while reading /config? not for me with a workaround :P (but yes otherwise)
22:28 karolherbst: ....
22:28 karolherbst: ;)
22:28 karolherbst: well, I don't have any workarounds now
22:29 karolherbst: but I also used to have some scripts to do that stuff
22:29 karolherbst: removing the GPU from pci, calling ACPI stuff and on resume the opposite
22:29 karolherbst: that is kind of the most stable I got
22:29 Lekensteyn: ok, I just dug up a debug log (kprobe on pci_bus_read_config_word and acpi method calls), do you want to see it as well? It should not contain accesses to the pci config before _ON is called
22:29 karolherbst: with my stub driver I kind of experienced sudden kernel crashes, but that could also be me tinkering in the pci subsystem
22:30 karolherbst: Lekensteyn: yeah, it shouldn't
22:30 karolherbst: but in the fatal case, you get some _after_ which write 0xffff into that value parameter
22:30 karolherbst: and return 0
22:31 karolherbst: 0xffffffff is also the fav value of nvapeek if you turn the gpu off with bbswitch in the early days
22:31 karolherbst: I am just not sure where that is coming from
22:31 karolherbst: or what causes that
22:31 karolherbst: maybe some mapped pci resoures and the pci subsystem scews up?
22:31 karolherbst: *screws
22:31 karolherbst: we don't do that much pci stuff inside the driver actually, so
22:32 Lekensteyn: do you mean BARs by PCI resources?
22:32 karolherbst: yeah
22:32 karolherbst: or other stuff
22:33 karolherbst: I am not that familiar with all that
22:33 karolherbst: my current task is what pci stuff to do inside "pci_stub_runpm_probe" until resume doesn't work anymore :)
22:33 karolherbst: I hope it ain't that much
22:37 Lekensteyn: this is the dmesg interleaved with kprobes and shell commands (back from November 2016, 4.9.0-rc5testing-00340-g64a22d3) http://ix.io/1io6
22:38 Lekensteyn: it shows that no pci regs are accessed before _ON
22:38 karolherbst: right
22:38 Lekensteyn: except for reading register 0x84 of the pci port (00:01:0) right after _OFF (presumably the PM reg?)
22:39 karolherbst: depends on the device
22:39 karolherbst: it is 0x64 for me
22:39 karolherbst: it is on the root bus anyway
22:39 karolherbst: ;)
22:40 karolherbst: Lekensteyn: uhm
22:40 karolherbst: why aren't there any pci_bus_read_config_dword calls on the nvidia gpu?
22:40 karolherbst: or isn't it at 01:00.0?
22:40 karolherbst: ohhh
22:40 karolherbst: it is :0
22:40 Lekensteyn: there is one on line 510
22:41 karolherbst: not .0
22:41 karolherbst: mhh
22:41 karolherbst: Lekensteyn: did it work with that kerne or was it already failing back then?
22:43 karolherbst: ohhh, I see the issue
22:43 karolherbst: Lekensteyn: you didn't trace the read value in pci_bus_read_config
22:44 Lekensteyn: from my notes I see there might be issues with some inlined functions
22:44 karolherbst: :(
22:44 karolherbst: well
22:44 karolherbst: that is actually the interesting part
22:44 karolherbst: because it is just 0xffffffff
22:47 Lekensteyn: fwiw, I have uploaded the kprobe script (add-probes), the post-processing scripts (merge-trace-dmesg.py, namify-pci.py) and logs here: https://lekensteyn.nl/files/p651ra-acpi-debug/probe-debug/
23:12 pie_: hey guys, do you have any recommended reading material for gpu architecture or somesuch, so that i can actually make some sense of opengl and not just memorize apis? or is it too far removed from the hardware for that to make any sense
23:12 pie_: basically i want to gain something from this other than just a memorized API
23:22 karolherbst: Lekensteyn: mhh but anyway, I kind of know what fails, I simply don't know why
23:55 karolherbst: Lekensteyn: .... I think I found something and you probably won't like it
23:55 karolherbst: still need to verify it though