00:33 karolherbst: mhh or can we actually bind a const buffer per stage?
00:35 karolherbst: ohhh, I see what's going on now
00:35 karolherbst: we allocate one 0xd0000 sized bo for the uniforms + driver constbufs + runout at once and just use that one
00:36 karolherbst: I'll guess I can just extend that a little
01:07 HdkR: karolherbst: Yea, it is per stage for a reason. Nothing wrong with binding the same buffer to each stage though
02:51 letterrip: Using nouveau, I tend to hit a (presumably) 2d driver bug if I use Mate, Cinnamon, gedit, wine, etc. but as long as I use xfce only apps I don't tend to hit one
02:51 letterrip: do you want just the kernel dmesg stuff or x logs? or is there other useful stuff that you need.
02:52 letterrip: the most important bug I hit is that x freezes and i have to restart the lightdm service
02:52 letterrip: via ssh
02:53 letterrip: (can't use sysreq keycombos to kill it; can't switch to a virtual console, etc.)
02:54 letterrip: I'm using 4.20 kernel and mesa 19rc5
02:54 letterrip: from debian experimental since it was suggested those might be less buggy
02:59 gnarface: some sort of stack trace or mmio trace or something like that, i think is what they also ask for
02:59 gnarface: if you just identify the specific video card model # though, it might already be a known issue
03:20 gnarface: letterrip: oh, and make sure that it's actually a 2d bug; make sure you're not running a desktop compositor
07:06 letterrip: gnarface - thanks for the advice
07:06 gnarface: no problem
13:04 karolherbst: mhh, got it working for CL, but somehow it doesn't work for other types :/
13:11 karolherbst: HdkR: I just saw that fma(a, b, imm) benefits a lot from this as well
13:12 karolherbst: because src2 can be a const buffer as well
13:12 karolherbst: but no immediate
14:32 karolherbst: pendingchaos: maybe you have an idea what I do wrong? I am trying to optimize trivial immediates movs by putting them into the driver const buffer, but for some reasons it doesn't seem to work for non compute workloads :/
14:32 karolherbst: https://github.com/karolherbst/mesa/commit/642095eced42c1de82c4cf9b2ff42e48200ceaa0
14:49 pendingchaos: karolherbst: I don't see anything that would make it work only for compute shaders
14:49 pendingchaos: but perhaps code should be moved from nvc0_program_validate() to nvc0_program_update_context_state() since programs don't seem to share immediates
14:49 pendingchaos: currently I think immediates would only be uploaded when the program is uploaded, not when it's bound
14:50 karolherbst: mhhh, maybe yes
14:50 karolherbst: let me try that
14:53 karolherbst: pendingchaos: that wasn't it either :/
14:54 karolherbst: let me try to do that at binding time
14:56 pendingchaos: are you sure using "prog->type" is correct? NVC0_CB_AUX_INFO(4) is for fragment shaders, but PIPE_SHADER_FRAGMENT is 1
14:57 karolherbst: mhhhhhhhh
14:57 karolherbst: that would be evil
14:59 pendingchaos: I suggested moving the upload code to nvc0_program_update_context_state() but I don't think that would be correct since it seems that isn't called for compute shaders
15:00 karolherbst: yeah.. and it's called after nvc0_program_validate generally
15:04 karolherbst: mhh, with the right number it doesn't work either :/
15:04 karolherbst: weird
15:08 karolherbst: pendingchaos: mhh, doing it inside nvc0_program_validate should be fine, because binding a new shader invalidates the state so we always end up there :/
15:12 karolherbst: pendingchaos: btw, info->type _is_ the PIPE type
15:12 karolherbst: :/
15:12 karolherbst: uff, I converted it the wrong way around
15:13 karolherbst: pendingchaos: now it works \o/
15:14 karolherbst: thanks for pointing that type thing out
15:15 karolherbst: now let's see how much of a perf difference it makes for piano
15:24 karolherbst: wow
15:24 karolherbst: 2185 -> 2201 points
15:24 karolherbst: HdkR: ^^
15:36 karolherbst: heh
15:36 karolherbst: I am getting CTXSW_TIMEOUT inside heaven as well now...
15:37 karolherbst: the heck...
15:41 karolherbst: slowly I think that's some kind of kernel regression
16:14 karolherbst: I'll go with kernel regression now
16:47 karolherbst: mhhh, it's not, it's one of my local patches
16:47 karolherbst: either reclocking of power gating
16:52 karolherbst: skeggsb: ever ran into an issue like that with your nvkm-as script? error: sha1 information is lacking or useless (drivers/gpu/drm/nouveau/nvkm/subdev/therm/base.c)
18:06 ajax: hm, no imirkin
18:08 karolherbst: ajax: what's up?
18:09 ajax: trying to finish up gitlab migration
18:09 ajax: nouveau ddx is still only in cgit
18:10 ajax: we can move the repo without moving bz, just wondering if there was some reason it's not been done yet
18:12 karolherbst: I guess nobody cared until now?
18:33 HdkR: karolherbst: So a 0.7% improvement? :p
18:34 karolherbst: HdkR: yeah
18:34 karolherbst: which is a lot for a compiler only optimization
18:37 karolherbst: HdkR: but something is still wrong.. unigine heaven doesn't render correctly anymore
18:40 HdkR: ah, so close
18:40 karolherbst: well, I think I don't reupload the buffer often enough or something
18:45 karolherbst: and fixed :)
18:59 karolherbst: HdkR: 6% in heaven
18:59 HdkR: Nice
19:00 karolherbst: but I have to benchmark more.. could be some best/worst case thing, but the max fps went up by a lot
19:02 HdkR: They must use a significant number of immediates
19:02 karolherbst: not really
19:03 HdkR: large shaders?
19:03 karolherbst: nope
19:03 karolherbst: no idea what's going on here
19:05 HdkR: Woo for surprising amount of perf I guess
19:06 karolherbst: I am sure it's something stupid... let me do a few more rounds of benchmarking
19:08 karolherbst: HdkR: maybe it still renders something incorrectly and I simply don't see it?
19:09 HdkR: Could be
19:37 karolherbst: mhhhhh
19:37 karolherbst: something is very odd
19:38 karolherbst: I thought those ctxsw timeouts were caused by the applied clockgating patches
19:38 karolherbst: but without those it's even worse
19:38 karolherbst: system is pretty unstable and I get various issues
19:39 karolherbst: mupuf: random thought: let's assume the power sensors is right on my GPU and the reported 150W is actually the real value, but drawing more power than nvidia doesn't necesarily mean the core has to be warmer, right? Could be the memory as well or something
19:40 mupuf: Could be true, but you don't control the fan
19:40 mupuf: Buy an external power sensor ;)
19:40 mupuf: They are super cheap
19:40 mupuf: Then get the battery out, and see what happens
19:41 mupuf:doubts you could dissipate 150W
19:41 karolherbst: allthough drawing double the budget is kind of intense
19:42 karolherbst: I highly doubt that as well
19:47 karolherbst: Lyude|PTO: by any chance, are you there?
19:48 karolherbst: mupuf: but in any case, that leaves us with the issue on how to program the power sensors...
19:49 Lyude|PTO: karolherbst: depends on the question
19:49 karolherbst: Lyude|PTO: I think your clockgating patches causes some issues on my system... and I saw your comment regarding the idle filters. Anything special I should do? like creating an mmiotrace?
19:50 Lyude|PTO: karolherbst: first try the patches for Kepler that have been awaiting review for a couple months now :P
19:50 Lyude|PTO: If that doesn't work yeah, mmiotraces I need are boot up, shut down, suspend and resume
19:50 karolherbst: does it help on a gm204 as well?
19:50 Lyude|PTO: All with the binary driver
19:51 karolherbst: I am testing it on a gm204 with your WIP patches
19:51 Lyude|PTO: karolherbst: probably not, see the wip branch I've got on gitlab. But I still need to get around to reordering where we write the cg packs for Maxwell to work fully
19:51 karolherbst: mupuf: did you do anything with your git repository? I can't access the vbios stuff anymore I think
19:51 Lyude|PTO: The thing is with maxwell and how things are restructured the current code for writing the clockgate packs ends up edecuting after we program CG_CTRL
19:52 Lyude|PTO: We need to fix that so it happens before, not after
19:52 karolherbst: mhhh
19:52 karolherbst: I'd assume that this could indeed affect context switching
19:52 Lyude|PTO: Yes
19:52 karolherbst: *effect
19:52 karolherbst: mhhhhhhh
19:52 karolherbst: meh
19:52 karolherbst: but without those, the GPU isn't stable either
19:52 karolherbst: but the power sensors gives me way too high values
19:53 karolherbst: but there is a chance I am over the budget indeed
19:53 karolherbst: and this could also cause other issues
19:53 Lyude|PTO: I had been planning on fixing this on Christmas before my test machine at RH did something very dumb and made it so I couldn't login remotely to continue working on it
19:53 karolherbst: :/
19:53 Lyude|PTO: (turns out somehow the boot menu got set in BootNext)
19:54 Lyude|PTO: that being said the work should be p simple
19:55 Lyude|PTO: I think I've got most of the cg packs there already, and you can enable debug=therm=trace to see where it writes the cg packs and enables cg_ctrl
19:56 karolherbst: mupuf: mhhh, there are two bytes in the vbios table I can't make any sense of: 60 34
19:58 Lyude|PTO: karolherbst: wait, ooh, are you suggesting some of the ctx switching issues are coming from clockgatingM
19:58 karolherbst: not 100% sure
19:58 Lyude|PTO: It should be very easy to check
19:58 karolherbst: I removed the patches and that issue disappeared, but now I've got others
19:58 Lyude|PTO: set everything in the CG_CTRL registers to always on
19:59 karolherbst: mupuf: "HW Only Slowdown Enable. On assertion HW will slowdown clocks (NVCLK, HOTCLK) using _EXT_POWER settings (use only with GPIO12). No software action will be taken. On deassertion HW will release clock slowdown." this GPIO exists on my GPU
19:59 Lyude|PTO: at that point I'm fairly sure the contents of the cg packs shouldn't matter
19:59 karolherbst: mupuf: I assume that's the one power capping my GPU on battery
19:59 Lyude|PTO: and if there is any cg issues that may fix it
20:00 karolherbst: mhhh
20:00 karolherbst: Lyude|PTO: which register was that?
20:00 karolherbst: the 20200 one?
20:01 Lyude|PTO: karolherbst: uhhhhhhh
20:02 Lyude|PTO: Look through drm/nouveau/nvkm/subdev/therm/gf119.c (might be an earlier gen then that)
20:02 Lyude|PTO: karolherbst: also all of those registers I documented in envytools
20:02 karolherbst: but that sensors is bothering me the most
20:02 karolherbst: wthout your clock gating patches, it reports 60W with 0xf idling
20:03 karolherbst: and the budget is like 80W
20:03 Lyude|PTO: Also remember cg isn't on by default
20:03 Lyude|PTO: karolherbst: that makes sense
20:04 Lyude|PTO: I wouldn't be surprised if incorrect cg settings cause more power usage
20:04 karolherbst: I think I will figure out what's wrong with the sensor
20:04 Lyude|PTO: as such incorrect settings might be inferior to whatever the vbios sets
20:04 karolherbst: Lyude|PTO: no, I am sure it's overreporting
20:04 Lyude|PTO: ahh
20:04 karolherbst: compared with nvidia more or less, but I really should get a real power meter
20:05 karolherbst: but I think it's like 50% above the actual value
20:05 Lyude|PTO: Anyway I've gotta go back to drinking with the "Scottish Consulate", any other questions you've got?
20:05 karolherbst: no, have fun on your PTO
20:05 Lyude|PTO: Thanks!
20:05 Lyude|PTO: See you later~
20:07 karolherbst: mupuf: any idea on how to trace how nvidia initializes the i2c power sensor?
20:07 karolherbst: I would want to fake the vbios, but it's a gm204 :/
20:09 karolherbst: it's also quite sad that the ina3221 manual doesn't contain the word "calibration"
20:14 karolherbst: ohhh
20:15 karolherbst: it's the mask/enable register
20:49 karolherbst: mupuf: okay, the value is definietly wrong. my battery reports a draw of around 75W entire system, but the GPU sensor reports 82W
20:50 karolherbst: CPU reports around 5W for the package
20:51 HdkR: You have an 82w GPU in your laptop? :P
20:51 karolherbst: actually it's a 80W one
20:51 karolherbst: HdkR: but... I can get the power sensor to report 150W as well
20:51 karolherbst: just sounds way to high
20:52 karolherbst: mupuf: also, the system doesn't power cap, it's really just the clocks which get cut
20:52 karolherbst: different avg power consumption with different applications on battery
20:52 HdkR: Yea, 150w is like...Maximum power mobile GPU territory
20:52 karolherbst: yeah
20:52 HdkR: Did you stick a GTX 1080 in your laptop? :P
20:53 karolherbst: nope, 970m
20:53 karolherbst: I got the laptop super cheap
20:54 karolherbst: like roughly 900€ (and it has a nvme ssd and everything), usually you would pay like double for that
20:54 karolherbst: HdkR: anyway, that power sensor annoys me
20:55 HdkR: Considering it should ever go past 75w for that chip, it's mad :P
20:55 karolherbst: the vbios reports 80W
20:56 karolherbst: HdkR: well, especially because the battery reports a smaller power consumption for the entire system ;)
20:56 karolherbst: and this includes the CPU and the HDDs
20:56 karolherbst: and the fans
20:56 HdkR: aye
20:57 karolherbst: anyway, I think that's good enough as a power meter
20:57 karolherbst: at some point I kind of figured out how to read out the ampere value from the EC... but I am quite sure it's correct
20:58 karolherbst: ufff, right
20:58 karolherbst: I reverse engineered it with the help of ACPI
20:59 karolherbst: EC.BPR0 and EC.BPV0
21:00 karolherbst: now I am wondering why the acpi hwmon driver doesn't use those values...
21:00 karolherbst: HdkR: new highscore: 160.37W
21:02 HdkR: I see how it is. You're just overclocking
21:02 HdkR: :P
21:02 karolherbst: :p
21:02 karolherbst: well I've enabled reclocking on that gm204 here
21:02 karolherbst: but there isn't really any space for overclocking
21:03 karolherbst: mhh, well there are some cstates actually
21:03 karolherbst: but with the highest legal values the core voltage is already 1.1V
21:04 karolherbst: HdkR: for whatever reasons that GPU has no boost values
21:04 karolherbst: the boost and turbo boost base clocks are all the same
21:05 karolherbst: and the highest reachable cstate has the same clock
21:06 karolherbst: uhm, vpstate is how you call them
21:06 karolherbst: not base clocks
21:06 karolherbst: allthough there is this weird entry in the cstep table
21:07 karolherbst: 0x08 0x02 0x4b 0x4e pointing to a much higher cstate but with no pstate value
21:07 HdkR: huh
21:07 karolherbst: I am sure it has a special meaning
21:07 HdkR: Probably
21:07 karolherbst: _but_
21:07 karolherbst: nvidia doesn't make use of it as well
21:08 karolherbst: at least not on linux
21:11 HdkR: :shruggie:
21:11 karolherbst: anyway, the driver isn't able to reach the higher clocks
21:11 karolherbst: already at the budget :)
21:11 karolherbst: but nvidia is nasty, goes a little over
21:11 HdkR: Oh no :P
21:14 karolherbst: soo, values
21:14 karolherbst: GPU fan: ~5600 rpm, temp: 83C/78C, power usage: 160W/85W
21:15 karolherbst: nouveau/nvidia
21:17 karolherbst: battery reports 85W, nvidia 50W
21:17 karolherbst: mhhh
21:17 karolherbst: 15W package
21:17 karolherbst: 20W are lost for various components which might make sense?
23:41 karolherbst: HdkR: mhh, with my newest version of the opt pass I get worse resulsts in shader-db... I kind of fear that there are some instructions which can't take a full cb address :/ or something
23:41 karolherbst: currently I have implemented it with the driver const buf on maxwell as well
23:41 karolherbst: seemed easier
23:45 HdkR: I only know Maxwell+, no idea about older generation limitations :P
23:46 karolherbst: mhhh, I am testing against pascall
23:46 karolherbst: *pascal
23:46 karolherbst: but it's weird
23:46 karolherbst: I get worse results than before
23:46 karolherbst: no idea why
23:48 karolherbst: mhhh, maybe something super stupid