08:50gentlewonder: So the components of unified SIMD/VLIW core for fast (which is slightly more complex than specialized pipelined hw), there is inflight buffer fifo, that gets filled in wraparound manner from decode fed opcode operands, since zero can not be accounted when something bitwises against empty operand from grduation, this does not fill the inflight array, only mathing entries are modified from decoded operands fifo. SO componendts are instruction
08:50gentlewonder: buffer, inflight buffer and issue queue entries hence.
08:51gentlewonder: everything in inflight buffer is absolutely addresses, so one possible implementation is branches or fetch from peudo ops relocations. Now we talk how those buffers need to be pinned in their contents.
08:52gentlewonder: difference being that branches flush the contents of instruction buffer like halt would when reset is not executed after halt, however relocations would not per pseudo-ops.
08:55gentlewonder: it executes in the following manner: first instruction is fetched (dispatcher needs to be brought down with giving zero threads to the core and start address as determined where one needs to start)
08:56gentlewonder: first it brings in the instruction which is added as first element out of 40 in inflight buffer, NOPs are not going to inflight buffer, later when inflight buffer is pinned you need to start either feeding NOPs in alternating steps or something that does not issue dependencies in inflight buffer
08:57gentlewonder: many options is possible, but the best case is to chose an implementation that is the lowest power and performing the fastest at the same time.
09:02gentlewonder: since for instance everything is absolutely addressed and lsu can target hence any reg, when the bases are authored respectively, first you'd go pinning 40loads to the first comlumn, each with a different base
09:03gentlewonder: rest of the 39 columns in this GCN model, belong to different type of functional units
09:08gentlewonder: first it starts from zero wfid, which is decoded as one later in the valid_entry it is added to the inflight buffer when it also does not issue, so first element is always added to the issue queues and inflight buffer
09:08gentlewonder: now when the arbiter chooses from scoreboard free lights or green ones, to also issue this instruction
09:09gentlewonder: what happens is valid goes down, and this a time to use NOP from the respective fetch or instruction buffer entry
09:10gentlewonder: cause than you will be stying at the first column
09:15gentlewonder: next off, when NOP technically does not issue anymore , it tries to bring in the next instruction, when this one issues things reverse
09:22gentlewonder: cause basically valid bits start off as zero, hence when nothing issues it toggles issue to be ready for the first element, and arbiter can decode and issue this, but when after this nop comes it is upto the first instruction to free the next operand
09:25gentlewonder: so when you had filled the queues and inflight buffer correctly, with providing NOP as after the issue...it will take the next instruction from the same column which is exactly what needs to be done to schedule into another column
09:43gentlewonder: because the VGPR_BASE pseudo ops, will not get routed through when dispatcher is off, they need stale lsu copies from issue queues to do that
09:51gentlewonder: in other words, when first element issues , than it gets a nop, when it graduates it gets a new instruction to free something from all the inflight buffer
09:51gentlewonder: which is 40entries wide
09:54gentlewonder: so roughly there are two possibilites same thing can be done with loop index indirect addressing or alike on sm2.0 pixel shaders of es2 for instance.
09:57karolherbst: gentlewonder: mind taking your nouveau unrelated write up somewhere else?
09:57gentlewonder: I will use pseudo-ops as this is more power friendly and even easier and many benefits.
10:04gentlewonder: it is related to nouveau very easily, and the instruction arbiter is the same on nouveau, it is barrel shifter, and priority encoder pair, all unified simd chips are basically the same.
10:05gentlewonder: the arbiter, in case of NOP is fed uses the gap between the last and current wfid, and adds one basiclly since priority encoder encodes one from non issued instruction to in-order or repeat the wfid of nop again.
10:06karolherbst: doesn't matter, nothing what you talk about is nouveau related, so do it elsewhere. Maybe even create a channel, dunno
10:06karolherbst: we also don't spam other drivers channel with details about nvidia GPUs
10:16gentlewonder: karolherbst: you are a known clown in red hat, i never had any respect towards your activity, same goes about Lyude, so yeah that is enough of a reason to leave indeed.
14:37alkisg: imirkin: hello! Did you find time to push that patch of yours upstream, and if yes, in which version will we find it? https://patchwork.freedesktop.org/patch/335616/
15:30raket: hi good folks, will a 980ti (nv120) work with a 27" 1440p 240hz monitor?
15:32RSpliet: raket: HDMI or DP?
15:33raket: RSpliet: DP
15:33RSpliet: There's a few factors at play. Even if I can't give you a definite answer, this is one of them ;-)
15:33RSpliet: Ok, good
15:34raket: 980ti and 1440p 165hz works great with 980ti and i patched the kernel to support reclocking, works great.. however some bugs in what i play, but i found a solution so doesn't matter.
15:34raket: then i will buy a 1440p 240hz monitor and see if it works :-)
15:34RSpliet: erm... this is a GM200. I don't think we can change the memory clock as we don't have any firmware running on the PMU
15:35RSpliet: Can you tell me the output of /sys/kernel/debug/dri/<card no>/pstate ?
15:35raket: RSpliet: i used karolherbst patches and it worked. i know it's unsupported but works anyway
15:35RSpliet: the AC line to be precise
15:35raket: 0f: core 595-1468 MHz memory 7200 MHz AC DC *
15:35RSpliet: no, the AC line
15:35raket: AC: core 1366 MHz memory 7200 MHz
15:35RSpliet: Ah thanks
15:36RSpliet: Ok, wow. cool. That takes away one barrier for such monitors
15:36raket: I mount own fans on the gpu connected to the motherboard, works great. i know skeggsb removed support for it.. but hey whatever, i want 165hz!
15:36RSpliet: karolherbst: did you hack up memory reclocking without PMU? I didn't know that!
15:36raket: RSpliet: it's a old patch!
15:37karolherbst: RSpliet: we can do memorcy relcocking on the PMU on maxwell2
15:37RSpliet: karolherbst: can we? I somewhat assumed that because it requires signed firmware, we don't bother uploading our own firmware at all...
15:37karolherbst: we don't, but the patches I written a long time ago just do that
15:38karolherbst: we still can't change the fan speeds though
15:38RSpliet: Ah, that's the missing link. Cool
15:38karolherbst: but that's totally fine on a laptop :)
15:38raket: karolherbst: yeah.. and i mounted my owns external fans on the card. should i even mind reporting bugs from nouveau from it or should i just let run wit whatever works?
15:39RSpliet: raket: so the only other thing to work out is the pixel clock for a monitor with such a mode. my gut feeling would be that because it's DP, it should probably work fine. But obvs hasn't been tested
15:40RSpliet: Hmm... is that really a pixel clock of over 884MHz? Maybe...
15:40RSpliet: karolherbst: does the dual-link DP stuff work?
15:40RSpliet: Or w/e it's called
15:40RSpliet: Actually, Lyude is more of a display expert :-D
15:40RSpliet: (no offence ofc! ;-))
15:40gentlewonder: karolherbst: tell me why do you need to reclock anything? Since cache is allready zero-cycle? the last time here probably, but you know 32inst cache they give nowdays can accomodate 8000words, when using this amount of cache as drivethrough, which little bit using more power through, the freakin algo is just very performant but too easy.
15:41gentlewonder: 32k cache i meant
15:41Lyude: not @ RSpliet, one second
15:41Lyude: *kill bill music starts playing*
15:41karolherbst: RSpliet: one display with two cables?
15:42Lyude: not good enough chanserv
15:43Lyude: RSpliet: so do you mean the 8 lane dp stuff?
15:45RSpliet: Lyude: well, I don't really know what I mean. raket wants a display with a pixelclock that back-of-the-envelope should be over 884MHz. I thought a single DP link was limited to like 570 or something like that.
15:46Lyude: raket: what kind of connector does this use?
15:46raket: Lyude: it's probably a DP cable to use 240hz with 1440p, i really don't know
15:46raket: i'm just thinking about to shop
15:47RSpliet: Wikipedia mentions the existence of HBR3 that should enable support of such a monitor. But I don't have a HBRx->NVIDIA generations mapping table :-D
15:48Lyude: raket: I mostly asked because I think tb3 supports some sort of dual link displayport thing and I don't know that we support that
15:48raket: Lyude: ok, so you suggest me to stay on 165hz 1440p till it's solved?
15:48Lyude: raket: not completely sure, could you send me some of the monitors you're looking at
15:49raket: Lyude: https://eu.aoc.com/en/gaming-monitors/ag273qz
15:51Lyude: ok - so I don't think it's the fancy 8 lane thing
15:52Lyude:does some match
15:52karolherbst: raket: does the mode you want to use even show up?
15:53RSpliet: Love that webpage by the way. Trying to sell 240Hz as being less blurry. But I'm definitely looking at a static picture. On my 60Hz laptop screen.
15:54Lyude: karolherbst: they're buying something, they don't have the monitor with them
15:59raket: karolherbst: i don't have the 1440p 27" monitor here, i'm planning to buy one. the 165hz monitor in 1440p works fine with 980ti/nv120
16:01Lyude: trying to figure out what the bpp for hdr is, but if we assume you aren't using hdr (since we don't have that working yet in nouveau) I think I'm getting like, 21.6GBit/s unless I'm doing my math wrong (2560×1440×30×240 then convert from bits to gbits), so we'd need HBR3 like RSpliet mentioned
16:01Lyude: which I think we do support
16:02RSpliet: Lyude: I skipped maths and went straight for https://en.wikipedia.org/wiki/DisplayPort#Refresh_frequency_limits_for_standard_video
16:03Lyude: oh, I should keep that in mind for the future, also I am fairly sure that's actually pretty easy to check the caps for since I think the max link speed is specified in the vbios
16:04Lyude: raket: let me confirm that and I should have a way for you to check if it'll work with your GPU, just remember you're limited by nouveau's lack of reclocking :P
16:05RSpliet: Lyude: no he isn't!
16:05RSpliet: I was surprised by that, but it's true :-D
16:05Lyude: RSpliet: oh huh
16:05RSpliet: Apparently some patches exist that for GM2xx just upload "our" PMU firmware and ignore the fact that the fan speed can't be controlled
16:06RSpliet: Which is like, not very desirable upstream. But if users know what they're doing (... well, I don't know what I'm doing half the time, and I RE'd some of this shit :-D) that could be workable.
16:07raket: RSpliet: i use that. i don't really care if the card would go broke. external fans <3
16:07Lyude:looks up how to check the dp capabilities
16:09Lyude: ok it does come from the vbios, raket mind sending me a copy of /sys/kernel/debug/dri/0/vbios_rom and /sys/kernel/debug/dri/0/strap_peek ?
16:10raket: where should i upload it?
16:10Lyude: sent you an email over pm
16:28Lyude: huh, did envytools stop building for anyone else? or do I maybe need a newer version of nouveau or something like that? https://paste.centos.org/view/8fdce076
16:42Lyude: karolherbst: do you have any ideas?
16:43Lyude: i'm guessing I have an older version of some nouveau header but I can't figure out how to tell cmake to include extra directories without having to manually edit the CMakeLists.txt ifle
16:47Lyude: every time I have to deal with something that's more complicated then generating build files with cmake i immediately wonder if I should just convert envytools over to meson
16:51mwk: ... feel free
16:52mwk: I tried to do it long ago, but failed for some dumb reason
16:52Lyude: good enough for me, i'll look into doing it today
17:09Lyude: ok, i think i see how to fix this now
17:19Lyude: raket: mhhh, I got nvbios built but it looks like we never implemented support for all of the DP speeds. I -think- yours only supports 5.4 GBit/s on the DP link, but could you get a dmesg log of booting your system up with drm.debug=0x6 with nouveau loaded? that should tell me for sure
17:20raket: Lyude: brb
17:25raket: Lyude: dmesg sent!
17:28Lyude: raket: yeah I was right, I don't think your gpu will support this. [ 7.128527] nouveau 0000:01:00.0: DRM: encoder: 4x540000
17:28raket: Do you think it will work if i upgrade to the firmware i posted in pm?
17:30Lyude: raket: i've never seen/heard someone updating the firmware on a GPU before, it's worth a shot and you can send me another dmesg after you do it
17:30Lyude: so i have no idea
17:30raket: ok. i will start windows and try!
17:54karolherbst: Lyude: ehh... I think I removed those from libdrm ...
17:54Lyude: karolherbst: yeah i just sent a MR to fix envytools
17:54karolherbst: yeah.. the kernel doesn't support those either anyway
17:56karolherbst: ahh.. this way
18:18raket: Lyude: here comes the dmesg after firmware upgrade from https://www.nvidia.com/en-us/drivers/nv-uefi-update-x64/ , se your mailbox!
18:19Lyude: raket: yeah I see no difference, sorry :(
18:20raket: but 5.4gpbs/s isn't that less than 2560x1440 165hz? :-) whatever.. i will stick with this monitor from 2015 forever and ever :-)
18:20Lyude: raket: lemme check
18:21Lyude: raket: nope, according to https://en.wikipedia.org/wiki/DisplayPort#Refresh_frequency_limits_for_standard_video it doesn't appear so, HBR2 is enough for that
18:21raket: btw, what's vram_pushbuf?
18:23Lyude: raket: so, nvidia GPUs use something called a pushbuffer for controlling some various blocks on the GPU, for instance they're used for controlling the evo display controller (which handles modesetting). they can be allocated in vram or in system memory (depending on the nvidia generation, some generations don't work well with pushbuffers through GART because of addressing limitations), I think
18:23Lyude: that option just forces them to live in vram
18:23Lyude: pre-pascal iirc they live in vram, post-pascal they live in system memory
18:24raket: how do i enable it? /lib/modprobe.d/nouv.conf -> options nouveau config=NvClkMode=15,NvMemExec=1,NvBoost=2,vram_pushbuf=1 ?
18:24Lyude: yeah, but tbh you shouldn't really need to change that
18:24Lyude: it should already be in vram for maxwell
18:24raket: there's some bugs with the program i use which whines about pushbuf, in dmesg it says vram_pushbuf : 0
18:26Lyude:needs to look at nouveau for a sec
18:29Lyude: raket: ah ok I misspoke, I think that actually does apply to all channels -except- the evo/nvd channel, which always lives in vram for pascal. that being said though, I would file a bug anyway - if an application is causing a pushbuf error to get triggered, that sounds like a mesa problem to me
18:29raket: my previous statement was wrong, options nouveau config=NvClkMode=15,NvMemExec=1,NvBoost=2,vram_pushbuf=1 doesn't set vram_pushbuf to 1, how do i assign it?
18:30Lyude: raket: don't include it as part of config=, it's its own option
18:30raket: ... meaning it should be a blankspace between NvBoost=2 and vram_pushbuf=1 ?
18:30Lyude: raket: yeah
18:31raket: ok, reboot brb, trying to find the needle in the haystack
18:39raket: seems like vram_pushbuf=1 fixed the issue... well i will get back to you if i get and other nouveau spitting out errors when using it
18:40Lyude: raket: definitely should report that then
18:40Lyude: it's not really expected that users should have to change that by default
18:42raket: Lyude: well it could be a problem in the program/client i use. since i play my game daily i will report it as soon as possible if it happens again, i'm really into oldschool e-sports :-)
18:42raket: Btw, how do i take snapshots if the screen get scrambled?
18:43raket: and everything just crash?
18:43Lyude: raket: i wouldn't say so, the application shouldn't be interacting with the pushbuffer directly at all - mesa should be handling all of that
18:43Lyude: but maybe i'm wrong
18:44raket: (meaning; i need to press the powerbutton to shutdown the computer).. well yeah right, i need some scripting to take care of logs..
18:44Lyude: karolherbst: is it normal to need to set vram_pushbuf=1 to workaround a nouveau issue with pushbuffewrs?
18:44raket: Lyude: well, i will get back to you, to just see if the race conditoin is meet again, if it work with vram_pushbuf=1 then whatever
18:45karolherbst: Lyude: nope
18:45Lyude: figured :P
18:45karolherbst: but depending on the bug it can solve some weird issues
18:46raket: i should just buy a gtx 960 and send it to karolherbst and instructions how to get the bug
18:46raket: however, seems to be fine now..
18:48karolherbst:should improve debugibility in that area
18:51Lyude: you know that wouldn't be the first time someone's randomly suggested they would buy a nouveau developer a gpu lol
18:52raket: well whatever, time to play some quake1 and see if the race condition is meet again
18:52raket: it really seems to be fine now :-)
18:52raket: thanks <3
18:55karolherbst: I should also finish my mt fix :D
18:56karolherbst: now that I actually fixed it
19:09HdkR: Multithreading you say?
19:10karolherbst: I have a branch which passes all deqp EGL multi threading tests without crashing
19:10karolherbst: and those are super aggressive in testing all of that
19:10HdkR: Whoa, It actually has some race tests?
19:10karolherbst: creates like 20 contexts and does crazy shit
19:11karolherbst: HdkR: search for tests with multithread and multi_thread in the name
19:11karolherbst: those are all EGL ones though
19:11karolherbst: external/openglcts/data/mustpass/egl/aosp_mustpass/master/egl-master.txt contains them I think
19:12HdkR: I'll have to give it a clone quick,
19:30HdkR: Took awhile to clone from Google's servers
19:30karolherbst: HdkR: you can also just use the khronos one
19:31karolherbst: khronos us upstream, where googles is a downstream fork now :p
19:31HdkR: Well it's cloned now
19:31HdkR: Definitely doing some fun stuff there