00:00karolherbst: maybe there is hardware with bigger ones?
00:00karolherbst: or smaller
00:00imirkin: (on nvidia hw. other hw can be larger)
00:00endrift: is that the minimum necessary?
00:00endrift: because like, I care about Intel and AMD here too
00:00imirkin: yes. because that's what nvidia supported, they ensured it was sufficient to be complaint.
00:00karolherbst: imirkin: wasn't there some magic on the ISA?
00:00karolherbst: where you put in an offset and it just reads from the next UBO if it overflows?
00:00imirkin: karolherbst: you can indirect to another ubo.
00:00imirkin: that's how indirect accesses across ubo's work
00:00karolherbst: sure.. but if you address 112k you have to check the high bit
00:01endrift: the maximum offset you can address from a base I THINK is 64kiB
00:01karolherbst: I was thining about c1[$r0] which reads from c2[$r0 - 64k]
00:01endrift: let me check
00:01karolherbst: imirkin: can we do something like that?
00:01karolherbst: normally we would have a load with two sources
00:01karolherbst: one for each indirection
00:02endrift: ok there's one shader that needs 75kB contiguous but that's easy to circumvent
00:02endrift: I can just upload it in two blocks
00:02karolherbst: that was my idea
00:02karolherbst: and it would be constant for the entire frame, right?
00:02karolherbst: so just _one_ upload per frame
00:02endrift: it can change midframe
00:02karolherbst: but I guess it doesn't happen all that often?
00:03karolherbst: but you have an ubo
00:03karolherbst: you can do partial updates
00:03endrift: no, but I do have a pathological test
00:03endrift: I might be able to flip between two VRAM textures to avoid stalls
00:03endrift: I'm currently only using one
00:03karolherbst: you can use even more if you still stall :p
00:04endrift: yeah but it means more bandwidth
00:04endrift: that said, it's still a low amount of bandwidth
00:04endrift: and the reason it was slowing down was stalls, not latency
00:05karolherbst: why would it mean more bandwidth?
00:05karolherbst: you don't upload more data
00:05endrift: I do, because I need to make sure the data gets to both textures eventually
00:05endrift: I mean I could do a texture copy
00:05endrift: but that's a little annoying
00:06karolherbst: heh? why though?, just bind the other texture instead
00:06endrift: unless I'm uploading the whole texture at once
00:06karolherbst: or well, if you don't reupload the entire texture every time then yeah.. you would need more
00:06endrift: that's what I'm doing
00:06endrift: I have it in 24 discrete subtextures
00:10endrift: I can see about doing a ring buffer of textures :P
00:43imirkin: skeggsb: so it looks like DRM is meant to work with 0..1 (which makes sense), so we have to make use of USE_GAIN_OFS... which is only on 507c, but is on both 827c/827d
00:44imirkin: i'm thinking only enable the format on G84+?
00:44imirkin: or add the requirement on G80 that it's all-or-nothing
00:45imirkin: i.e. all heads must have fp16 or none
00:45imirkin: yeah, that actually makes more sense.
00:49skeggsb: i think it's fine that only 507c has it, that's where you'd want to control it from anyway
00:49skeggsb: not sure the core channel control would have any effect the way we use evo anyway
00:52imirkin: hm ok
00:52imirkin: i still don't understand what any of this stuff is ...
00:53imirkin: like ... how does this even work? base507c_image_set
00:53imirkin: how does it know which head it's referring to with the width/height thing?
00:53imirkin: er hm. i guess there's one big image for everyone to scan out from...
00:54imirkin: (but that's not necessarily the case, is it...)
00:54skeggsb: each base channel is associated with a particular head, there's a static mapping
00:55imirkin: and the difference between "base" and "head" is?
00:55imirkin: i.e. NV827C_SET_PROCESSING vs NV827D_HEAD_SET_PROCESSING?
00:58skeggsb: the core channel can also scanout an image (until volta, anyway).. i believe the intention was for the desktop to be defined there, and the base channels get handed to a 3D application to control directly for fullscreen stuff etc
00:58skeggsb: the base channel image overrides the one on the core channel
00:58skeggsb: we only use the base channel, and just configure the core channel in such a way that it doesn't annoy the cross-channel error checks
01:06imirkin: and how does this base/core channel stuff map onto things like head?
01:06imirkin: i.e. 827d
01:07skeggsb: "head" is just "crtc".. the core channel controls all of them, the mode, output routing etc. the base channels are basically an overlay, one per head
01:07imirkin: ok. and does base507c map onto this "base" channel? or core? or?
01:08skeggsb: base channel
01:08imirkin: and head507d would be ... also base channel?
01:08skeggsb: nope, core
01:08imirkin: so how does it know which head is which?
01:09imirkin: oh, i see.
01:09imirkin: cl827d.h:#define NV827D_HEAD_SET_PROCESSING(a) (0x00000910 + (a)*0x00000400)
01:09skeggsb: what do you mean?
01:09skeggsb: yes :)
01:09imirkin: it knows based on the offset
01:09imirkin: so the same setting can be effected via core and base channels
01:10imirkin: can't see how *that* might backfire...
01:10skeggsb: it's not the same setting, if the base channel is active, the core one will be irrelevant
01:10imirkin: is the base channel ever not active in nouveau's usage?
01:10skeggsb: it completely overrides the image-related stuff on the core channel
01:11skeggsb: no, we always use the base channel, it's got more "features" (control over swap interval, semaphores, ability to lock heads together to display simultaneously etc etc etc) than the core channel can manage
01:12imirkin: right ok
01:12skeggsb: also, NVDisplay's core channel has zero image-related controls, it's all in "window" (combined features of base/overlay) channels
01:12imirkin: and remind me again what "asy" is in "asyh", "asyw", etc?
01:13imirkin: [yeah, i've asked like 50 times already]
01:13skeggsb: it's the "assembly" state, ie. not yet active on hw
01:13skeggsb: those names come from nvidia register definitions for evo's double-buffered state fwiw
01:13imirkin: so ... basically like "pending"
01:25imirkin: skeggsb: hey, i don't think our overlays support alpha-based transparency, do they?
01:25imirkin: or am i doing something wrong?
01:26skeggsb: no, they don't
01:26imirkin: looks like source/dest color key, or plain opaque...
01:26skeggsb: i believe they do on NVDisplay, but not prior
01:26imirkin: ok. should probably remove the alpha formats from the overlay lists...
01:27imirkin: (NVD == volta, right?)
01:28imirkin: well, i probably won't be seeing any of that until a GT 1630 comes out...
01:28imirkin: and dell starts shipping it by default in their pc's, and we order such a pc at work :)
01:45imirkin: alright. let's see if this gain thing works as advertised. bbl.
02:11imirkin: skeggsb: it works =]
02:11imirkin: should test on the GK208 too ... hm
02:15imirkin: of course, it looks like the 907d+ displays (at least the GK208 one) really does want stuff in the range 0..1
02:15imirkin: and not 0..1024
02:17imirkin: skeggsb: suggestions? this is my current code -- https://hastebin.com/aresimerug.php
02:17imirkin: should i ... conditionally set conv_gain_offset? or make it internal to the 50/82 bases?
02:18imirkin: i'm thinking the latter
02:35imirkin: skeggsb: this seems to work, and isn't too intrusive: https://hastebin.com/himebusenu.php
02:35imirkin: obviously only limited testing, but ... meh
02:39imirkin: if you're happy with it, let me know, and i'll send a proper patch
02:45HdkR: `ok there's one shader that needs 75kB contiguous but that's easy to circumvent`
02:45HdkR: Does Nouveau not support the indirect UBO index mode that can index multiple cbufs?
02:45imirkin: not user-accessible though, unless they really have a ubo array and are doing indirect access in it
02:46HdkR: I guess it would actually be hard to do in GL since there is no way to do indirect selection of a UBO?
02:47imirkin: sure there is
02:47imirkin: and that's when you can do it
02:47imirkin: but if you're just doing an indirect ubo element access, we don't flip that mode on
02:47imirkin: so you have to be doing an indirect on the buffers themselves
02:47imirkin: LDC.IS iirc
02:47HdkR: Are you able to have an array of UBO bindings?
02:48HdkR: I guess that's how to do it then :P
02:48imirkin: GL 4.0 functionality, iirc (ARB_gpu_shader5)
02:48HdkR: ah that's why, I get fuzzy past GL 3.x
02:48imirkin: not 100% sure that ever came out in ES though... maybe OES_gpu_shader5 allows it?
02:49HdkR: Although if you're dynamically indexing, you don't get to inline any of the UBO acceses in the instructions. Have to use LDC.* for all
02:49imirkin: can't win 'em all
02:50HdkR: Time for Turing to add a inline indirect UBO access for every instruction
02:54skeggsb: imirkin: what classes did you manage to test? i'll check nv50,and gv100+ in a short while, assuming you don't have those
02:54imirkin: i tested with a G84 and GK208
02:54imirkin: let me push a modetest branch
02:54skeggsb: ok, i'll confirm 507c then, because nv50 is annoying like that sometimes.. and i swear i tested this on gv100 already and it didn't work, but i'll try again just in case
02:55imirkin: skeggsb: https://github.com/imirkin/drm.git 30bpp
02:55imirkin: build modetest -- that's what i was testing with. then run -s ...@XB4H or whatever
02:56skeggsb: ack, thanks
02:56skeggsb: i'll let you know how it goes a bit later on
02:57imirkin: skeggsb: well, if gv100 doesn't have the cap, then maybe more checking should be done (caps bits, whatever)
02:57imirkin: gv100 might be special though
02:57skeggsb: yeah indeed, it's also possible i messed up :P
02:57imirkin: i did have to fix up the depth, otherwise i just got EINVAL's
02:58imirkin: let me mail you a proper patch, so it's easier to apply
02:58imirkin: i think hastebin messes up tabs (or rather i mess them up when pasting in there)
05:29imirkin: skeggsb: i'm out for a while, but let me know if you hit any snags in your testing that you want me to address.
05:29imirkin: i'll be checking scrollback
05:30imirkin: [and email, obviously]
05:31skeggsb: imirkin: ack, i'm not far off taking a look at it
11:16coderobe: tried karolherbst's secboot_fixes on GP104 and at least it doesn't crash anymore, woo! https://shr.codero.be/ConfusedSilverGoldfish4.txt
11:17karolherbst: still secboot failing
11:17karolherbst: oh well, at least the runpm issue is resolved, which was the worse issue anyway
12:59PaulePanter: Lyude, karolherbst: Regarding the hang problem from some days ago.
12:59PaulePanter: I changed the cable, but the problem remains.
13:00PaulePanter: The DP monitor is turned off, so it should be fine that no EDID is sent.
13:00karolherbst: I assume the display firmware is actually broken
13:01karolherbst: if it is turned of, why would it want to communicate with the GPU?
13:01PaulePanter: That’s possible.
13:01karolherbst: what happens if you turn the display on?
13:02karolherbst: and what happens if you boot without the display connected and then connect it while it's off?
13:03PaulePanter: If the display is on, no errors are reported. Same with disconnected display.
13:04PaulePanter: The problematic thing is, with the turned off monitor, running xfce4-display-settings for example, the system “freezes”.
13:04PaulePanter: Usage increase over time, and in the end it reboots. That time differs.
13:05PaulePanter: Unfortunately, I do not see the process responsible for the usage in top or ps output.
13:05PaulePanter: `sudo perf top` is also not very helpful.
13:07PaulePanter: Looks like the state of the turned of display switches between connected and disconnected really fast(?)?
13:07PaulePanter: xrandr can trigger it too.
13:08PaulePanter: There were no problems with the proprietary Nvidia driver 390.x. We switched to Nouveau some weeks ago.
13:17karolherbst: Lyude: ^^ enough information?
13:18karolherbst: airlied: ^^ any ideas on what to do here?
13:18karolherbst: sounds like something other drivers might have been running into already
18:25endrift: Had a dream last night about optimizing VRAM uploads. Oops
18:39karolherbst: endrift: btw, did you try the VRAM as UBO thing? would be interesting to see how much benefit that would give you
18:39karolherbst: or maybe using ssbo instead of texture
19:16endrift: karolherbst: not yet. And I don't want to use SSBOs because of the minimum version requirement
19:26karolherbst: SSBOs are VRAM anyway
19:28endrift: According to the Khronos wiki UBOs are minimum 16kiB, not 64
19:32endrift: And does that mean I can't use any other uniforms if I make a UBO that big?
19:44endrift: I may try doing 4x16kiB UBOs just to be safe
20:37karolherbst: endrift: you are required to have at least 8 UBOs
20:37karolherbst: + uniforms
20:38karolherbst: endrift: also, you can query the driver to know the size of an UBO
20:38endrift: Yeah I know but I don't want to bother with adjusting the shader at compile time
20:38karolherbst: for graphic shaders you have at least 14 ubos on nvidia
20:39endrift: I may have to. I don't have any computers of the right vintage to test on though
21:11endrift: I think I'm gonna try to do palettes as a UBO first, since VRAM updates, while they are annoying, are less impactful in the scenes where they are bad
21:12karolherbst: endrift: the point was that reading from an UBO is much faster than reading from a texture/VRAM
21:12karolherbst: and if you don't have as many updates and many reads still, there can be a significant perf improvement
21:13endrift: well sure but this'll give me an easier test first, plus improve the performance in a bunch of games
21:13endrift: and if I can get that working I'll do VRAM next
21:13karolherbst: ohh, there is no benefit in UBOs over a uniform array
21:13karolherbst: so if you have the uniform stuff working you can just keep that
21:14endrift: No I mean, palettes are currently a 256 element array, as opposed to a 160x256 element array which lets me not have to worry about palette updates mid-frame
21:14karolherbst: I see
21:14karolherbst: so the palette is fixed for each entire frame?
21:14endrift: midframe palette updates are a common technique
21:15endrift: no, I'm currently dividing frames up every time the palette is updated
21:15karolherbst: or would you be able to retrieve the entire palette for the entire frame before drawing?
21:15karolherbst: ahh, I see
21:15endrift: this would let me remove that dividing
21:15karolherbst: yeah, just tried to refresh my memory on that
21:15karolherbst: sounds like a good idea
21:15endrift: I need to figure out how to use UBOs first though
21:16endrift: doesn't look too hard
21:16endrift: and before that I need to get my cat off of my chair so I can sit down :P
21:16karolherbst: but yeah, that sounds like something which could speed up things a lot
21:16endrift: yep I have a pathological test case for it
21:17endrift: I have lots of pathological test cases :D
21:17karolherbst: might make sense to have two rendered paths. one for pre 3.0 hardware and one for modern
21:17karolherbst: maybe even three
21:17karolherbst: pre 3.0, 3.3 and 4.6
21:17endrift: is there a utility that can tell me the values of the various system glGets without me having to write it?
21:17endrift: I thought glxinfo did it but I can't find it
21:18endrift: I'm not supporting pre-3.0 for this
21:18endrift: I'm probably gonna bump the requirement from 3.0 to 3.2 Core
21:18endrift: maybe 3.3 Core
21:18endrift: The software renderer works fine if you have pre-3
21:18endrift: it just can't do fancy upscaling
21:21karolherbst: yeah.. on older hardware it might even make much sense to use the GPU for that
21:21karolherbst: endrift: do you know how much faster GPU rendering is compared to the software one?
21:22endrift: not very
21:22endrift: if at all
21:22endrift: I'm doing this for the fancy upscaling
21:22karolherbst: ahh, I see
21:22wrl: endrift: 3.2 is a good target
21:23wrl: core, forward
21:23karolherbst: endrift: I was more refering to the frame splitting stuff as this is kind of a big difference for the rendering
21:23endrift: I thought Core was the replacement for forward
21:23karolherbst: but if you require 3.2 core you have UBOs anyway
21:24wrl: it might be
21:24wrl: i dunno
21:24endrift: oh, yeah, if I don't frame split it is faster
21:24wrl: i ask for core
21:24karolherbst: forward is something somebody requested without knowing why
21:24endrift: wrl: are you the wrl I think you are
21:24wrl: endrift: are there any other wrls i should know about
21:24endrift: depends on if you make music software
21:25karolherbst: apple is the only reason forward profiles exists
21:25karolherbst: I mean it
21:25wrl: endrift: it me
21:25wrl: i'm shipping 3.2 core and nobody's complained
21:25karolherbst: yeah, 3.2 core is good enough
21:25karolherbst: you could just require 3.3 though
21:26wrl: on mac 3.2 is all you can request
21:26endrift: Nothing I found supports 3.2 but not 3.3
21:26karolherbst: all hardware which can do 3.2 can also do 3.3
21:26wrl: there's just one define for it, rip
21:26karolherbst: well.. apple
21:26endrift: yeah I noticed that
21:26wrl: y e p
21:26endrift: you can request 3.2 or 4.1
21:26endrift: but nothing in between
21:26karolherbst: not that 3.3 is all that useull
21:26wrl: well... you can request 3.2, and if the OS is new enough you'll get 4.1
21:26karolherbst: it has GL_ARB_timer_query though
21:26wrl: it's the same constant
21:26endrift: my cat is making it hard to type
21:27karolherbst: ohh, GL_ARB_instanced_arrays
21:27karolherbst: okay, 3.3 is useful :D
21:27karolherbst: instanced drawing is the best anyway
21:27wrl: oh i guess there *is* a NSOpenGLProfileVersion4_1Core
21:27endrift: yes that's what I was talking about
21:27endrift: ack claws
21:28wrl: i could have sworn i looked for that the other day and it wasn't there
21:28wrl: ah, well, whatever. apple.
21:28karolherbst: endrift: with instanced drawing you can select texture in the vertex stage for the fragment stage
21:28karolherbst: and other weird shit
21:28karolherbst: can eliminate tons of draw calls
21:28endrift: that's actually huge
21:29endrift: means I can move a bunch of stuff from fragment stage to vertex stage
21:29karolherbst: endrift: https://www.khronos.org/opengl/wiki/Vertex_Rendering#Instancing
21:29endrift: my fragment shaders are on the heavy side
21:29endrift: though afaik they don't spill registers (thankfully)
21:29wrl: endrift: anyway i'm shipping 3.2 in cadmium and while i was expecting people to complain that their hw doesn't support it, nobody has
21:29wrl: there's been like... 1 person with a lappy from 2007 or sth
21:29wrl: intel integrated
21:30endrift: mmm GMAs
21:30wrl: literally the only report i've had and that was 4 years back
21:30wrl: you are well into the clear with 3.2 core
21:30karolherbst: endrift: mhh, maybe you need bindless_textures to make really use of all that though
21:31endrift: oddly I have someone complaining that they don't have working graphics on an old ATI Radeon that should support 3.3
21:31endrift: even though I'm only requesting 3.0
21:31endrift: but I don't have enough diagnostic info to debug
21:31wrl: endrift: windows?
21:31karolherbst: yeah.. they don't do fp64
21:31wrl: you're requesting 3.0 core?
21:31endrift: there's no such thing as 3.0 core
21:31endrift: I'm requesting 3.0 fc
21:32wrl: got it
21:32endrift: profiles were introduced in 3.2
21:32wrl: yeah i ran into some amd/ati issues on windows which were because i wasn't specifically asking for 3.2, i was just checking to make sure the context supported everything i needed
21:32endrift: yep that's what I'm doing
21:33endrift: do you expect it'll work if I request 3.2 core?
21:33karolherbst: endrift: ohh, wait. I mistook instanced drawing with bindless textures way of selecting textures
21:33wrl: usually it worked fine but amd gpus ended up in a "standards-whatever" mode. 3.0 shored up some Dumb Shit that GPUs could do and just requesting a 3.2 core profile context fixed my amd issues
21:33karolherbst: endrift: instanced drawing allows you to draw the same object multiple times at different locations
21:34endrift: ahaha if I request 3.2 core I get a context that doesn't work
21:34karolherbst: with one draw, instead of N
21:34endrift: but if I request 3.3 core it works
21:34wrl: it was something dumb like... unbound textures should effectively just be vec4(0.0) under >=3.0 but before that on amd the shader would crash intermittently
21:34wrl: which is Good
21:35karolherbst: instanced drawing is essentially geometry shader in non crappy
21:35wrl: (i wasn't even sampling from the texture either, it was behind a conditional in the fragment shader. if the call was even there then the shader would crash)
21:36wrl: anyway, i only brought it up because "amd on windows" has been my only outlier
21:36karolherbst: but you can still output a texture id if you have multiple texture bound and do an indirect texture access in the fragment shader and select a different shader for each drawn object though
21:36karolherbst: bindless_texture just makes that less messy
22:40endrift: urgh I can't figure out this compiler error
22:40endrift: I wish mesa compiler errors were better :/
22:41imirkin: what's the error?
22:41endrift: ...the error was that I was spelling uniform wrong
22:41imirkin: then you get a unknown token error or something
22:41endrift: "error: syntax error, unexpected NEW_IDENTIFIER, expecting $end"
22:42imirkin: does it at least give you a line number?
22:42endrift: it was very difficult to turn that into "you spelled uniform wrong" without syntax highlighting
22:42endrift: but I'm a bit dyslexic
22:42imirkin: should narrow it down
22:42imirkin: anyways, patches welcome :)
22:42imirkin: turns out it's _really_ hard to make good errors
22:42endrift: oh I know
22:43endrift: hire me to work on Mesa and I'll write patched ;)
22:43imirkin: esp without a custom lexer/etc
23:52imirkin: skeggsb: btw, what do you think about the adjusted_mode vs mode thing?
23:52imirkin: or haven't gotten to that yet?
23:53skeggsb: i'll have a look shortly, i can't remember where that function fits into things exactly
23:53imirkin: and i never knew :)