01:44 JayFoxRox: what precision is the NV20 [NV2A actually] vertex shader output for the colors [front/back]? is it uint8_t[4]?
02:00 imirkin: JayFoxRox: check out the nvhw stuff that mwk has put together
02:00 imirkin: he has bit-accurate tests for a lot of the pipeline
02:01 imirkin: [in envytools]
02:03 JayFoxRox: will do, thanks for pointing me :)
02:04 imirkin: did you work out the overlay stuff?
02:07 JayFoxRox: I don't even remember :P I got distracted with other stuff. I have enough of it working to be useful anyway, but I still don't fully understand all of it
02:08 imirkin: https://github.com/envytools/envytools/blob/master/hwtest/xf.cc
02:08 imirkin: this handles all the opcodes
02:09 imirkin: hmmmm doesn't do the actual color output though
02:11 JayFoxRox: yeah, I'm wondering what the storage is between VS / FFP and reg combiners
02:12 JayFoxRox: m`wk once responded that he didn't know what the reg size is in the tex shaders / reg combiners already (if I remember correctly); but I knew he worked on the VS stuff - so I was hoping he knows how they end up in GPU cache (?)
02:13 JayFoxRox: we have a problem in XQEMU where OpenGL geometry shaders only allow 128 output components, but our output vertices are 34 components, and some of our GS emit 5 vertices (so a total of 170 output components; breaking the limit)
02:14 JayFoxRox: we do use vec4 for the 4 colors, so if we can reduce those 16 components to 4 components, then we will be at 22 components per vertex, for a total of 110 components
02:15 JayFoxRox: actually that might not even work, now that I think of it. because of interpolation?
02:19 imirkin: errrr
02:19 imirkin: what hardware is the limit 128 on?
02:20 imirkin: afaik the lowest limit is 1024
02:21 imirkin: the limit is 128 output components PER VERTEX
02:22 imirkin: but then a single shader can emit at least 1024 (subject to the max_vertices declaration)
02:22 imirkin: GL 3.2 requires MAX_GEOMETRY_TOTAL_OUTPUT_COMPONENTS to be >= 1024
02:23 imirkin: and yes, you have to stick 32-bit float values into GL components if you want linear/perspective interpolation to work
02:23 imirkin: JayFoxRox: --^
02:25 JayFoxRox:adjusts his imaginary glasses
02:25 JayFoxRox: let me look at the GL spec again
02:25 imirkin: https://people.freedesktop.org/~imirkin/glxinfo/
02:25 JayFoxRox: we did not get confirmation the actual values are this low by the way - because windows people traditionally suck at helping you to figure out problems ;P
02:25 imirkin: you can see the value reported there
02:26 JayFoxRox: only intel windows is affected
02:26 imirkin: ok, well, intel windows is buggy as shit
02:26 imirkin: also note that intel windows doesn't support geometry shaders at all on sandybridge
02:26 imirkin: while the linux drivers support it
02:27 imirkin: also, geometry shaders on intel are variously tricky and you can run into trouble even without any of this
02:28 imirkin: it probably took the linux intel team a year to get it all right.
02:28 JayFoxRox: "MAX_GEOMETRY_OUTPUT_COMPONENTS" is at least 128 according to GL spec. and the wording makes it sound like it's output components total. however, you are right in that the minimum for "max vertices" is at 256. which implies this is the ouput count per vertex?
02:28 imirkin: yes.
02:28 imirkin: it matches MAX_FRAGMENT_INPUT_COMPONENTS
02:28 JayFoxRox: I'm confused now, because the GL wiki also said the component limit is overall (https://www.khronos.org/opengl/wiki/Geometry_Shader#Output_limitations)
02:29 JayFoxRox: ohh wait TOTAL_OUTPUT
02:29 JayFoxRox: yeah.. I'm an idiot. you are right
02:29 imirkin: everything they say seems right.
02:30 imirkin: so in practice, if you do something like output a gl_Position, that actually takes up some varyings
02:30 imirkin: out of the 128 limit
02:30 JayFoxRox: yeah, ignore what I originally said - I simply didn't see there were 2 output limits
02:30 imirkin: yeah
02:30 imirkin: it's confusing.
02:30 JayFoxRox: I always skipped the "TOTAL_" part
02:31 imirkin: what's more confusing is how the adreno gpu designers thought ignoring the GL/DX "way" of doing geometry shaders was a good idea
02:31 imirkin: and instead they have their own model
02:31 JayFoxRox: the wording in the spec could be better imo and I'd probably have seen the correct one (reading the tables with my head tiltet sideways propbably had my thinking fluid in the wrong spot too)
02:31 imirkin: which requires O(N^2) invocations in the general case
02:31 imirkin: as if it wasn't enough of a bottleneck already.
02:32 JayFoxRox: I'm not too informed about any of this; but yeah, I've heard GS are a trainwreck in general. We need them to emulate GL_QUADS and friends
02:32 imirkin: what's wrong with trifan?
02:32 imirkin: [does that not work? i forget]
02:33 JayFoxRox: you'd have to loop over the draw calls on the CPU, wouldn't you?
02:33 imirkin: no
02:33 imirkin: just s/GL_QUADS/GL_TRIANGLE_FAN/
02:33 JayFoxRox: but if you have 8 vertices... ?
02:33 imirkin: might screw up the winding.
02:33 imirkin: oh. then you're in trouble.
02:33 imirkin: ;)
02:34 JayFoxRox: right :P and then you'd have to loop on the draw call ;)
02:34 imirkin: i wonder if tessellation shaders would be more performant. that'd be funny if they were.
02:34 imirkin: wait, so how does it work now? what do you feed it in as?
02:35 imirkin: you must do some sort of transformation...
02:36 JayFoxRox: now.. if the GS limit is not the issue, I wonder why we run into trouble then :( however, that's probably even more offtopic then we already are. if anyone cares; we get this behaviour: https://github.com/xqemu/xqemu/issues/90 (we got a second report on IRC earlier, which will hopefully be added to the issue soon)
02:36 JayFoxRox: imirkin: yes.. with the GS?
02:36 imirkin: no, i mean on the vertices
02:36 imirkin: i.e. what vertices do you feed to glDraw
02:37 JayFoxRox: whatever the virtual nv2a GPU receives
02:37 imirkin: so basically the GL_QUADS vertices?
02:37 imirkin: so it goes 1 2 3 4 1 2 3 4 ?
02:37 JayFoxRox: yes, but instead of GL_QUADS we use some of the GS types (I'd have to look up which one makes sense). and then we emit 4 vertices or so
02:38 imirkin: that's what i mean - what primitive do you feed it in as?
02:38 JayFoxRox: this is our code https://github.com/xqemu/xqemu/blob/master/hw/xbox/nv2a/nv2a_shaders.c#L72
02:38 imirkin: anyways, if you want me to look at that issue, i'll need to see the shaders
02:38 imirkin: mmmm
02:38 imirkin: i don't think line_adj is what you want
02:38 imirkin: i thought about that first
02:38 imirkin: but i don't think it works...
02:39 JayFoxRox: the issue is probably not related to that code by the way - I had just assumed it's a problem with GS, and we used more than 128 components, which made me look into it. but this is probably not it
02:39 imirkin: sure.
02:39 JayFoxRox: our GS code works fine btw - so that's not the topic here
02:39 imirkin: anyways, if you can get me the shaders that don't link
02:39 imirkin: i can provide an opinion as to why
02:40 JayFoxRox: yeah, I'll try to make that happen tomorrow - but it's almost 5AM now. there's already a pastebin with a handful of shaders, but someone has to clean it up before it goes into the issue
02:42 imirkin: k
02:43 JayFoxRox: regardless of all this offtopic drama, I still want to know the VS output format ofc ;)
03:00 mwk: JayFoxRox: the VS outputs are in uint8_t for colors, yeah
03:01 mwk: there's a final conversion step that is not yet hwtested
03:01 mwk: which stores the outputs in the register file
03:02 mwk: I have mostly figured out that stuff, but it's only in my private notes so far
03:52 art10001: Greetings
03:52 art10001: I have a question
03:53 art10001: How would I set anisotropic mip filter optimization to off?
03:54 art10001: The setting exists on windows but not on the propietary blob
03:54 art10001: Could nouveau have it? I have searched but there's barely any info
03:55 imirkin: glTexParameter(GL_TEXTURE_MAX_ANISOTROPY_EXT, 0)?
03:55 imirkin: not sure what you're looking for ... are you writing an application?
03:56 art10001: Making one work
03:56 art10001: There are glitches without that setting
03:56 imirkin: and the application enables ansiotropy, and you to override that?
03:57 art10001: It is obscure and does not say if it has it on or not
03:57 art10001: Yes, override it
03:58 imirkin: i don't see an option to do that
03:58 imirkin: if you want, it's an easy edit to make
03:58 imirkin: https://cgit.freedesktop.org/mesa/mesa/tree/src/mesa/state_tracker/st_atom_sampler.c#n202
03:58 imirkin: just set that to always 0
04:01 art10001: Thank you
04:02 art10001: Wow
04:02 art10001: Nouveau supports so many options
04:03 imirkin: it's all part of how GL works
04:03 imirkin: the code i pointed you to is generic
04:03 imirkin: shared by all the gallium drivers
04:03 imirkin: GL is complicated.
04:03 imirkin: lots of options.
04:04 imirkin: i'd guess if you're seeing glitches as a result of anisotropy, then the max_lod setting is wrong on some samplers
04:04 imirkin: and it's accessing texture data from levels that haven't been properly set up
04:04 art10001: Another setting is "trilinear fi
04:05 art10001: oops
04:05 art10001: Accidentally tapped enter
04:05 imirkin: some of this texture quality stuff is not accessible at all
04:05 imirkin: i.e. we can configure it in various ways
04:06 imirkin: we just don't have a way for the user to control it
04:06 imirkin: so we pick some defaults
04:08 art10001: I see
04:16 rhyskidd: is there someone with a multi-gpu setup that could run a quick demmt test?
04:16 rhyskidd: https://github.com/Echelon9/envytools/commit/47a534c833ef16888699b1ee9fbed1b52cf14004
04:16 rhyskidd: i want to confirm the "cnt: 0x00000002" (if two gpus)
04:18 imirkin: you mean mmt...
04:18 rhyskidd: yeh, mmt
04:19 art10001: Wave
10:45 someosdev: rhyskidd: I've got an old mmt laying around of my double G84
10:47 someosdev: The output of the 0x00800280 method seems to depend on wether SLI mode is enabled or not.
11:03 rhyskidd: someosdev: so you're seeing a value of 2 for 0x00800280 on the system with two physical gpus, but only when SLI is enabled?
11:05 someosdev: I did a mmt of xorg and the value seems 0x2 in SLI mode and 0x1 in normal mode. Do you want the complete mmt?
11:05 rhyskidd: yes please
11:06 rhyskidd: so that's good -- perhaps the mthd is better named NVRM_DEVICE_GET_NUM_SLI_GPUS?
11:09 someosdev: Yeah that makes more sense. However if there are 3 GPUs and only 2 can be used in SLI mode, how does the userspace driver know which cards to use for SLI?
11:09 someosdev: I'd assume some sort of mask
11:10 someosdev: Here's the mmt without SLI: https://ufile.io/qcinh
11:10 rhyskidd: i've only seen integer values for that mtd, doesn't look like a mask or bitfield
11:13 someosdev: Yeah the values 0x1 and 0x2 wouldn't make sense as a mask. However I'd assume there is a more sophisticated way for getting the SLI configuration info than just the number of SLI GPUs.
11:16 someosdev: The SLI mmt: https://ufile.io/q3zw1
11:16 someosdev: Maybe it has to do sth with the 0x00000202 method that follows afterwards?
11:18 someosdev: And the 0x00000201 method
11:51 rhyskidd: i'll keep looking
11:51 rhyskidd: thanks for the mmt's
11:59 someosdev: You're welcome!
12:06 someosdev: Has anyone traced the access of nvrm to the PCI-PCI bridge both cards are connected to? I think it is needed for SLI.
14:46 Kevlar_Noir: hi
14:46 Kevlar_Noir: I've jut installed the nouveau driver
14:47 Kevlar_Noir: I've got screen tearing, do I need a 20-nouveau.conf with specific parameter ?
14:47 Kevlar_Noir: do I need to make something for KMS ?
14:47 Kevlar_Noir: I'm using debian buster
14:48 Kevlar_Noir: thank you by advance
14:49 Kevlar_Noir: I'm using compton
14:58 Kevlar_Noir: problem solved
14:58 Kevlar_Noir: I edit the /etc/initramfs-tools/modules
14:58 Kevlar_Noir: no... in fact I still have tearing
22:14 Subv: hey, in GF100+, is there any particular reason why pushbufs would not get executed by the GPU immediately after being submitted? like some kind of queue that needs to be manually flushed or something