00:19imirkin: grrr. i hate it when i go 'how did this work before?' only to discover that it didn't work before =/
00:47william_: anyone working on SLI?
00:47pmoreau: No one as far as I know, and really few devs have access to a SLI setup.
00:48pmoreau: Apparently my laptop can do SLI, but… I have no time to look into it.
00:48william_: ok then
00:49pmoreau: If you want to have a try, you are really welcome! :-)
00:49william_: what do we know about SLI
00:50pmoreau: They are some regs that have been identified as being used by SLI, and in some commands you can specify a card ID.
00:51william_: tell or show me more
00:53pmoreau: Best is to grab the whole repo, and grep for SLI :D
00:53pmoreau: There are some mentions in the GPIOs part as well
00:54imirkin: the biggest challenge with SLI is actually the question of how to use it effectively
00:57william_: i don't know much but i know about this project or GPU
00:58william_: but guessing the SLI is some kind of inter processor communication
01:00imirkin: SLI enables you to send a single command stream to multiple GPUs, but have each one only execute some of those commands
01:01imirkin: presumably one might tell one GPU to render one half of a buffer, and the other GPU to render the other half
01:02imirkin: but like i said, the biggest issue is that it's unclear how one would actually make effective use of SLI in the first place
01:02william_: how do we render buffer at the present time?
01:03imirkin: ... by sending commands to the gpu?
01:04imirkin: not sure what you're asking, but more importantly, not sure what your goal is
01:06william_: find out more about nouveau and to have a greater understanding how GPU work. so i can be able to submit code for this Project
01:06imirkin: pick a smaller task than "SLI"
01:06william_: and get SLI support
01:06imirkin: here are some ideas: https://trello.com/b/ZudRDiTL/nouveau
01:09william_: when you say command buffers are you basically copying a program or straight up telling the gpu to do something
01:10imirkin: is there a difference?
01:10imirkin: unfortunately this stuff isn't super-well documented... there's no great "for beginners" type of thing
01:11imirkin: take a look at https://github.com/pathscale/pscnv/wiki/PFIFO
01:12imirkin: there's also a lot more at envytools.rtfd.org
01:17william_: when i mean straight up telling the gpu does it have a predefined instructions that says draw polygon or do you have to upload a program to it to make it draw polygons(similar to OpenCL)
01:18william_: ignore that. just got it
01:18imirkin: nvidia hardware maps fairly nicely to the OpenGL pipeline
01:20william_: so opengl commands are program when on the GPU?
01:21imirkin: this is a map of the pipeline... as you can see, it's pretty involved: http://www.seas.upenn.edu/~pcozzi/OpenGLInsights/OpenGL44PipelineMap.pdf
01:25william_: ok...... so opengl commands (Like glBindTexture) to GL pipeLine (shi+ ton of stuff) to GPU command buffer
01:28imirkin: not *exactly* but... sort of close enough.
01:30william_: I guess shaders just get put in GPU command buffer after getting compiled
01:31imirkin: no, they get uploaded to memory and then a reference to where they are in memory is given in the command buffer
01:35william_: if programs similar are to command buffer. then can we just do some kide of parallelization between the GPU?
01:35william_: *Automatic parallelization
01:36william_: and use SLI as a bridge?
01:40karolherbst: imirkin: there is also some kind of SLI between different gpus, like on pmoreau laptop :/
01:41imirkin: karolherbst: that's not SLI
01:41imirkin: karolherbst: that's just 2 GPUs
01:41karolherbst: imirkin: sure it is :p
01:41william_: nvidia optimus?
01:41karolherbst: hybrid SLI
01:41imirkin: SLI is when you have 2 GPUs linked up and able to determine whether they are the SLI slave or master
01:42karolherbst: ohh wait, this is for the newer one
01:42imirkin: william_: "automatic" parallelization is tricky -- how do you do it? i have no clue what the right strategy is.
01:43karolherbst: ohh no, this is the right one
01:43karolherbst: it is just not implemented on mac os x
01:44karolherbst: pmoreau: maybe your gpus really don't support it after all :/ nvidia is kind of vague on this one
01:44karolherbst: but generally the technology is there
01:44Hoolootwo: it says it is not supported on hardware
01:45Hoolootwo: but that OS X does its own thing to accomplish similar things
01:45karolherbst: Hoolootwo: no, this is about Hybrid SLI
01:46karolherbst: like you have two different nvidia gpus: one chipset gpu (embedded on the motherboard chip) and one discrete one
01:46karolherbst: and both can do some SLI together to maximize performance
01:46karolherbst: as it seems there is only a limited set of gpus, which can actually do that
01:47Hoolootwo: hmm okay
01:47karolherbst: william_: first the SLI bits have to be understood before anyone can think about how to schedule all the commands between gpus
01:48william_: raise Your Hand if you got sli graphics card?
01:48william_: I don't
01:48imirkin: karolherbst: SLI bits are understood.
01:48karolherbst: me neither
01:48karolherbst: imirkin: really? like in nouveau could actually use all SLI connected cards?
01:49imirkin: karolherbst: we know how to link GPUs together, we know how to send them commands s.t. only the master or slave processes them
01:49karolherbst: I see
01:49imirkin: but how to make effective use of that? who knows :)
01:49karolherbst: so what's missing is just testing things out on actual SLI setups?
01:50imirkin: no, what's missing is making use of that knowledge
01:50karolherbst: I see
01:51imirkin: karolherbst: http://envytools.readthedocs.org/en/latest/hw/fifo/dma-pusher.html#nv4-sli-conditional-command
01:54imirkin: but... what do you do? upload all the same resources to both GPUs and then have each one render half the image?
01:54imirkin: i won't even mention multi-stage renders...
01:56william_: is the command Scheduler in the Mesa nouveau?
01:57imirkin: not sure what you're referring to
02:02william_: ignore that. found what i mean
02:04william_: is there any facility to mine map this problem of SLI
02:06Psy-Q: i'll put an easy switch-between-nouveau-and-nvidia script on that other machine :D
02:06Psy-Q: i love debian for the alternatives system
02:09william_: what Licence is (nouveau) it under
02:09william_: don't know why I did the brackets
02:11hansg: imirkin, hi, would now be a good time to talk about opencl support ?
02:15william_: any news on the release date of Vulkan API recently?
02:39william_: is thay a VOIP for Nouveau Devs
02:41hansg: imirkin, I have to temp. drop of irc, I'll be back in a bit
02:46william_: fo we have a GPU command list
02:52william_: how is Memory reclocking doing
02:53RSpliet: william_: fine, thank you. How is rasterizer today?
02:54RSpliet: (sorry, that one was really hard to resist :-D what do you want to know about it that isn't on Trello?)
02:56william_: cant find any documentation on envytools for Memory reclocking
02:56william_: am I blind
02:57RSpliet: that's because it's hardly understood, nouveau just tries to mimick the blob as well as possible
02:58RSpliet: it doesn't make sense writing documentation "bit X in VBIOS table Y means bit Z on register R must be set to Q", that doesn't provide any insights over the code
02:59RSpliet: hence: we only have a little bit of docs related to PLL control registers (nice diagrams for GT21x) and a lot of code in the kernel and envytools
03:00william_: can you give me a ponter
03:00william_: to them
03:01RSpliet: kernel: drivers/gpu/drm/nouveau/nvkm/subdev/fb/ram*.c , drivers/gpu/drm/nouveau/nvkm/subdev/fb/*ddr*.c , also see spec sheets from ram vendors to find documentation to understand the meaning of the registers generated in the latter set of source files
03:01RSpliet: (e)MR values are well documented
03:02william_: Ok thank you
03:02RSpliet: then there is [...]nvkm/subdev/bios/timing.c and [...]nvkm/subdev/bios/rammap.c for the VBIOS bits
03:03RSpliet: and those diagrams I told you about are on http://envytools.readthedocs.org/en/latest/hw/pm/gt215-clock.html - but not so relevant for Fermi and newer
03:03william_: any more for any more
03:04william_: what about Maxwell
03:04william_: try my luck
03:49RSpliet: hansg: it's always a good time to discuss OpenCL
03:49RSpliet: pmoreau: wake up :-P
04:23william_: is the vbois the same as the flacon firmware?
04:28william_: any one home
04:32william_: a program to sign fuc firmware. are you interested
05:33pmoreau: RSpliet: :p I had to switch to Windows… and I don't have an IRC client installed there… :/
05:34pmoreau: RSpliet: hansg wants to work on OpenCL? :-)
05:35pmoreau: We did some testing with Samuel yesterday evening/night, and… the compute support on NV50 needs some more work.
05:35hansg: pmoreau, AFAIK there is no compute support at all on any nouveau cards ?
05:36pmoreau: hansg: There is some for Fermi and Kepler, quite sure hakzsam used it when working on MP counters
05:36hansg: And yes the plan is for me to look into opencl support the coming weeks / months. I still need to get started though
05:36pmoreau: That's great to hear! What family will you be looking at?
05:36pmoreau: s/great/super awesome
05:38hansg: Hmm, interesting. So skeggsb and Lucas Stach want to leverage the existing opencl llvm frontend / compiler code and add a llvm backend which generates tgsi which can then be fed to the existing tgsi to nvidia-streamprocessor compiler code
05:38hansg: But I still need to talk to imirkin about this who may have other plans
05:38hansg: And I was not aware of the existing effort, I guess the existing effort does not use the llvm frontend ?
05:38pmoreau: What I started to work on, was taking a SPIR-V binary and converting it to NV50 IR
05:39pmoreau: SPIR-V binary which could either be provided by the application, or generated by LLVM (as Khronos is working on a LLVM IR <=> SPIR-V backend).
05:40hansg: pmoreau, yes I've heard of that effort, but that seems to be somewhat orthogonal, also why do native spir-v to nv50 and not do llvm-ir -> tgsi and then use existing tgsi -> nv?? code
05:41hansg: Anyways so this is where I'm currently add, my plan is to talk to various people, see what the best way forward is, and then make an actual real plan. IOW atm I have nothing.
05:42RSpliet: hansg: given where OpenCL is going (shipping SPIR-V "binaries" rather than source-based OpenCL kernels), I think starting off from SPIR-V is not such a bad idea
05:43pmoreau: RSpliet: That's one of the reason why we decided (with imirkin) to go down that road. However, when the LLVM IR <=> SPIR-V backend lands, it should be able to handle SPIR-V binaries as well.
05:44pmoreau: But you avoid one a few extra translations: SPIR-V => LLVM IR => TGSI => NV50 IR, down to SPIR-V => NV50 IR
05:44pmoreau: s/one a/a
05:45RSpliet: I wonder what the added benefit for TGSI is in the whole compute pipeline, given how compute kernels are generally not comprised of 4-wide SIMD instructions and they are well hard to optimise :-)
05:46pmoreau: Right, TGSI is 4-wide
05:46pmoreau: Does it handle multiple entry points and modules comprised of multiple program types?
05:47pmoreau: There's probably a way to get around that though
05:48hansg: If I've understood Ben and Lucas correctly the main reason for doing a tgsi backend for llvm is that it should give us working opencl support for all (new enough) nouveau cards with relatively little effort, while at the same time also adding opencl support to etnaviv (one of the reasons why Lucas is advocating this approach)
05:48pmoreau: hansg: How do you define *new enough*? :-)
05:48hansg: Having a spir-v->nv50 compiler does not help for nvc0, nve0, etc. Sure it can be used as a base, but the long way around through tgsi theoritically gives us support for opencl on a much wider range of cards
05:49pmoreau: I hope Tesla cards are still in that definition :D
05:49RSpliet: I expect there might instead be a TGSI->SPIR-V translation in the nearby future
05:49pmoreau: hansg: Well, the NV50 IR is used for NVC0+ cards IIRC, but then get lowered to the specific families
05:49RSpliet: and... oh god, too many IR's, NIR->SPIR-V?
05:50pmoreau: Like, there is no nvc0_ir_from_tgsi.cpp I think
05:50pmoreau: But there are nv50_lowered_nvc0.cpp or similarly named files
05:50hansg: I'm to new to this to proper define new-enough at this time, new-enough means that they at least need to support the necessary control flow ops.
05:50RSpliet: hansg: with the Cuda toolkit you can do OpenCL from NV50 onwards
05:52RSpliet: last time I checked, none of the families had support for anything newer than OpenCL 1.1, although I suspect that this outmodedness is mostly a political/resource motivated status
05:53pmoreau: RSpliet: The Titan X advertises OpenCL 1.2 with the latest driver version \o/
05:53RSpliet: ooo, progress! :-)
05:54pmoreau: hansg: You have nv50_ir_target_gm107, nv50_ir_lowering_gm107, nv50_ir_emit_gm107, and same for gk110, nvc0 and nv50.
05:55RSpliet: hansg: orthogonally, you might want to talk to imirkin about his business around getting atomic ops and memory fences working. That would benefit OpenCL support quite a bit as well
05:56pmoreau: hansg: So my understanding, is that the NV50 IR is then adapted to each family later on. I'm really (trying) to do the same steps as the current code using TGSI, except I take SPIR-V as input.
05:56pmoreau: And hakzsam as well, with his MP counters
05:57pmoreau: (Except he is off until Sunday)
05:59pmoreau: It could be a good idea to have a discussion with everyone about OpenCL then, to converge on some path to follow. It would be stupid to be working on two different translation steps for compute, especially given how many developers there are on Nouveau. :-)
05:59pmoreau: I'm ready to go the LLVM IR -> TGSI path if we find it's a better solution.
06:00RSpliet: pmoreau: I think your SPIR-V work is not going to be a waste of time regardless
06:01RSpliet: (with vulcan round the corner, that is alleged to reveal SPIR-V as it's preferred intermediate right?)
06:01pmoreau: ;-) Right
06:01pmoreau: And I did learn a few things in the process.
06:01RSpliet: I see SPIR-V as the most generic and standards-backed IR to hand-over between (GL, CL, Vulcan) front-ends and device-specific back-ends
06:01pmoreau: (And given how few progress I made, it's not like it's almost working)
06:03pmoreau: RSpliet: (Pssst, it's Vulkan, not Vulcan)
06:03RSpliet: I don't care if it were Klingon, we understand each other right? right? :-D
06:04pmoreau: How is Fermi going? I saw you commented on the Trello about it yesterday?
06:08hansg: pmoreau, I agree that it would be good to get everyone involved together for an irc meeting to discuss this and come up with a plan. This weekend I'm away with my family though, so I cannot make Sunday. I can probably do an irc meeting the weekend after that, or during pretty much any time between 8:00 and 22:00 CET on weekdays, except for Tuesday
06:09pmoreau: hansg: IRC meeting, or mails, though IRC might be better
06:10hansg: pmoreau, can you setup a doodle for an irc meeting for this (you likely know better whom to invite then I do)
06:11hansg: <pmoreau> But you avoid one a few extra translations: SPIR-V => LLVM IR => TGSI => NV50 IR, down to SPIR-V => NV50 IR
06:11pmoreau: hansg: It might be easier to handle time ranges per mail (over doodle)?
06:12hansg: Getting back to that remark, yes you avoid some extra steps, at the cost of having to effectively write your own compiler where as with llvm in the middle most optimizations steps will be done by llvm and you can just write a relatively simple llvm backend
06:12RSpliet: pmoreau: slowly but steadily
06:13hansg: pmoreau, Scheduling the meeting via email works for me too. Can you do an "invite" mail with some suggested time slots ?
06:13RSpliet: hansg: keep in mind that SPIR-V maps onto LLVM by design
06:13hansg: RSpliet, wouldn't that be all the more reason to not cut llvm out of the loop ?
06:14pmoreau: hansg: IIRC, SPIR-V binaries can have been optimised already by the OpenCL/CUDA => SPIR-V compiler (to a certain point). And if you take the OpenCL/CUDA code directly, you will be going through LLVM anyway to generate the SPIR-V.
06:14RSpliet: hansg: but what is the need for LLVM-IR? :-)
06:14RSpliet: I think we should aim for cutting TGSI out of the loop personally
06:15hansg: Heh, I keep hearing conflicting advice on this from various people, and I'm in no way an expert on this myself. I think we should just have a meeting with all the right people there, and then decide what is the bets way forward
06:16pmoreau: I'm going to send an email
06:16hansg: pmoreau, thanks, please be sure to also include skeggsb and Lucas Stach, they are the ones advocating the llvm tgsi backend approach
06:16pmoreau: I was thinking of sending it to Ben, Ilia, Samuel, Roy and Hans. Anybody else I have forgotten?
06:16hansg: Lucas :)
06:16pmoreau: Yep :D
06:17RSpliet: oh and Francisco might be interested
06:17pmoreau: And Francisco
06:17RSpliet: hell, invite Martin as well, he can always decline :-P
06:17pmoreau: Do I include some of the clover guys as well?
06:17RSpliet: Tom Stellard?
06:17pmoreau: Yeah and EdB
06:18pmoreau: Maybe some of the NVIDIA guys? IIRC, they were looking at CUDA support for Tegra
06:18RSpliet: ... mlankhorst any opinions? :-P
06:19pmoreau: Like the whole gpu-doc list or just Alexandre, Thierry
06:19hansg: You can always add Alexandre Coubot to the list, or maybe just Cc the malinglist ?
06:19pmoreau: And if someone thinks someone else should be added, he can just add him to the mail
06:43EmperorDAZ: Kepler (at least GTX760) has some funny issue with monitor going to sleep
06:44EmperorDAZ: once it sleeps, lspci seems to break, monitor never wakes up and shutdown never happens
06:44EmperorDAZ: I tried swapping with a 9600GT/ G94 gpu and none of those issues happen again
06:48karolherbst: EmperorDAZ: can you ssh into the machine and check if there are any errors in dmesg?
06:49EmperorDAZ: dmesg is clean
06:49EmperorDAZ: with G94
06:49EmperorDAZ: the only line that might be something is this:
06:49EmperorDAZ: [ 950.665899] perf interrupt took too long (2509 > 2495), lowering kernel.perf_event_max_sample_rate to 50100
06:50EmperorDAZ: other than this line, dmesg has no references to nouveau
06:50EmperorDAZ: waking up the monitor works flawlessly
06:50EmperorDAZ: seems to be Kepler bound
06:50karolherbst: EmperorDAZ: with the kepler of course
06:51karolherbst: doesn't make sense to check with the g94 running for errors with the kepler one ;)
06:51EmperorDAZ: I do have an old dmesg log
06:51EmperorDAZ: of the kepler
06:51EmperorDAZ: I can plug back the kepler if its of significance
06:52karolherbst: is there something important in the log?
06:52EmperorDAZ: it mentions this
06:52EmperorDAZ: [ 360.173837] [<ffffffffa054f80c>] nouveau_pmops_runtime_suspend+0xcc/0xf0 [nouveau]
06:52EmperorDAZ: call trace
06:52karolherbst: this error again
06:52EmperorDAZ: I can upload the entire dmesg
06:53karolherbst: nah, it's fine
06:53karolherbst: I know what this is about
06:53EmperorDAZ: I have an old LGA775 mobo with cpu that I dont have any use, I might set it up and plug the kepler on it just to run test and patches if desired
06:53karolherbst: imirkin: do you think X might try to turn of the gpu when it stops to push stuff to the displays?
06:54EmperorDAZ: I do not run an X server fyi
06:54EmperorDAZ: its just there for me to have display because the motherboard has no video out (VGA, etc)
06:55EmperorDAZ: I use it mostly through ssh
06:55EmperorDAZ: dont know if that above might be relevant or not
06:59karolherbst: EmperorDAZ: and what gets displayed on the display?
06:59EmperorDAZ: console mode
06:59EmperorDAZ: as for if you mean if I get any garbage, no, I dont get any garbage
07:00EmperorDAZ: everything is fine until the monitor goes to sleep
07:00karolherbst: yeah, but you said the monitor never wakes up?
07:00EmperorDAZ: permanent black screen
07:00karolherbst: anything before that is kind of unimportant for this issue
07:00EmperorDAZ: the monitor displays the yellow light
07:00EmperorDAZ: (no signal)
07:01EmperorDAZ: machine is usable through ssh
07:01EmperorDAZ: lspci will hang forever, not even with SIGKILL it will die
07:01EmperorDAZ: pretty much the only option to do after executing that through ssh is to close it and open another session
07:02EmperorDAZ: and even after that, "ps aux" will show the lspci process
07:03karolherbst: imirkin: do you think it might be possible, that nouveau tries to turn the card off, when there is no display attached (or turned off/sleep state), when there is nothing going on with the card?
07:06RSpliet: karolherbst: isn't that the purpose/point/goal of prime?
07:10pmoreau: RSpliet: Rather, the goal of vgaswitcheroo
07:10karolherbst: on a desktop system?
07:11karolherbst: I mean okay, it would be still nice to be able to turn the card off, when there is no real user
07:12pmoreau: I have no idea how vgaswitcheroo works on a desktop system, as it usually activates if two cards register themselves.
07:13karolherbst: there is some general runpm stuff going on though
07:15pmoreau: hansg, RSpliet: I sent a mockup email to you. If you have any comments, feel free :-)
08:32imirkin: EmperorDAZ: i wonder if you're hitting the issue fixed by this patch: http://cgit.freedesktop.org/nouveau/linux-2.6/commit/?h=linux-4.3&id=f231976c2e8964ceaa9250e57d27c35ff03825c2
08:35EmperorDAZ: I am running 4.2.3
08:36EmperorDAZ: its probable but to really be sure I'd need to compile the latest kern
08:39imirkin: EmperorDAZ: i think that patch should apply ~cleanly to a 4.2 kenrel
12:25imirkin_: mwk: do you know offhand if you can use "interp flat" on nv50 for a varying in the non-flat varying range?
12:44mwk: imirkin_: IIRC that's a very bad idea
12:44imirkin_: that's what i figured =/
12:44mwk: *maybe* it could work with a varying marked nonperspective
12:44imirkin_: mwk: what about using the VP/GP result maps to broadcast one VP/GP result to multiple FP inputs?
12:45mwk: umm... should work, I think
12:45imirkin_: ok, i might try that then
12:45imirkin_: basically i'm trying to figure out how to get flatshading to work
12:45imirkin_: the SHADE_MODEL flat thing is total BS
12:46mwk: , alone, doesn't suffice
12:46imirkin_: i want to get it to work by binary-patching the shader
12:46imirkin_: there's also the situation that one color migth need one thing and the other color might need another
12:47imirkin_: which they don't super support since the two colors must be adjacent for the front/back replacement thing to work
12:48imirkin_: but i'm less worried about that use-case
13:06imirkin_: i guess i could get that to work by doing the replacement by hand, but that'd require more code in the shader.
13:24airlied: imirkin_: does the blob just not do it?
13:24imirkin_: airlied: blob fails in a lot of ways too
13:24airlied: so probably very few real apps ever use it
13:25imirkin_: airlied: but for plain flatshading i think it just moves the varyings into the flat "area" of the varyings
13:25imirkin_: airlied: however i'm attempting to be clever
13:25imirkin_: and want to avoid recompiling
13:25imirkin_: i've gotten it all to work on nvc0
13:26imirkin_: (where flat-shading does work, but if you flip to flatshading, you can't keep one of the colors interpolated, which is a thing with GL3)
15:42glennk: imirkin, blender vertex selections flip between flatshaded and per vertex, if you need an actual app test case, fwiw
15:43imirkin_: glennk: yeah, i know that blender is broken on nv50
15:44imirkin_: glennk: but it doesn't do crazy per-color overrides that GL3 allows
15:44glennk: right, like airlied mentioned, not a lot of apps relying on that
15:44imirkin_: probably just 1
15:44imirkin_: piglit :)
15:45imirkin_: anyways, i want to maximize the number of working cases without actually breaking my back from bending over backwards
15:45glennk: i don't think any apps use point sprite replacement past the first 20 interpolants either
15:46imirkin_: that's not an issue that i have
15:46glennk: which is a good thing since that totally doesn't work on radeons ;-)
15:46imirkin_: for nv50, i can sprite-replace anything i want
15:46imirkin_: for nvc0, we use PIPE_CAP_TGSI_TEXCOORD
15:46imirkin_: (in fact, that cap was created explicitly for nvc0)
15:46glennk: and it had like max 8 of them?
15:47imirkin_: which is what GL allows
15:47imirkin_: with ARB_multitexture or whatever
15:47glennk: well, no problem then for radeons, since those coords end up first
16:11imirkin_: i guess if blender switches back and forth between flat/smooth shading then my current binary patching approach isn't great, but.... meh
17:14imirkin_: alright... that should be better -- only do flatshading fixups when i can't use SHADE_MODEL to control it. that should make blender happier again.
17:18glennk: well, the switch happens at interactive rates
17:18glennk: ie its user initiated
17:18imirkin_: well, it's too late for me to de-optimize it :)
17:19imirkin_: i've already folded the change and pushed out to github :)
17:20imirkin_: of course none of this fixes my nv50 issues
17:20imirkin_: ugh. those result maps give me a headache.
17:32imirkin_: glennk: can you imagine that someone would flip a shader from per-sample shading to non-per-sample shading?
17:34glennk: can't think of any obvious case one would do so
17:34imirkin_: me neither
17:34imirkin_: beyond the initial use maybe
17:35imirkin_: in that case my nvc0 fixes should all be pretty painless unless you're explicitly doing something dumb
17:35imirkin_: solution to that problem: don't do dumb things :)
17:36imirkin_: so now i think nvc0 is correct *and* shader variant-less.
17:40glennk: well, patching the shader is a variant, just a cheap one
17:42imirkin_: i only keep one at a time though
17:42glennk: well, there's several in the pipeline thoug
17:45glennk: if you are drawing with smooth shading, then switch to flat, the gpu will use two separate shaders, or do you stall in between?
17:45imirkin_: effectively stall
17:46imirkin_: assuming that it's a shader that i have to patch on switch between shade models
17:46glennk: ah, hopefully doesn't happen too often then
17:46imirkin_: right, that's why i was asking
17:47imirkin_: i guess there are only up to 4 possible variations, i could store all of them
17:47glennk: or just write a copy and release the old one
17:48imirkin_: well that's what i do
17:48imirkin_: but any shader upload effectively is a stall
17:48imirkin_: have to flush the code
17:48glennk: oh, on radeons its coherent so its fine as long as its not executing that segment
17:50imirkin_: BEGIN_NVC0(nvc0->base.pushbuf, NVC0_3D(MEM_BARRIER), 1);
17:50imirkin_: PUSH_DATA (nvc0->base.pushbuf, 0x1011);
17:50imirkin_: after every upload
17:50imirkin_: whatever that means
21:38imirkin_: ugh. so many bugs.