01:08 imirkin: karolherbst: yes, for implement KHR_blend_equation_advanced
02:34 AndrewR: imirkin, thanks for reviewing patch fixing postprocessing shaders, now they appear to work \o/
02:34 imirkin: cool
02:35 imirkin: i'm surprised people use those
06:33 pabs3: pmoreau, imirkin: I managed to replace the old GPU with a friend's spare GeForce GT 740. seems to work with nouveau, anything I should know about the support for it?
07:12 pmoreau: pabs3: You might run into this bug: https://bugs.freedesktop.org/show_bug.cgi?id=93629 But other than that, Kepler is nicely supported, and you should get OpenGL 4.3 + reclocking.
07:24 pabs3: pmoreau: is using the firmware the default these days?
07:25 pabs3: ah, reading through the bug it looks like yes
07:28 pmoreau: By default it should be using Nouveau’s own firmware. Using NVIDIA’s one improves the situation for some people, but not everyone.
07:28 pabs3: hmm, ok
07:29 pabs3: guess I'll leave it running for a while to see if the issue comes up
07:29 pabs3: is there anything definitive (versions/etc) to trigger it or determine if I might be affected?
07:31 pmoreau: Not that I know of.
07:32 pabs3: ok, thanks a lot for all the info and the friendly atmosphere on this channel :)
07:33 pmoreau: You’re welcome :-) Hopefully you’ll have better luck with this card regarding Nouveau support.
07:34 pabs3: it is a lot faster than the other one, so that would be excellent :)
07:36 pmoreau: Well, going from a G98 to a GK106, right? And you should be able to get Vukan and the latest OpenGL someday with Nouveau.
07:39 pabs3: it was GT21x but yeah G98 seems to be the same family
07:39 pabs3: great :)
07:40 pmoreau: Ah, right
14:33 pendingchaos: is the zero query used for PIPE_QUERY_SO_OVERFLOW_PREDICATE useful? it's 64 bit so it uses nouveau's fences for synchronization, not the zero query
15:54 imirkin_: pendingchaos: i don't understand your question
15:55 imirkin_: note that i'm somewhat rusty on all the query stuff, so you may just need to provide more back-story to jostle my memory
15:55 imirkin_: pabs3: the firmware thing has only been found to be helpful on *some* GTX 660's. i don't think this affects you.
15:55 imirkin_: pabs3: those GPUs tend to run into problems fairly quickly
15:56 imirkin_: pabs3: note that you should have working reclocking -- if you run into issues, definitely let us know.
15:56 pendingchaos: on line 248 of nvc0_query_hw.c, QUERY_GET is used for a zero query to write QUERY_SEQUENCE
15:56 pendingchaos: this would be used in nvc0_hw_query_fifo_wait(), but PIPE_QUERY_SO_OVERFLOW_PREDICATE is 64 bit
15:57 pendingchaos: so it uses nouveau's fences, not QUERY_SEQUENCE
15:57 pendingchaos: I'm wondering if I'm correct here and if there's some other use of it I've missed
15:58 imirkin_: it's likely that the logic has changed throughout the ages.
15:58 imirkin_: SO_OVERFLOW_PREDICATE was never used before
16:00 pendingchaos: also: what's the exact meaning of QUERY_GET's UNIT and MODE? they seem to specify what to write at QUERY_ADDRESS, but I don't know why it's split into two fields
16:02 imirkin_: i don't have a clean explanation
16:02 imirkin_: try not to think about it :)
16:02 imirkin_: iirc one is short vs long
16:02 imirkin_: short is a 32-bit value
16:02 imirkin_: long is a 128-bit value
16:02 imirkin_: the 128-bit value includes the first 64-bit which is a timestamp, followed by 64-bit which is the value
16:03 imirkin_: i don't understand this stuff nearly as well as i ought to though, sorry
16:03 imirkin_: it was all implemented before my time, and has, for the most part, just kinda worked.
16:03 imirkin_: iirc i added the hq->is64bit style of wait
16:04 imirkin_: my guess is that the original SO_OVERFLOW_PREDICATE impl wanted to rely on the "regular" wait style
16:05 imirkin_: feel free to rip the whole thing apart and redo it
16:06 imirkin_: (note that i also added the get_query_result_resource logic)
16:06 imirkin_: perhaps we can get rid of the nouveau fence-based wait entirely
16:07 imirkin_: just have to emit the sequence when using pipeline stats and maybe a few others.
20:37 pendingchaos: imirkin_: I think I've got PIPE_QUERY_SO_OVERFLOW_ANY_PREDICATE working
20:37 pendingchaos: what gpus should it be tested on?
20:37 imirkin_: GF100+
20:37 imirkin_: but ... it should all be the same
20:37 imirkin_: if it works on one, should work on all
20:38 HdkR: +1
20:38 imirkin_: and any differences can be dealt with later
20:38 imirkin_: on rare occasions it can happen that some difference comes up, but as you can see in that query code, other than the perf counters, it's all shared
20:45 pendingchaos: should I include an addition of PUSH_KICK in nvc0_hw_query_wait() in the patch? or should I post it separately
20:45 imirkin_: pendingchaos: i'd do it separately. your call.
20:53 pendingchaos: I think I've confirmed that ACQUIRE_EQUAL should be ACQUIRE_GEQUAL
20:55 pendingchaos: adding PUSH_KICK(); nouveau_fence_ref(nvc0->screen->base.fence.current, &fence); PUSH_KICK(); in nvc0_hw_query_fifo_wait() causes the semaphore to never trigger when using ACQUIRE_EQUAL, unlike ACQUIRE_GEQUAL
20:56 imirkin_: ok cool. that should definitely be a separate patch
20:56 imirkin_: basically when someone's thing breaks and they do a bisect, i don't want them to land on a large commit
21:05 pendingchaos: separate from the PUSH_KICK(), OVERFLOW_ANY_PREDICATE or both?
21:06 martjallalla: ouh hello cockblockers, do you have a master? masta is from estonia?
21:06 martjallalla: master cockblocker? estonian?
21:06 imirkin_: pendingchaos: everything separate
21:07 pendingchaos: *nods*
21:12 plutoo: does anyone know if const buffer is in dram?
21:12 plutoo: or if it's in some fast vram-thing inside the gpu?
21:12 imirkin_: dram != vram?
21:13 plutoo: s/vram/sram then:p
21:13 imirkin_: constbufs are ... special
21:13 HdkR: imirkin_: What is vram on a TX1? :P
21:13 imirkin_: in order to be used properly, they have to be uploaded in the cmdstream
21:13 imirkin_: the graphics unit will then stage those writes to the backing memory
21:13 imirkin_: but in a way that concurrent draws can have the same constbuf bound
21:14 imirkin_: but get different values
21:14 imirkin_: accessing these from shaders is meant to be pretty fast
21:14 imirkin_: but i don't have concrete metrics.
21:14 HdkR:confirms for very fast
21:15 imirkin_: it's a lot of memory, but it's shared across all threads
21:15 imirkin_: 64K * 16
21:15 imirkin_: i.e. 1MB
21:15 plutoo: but it's not dram?
21:15 imirkin_: constbuf data is ultimately backed by vram
21:16 imirkin_: however the way it's uploaded generally leads to it sitting in fast-ram
21:16 plutoo: right
21:16 imirkin_: i don't know how big the fast-ram is
21:16 plutoo: you bind a vram/dram ptr using CB_BIND
21:16 plutoo: but you can't write directly to that ptr, because there's a super highspeed cache that's inside the gpu
21:16 plutoo: ?
21:17 imirkin_: something like that.
21:17 imirkin_: you can bind a data buffer that was previously written to
21:17 imirkin_: and its data will get retrieved into the cache
21:17 imirkin_: i think as the shader uses it
21:17 imirkin_: not sure
21:17 imirkin_: in order for that to happen, iirc you have to do some kind of flush
21:18 plutoo: 0x43D has a bit to invalidate const buffers
21:18 plutoo: https://github.com/envytools/envytools/blob/master/rnndb/graph/gf100_3d.xml#L512
21:23 imirkin_: <reg32 offset="0x021c" name="MEM_BARRIER">
21:23 imirkin_: also there's that
21:33 pendingchaos: bugfix patches sent
21:35 plutoo: so as far as i can tell there are 5 const buffers
21:35 plutoo: each one has up to 16 "indicies" ?
21:36 plutoo: in the nintendo driver, all const buffers index 0 point to the same blob
21:36 plutoo: but all const buffer index 1 points to a per-const-buffer-region
21:41 imirkin_: plutoo: sounds about right.
21:41 imirkin_: there are 5 binding points
21:41 imirkin_: one for each shader stage
21:42 imirkin_: (compute sets up constbufs via a descriptor, and can only have up to 8 diff constbufs)
21:42 imirkin_: pendingchaos: those look good, i'll apply them tonight (not in front of a computer with keys right now)
21:45 plutoo: and the const buffer has a fixed layout? or is that programmable
21:45 plutoo: i'm seeing:
21:45 plutoo: 0x20 + 8*i: Textures
21:45 plutoo: 0x120 + 8*i: Images
21:45 imirkin_: plutoo: no, it can be whatever
21:46 imirkin_: but a particular driver will stick to something concrete for its "driver parameters" constbuf
21:46 imirkin_: since the shader will have to read it from a fixed location too
21:46 plutoo: what about texture description structs?
21:46 imirkin_: the only exception is that the first 8 words appear to have to be 0, 1, 2, 3, 4, 5, 6, 7 -- or else random things fail. no idea why.
21:47 imirkin_: i haven't played with it extensively
21:47 imirkin_: the TIC/TSC's live in buffers based on the TIC_ADDRESS and TSC_ADDRESS
21:47 imirkin_: the "texture" is a 32-bit thing that holds both the tic id and tsc id
21:47 imirkin_: which both index into those large tables
21:49 imirkin_: oh, unless LINKED_TSC is set
21:49 imirkin_: in which case it only reads the tic id, and assumes tic id == tsc id
21:50 pendingchaos: and feature patch sent
21:52 plutoo: is TIC_ADDRESS an offset into the const buffer?
21:52 plutoo: why is it 64 bit
21:52 imirkin_: TIC_ADDRESS is a vram address. it's 40-bit
21:52 imirkin_: or rather, it's a VA
21:52 imirkin_: vram or sysram, depending on what the PTE says :)
21:53 plutoo: so you're saying TIC/TSC are not in the constbuffer itself?
21:53 imirkin_: i am indeed saying that.
21:53 plutoo: nvm i should read better
21:55 imirkin_: pendingchaos: are you kidding me?! there was a single query for the "any" and i just missed it??
21:55 pendingchaos: it seems so? perhaps it should be tested with a pre-Pascal card
21:55 imirkin_: well, i have a fermi plugged in at home
21:55 imirkin_: so it'll be getting some testing ;)
21:56 imirkin_: given that it's f0, i wonder if it's a new thing
21:56 imirkin_: but we'll see
21:57 imirkin_: even if it's hw-specific, that's fine to enable partly too
21:57 plutoo: looks like TIC_ADDRESS is officially known as TexHeaderPoolOffset
21:57 imirkin_: sure, why not
22:19 plutoo: sometimes addresses are set to 4
22:19 plutoo: any clue what that means
22:19 imirkin_: what addresses?
22:19 plutoo: sorry, i mean methods that take gpu VA, sometimes gets value 4 written to them
22:20 imirkin_: like what methods
22:20 plutoo: instead of an actual gpu VA
22:20 imirkin_: (i'm guessing you're mistaken)
22:26 plutoo: yup
22:26 plutoo: it's 0x400000000, and it's being written to CODE_ADDR only
22:26 plutoo: i guess it might actually be a valid gpuva
22:27 imirkin_: that's a legal va
22:27 imirkin_: i'm guessing they put all the "huge" pages up there
22:28 plutoo: i think i got an address like this when i messed with gmmu allocator flags
22:28 imirkin_: yeah, so these gpu's support 4K pages as well as "large" pages. these can be configurable to either be 64K or 128K
22:29 imirkin_: with a global config switch :)
22:29 imirkin_: (thanks guys)
22:30 imirkin_: some later GPUs also support some even larger pages, but i forget if that's starting pascal or maxwell
22:30 imirkin_: might be pascal, with the 1MB pages
22:30 imirkin_: to match x86 cpu's
22:32 imirkin_: plutoo: btw, tell the switchbrew guys that they got TesselationMode slightly wrong
22:32 plutoo: i'm the switchbrew guy
22:32 plutoo: :)
22:32 imirkin_: you've been informed :p
22:32 imirkin_: https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/nvc0/nvc0_program.c#n308
22:33 plutoo: big_page_size appears to be 128KB on the switch, fwiw
22:34 imirkin_: sounds right.
22:40 plutoo: TesselationMode describes how to construct primitives from the vertex buffer right?
22:41 plutoo: or that might be VERTEX_ATTRIB_FORMAT
22:42 plutoo: i think it's both.
22:45 imirkin_: mmm ... from the uniform domain that is being tesellated
22:46 imirkin_: specifies how to assemble the evaluated points into primitives
22:50 plutoo: i wonder if instead of bit8, bit9 it's a single field bit8-9
22:50 plutoo: that specifies the size of a primitive, in no. of points
22:50 imirkin_: it most likely is :)
22:50 imirkin_: well no
22:50 imirkin_: connected vs points
22:50 imirkin_: and cw vs ccw
22:50 plutoo: so if you put triangle with 2 points per triangle, that effectively makes it connected
22:50 imirkin_: lol
22:50 plutoo: and if you put a line with 1 point per line, that is connected
22:51 imirkin_: that's not what those mean
22:51 plutoo: i need to work out with pen and paper if it works out
22:51 imirkin_: but thank you for playing ;)
22:51 plutoo: lol
22:51 imirkin_: isolines / tris / quads has to do with the domain being tessellated
22:52 imirkin_: the output of isolines are lines, the output of tris/quads are tris
22:52 imirkin_: if connected = false, then the output is actually the individual points
22:52 imirkin_: cw/ccw specifies the winding of the triangles
22:52 imirkin_: but it has no meaning for lines
22:54 plutoo: cw for tris permutes (a,b,c) -> (a,c,b)
22:54 plutoo: and for lines (a,b) -> (b,a)
22:55 imirkin_: no
22:55 imirkin_: it has to do with the tessellation results
22:55 imirkin_: basically it evaluates a bunch of points
22:55 imirkin_: and then it's a question of how to build triangles out of them
22:55 imirkin_: cw/ccw affects the order in which those points are picked to create triangles
22:56 plutoo: is this config only for tesselation shaders?
22:56 imirkin_: yea
22:56 imirkin_: of course
22:56 plutoo: oh
22:57 imirkin_: what would it even mean without tess?
23:00 plutoo: i thought it was also used to describe how to interpret points when assembling primitives
23:00 plutoo: in general
23:00 imirkin_: that's the primitive that's given when starting a draw
23:01 imirkin_: and also the geometry shader output format :)
23:01 imirkin_: which can effectively change it towards the end of the pipeline
23:01 plutoo: right
23:06 plutoo: updated the wiki
23:06 plutoo: now that's a mess haha
23:07 imirkin_: yeah sorry
23:07 imirkin_: ;)
23:08 imirkin_: next time i design that hw, i won't make it so confusing :p
23:10 imirkin_: plutoo: anyways, most of those methods are documented and/or used in nouveau
23:10 imirkin_: you should refer to rnndb and the nouveau codebase
23:10 imirkin_: i'd also recommend normalizing your numbering with nouveau's
23:10 plutoo: yeah i've peeked in nouveau quite a few times
23:10 imirkin_: since it just causes confusion for no reason
23:11 imirkin_: at least add a column that's like X*4
23:11 plutoo: yeah i'll do that this weekend or something
23:11 imirkin_: i dunno which view of it is "right", or if there's a naturally correct thing there
23:12 plutoo: official code uses *4 when calculating dynamic offsets, then divides by 4 as a final step
23:12 imirkin_: well, the encoding into the pushbuf requires various things
23:12 plutoo: does it?
23:13 plutoo: not on nvc0 i think
23:13 imirkin_: i just mean it's bit shifts
23:13 imirkin_: not divisions
23:14 imirkin_: http://envytools.readthedocs.io/en/latest/hw/fifo/dma-pusher.html#gf100-commands
23:14 imirkin_: so like the M's are the method id's
23:15 imirkin_: they live in various places, depending on the exact command type
23:16 imirkin_: in the pre-GF100 format, there tended to be 2 bits of padding at the end
23:16 imirkin_: in the GF100 format, most commands don't have those
23:18 plutoo: what are S and X?
23:18 imirkin_: X = ignore
23:18 imirkin_: S = subchan
23:19 imirkin_: C = count of methods to call
23:20 plutoo: man i feel like i'm staring at 100's of hours of work
23:20 imirkin_: probably.
23:21 imirkin_: it probably took me hundreds of hours to understand all this
23:21 imirkin_: so to figure it out in the first place was probably quite a bit more :)
23:23 imirkin_: skeggsb: any plans to refresh your fermi reclock series?
23:24 imirkin_: skeggsb: i have a Quadro 600 plugged in =]
23:24 skeggsb: imirkin_: you mean like, rebase it on latest code? i don't think that'll take much, i'll take a crack at it shorly if you need it
23:24 imirkin_: ideally with fixes...
23:24 imirkin_: but even a plain rebase would be nice if it's not a lot of trouble
23:24 skeggsb: the quadro 600 should work already, it's one of the boards i wrote it against...
23:25 imirkin_: should i expect a DP -> HDMI adapter to work in there?
23:25 imirkin_: (i haven't tried it, just want to know)
23:25 imirkin_: (passive)
23:26 skeggsb: i don't see why not
23:28 pabs3: imirkin_, pmoreau: after a short 0ad session and a day of idling in GNOME shell, no problems. so I think I'll switch over to that machine as my main system next reboot. thanks a lot for your nouveau work and help here
23:28 imirkin_: pabs3: glad it worked out ok
23:29 imirkin_: it's not too bad if you don't do anything too crazy
23:29 imirkin_: unfortunately "crazy" has become a lot harder to determine
23:29 imirkin_: as even the most basic desktop components think it's a good idea to do things with GL
23:29 imirkin_: pabs3: did the reclocking work ok?
23:29 imirkin_: i.e. were you able to reclock without crashing
23:30 pabs3: not sure how to check
23:30 imirkin_: then you probably didn't reclock in the first place :)
23:30 imirkin_: it's manual... cat /sys/kernel/debug/dri/0/pstate
23:30 imirkin_: that will show you a bunch of values, as well as the current clock state
23:30 imirkin_: you can echo those values to change the clock
23:31 imirkin_: should get you moar fps in 0ad
23:31 pabs3: it was pretty fast as-is, my monitor size isn't huge
23:32 imirkin_: probably boots to a middle-ish clock
23:32 pabs3: is the AC line the current clock state?
23:32 imirkin_: yes
23:32 pabs3: AC: core 324 MHz memory 648 MHz
23:32 pabs3: yeah, that is the lowest
23:32 pabs3: max is 0f: core 1071 MHz memory 5000 MHz
23:32 imirkin_: that should be faster.
23:32 imirkin_: because more mhz is more better :)
23:33 pabs3: so this? echo 0f > /sys/kernel/debug/dri/0/pstate
23:33 imirkin_: yup
23:33 imirkin_: save first
23:33 imirkin_: in case, you know, the unthinkable happens
23:33 pabs3: its running a test install, I'm on another machine :)
23:34 imirkin_: might want to wait until an opportune time when you don't mind the system crashing and burning
23:34 pabs3: yeah, I've a few things to do today before I get distracted by this :)
23:34 imirkin_: (reclocking involves disconnecting vram while settings are changed... if it doesn't get reconnected properly, the video card becomes unhappy)
23:36 pabs3: does removing the power solve that?
23:36 imirkin_: yes
23:36 pabs3: that is fine then
23:36 imirkin_: any reboot fixes it
23:36 imirkin_: just ... well you have to reboot :)
23:36 pabs3: ah good. not an issue here, I don't care about the machine until I move the hard drive from my primary install across
23:37 imirkin_: then flip it over, and, uh, test it, with 0ad
23:37 pabs3: will do. I better go, thanks for the help :)
23:37 imirkin_: good luck
23:41 pabs3: oh, do GPUs support dynamic reclocking based on usage like CPUs support frequency adjustment at runtime?
23:41 imirkin_: not sure exactly how CPUs work. GPUs allow adjusting a number of things, but they don't do this automatically - it's done by the operating system directly.
23:42 imirkin_: there are a number of things that can be adjusted - e.g. memory reclocks are fairly heavy
23:42 imirkin_: but there are also shader clocks and so on which can be changed much more straightforwardly
23:43 imirkin_: (thing is, memory is used for scanout, so you have to do it within a vblank period, OR ELSE)
23:43 skeggsb: imirkin_: updated, but not tested
23:43 imirkin_: skeggsb: ok thanks =]
23:44 imirkin_: do you have plans around finishing it up and/or upstreaming the initial version so people can play with it?
23:49 skeggsb: imirkin_: yeah, it's a matter of finding a decent chunk of time to do it is all
23:49 skeggsb: ie. in the very least make sure it doesn't regress kepler, more than making fermi work any better
23:50 imirkin_: yeah
23:50 imirkin_: that's the bit i'd be most concerned by
23:50 imirkin_: and tesla too
23:50 imirkin_: it modifies some shared ddr3 stuff
23:50 skeggsb: not too concerned about tesla, it's a lot simpler
23:50 skeggsb: but yes
23:51 skeggsb: dealing with gf119's weirdness would be nice too, but not as important
23:51 imirkin_: gotta start somewhere =]
23:52 skeggsb: gf119 has some minor steps towards being more like kepler, but mostly the same as fermi still.. also, i think the kepler code is actually "gf117"
23:52 imirkin_: if only you could find one somewhere
23:52 skeggsb: those are *really* rare, only in a couple of laptops i think
23:52 imirkin_: yeah