03:42mwk: alright, so now I more or less understand the NV10-NV30 pipeline right until the edge setup stage...
03:43mwk:wonders how to observe the rasterizer without all these annoying later stages getting in the way
03:45mwk: or maybe I should just write lots of docs instead and leave that stuff for later
10:19HZun: Does noeveau support "nvidia-like" triple buffering, where unused frames are skipped? (as opposed to there being a long queue/latency)
14:04rhyskidd: sigh at late series GP10x
14:05rhyskidd: got another one of the runpm and gr init bug reports on a GP108
16:16mslusarz: how long demmt has been in broken state?
16:26imirkin: mslusarz: quite a while...
16:27imirkin: i have some local patches to semi-fix it
16:27imirkin: mslusarz: there's a problem in the chipset detection stuff coz the ioctl changed, so mmt doesn't record it properly anymore
16:28imirkin: and it's tricky to make it compatible with both new and old blobs
16:28pendingchaos: mslusarz: I think since around 380 series? 375.26 works fine. I think I might have tried 387.34 or 384.59 once and it didn't work
16:29mslusarz: I think you are probably ignoring envytools bot messages
16:29mslusarz: imirkin: ^
16:29imirkin: no, i saw them
16:29mslusarz: ah, ok
16:29imirkin: perhaps there's a new new new new way of getting the chipset
16:29mslusarz: I also pushed one change to mmt repo
16:29imirkin: but at least when i looked, the number of args changed or something
16:29imirkin: and i couldn't think of a way of making it work on both old and new blob versions
16:30mslusarz: now it just works (tm) ;)
16:30imirkin: oh, i see. you just call both.
16:30imirkin: did you test that this works on older blobs?
16:31mslusarz: on older blob versions the new method will fail, so the old one will also be issued
16:31imirkin: mslusarz: i was also trying to get it to work again with nouveau ... quite a pain
16:31imirkin: also the way things are handled with channels is a bit off, unfortunately
16:31Subv: hey, is there some place where i can read more about the QUERY methods? (specifically the structures written to memory in short vs long mode)
16:32imirkin: Subv: short mode is 32-bit, long is 128-bit
16:32imirkin: Subv: on average, 128-bit is a pair of (timestamp, value) - 64bit for both
16:32mslusarz: imirki: fyi, I have no plans to fix nouveau support
16:32imirkin: the short mode is just the value, no timestamp
16:32imirkin: anyways, i started on it
16:32imirkin: but then saw something shiny
16:33imirkin: i had totally forgotten i was even doing this until just now :)
16:33Subv: if i understand correctly, GET(Write, Zero, ShortMode) writes the QUERY_SEQUENCE value to the QUERY_ADDRESS as a 32 bit value
16:34imirkin: Subv: sounds likely. tbh i don't remember such details.
16:34Subv: and GET(Write, Zero, ShortMode) would write [Timestamp64, 0u64] to the QUERY_ADDRESS?
16:34imirkin: you mean long?
16:34Subv: mm, i don't understand what this timestamp is supposed to be, though
16:34imirkin: well, not all queries support short and long
16:34imirkin: it's a GPU timestamp
16:34imirkin: kinda like rdtsc
16:34Subv: yes, sorry, i meant long mode in that second message
16:35imirkin: you'd have to experiment a bit to work out specifics
16:35Subv: i see, thanks!
16:35imirkin: Subv: https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw.c
16:36imirkin: there are some comments in there
19:19mwk: imirkin: is there some reasonable way to change pixel shader uniforms on nv30/nv40?
19:20imirkin: define 'reasonable'
19:20mwk: as in, not involving a whole pipeline flush
19:20imirkin: i guess i dunno for sure, but i didn't think it involved a pipeline flush...
19:20imirkin: just update the uniforms, and draw
19:21imirkin: how to which part?
19:21mwk: how to update the uniforms
19:21mwk: they are in memory, right?
19:22mwk: so you have to submit in-band memory write commands somehow
19:22imirkin: you write const via pushbuf anyways
19:22imirkin: you can't update them in-memory
19:23imirkin: this is true all the way up to the latest chips btw
19:23imirkin: trying to find it for nv30 -- i remember it's carefully hidden away, but unfortunately not where
19:23mwk: yeah, g80 has cb_upload, got it
19:24mwk: i'm looking for a similar mechanism on nv30/nv40
19:24imirkin: and while you CAN update the raw memory, you have to be super-careful with flushes
19:24imirkin: iirc there's no backing memory on nv30/nv40 at all
19:24imirkin: you just write the values to some methods
19:25imirkin: wait a sec
19:26imirkin: looks like consts are part of the program directly?
19:27imirkin: yeah. ok. looks like uniforms are part of the instruction stream
19:27mwk: vertex shader code+consts are methods
19:27imirkin: (stored at the end i guess)
19:27mwk: but pixel shader consts are inline in code
19:27mwk: which is in memory
19:28imirkin: yeah ok. i'm befuddled. sorry
19:28imirkin: i didn't realize VP / FP were done differently
19:31imirkin: mwk: https://hastebin.com/xikaxizehe.php
19:36mwk: imirkin: that's just the flush though, right?
19:37imirkin: and i don't see anything that makes sure that that memory is done being read
19:37mwk: what about the actual mem write?
19:37imirkin: that's in nv30_fragprog_upload
19:37imirkin: just a straight up cpu write
19:38imirkin: er, i guess not - it may use a transfer
19:38imirkin: which will in turn use a m2mf
19:38imirkin: not 100% sure
19:39mwk: so, a pipeline flush, class switch, and tex cache flush to boot
19:39mwk: sounds horrible
19:39imirkin: don't mess with perfection :p
19:40imirkin: triangles show up on the screen... what more can you want
19:41mwk: fwiw I'm quite convinced the only part of your paste that's necessary for flush is the offset method resubmit
19:42mwk: since that's what triggers the super-secret cache prefetch
19:42mwk: of first 8 words
19:44mwk: well, that and tex_cache_ctl
19:45imirkin: well, this is a general "fragprog needs revalidating" thing
19:45imirkin: not just for a uniform update
19:46mwk: fair enough
20:00mwk: I have to say, the whole NV30/NV40 pixel shader architecture is deliciously trippy
20:01mwk:still doesn't quite understand what's going on
20:01mwk: but it's pretty... unique
20:01imirkin: i know next to nothing about it...
20:01imirkin: except immediates are interleaved
20:02mwk: oh, the ISA is only half the story
20:02imirkin: i also have yet to figure out how to operate certain bits of it
20:02imirkin: iirc i never got pointsize working
20:02mwk: you basically have three big pipeline stages
20:03mwk: shader top, texture, shader bottom
20:03imirkin: ah, you're looking at it from a "how does it actually get executed" perspective
20:03imirkin: i look at it from the "please, correct pixels, show up, pretty please"
20:04mwk: the first thing that happens is that the *vertex fetch* unit, whenever the active pixel shader changes, reads the first 8 words of it from memory and sends them down the pipeline
20:04mwk: aka the weird secret prefetch, which is why you need to resubmit the address when you change shader code
20:05mwk: then, the shader top gobbles up as many pixels as it can fit, executes the first one or two instructions on each pixel, sends them all to texture
20:06mwk: texture reads the whole batch, fetches more instructions from memory, executes another instruction, and sends all results to bottom
20:07mwk: and the bottom executes another 1 or 2 instructions, saves all the results from this phase to the register file [which only it has access to], then sends source operands and instructions for the next phase to the top
20:07mwk: or notices the end-of-shader and sends the results to rop instead
20:08mwk: and so the wheel goes round
20:08mwk: instruction decoding is scattered pretty much everywhere
20:09mwk: the first words of the shader are read through a completely different path than the following ones
20:09mwk: and top/tex/bottom each have their own subset of supported instructions
20:09mwk: that partially overlap
20:10mwk: it's a miracle the whole thing doesn't desync
20:13mwk: now that I said it, I'll probably find a dozen ways of desyncing it...
23:15Subv: is it normal that queries take a rather long time to execute?
23:16imirkin: i guess it's relative... how long is long?
23:18Subv: i'm not submitting rendering commands, just a single query with configuration GET(Write2, SelectZero, Long), it takes more than a few seconds after i submit it for the backing memory to show any change
23:22Subv: i'm waiting 1 second after submitting the command, and then read the memory in a loop that reads and sleeps another second, the data isn't updated immediately but after some unspecified time
23:25imirkin: so ... that obviously should be pretty fast
23:25imirkin: but perhaps your submission isn't actually invoking a true submit
23:25imirkin: and so your commands sit in some queue
23:26imirkin: also depending on how you read the memory, caching can be annoying
23:26imirkin: i pointed you at how nouveau does stuff... get_query_result is what we do on the CPU to get at the data
23:27imirkin: check the wait case.
23:46Subv: oh i see
23:57Subv: oh, nouveau's is64Bit only means that the query doesn't write a sequence number in the first word, not that it's a long query
23:57Subv: that was confusing