03:42 mwk: alright, so now I more or less understand the NV10-NV30 pipeline right until the edge setup stage...
03:43 mwk:wonders how to observe the rasterizer without all these annoying later stages getting in the way
03:45 mwk: or maybe I should just write lots of docs instead and leave that stuff for later
10:19 HZun: Does noeveau support "nvidia-like" triple buffering, where unused frames are skipped? (as opposed to there being a long queue/latency)
14:04 rhyskidd: sigh at late series GP10x
14:05 rhyskidd: got another one of the runpm and gr init bug reports on a GP108
16:16 mslusarz: how long demmt has been in broken state?
16:26 imirkin: mslusarz: quite a while...
16:27 imirkin: i have some local patches to semi-fix it
16:27 imirkin: mslusarz: there's a problem in the chipset detection stuff coz the ioctl changed, so mmt doesn't record it properly anymore
16:28 imirkin: and it's tricky to make it compatible with both new and old blobs
16:28 pendingchaos: mslusarz: I think since around 380 series? 375.26 works fine. I think I might have tried 387.34 or 384.59 once and it didn't work
16:29 mslusarz: I think you are probably ignoring envytools bot messages
16:29 mslusarz: imirkin: ^
16:29 imirkin: no, i saw them
16:29 mslusarz: ah, ok
16:29 imirkin: perhaps there's a new new new new way of getting the chipset
16:29 mslusarz: I also pushed one change to mmt repo
16:29 imirkin: but at least when i looked, the number of args changed or something
16:29 imirkin: and i couldn't think of a way of making it work on both old and new blob versions
16:30 mslusarz: now it just works (tm) ;)
16:30 imirkin: oh, i see. you just call both.
16:30 imirkin: did you test that this works on older blobs?
16:31 mslusarz: on older blob versions the new method will fail, so the old one will also be issued
16:31 imirkin: mslusarz: i was also trying to get it to work again with nouveau ... quite a pain
16:31 imirkin: also the way things are handled with channels is a bit off, unfortunately
16:31 Subv: hey, is there some place where i can read more about the QUERY methods? (specifically the structures written to memory in short vs long mode)
16:32 imirkin: Subv: short mode is 32-bit, long is 128-bit
16:32 imirkin: Subv: on average, 128-bit is a pair of (timestamp, value) - 64bit for both
16:32 mslusarz: imirki: fyi, I have no plans to fix nouveau support
16:32 imirkin: the short mode is just the value, no timestamp
16:32 imirkin: anyways, i started on it
16:32 imirkin: but then saw something shiny
16:33 imirkin: i had totally forgotten i was even doing this until just now :)
16:33 mslusarz: lol
16:33 Subv: if i understand correctly, GET(Write, Zero, ShortMode) writes the QUERY_SEQUENCE value to the QUERY_ADDRESS as a 32 bit value
16:34 imirkin: Subv: sounds likely. tbh i don't remember such details.
16:34 Subv: and GET(Write, Zero, ShortMode) would write [Timestamp64, 0u64] to the QUERY_ADDRESS?
16:34 Subv: ah
16:34 imirkin: you mean long?
16:34 Subv: mm, i don't understand what this timestamp is supposed to be, though
16:34 imirkin: well, not all queries support short and long
16:34 imirkin: it's a GPU timestamp
16:34 imirkin: kinda like rdtsc
16:34 Subv: yes, sorry, i meant long mode in that second message
16:35 imirkin: you'd have to experiment a bit to work out specifics
16:35 Subv: i see, thanks!
16:35 imirkin: Subv: https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw.c
16:36 imirkin: there are some comments in there
19:19 mwk: imirkin: is there some reasonable way to change pixel shader uniforms on nv30/nv40?
19:20 imirkin: define 'reasonable'
19:20 mwk: as in, not involving a whole pipeline flush
19:20 imirkin: i guess i dunno for sure, but i didn't think it involved a pipeline flush...
19:20 imirkin: just update the uniforms, and draw
19:21 mwk: how?
19:21 imirkin: how to which part?
19:21 mwk: how to update the uniforms
19:21 imirkin: sec
19:21 mwk: they are in memory, right?
19:22 mwk: so you have to submit in-band memory write commands somehow
19:22 imirkin: you write const via pushbuf anyways
19:22 imirkin: you can't update them in-memory
19:22 mwk: eh?
19:23 imirkin: this is true all the way up to the latest chips btw
19:23 imirkin: trying to find it for nv30 -- i remember it's carefully hidden away, but unfortunately not where
19:23 mwk: yeah, g80 has cb_upload, got it
19:24 mwk: i'm looking for a similar mechanism on nv30/nv40
19:24 imirkin: and while you CAN update the raw memory, you have to be super-careful with flushes
19:24 imirkin: iirc there's no backing memory on nv30/nv40 at all
19:24 imirkin: you just write the values to some methods
19:25 imirkin: wait a sec
19:26 imirkin: looks like consts are part of the program directly?
19:26 imirkin:is confused
19:27 imirkin: yeah. ok. looks like uniforms are part of the instruction stream
19:27 mwk: vertex shader code+consts are methods
19:27 imirkin: (stored at the end i guess)
19:27 mwk: but pixel shader consts are inline in code
19:27 mwk: which is in memory
19:28 imirkin: yeah
19:28 imirkin: yeah ok. i'm befuddled. sorry
19:28 imirkin: i didn't realize VP / FP were done differently
19:31 imirkin: mwk: https://hastebin.com/xikaxizehe.php
19:36 mwk: imirkin: that's just the flush though, right?
19:37 imirkin: yes
19:37 imirkin: and i don't see anything that makes sure that that memory is done being read
19:37 mwk: what about the actual mem write?
19:37 imirkin: that's in nv30_fragprog_upload
19:37 imirkin: just a straight up cpu write
19:38 imirkin: er, i guess not - it may use a transfer
19:38 imirkin: which will in turn use a m2mf
19:38 mwk: ugh
19:38 imirkin: not 100% sure
19:39 mwk: so, a pipeline flush, class switch, and tex cache flush to boot
19:39 mwk: sounds horrible
19:39 imirkin: don't mess with perfection :p
19:40 imirkin: triangles show up on the screen... what more can you want
19:41 mwk: fwiw I'm quite convinced the only part of your paste that's necessary for flush is the offset method resubmit
19:42 mwk: since that's what triggers the super-secret cache prefetch
19:42 mwk: of first 8 words
19:44 mwk: well, that and tex_cache_ctl
19:45 imirkin: well, this is a general "fragprog needs revalidating" thing
19:45 imirkin: not just for a uniform update
19:46 mwk: fair enough
20:00 mwk: well
20:00 mwk: I have to say, the whole NV30/NV40 pixel shader architecture is deliciously trippy
20:01 mwk:still doesn't quite understand what's going on
20:01 mwk: but it's pretty... unique
20:01 imirkin: i know next to nothing about it...
20:01 imirkin: except immediates are interleaved
20:02 mwk: oh, the ISA is only half the story
20:02 imirkin: i also have yet to figure out how to operate certain bits of it
20:02 imirkin: iirc i never got pointsize working
20:02 mwk: you basically have three big pipeline stages
20:03 mwk: shader top, texture, shader bottom
20:03 imirkin: ah, you're looking at it from a "how does it actually get executed" perspective
20:03 mwk: yep
20:03 imirkin: i look at it from the "please, correct pixels, show up, pretty please"
20:04 mwk: the first thing that happens is that the *vertex fetch* unit, whenever the active pixel shader changes, reads the first 8 words of it from memory and sends them down the pipeline
20:04 mwk: aka the weird secret prefetch, which is why you need to resubmit the address when you change shader code
20:05 mwk: then, the shader top gobbles up as many pixels as it can fit, executes the first one or two instructions on each pixel, sends them all to texture
20:06 mwk: texture reads the whole batch, fetches more instructions from memory, executes another instruction, and sends all results to bottom
20:07 mwk: and the bottom executes another 1 or 2 instructions, saves all the results from this phase to the register file [which only it has access to], then sends source operands and instructions for the next phase to the top
20:07 mwk: or notices the end-of-shader and sends the results to rop instead
20:08 mwk: and so the wheel goes round
20:08 mwk: instruction decoding is scattered pretty much everywhere
20:09 mwk: the first words of the shader are read through a completely different path than the following ones
20:09 mwk: and top/tex/bottom each have their own subset of supported instructions
20:09 mwk: that partially overlap
20:10 mwk: it's a miracle the whole thing doesn't desync
20:12 mwk: eh
20:13 mwk: now that I said it, I'll probably find a dozen ways of desyncing it...
20:21 imirkin: hehe
23:15 Subv: is it normal that queries take a rather long time to execute?
23:16 imirkin: no
23:16 imirkin: i guess it's relative... how long is long?
23:18 Subv: i'm not submitting rendering commands, just a single query with configuration GET(Write2, SelectZero, Long), it takes more than a few seconds after i submit it for the backing memory to show any change
23:22 Subv: i'm waiting 1 second after submitting the command, and then read the memory in a loop that reads and sleeps another second, the data isn't updated immediately but after some unspecified time
23:24 imirkin: right
23:25 imirkin: so ... that obviously should be pretty fast
23:25 imirkin: but perhaps your submission isn't actually invoking a true submit
23:25 imirkin: and so your commands sit in some queue
23:26 imirkin: also depending on how you read the memory, caching can be annoying
23:26 imirkin: i pointed you at how nouveau does stuff... get_query_result is what we do on the CPU to get at the data
23:27 imirkin: check the wait case.
23:46 Subv: oh i see
23:46 Subv: thanks
23:57 Subv: oh, nouveau's is64Bit only means that the query doesn't write a sequence number in the first word, not that it's a long query
23:57 Subv: that was confusing