01:39imirkin: pendingchaos: using push is going to be faster than copy for small bo's ... esp if they're ubo's
01:39imirkin: just think about it... one way the data is right there in the command buffer, the other it's gotta do a dma
01:39imirkin: (the command buffer gets prefetched, etc)
01:40imirkin: and yeah, i get confused with the derivAll / liveOnly thing. nfc what derivAll is then...
01:57rhyskidd: rounding up all the ioctl's ....
01:57rhyskidd: going to a party
03:12rhyskidd: some might be interested, more this Friday: https://videocardz.com/77895/the-new-features-of-nvidia-turing-architecture
08:02hakzsam: karolherbst: probably not moe
11:02alkisg: Hi, in Ubuntu 18.04 I found out that `Option "PageFlip" "off"` avoids a segfault in this card:
11:02alkisg: 01:00.0 VGA compatible controller : NVIDIA Corporation NV5 [Riva TNT2 Model 64 / Model 64 Pro] [10de:002d] (rev 15)
11:03alkisg: Is it a known issue with patches that I could test? Should I file a bug report? The xorg log is there: http://termbin.com/haem
11:03alkisg: It affects other models too, but I have this one handy for testing
11:40RSpliet: rhyskidd: Interesting... they reduced the number of FP units per warp scheduler.
11:43RSpliet: alkisg: I don't think NV5 has been tested in a long long long time. Yes, please do report bugs in this area
11:44RSpliet: Normally I'd say do please also test with a newer kernel (4.18), and if you have that opportunity please do... but I doubt you'd see much of a difference
12:00RSpliet: Ah, they had 16 FP units per warp scheduler on Volta as well. Judging by the SM diagrams, Turing is Volta without much of the FP64 logic?
12:06alkisg: RSpliet: thank you, will do, it was running fine 2 years ago in ubuntu 16.04. I'll file a bug report and provide all the info that I can.
13:18pendingchaos: imirkin: isn't the contents of the command buffer also read through dma?
13:18pendingchaos: prefetching might help, is there any other reasons why it would be faster?
13:18pendingchaos: is there some significant fixed overhead with using m2mf? why especially for ubos?
13:18pendingchaos: (I'm a little unsure how all of this is setup)
13:23imirkin: it's already being fetched through dma
13:23imirkin: there's a whole engine always running fetching the next command
13:24imirkin: here you're sending commands to that engine, so it must already be running
13:24imirkin: and you're telling it to configure ANOTHER engine
13:24imirkin: which should now set up its own dma
13:24imirkin: now, there's a limit of effectiveness there
13:24imirkin: doing this for a 1MB transfer won't be helfpul
13:25imirkin: but doing it for a 1-byte transfer clearly will
13:25imirkin: we have a limit (which i know you found), for deciding where the trade-off is
13:25imirkin: alkisg: ideally supply a backtrace with symbols
13:25imirkin: otherwise it's a bit hard to tell what went wrong
13:26imirkin: i did test NV5 a while back and it worked fine, so this is a regression
13:26imirkin: (while back == 2015 or so)
13:44alkisg: imirkin: thank you, I will test tomorrow morning, afaik it worked until ubuntu 16.04 here as well
14:00karolherbst: imirkin: I guess it would make sense to have that threshold adjusteable at runtime. Depending on... whatever
14:01imirkin: nah, i'd just bump it up unconditionally
14:01imirkin: or maybe set differently for nv50 vs nvc0
14:09pendingchaos: imirkin: I think I understand
14:09pendingchaos: why especially for ubos though?
14:10imirkin: pendingchaos: well, ubo uploads via a copy engine require a pipeline stall
14:10imirkin: whereas using the dedicated push_cb uploads via magic
14:10imirkin: (a staging area on-chip, which gets written out to the ubo backing memory eventually, but also plays nice with concurrent draws)
14:21karolherbst: imirkin: okay. Maybe we should figure out where the sweet spot is?
14:21karolherbst: pendingchaos: where you trying higher values as well?
14:22pendingchaos: aside from the patch I posted through pastebin, I just set it to 1073741824 and found Deus Ex: Mankind Divided and Hitman to be faster
14:22pendingchaos: Hitman wasn't doing any large uploads while rendering I think
14:23pendingchaos: dunno about Deus Ex
14:23karolherbst: 1GB is quite a lot though
14:23pendingchaos: yeah, the limit should probably be lower in practice
14:25karolherbst: anyway, at some point the perf shouldn't increase anymore or maybe even drop
14:25karolherbst: if 256 increased perf that much, maybe 512 increases it a bit mroe?
14:26pendingchaos: in Hitman's case, I think all uploads after startup were below 256, so think 256, 512 and 1073741824 would all be the same
14:28karolherbst: imirkin: is there some problem when setting the threshold too high actually
14:28imirkin: loss of perf
14:28pendingchaos: I would assume p2mf is slower than m2mf for large copies
14:29karolherbst: so yeah, we might want to figure out where the sweet spot is indeed
14:29imirkin: and it prevents the fifo engine from doing other things
14:29imirkin: since it's just sending data to p2mf
14:29imirkin: based on zero information, i'd guess 512 or 1024 would be fine
14:30imirkin: or 256
14:30imirkin: what's it at now? 192 or something?
14:30karolherbst: 192, yes
14:30pendingchaos: 192 I think
14:30imirkin: no idea where that number came from
14:30imirkin: might have been calim optimizing a specific situation
14:31karolherbst: uhm, https://cgit.freedesktop.org/mesa/mesa/commit/?id=48a45ec24ae7
14:38someosdev: Hi. I have got a double G84 SLI card and wanted to play around with SLI and get basic AFR working. However nouveau does not work on my Dell M1730 laptop, which is a known problem.
14:38karolherbst: someosdev: why doesn't it work?
14:38someosdev: Once I load nouveau, the display of the laptop just iterates through the colors red, green, blue, black and white. However connecting an external display via DVI shows a valid image.
14:39someosdev: xf-86-video-nv and the blob both work with the card. The nouveau log does not show anything interesting. Are there any known problems with LVDS? Any suggestions?
14:39imirkin: sounds like an eDP failure of some sort? this is a recent laptop?
14:39someosdev: No it's from ~2008
14:39imirkin: oh. i see. the laptop has dual G84's in it.
14:39someosdev: Btw every color is shown for a little longer than 1sec, then it switches to the next color.
14:40imirkin: yeah, i think that's a panel reset pattern
14:40imirkin: i'd recommend filing a bug and including your vbios in there
14:40imirkin: from both GPUs
14:40imirkin: these are available in /sys/kernel/debug/dri/*/vbios.rom
14:40karolherbst: someosdev: do you know if this is a regression or is it broken with nouveau since forever?
14:40imirkin: as well as a dmesg after boot
14:42someosdev: I tried to install Ubuntu about 3 years ago and it had the same issue back then.
14:45pendingchaos: running this microbenchmark while forcing p2mf/m2mf with a modified mesa: https://pastebin.com/raw/TZfY2Z4g, it seems p2mf suddently becomes slower than m2mf at 32 MiB (compared to 31 MiB)
14:45someosdev: Here's another guy having the same issue back in 2012: https://askubuntu.com/questions/111011/how-can-i-install-on-a-dell-m1730-laptop
14:45pendingchaos: not sure how correct the benchmark is for this though
14:46imirkin: pendingchaos: if you like, i can run this on my G92 tonight
14:47imirkin: someosdev: yeah, we probably never got it right. it's a weird setup.
14:47imirkin: someosdev: good to know that xf86-video-nv works though
14:47imirkin: means it's something very dumb that we're missing
14:48pendingchaos: seems I forgot to disable some related changes in mesa, the 32 MiB thing might be very wrong
14:52someosdev: xf86-video-nv just copies the LVDS modest info from some registers, maybe they contain something magic?
14:53imirkin: pendingchaos: yeah, i just mean come up with a benchmark, and i can test it on a G92 tonight. (and GM206, but that's less interesting.)
14:53imirkin: someosdev: -nv code is hard to read. it covers a lot of different generations, you want the g80_* files
14:53imirkin: someosdev: the vbios contains info on how to drive the LVDS
14:53imirkin: might be that it's hooked up to the "wrong" GPU somehow
14:54someosdev: I have written my ow little g80 modest driver, so I know what nv is doing :-)
14:54imirkin: oh ok
14:54imirkin: you're above average then :)
14:55imirkin: then you should be able to look at what nouveau is doing
14:55imirkin: and figure out where we go wrong.
14:55someosdev: I know that the second card has one valid output, maybe nouveau is trying to modest that?
14:56imirkin: it definitely would try
14:56karolherbst: imirkin, pendingchaos: I can test on a gk106, gm204 and gp107, allthough maybe we get the same result on each?
14:56imirkin: it assumes they're both valid outputs
14:56imirkin: you can blacklist the other card's output
14:56imirkin: e.g. video=LVDS-2:d or osmething like that
15:03karolherbst: imirkin: oh, btw, we have somebody hitting the submitting stuff too fast error on a maxwell GPU. I kind of assumed this only happens for rather old GPUs, but maybe that was never the case
15:08karolherbst: ohh nvm, I was mistaken. Was looking at the log again, it got a ctxsw_timeout just seconds before the pushbufs were rejected
15:17pendingchaos: seems the number is around 64-128 KiB
15:17pendingchaos: (just glBufferSubData performance, no drawing involved)
15:18imirkin: surprising that it's more than can fit into a single pushbuf (8KB)
15:18imirkin: or rather ... a single command
15:18imirkin: not a single pushbuf
15:19imirkin: [using the repeating command type]
15:19karolherbst: pendingchaos: how much faster?
15:21pendingchaos: https://gist.github.com/pendingchaos/036407c96bcbfb760327e26871363457 < the benchmark
15:21pendingchaos: karolherbst: https://pastebin.com/raw/ed0cYuv1 < the results
15:22karolherbst: the perf penalty with higher sizes is suprisingly small
15:23karolherbst: which means small risk in having a too high vaule
15:23imirkin: so ... p2mf is only a thing on kepler+... there was some other thing on fermi. maybe sifc.
15:23imirkin: and it could be that with sifc the trade-offs are different
15:23karolherbst: so we probably only want to increase that value for kepler+ then
15:24karolherbst: and can go crazy and set it to like 32k or something ;)
15:24imirkin: well, we should just do a bit of due dilligence
15:24karolherbst: allthough I would assume 8k would be good enough?
15:24imirkin: pendingchaos: your data's weird... wtf is up with 32K and 64K?
15:25imirkin: the numbers are weirdly non-linear
15:25pendingchaos: why they're commented? they were slow
15:25karolherbst: imirkin: I guess he started with the bigger numbers and 0.062 == 0x125 / 2
15:25imirkin: oh i see. fewer uploads.
15:25karolherbst: ohh, that
15:25karolherbst: yeah ;)
15:26imirkin: 50000 uploads vs 5000 uploads.
15:26karolherbst: took me a few seconds as well
15:26imirkin: pendingchaos: probably good to do it as MB/s :)
15:26imirkin: but whatever
15:27karolherbst: pendingchaos: do you know of games which have bigger uploads than 256?
15:28pendingchaos: don't remember what the general upload sizes of anything other than Hitman was
15:28karolherbst: but look at the data, 4k seems to be the sweet spot
15:28pendingchaos: I tried a few other
15:28karolherbst: starting with 8k the time starts to increase
15:28karolherbst: so stalling could have a bigger impact on overall performance in real world scenarios
15:29karolherbst: 512 or 1k would be probably the safest values
15:29karolherbst: or, well 512
15:29karolherbst: I would go for 512 until we are able to test it
15:30karolherbst: imirkin: what do you think?
15:51pendingchaos: imirkin: fermi seems to use m2mf, tesla uses sifc (dunno what it is btw), kepler+ uses p2mf
16:12karolherbst: oh, is it known what "fecs" stands for? because somebody told me it should mean "front end context switching"
17:29someosdev: I tried completely disabling the secondary GPU, however this does not change anything.
18:16RSpliet: karolherbst: That's the one
18:18karolherbst: yeah, I was not sure if we knew that name already... didn't find much about it anywhere
18:49RSpliet: I think I used it in my paper a few months ago ;-)
18:50RSpliet: It had been confirmed... somewherre
20:14pendingchaos: karolherbst: perhaps the test could be run on gk106 and gm204?
20:35karolherbst: pendingchaos: yeah... just not today or maybe not even tomorrow, something urgent came up