01:39 imirkin: pendingchaos: using push is going to be faster than copy for small bo's ... esp if they're ubo's
01:39 imirkin: just think about it... one way the data is right there in the command buffer, the other it's gotta do a dma
01:39 imirkin: (the command buffer gets prefetched, etc)
01:40 imirkin: and yeah, i get confused with the derivAll / liveOnly thing. nfc what derivAll is then...
01:57 rhyskidd: rounding up all the ioctl's ....
01:57 rhyskidd: going to a party
01:57 rhyskidd: weee!
03:12 rhyskidd: some might be interested, more this Friday: https://videocardz.com/77895/the-new-features-of-nvidia-turing-architecture
08:02 hakzsam: karolherbst: probably not moe
11:02 alkisg: Hi, in Ubuntu 18.04 I found out that `Option "PageFlip" "off"` avoids a segfault in this card:
11:02 alkisg: 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation NV5 [Riva TNT2 Model 64 / Model 64 Pro] [10de:002d] (rev 15)
11:03 alkisg: Is it a known issue with patches that I could test? Should I file a bug report? The xorg log is there: http://termbin.com/haem
11:03 alkisg: It affects other models too, but I have this one handy for testing
11:40 RSpliet: rhyskidd: Interesting... they reduced the number of FP units per warp scheduler.
11:43 RSpliet: alkisg: I don't think NV5 has been tested in a long long long time. Yes, please do report bugs in this area
11:44 RSpliet: Normally I'd say do please also test with a newer kernel (4.18), and if you have that opportunity please do... but I doubt you'd see much of a difference
12:00 RSpliet: Ah, they had 16 FP units per warp scheduler on Volta as well. Judging by the SM diagrams, Turing is Volta without much of the FP64 logic?
12:06 alkisg: RSpliet: thank you, will do, it was running fine 2 years ago in ubuntu 16.04. I'll file a bug report and provide all the info that I can.
13:18 pendingchaos: imirkin: isn't the contents of the command buffer also read through dma?
13:18 pendingchaos: prefetching might help, is there any other reasons why it would be faster?
13:18 pendingchaos: is there some significant fixed overhead with using m2mf? why especially for ubos?
13:18 pendingchaos: (I'm a little unsure how all of this is setup)
13:23 imirkin: it's already being fetched through dma
13:23 imirkin: there's a whole engine always running fetching the next command
13:24 imirkin: here you're sending commands to that engine, so it must already be running
13:24 imirkin: and you're telling it to configure ANOTHER engine
13:24 imirkin: which should now set up its own dma
13:24 imirkin: now, there's a limit of effectiveness there
13:24 imirkin: doing this for a 1MB transfer won't be helfpul
13:25 imirkin: but doing it for a 1-byte transfer clearly will
13:25 imirkin: we have a limit (which i know you found), for deciding where the trade-off is
13:25 imirkin: alkisg: ideally supply a backtrace with symbols
13:25 imirkin: otherwise it's a bit hard to tell what went wrong
13:26 imirkin: i did test NV5 a while back and it worked fine, so this is a regression
13:26 imirkin: (while back == 2015 or so)
13:44 alkisg: imirkin: thank you, I will test tomorrow morning, afaik it worked until ubuntu 16.04 here as well
14:00 karolherbst: imirkin: I guess it would make sense to have that threshold adjusteable at runtime. Depending on... whatever
14:01 imirkin: nah, i'd just bump it up unconditionally
14:01 imirkin: or maybe set differently for nv50 vs nvc0
14:09 pendingchaos: imirkin: I think I understand
14:09 pendingchaos: why especially for ubos though?
14:10 imirkin: pendingchaos: well, ubo uploads via a copy engine require a pipeline stall
14:10 imirkin: whereas using the dedicated push_cb uploads via magic
14:10 imirkin: (a staging area on-chip, which gets written out to the ubo backing memory eventually, but also plays nice with concurrent draws)
14:21 karolherbst: imirkin: okay. Maybe we should figure out where the sweet spot is?
14:21 karolherbst: pendingchaos: where you trying higher values as well?
14:22 pendingchaos: aside from the patch I posted through pastebin, I just set it to 1073741824 and found Deus Ex: Mankind Divided and Hitman to be faster
14:22 pendingchaos: Hitman wasn't doing any large uploads while rendering I think
14:23 pendingchaos: dunno about Deus Ex
14:23 karolherbst: mhh
14:23 karolherbst: 1GB is quite a lot though
14:23 pendingchaos: yeah, the limit should probably be lower in practice
14:25 karolherbst: anyway, at some point the perf shouldn't increase anymore or maybe even drop
14:25 karolherbst: if 256 increased perf that much, maybe 512 increases it a bit mroe?
14:25 karolherbst: *more
14:26 pendingchaos: in Hitman's case, I think all uploads after startup were below 256, so think 256, 512 and 1073741824 would all be the same
14:28 karolherbst: imirkin: is there some problem when setting the threshold too high actually
14:28 karolherbst: ?
14:28 imirkin: loss of perf
14:28 pendingchaos: I would assume p2mf is slower than m2mf for large copies
14:29 karolherbst: ahh
14:29 imirkin: correct
14:29 karolherbst: so yeah, we might want to figure out where the sweet spot is indeed
14:29 imirkin: and it prevents the fifo engine from doing other things
14:29 imirkin: since it's just sending data to p2mf
14:29 imirkin: based on zero information, i'd guess 512 or 1024 would be fine
14:30 imirkin: or 256
14:30 imirkin: what's it at now? 192 or something?
14:30 karolherbst: 192, yes
14:30 pendingchaos: 192 I think
14:30 imirkin: no idea where that number came from
14:30 imirkin: might have been calim optimizing a specific situation
14:30 karolherbst: 48a45ec24ae7
14:31 karolherbst: uhm, https://cgit.freedesktop.org/mesa/mesa/commit/?id=48a45ec24ae7
14:38 someosdev: Hi. I have got a double G84 SLI card and wanted to play around with SLI and get basic AFR working. However nouveau does not work on my Dell M1730 laptop, which is a known problem.
14:38 karolherbst: someosdev: why doesn't it work?
14:38 someosdev: Once I load nouveau, the display of the laptop just iterates through the colors red, green, blue, black and white. However connecting an external display via DVI shows a valid image.
14:39 someosdev: xf-86-video-nv and the blob both work with the card. The nouveau log does not show anything interesting. Are there any known problems with LVDS? Any suggestions?
14:39 imirkin: sounds like an eDP failure of some sort? this is a recent laptop?
14:39 someosdev: No it's from ~2008
14:39 imirkin: oh. i see. the laptop has dual G84's in it.
14:39 someosdev: Btw every color is shown for a little longer than 1sec, then it switches to the next color.
14:40 karolherbst: weird
14:40 imirkin: yeah, i think that's a panel reset pattern
14:40 imirkin: i'd recommend filing a bug and including your vbios in there
14:40 imirkin: from both GPUs
14:40 imirkin: these are available in /sys/kernel/debug/dri/*/vbios.rom
14:40 karolherbst: someosdev: do you know if this is a regression or is it broken with nouveau since forever?
14:40 imirkin: as well as a dmesg after boot
14:42 someosdev: I tried to install Ubuntu about 3 years ago and it had the same issue back then.
14:45 pendingchaos: running this microbenchmark while forcing p2mf/m2mf with a modified mesa: https://pastebin.com/raw/TZfY2Z4g, it seems p2mf suddently becomes slower than m2mf at 32 MiB (compared to 31 MiB)
14:45 someosdev: Here's another guy having the same issue back in 2012: https://askubuntu.com/questions/111011/how-can-i-install-on-a-dell-m1730-laptop
14:45 pendingchaos: not sure how correct the benchmark is for this though
14:46 imirkin: pendingchaos: if you like, i can run this on my G92 tonight
14:46 pendingchaos: sure
14:47 imirkin: someosdev: yeah, we probably never got it right. it's a weird setup.
14:47 imirkin: someosdev: good to know that xf86-video-nv works though
14:47 imirkin: means it's something very dumb that we're missing
14:48 pendingchaos: seems I forgot to disable some related changes in mesa, the 32 MiB thing might be very wrong
14:52 someosdev: xf86-video-nv just copies the LVDS modest info from some registers, maybe they contain something magic?
14:53 imirkin: pendingchaos: yeah, i just mean come up with a benchmark, and i can test it on a G92 tonight. (and GM206, but that's less interesting.)
14:53 imirkin: someosdev: -nv code is hard to read. it covers a lot of different generations, you want the g80_* files
14:53 imirkin: someosdev: the vbios contains info on how to drive the LVDS
14:53 imirkin: might be that it's hooked up to the "wrong" GPU somehow
14:54 someosdev: I have written my ow little g80 modest driver, so I know what nv is doing :-)
14:54 imirkin: oh ok
14:54 imirkin: you're above average then :)
14:55 imirkin: then you should be able to look at what nouveau is doing
14:55 imirkin: and figure out where we go wrong.
14:55 someosdev: I know that the second card has one valid output, maybe nouveau is trying to modest that?
14:56 imirkin: it definitely would try
14:56 karolherbst: imirkin, pendingchaos: I can test on a gk106, gm204 and gp107, allthough maybe we get the same result on each?
14:56 imirkin: it assumes they're both valid outputs
14:56 imirkin: you can blacklist the other card's output
14:56 imirkin: e.g. video=LVDS-2:d or osmething like that
15:03 karolherbst: imirkin: oh, btw, we have somebody hitting the submitting stuff too fast error on a maxwell GPU. I kind of assumed this only happens for rather old GPUs, but maybe that was never the case
15:08 karolherbst: ohh nvm, I was mistaken. Was looking at the log again, it got a ctxsw_timeout just seconds before the pushbufs were rejected
15:17 pendingchaos: seems the number is around 64-128 KiB
15:17 pendingchaos: (just glBufferSubData performance, no drawing involved)
15:18 imirkin: surprising that it's more than can fit into a single pushbuf (8KB)
15:18 imirkin: or rather ... a single command
15:18 imirkin: not a single pushbuf
15:19 imirkin: [using the repeating command type]
15:19 karolherbst: pendingchaos: how much faster?
15:21 pendingchaos: https://gist.github.com/pendingchaos/036407c96bcbfb760327e26871363457 < the benchmark
15:21 pendingchaos: karolherbst: https://pastebin.com/raw/ed0cYuv1 < the results
15:21 karolherbst: interesting
15:22 karolherbst: the perf penalty with higher sizes is suprisingly small
15:23 karolherbst: which means small risk in having a too high vaule
15:23 imirkin: so ... p2mf is only a thing on kepler+... there was some other thing on fermi. maybe sifc.
15:23 imirkin: and it could be that with sifc the trade-offs are different
15:23 karolherbst: yeah
15:23 karolherbst: so we probably only want to increase that value for kepler+ then
15:24 karolherbst: and can go crazy and set it to like 32k or something ;)
15:24 imirkin: well, we should just do a bit of due dilligence
15:24 karolherbst: allthough I would assume 8k would be good enough?
15:24 imirkin: pendingchaos: your data's weird... wtf is up with 32K and 64K?
15:25 imirkin: the numbers are weirdly non-linear
15:25 pendingchaos: why they're commented? they were slow
15:25 karolherbst: imirkin: I guess he started with the bigger numbers and 0.062 == 0x125 / 2
15:25 imirkin: oh i see. fewer uploads.
15:25 karolherbst: ohh, that
15:25 karolherbst: yeah ;)
15:26 imirkin: 50000 uploads vs 5000 uploads.
15:26 karolherbst: took me a few seconds as well
15:26 imirkin: pendingchaos: probably good to do it as MB/s :)
15:26 imirkin: but whatever
15:27 karolherbst: pendingchaos: do you know of games which have bigger uploads than 256?
15:28 pendingchaos: don't remember what the general upload sizes of anything other than Hitman was
15:28 karolherbst: but look at the data, 4k seems to be the sweet spot
15:28 pendingchaos: I tried a few other
15:28 karolherbst: *looking
15:28 karolherbst: starting with 8k the time starts to increase
15:28 karolherbst: so stalling could have a bigger impact on overall performance in real world scenarios
15:29 karolherbst: 512 or 1k would be probably the safest values
15:29 karolherbst: or, well 512
15:29 karolherbst: I would go for 512 until we are able to test it
15:30 karolherbst: imirkin: what do you think?
15:51 pendingchaos: imirkin: fermi seems to use m2mf, tesla uses sifc (dunno what it is btw), kepler+ uses p2mf
16:12 karolherbst: oh, is it known what "fecs" stands for? because somebody told me it should mean "front end context switching"
17:29 someosdev: I tried completely disabling the secondary GPU, however this does not change anything.
18:16 RSpliet: karolherbst: That's the one
18:18 karolherbst: yeah, I was not sure if we knew that name already... didn't find much about it anywhere
18:49 RSpliet: I think I used it in my paper a few months ago ;-)
18:50 RSpliet: It had been confirmed... somewherre
20:14 pendingchaos: karolherbst: perhaps the test could be run on gk106 and gm204?
20:35 karolherbst: pendingchaos: yeah... just not today or maybe not even tomorrow, something urgent came up
20:36 pendingchaos:nods