16:48 karolherbst: RSpliet: what should I test?
16:50 karolherbst: RSpliet: insn_sched branch or some special commit?
17:12 karolherbst: RSpliet: will test all of the GPUtest ones and then maybe unigine
17:25 karolherbst: RSpliet: big regression with JuliaFP32
17:25 mupuf: karolherbst: what GPUs do you want? Nve4 and maxwell 1? Or did you change your mind?
17:25 karolherbst: mupuf: nope, sounds good
17:26 karolherbst: RSpliet: volplosion fails to compile
17:26 karolherbst: RSpliet: JuliaFP64 has a slim perf win
17:28 karolherbst: RSpliet: pixmark piano +10%
17:30 karolherbst: RSpliet: Tessmark also sees a perf improvement
17:49 karolherbst: mupuf: I think we should start collect apitraces which kill nouveau
17:49 mupuf: karolherbst: there is a repo for that, isn't?
17:50 karolherbst: mupuf: I think so
17:50 karolherbst: mupuf: or was it for shaders only? we could use it for both, but I thought there is a seperated one as well
17:50 mupuf: there was two, let me give you the url again
17:51 mupuf: I was about to say that I have vulkan traces that I could share too, but then I remembered...
17:51 karolherbst: ;)
17:52 mupuf: hmm, you might be right though
17:53 mupuf: here you go, it is named apitraces
17:54 karolherbst: lucky me that I have a 300Mbit connection here
17:55 karolherbst: mupuf: how much space do you have left there?
17:55 mupuf: hald a terabyte
17:55 karolherbst: mhhh
17:55 karolherbst: better compress those then
17:55 karolherbst: can apitrace read compressed traces? like xz ones?
18:06 mupuf: karolherbst: probably not directly
18:07 mupuf: the traces are compressed already... but nothing like xz could
18:07 mupuf: you can expect a factor of three
18:10 karolherbst: mupuf: mhh, the plain traces usually compress with a factor of 5-10 from my experience
18:10 karolherbst: but it highly depends on the trace
18:16 mupuf: karolherbst: wonderful then
18:23 mupuf: karolherbst: sorry, can't plug the GTX 690 in reator
18:24 mupuf: I do not have the right power cables
18:24 mupuf: it requires two 2x4 connectors
18:24 karolherbst: :/
18:24 mupuf: I only have one
18:24 mupuf: sorry, 2x5
18:24 karolherbst: x5?... wirdos
18:24 karolherbst: *weirdos
18:27 mr_sm1th: I've really been living under a stone.
18:28 mr_sm1th: But Nouveau supports Pascal now?
18:28 karolherbst: kind of
18:28 mupuf: mr_sm1th: "supports"
18:28 mr_sm1th: No re-clocking...
18:28 mr_sm1th: ahh
18:28 mr_sm1th: Kepler still last generation with reclocking then?
18:28 karolherbst: maxwell
18:29 mr_sm1th: Maxwell, that's new for me.
18:29 karolherbst: well, there is only one issue
18:29 karolherbst: fan controls are locked down and we can only control the fans with a signed firmware, which we don't have
18:34 mupuf: karolherbst: so, what else do you want?
18:34 loonycyborg: didn't nvidia plan to release something to enable you to do this stuff?
18:34 karolherbst: mupuf: kepler with super high clocks
18:34 mupuf: titan?
18:34 karolherbst: loonycyborg: well yes, they "planed"
18:35 karolherbst: mupuf: it has low clocks
18:35 mupuf: memory clock or core clock?
18:35 karolherbst: core
18:35 karolherbst: best if 1.1GHZ+
18:37 mupuf: karolherbst: https://www.gigabyte.com/Graphics-Card/GV-N660OC-2GD#ov almost 1.1 GHz
18:37 karolherbst: mupuf: looks perfect
18:37 karolherbst: it should boost even higher
18:38 karolherbst: mupuf: I think everything above 1023MHz or so is unstable, but I am not _quite_ sure about that, so I rather go with the extreme
18:40 mupuf: karolherbst: here you go, plugged and ready
18:40 mupuf: want me to turn it off?
18:40 mupuf: oh, and if you want me to update the kernel, it is easier now, I am leaving tomorrow
18:41 karolherbst: mupuf: what is the current kernel?
18:41 karolherbst: but usually I just rebase my branch until it compiles
18:41 mupuf: 4.10.0-rc4-ARCH
18:41 karolherbst: mupuf: ohh, you have that GPU?
18:41 karolherbst: mupuf: would make sense to update it
18:41 mupuf: hehe
18:41 mupuf: 4.12 is not enough of course :D
18:41 karolherbst: well, my current branch is based on 4.13
18:41 mupuf: oki docki
18:42 mupuf: well, arch already has the 4.13 kernel, I could update to that
18:43 karolherbst: a "glDrawRangeElements(mode = GL_TRIANGLES, start = 0, end = 23504, count = 68142, type = GL_UNSIGNED_SHORT, indices = 0x60)" call hangs the GPU
18:43 karolherbst: mupuf: sounds good
18:45 mupuf: 4.13.1-1-ARCH
18:45 mupuf: done
18:45 karolherbst: nice, thanks
18:45 karolherbst: mupuf: do you have the vbios uploaded of that GTX 660?
18:45 mupuf: yes
18:45 mupuf: nve6
18:46 karolherbst: ohh, okay
18:46 karolherbst: found it, sounds legit
18:47 karolherbst: I hope I won't need to undervolt the GPU to get it unstable due to high clocks
18:47 karolherbst: because then things might get tricky
18:47 karolherbst: because in doubt it might be unstable due to being undervolted
18:48 mr_sm1th: Is there anything Nvidia could do to hinder nouveau even more?
18:48 karolherbst: mr_sm1th: not providing signed gr firmwares
18:49 karolherbst: mr_sm1th: so maxwell reclocking is enabled on GM10X gpus, but not GM20x, which we could do theoretically on laptops or passively cooled GPUs
18:50 mr_sm1th: So Kepler is still best bet for a desktop GPU
18:51 karolherbst: yeah
18:51 karolherbst: a 780 ti is a pretty decent GPU still
18:51 mr_sm1th: Yes but it is very overpriced second-hand.
18:51 karolherbst: well yeah, but it's like a 1070
18:52 karolherbst: slower, but faster than a 1060
18:52 mr_sm1th: Really? I thought it was on par with a 1060
18:52 karolherbst: or maybe equally fast as the 1060
18:52 karolherbst: not _quite_ sure
18:52 mr_sm1th: ever so slightly faster I believe
18:52 karolherbst: the 1060 should have shitty memory speed compared to
18:53 Tom^: ok order placed on a 56 vega, karolherbst so who wants this 780ti? it has possible something vram/temp sensor/ or other weird wackiness going on. but it works and im using it daily. until it once every second or third day starts going flickering and downclocking to ~100mhz core. even tho it reports temsp of 49C :p
18:53 Tom^: il pay the shipping
18:54 karolherbst: Tom^: we want it :p
18:54 karolherbst: mhh
18:54 Tom^: karolherbst: sure. il just need some adress to send it to
18:54 karolherbst: Tom^: it downclocks through nouveau or was it nvidia being shitty?
18:54 karolherbst: ask mupuf
18:55 Tom^: it only downclocks and flcikers on nvidia
18:55 karolherbst: allthough I think he has enough kepler already :D
18:55 Tom^: havent had it happend on novueau, unless those weird fps drops i had in cs:go ages ago are related.
18:55 karolherbst: Tom^: most likely we do something wrong on nouveau then :D
18:55 Tom^: no it happends only on blob
18:55 Tom^: so you are doing it correct! :P
18:56 karolherbst: well in a perfect world, we should do things quite similiar
18:56 karolherbst: and maybe we simply don't use some part of the GPU nvidia does
18:56 Tom^: tried google, seems quite a lot 780ti suffered from some bad vram overtime.
18:56 karolherbst: and this part is broken
18:56 karolherbst: who knows
18:56 karolherbst: ha! NV50_PROG_OPTIMIZE=0 "fixed" the hang
18:56 karolherbst: my fav types of hangs
18:57 Tom^: mupuf: yeah, you need a sort of working 780ti?
18:57 karolherbst: Tom^: mhhh, or wait, mabe you could wait a few months and then I could give you an address
18:57 karolherbst: depending on what mupuf answers
18:57 Tom^: sure, il have it in the vega 56 box and pester you about it from time to time.
18:58 Tom^: il lay it in my closet until someone gets some use for it. and i rather send it to you for the good nouveau cause!
18:58 karolherbst: yeah like "debugging" games
18:58 karolherbst: :P
18:58 Tom^: =D
18:59 karolherbst: but now, it makes sense to have the most powerfull kepler GPU to do benchmarking on
18:59 karolherbst: *no
18:59 karolherbst: best way to track down CPU bottlenecks as well
19:13 karolherbst: with mod folding disabled: ../../../../../src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp:282: void nv50_ir::CodeEmitterNVC0::emitPredicate(const nv50_ir::Instruction*): Assertion `i->getPredicate()->reg.file == FILE_PREDICATE' failed.
19:13 karolherbst: :(
19:24 karolherbst: anyhow, bug in MemoryOpt
19:37 karolherbst: something is odd while opting 4x export u32 to 1x export b128 or something simliar, a little annoying
19:39 karolherbst: imirkin: are you awary of any possible issues related to that? the SSA version looks fine, so I assume something got be messed up within RA
19:42 karolherbst: bad luck this happens within a geom shader with nearly thousends of instructions
19:50 karolherbst: maybe somebody of you notice something https://gist.github.com/karolherbst/c2ed0a32eaa6a8451ffa521e9254e214
21:30 RSpliet: karolherbst: thanks for all that. That was your Maxwell 1st gen? Do you know why volplosion compilation failed? RA or segfault?
21:34 karolherbst: RSpliet: I only have a nve6
21:34 karolherbst: RA
21:34 RSpliet: Ah ok, so I probably GPR usage over a threshold with this sched work, and with the spilling problems around that becomes a regression
21:35 RSpliet: Unfortunately I don't use liveness info in the heuristic, as the liveness analysis appears to be either pre-SSA or strongly tied into RA...
21:37 RSpliet: karolherbst: How big is Tesseract perf improvement? Are we talking ~5% or 0-2%?
21:37 karolherbst: improvements are usually quite big >5%
21:38 RSpliet: That's encouraging! I've observed more benefits on Kepler than Maxwell unfortunately, but there's room for more cleverness if I have another stab at it next weekend
21:40 RSpliet: (also, I treat a lot of tex instructions as loads, but tbh I don't have a clue what they do in practice. envydocs seemed to imply they are sort-of-load insns, but perhaps I should prioritise true ld over tex or vice versa... there's some room for playing around!)
21:41 karolherbst: RSpliet: well, they read from textures
21:43 RSpliet: I got that impression yes. Now to figure out what the implications are for perf
21:44 RSpliet: is it: textures are compressed, so transfers are relatively small -> lower latency on tex insns when compared with regular ld
21:44 RSpliet: or... textures need decompression, so although BW is high, latency is high too when compared to regular ld
21:44 RSpliet: or... for textures there's no coherence traffic on the caches, so latency of loads is lower compared to ld insn
21:44 RSpliet: :-P
21:59 glennk: last i checked on nv hardware its mostly alu limited, if you were talking about tesseract.gg
22:02 RSpliet: glennk: I presume that's with the official driver? I can imagine bottlenecks with nouveau will lie at different points
22:03 RSpliet: I'm pretty sure we don't utilise hw nearly as good as they do
22:03 glennk: right
22:03 RSpliet: it's an interesting observation nonetheless, it's nice to know that with a lot of manpower we should aim for being ALU-bound :-)
22:06 glennk: i did a fair amount of measuring on r600g, but on that hardware the main bottlenecks are in the fixed function hardware and i really could do much about it
22:07 glennk: i forget if nouveau supports zcull yet?
22:08 glennk: but that and the stencil variant can save a bit of shader invocations
22:08 Marll: hello
22:08 Marll: why is that nouveau driver 1.5gb ? https://nouveau.pmoreau.org/
22:08 Marll: september 4
22:08 Marll: im a newb so definitely i misunderstand something
22:12 Marll: #archlinux
22:12 Marll: oops
22:12 Marll: wanted to join
22:13 pmoreau: Marll: It’s more than just the driver, it is a whole image, with a GUI and other softwares
22:13 Marll: so it's like a full distro?
22:13 pmoreau: I wouldn’t go that far either :-D
22:14 pmoreau: It’s a live CD/USB you can boot on to test the latest version of Nouveau, Radeon and Intel.
22:14 pmoreau: (It’s possible it’s not the very latest for Radeon and Intel, but at least it is for Nouveau.)
22:15 Marll: is there an installer to make it a stable installation on HDD?
22:15 pmoreau: It was mostly meant for users experiencing bugs on an older version of the kernel or Mesa, to try a newer version without having to build and install the software.
22:16 pmoreau: There is no installer.
22:16 Marll: I'm actually having troubles with Antergos myself
22:17 Marll: with nouveau but not really sure if the fault is on Nouveau side
22:17 Marll: might try that.
22:17 pmoreau: I guess you could download the image, and extract the packages for Mesa and Linux, if you don’t want to build them yourself. Or get the PKGBUILDs from the GitHub repository.
22:18 pmoreau: What kind of issues are you experiencing?
22:18 Marll: Icons are invisible and/or I think black and white colors are inversed in the DE
22:19 Marll: And in terminal
22:19 Marll: O_O
22:19 Marll: not on browser though for example
22:19 Marll: but I did get an error message when installing
22:19 Marll: so maybe there's that.
22:19 pmoreau: Do you remember the error?
22:20 pmoreau: I will have to go though, sorry :-/
22:20 Marll: It was something like "desktop environment has errors/problems running components" or something like that. Then I got 3 choices: Load default desktop, load fallback desktop? and 'cancel'
22:21 Marll: in fact I reinstalled the OS three times, each time with a different DE and each time I still got that error.
22:23 Marll: I just wonder why is that so? Isn't Antergos on latest <everything> ?
22:23 Marll: So why does Ubuntu 17.10 have an advantage in many place?
22:26 pmoreau: I won’t be able to look at it, but you can open a bug report, and attach the kernel and Mesa versions you are using, which GPU it is, as well as the output of dmesg and the content of Xorg.0.log.
22:27 pmoreau: Sorry, really need to go now.
22:28 Marll: np at all, thanks!
22:57 doctorpepper: hi guys
22:58 doctorpepper: an anyone help me please i am trying to offload 3d rendering to a nouveau driver on my laptop (with nvdia graphic card) but for some reason i dont get the right output from randr --listproviders
22:58 doctorpepper: Providers: number : 2
22:58 doctorpepper: Provider 0: id: 0x74 cap: 0xf, Source Output, Sink Output, Source Offload, Sink Offload crtcs: 3 outputs: 2 associated providers: 1 name:modesetting
22:58 doctorpepper: Provider 1: id: 0x3f cap: 0x5, Source Output, Source Offload crtcs: 0 outputs: 0 associated providers: 1 name:modesetting