04:40 gnurou: skeggsb: sent v2 of secboot refactoring your way - I'm happier with how this version turned out, much clearer IMHO (at least the end result - intermediate states are sometimes so-so)
05:05 kingsley: First of all, thanks for maintaining open source.
05:06 kingsley: I've been using nouveau for some time now, and it has definitely been workable.
05:08 kingsley: I recently stumbled across the image corruption shown at http://loaner.com/corrupted_texture.png
05:09 kingsley: It happens when I load a big 146 MB .png image file into the animation program named "blender".
05:11 kingsley: I originally reported the bug against blender at https://developer.blender.org/T49847
05:12 kingsley: However, two other people have been unable to elicit the bug. One uses Windows and the other Linux, with nVidia's diver.
05:13 kingsley: I worked around the bug with a smaller, lower resolution file. Instead of 10000 pixels horizontal resolution, it had "only" 4000.
05:14 kingsley: However, there's a bug, and I thought it might be helpful to give a heads up to the right people who can fix it.
05:15 kingsley: I checked the troubleshooting tips at https://nouveau.freedesktop.org/wiki/TroubleShooting, and the dependencies at https://nouveau.freedesktop.org/wiki/
05:17 kingsley: The image is corrupted with an nVidia G72 [GeForce 7200 GS / 7300 SE].
05:21 kingsley: I found some software requirements that are currently unsupported by my linux distro: basically Debian unstable. It only has version 4.7 of the kernel, but evidently 4.8 is required. It only has version 2.4.26-1 of libdrm-nouveau1a, but evidently 2.4.70 is required.
05:22 kingsley: I'm reluctant to ask Debian's developers to upgrade because I'm not certain it's a bug in Nouveau.
05:23 kingsley: So, that's basically what I think.
05:23 kingsley: I tailed log files while eliciting the bug and saw nothing.
05:24 kingsley: I suppose simply swapping out the video card for another might be an option too, but fixing the bug appeals to me.
05:24 kingsley: What do you think?
07:19 kingsley: I'm happy to report I seem to have worked around the bug by doing
07:19 kingsley: $ LIBGL_ALWAYS_SOFTWARE=1 blender
07:20 mupuf: kingsley: that means you have no acceleration at all :D
07:21 mupuf: GEForce3 :o
07:22 kingsley: mupuf: Yes, I concede it may not have the gripping drama of 100 FPS, but it gives me the will to live, and ...
07:22 kingsley: seems to me to point to a bug in LibGL instead of the app that uses it (blender).
07:22 mupuf: sure, it is quite likely
07:23 mupuf: it is also quite likely that blender is trying to use a non-power-of-two texture directly
07:23 mupuf: or it splits the textures into many textures but you end up not having enough binding IDs
07:24 mupuf: it is possibly a hw limitation we will have big troubles avoiding aside from reverting to using the CPU
07:24 mupuf: but that would require a lot of architecture allow the CPU to handle part of the rendering
07:24 mupuf: but thanks for providing a blend file, someone with more recent hw could test it
07:25 mupuf: imirkin_ would have access to such old hw and may be able to help
07:28 kingsley: mupuf: What would you say, to the hypothetical circumstance, where someone using the nick "kingsley", on a Freenode channel named "#nouveau", asked which, if any, video card(s) is/are particularly well supported, after finding that LIBGL_ALWAYS_SOFTWARE=1 worked around a bug? What would you say in that theoretical scenario?
07:28 mupuf: lol
07:29 kingsley: I suppose that's an answer too.
07:29 mupuf: first, wait for someone to check if a a newer GPU handles this case
07:29 mupuf: after this, I assume you only have an AGP and PCI motherboard, right?
07:31 mupuf: kingsley: this would be a problem
07:31 mupuf: otherwise, just buying the cheapest nvidia gpu you can find would work
07:31 mupuf: you can get some for 30e, new
07:31 mupuf: or for free if you ask nicely some colleagues and friends
07:32 mupuf: and donate your geforce 3 to us, :p
07:34 kingsley: mupuf: I'll try to reply to your comments in the same order as you reasonably chose.
07:39 kingsley: Yes, I believe my ~9 year old MoBo has PCI. AGP I'm not so sure about.
07:40 kingsley: I believe my GeForce card already uses an nVidia GPU, so I think it would be hard to find a cheaper one. ;-)
07:42 kingsley: mupuf: Last, but not least, there's a cool used computer parts store near where I live, and it's a pretty straight forward matter to scrounge through its bins for bargain basement hardware. Walking out of there with a handfull of $US 5 video cards is not out of the question.
07:44 mupuf: ah ah
07:45 mupuf: but yeah, PCI GPUs are hard to find nowadays
07:45 mupuf:would get a new machine at this point
07:45 mupuf: almost all of them from the recycling bin would be faster
07:46 mupuf:pesters at his skylake laptop for not compiling fast-enough
07:46 mupuf: I can't even imagine how frustrated I would be with your machine
07:53 mwk:managed to find a PCI GF119
07:53 kingsley: For what it's worth, an outfit named Raptor Engineering is crowdfunding open source computer hardware. It's called their Talos Secure Workstation. It uses IBM's Power8 CPUs, is reasonably fast, and has no suspicious "management engines" that might invade one's privacy like Intel and AMD, or my gazillion year old computer.
07:53 mwk: quite a surprise, but nice
07:53 mupuf: mwk: oh, nice!
07:54 kingsley: mwk: What's the nice surprise?
07:54 mwk: kingsley: such a new GPU on a PCI bus
07:54 mupuf: darn, that could be a good addition to reator!
07:55 mupuf: http://www.dx.com/p/nvidia-geforce-gt610-gf119-2048mb-64-bit-ddr3-pci-express-x16-graphic-card-black-157417
07:55 mwk: yup
07:55 mwk: any card that doesn't take up a PCIE/AGP slot is awesome :)
07:55 karolherbst: mupuf: how expensive :D
07:55 mwk: that one is pci-e though
07:55 karolherbst: and garbage
07:55 mupuf: $72 apparently
07:55 karolherbst: :p
07:55 mupuf: oh, right
07:56 karolherbst: 150 GFLOPS, yo
07:56 mwk: mine was made by zotac fwiw
07:57 mwk: mupuf: btw, if you have a free PCIE ×1 or ×4 slot in reator... these can also be filled :)
07:57 mupuf: oh oh
07:57 mwk: I got a PCIE×1 G86 too
07:57 mupuf: nice
07:58 mwk: it's low-profile though
07:58 mwk: but then, I doubt it's a problem for you :p
08:09 mupuf: it's great you mean
08:18 kingsley: An old thread at https://lists.freedesktop.org/archives/dri-devel/2011-March/009477.html suggests blender ignores an error returned by OpenGL when textures have more than 4096 x 4096 "bits per pixel". I can't vouch for the rest of the thread, but I also worked around the bug by lowering the image's horizontal resolution to 4096 bits. 4097 caused corruption.
09:02 mupuf: I see
09:10 mupuf: this is worse than I thought ;)
09:11 mupuf:thought blender would cut the texture into tiles on its own
09:53 MiniJack: Hi, is imirkin here?
09:57 RSpliet: Probably not, it's 6 in the morning there
09:58 MiniJack: damn, I was reading some log how this appeared after running a java application: java: pushbuf.c:727: nouveau_pushbuf_data: Assertion `kref' failed.
10:08 RSpliet: MiniJack: you can always dump your actual question here. imirkin reads logs, and so do others that might have useful input ;-)
12:05 karolherbst: uhhhhhh
12:05 karolherbst: I found what is causing that mmiotrace bug… I think
12:07 karolherbst: silly hash collisions
12:21 mupuf: :o
12:35 xexaxo1: RSpliet: the probable question/answer of the java + pushbuf.c assertion is = hello multithreaded GL, try ilia's locking branch ;-)
12:36 xexaxo1: speaking of which imirkin: as a quick check have you tried - dropping the nouveau_drm_screen_create export ?
12:37 xexaxo1: that will 'break' the gl-vdpau interop for dri2, but things should just work (at least on paper) with dri3.
13:07 karolherbst: mupuf: guess what, mmiotrace only hashes the lower 32 bits ;) I doubt though that is the issue we for example get while tracing nouveau, but yeah...
13:07 karolherbst: fun if you have multiple 1GB pages
13:15 mupuf: the lower 32 bits of what?
13:32 RSpliet: xexaxo1: ah that one, yeah...
13:32 mupuf: karolherbst: the lower 32 bits of what?
13:41 karolherbst: mupuf: the page offset
13:42 karolherbst: and then there are just 1024 values possible due to the low has table size
13:42 karolherbst: *hash
13:54 imirkin: xexaxo1: what would be the advantage of doing that?
13:57 xexaxo1: imirkin: less bug reports or crashes at least :-)
13:57 xexaxo1: either way it's not my call, just throwing some (workaround) ideas
13:57 imirkin: xexaxo1: haven't had coffee. could you explain it for a pre-coffee brain?
14:08 imirkin: (i.e. explain why this would cause less bug reports/crashes)
14:08 imirkin: s/less/fewer/
14:09 imirkin: stupid english and its mass nouns.
14:50 xexaxo1: imirkin: people will either have the dri3 working case or the dri2 non-working one.
14:51 xexaxo1: in the latter case one could just spam strerr/stdout with something like "XX not supported try YY"
14:51 xexaxo1: s/one could/mesa could/
14:54 xexaxo1:might be relying too much one people looking at those and the app not eating them ;-)
15:03 imirkin_: xexaxo1: i still don't understand what bug reports/crashes this would address
15:07 xexaxo1: reports -> closed as "not supported, support planned", crashes -> since the pushbuf is no longer shared, things should not longer crash.
15:09 xexaxo1: not 100% sure it the latter is correct, but shouldn't be too hard to test (considering one's nouveau system isn't out of commission :-\)
15:10 xexaxo1: obviously one might want to drop the symbol from the dri-vdpau.dyn and dri/vdpau.sym files...
15:10 xexaxo1: alongside demoting the symbol to private.
15:11 imirkin_: xexaxo1: i'm still unsure which issue you're talking about. the mpv thing?
15:11 imirkin_: xexaxo1: or the fact that vdpau is totally broken for nouveau right now?
15:11 imirkin_: if the latter, that's wholly unrelated to gl-vdpau interop
15:12 xexaxo1: former.
15:13 xexaxo1: at the same time, it might also help on the multithreaded GL front ...
15:13 xexaxo1: again, not 100% sure but it should be trivial to check ;-)
15:13 imirkin_: xexaxo1: so basically your suggestion is to just disable support for vdpau interop because one application happens to use it in a threaded environment?
15:14 xexaxo1: yes, nice one isn't it ;-)
15:14 imirkin_: ok. in that case, i disagree with that approach. and i don't see how dri2 vs dri3 fits into this.
15:14 xexaxo1: don't forget the MT GL wild idea ;-)
15:14 imirkin_: or were you just proposing that as a mechanism for performing the disabling of vdpau interop without killing vdpau entirely?
15:15 xexaxo1: fair enough, I m not saying you have to agree.
15:15 xexaxo1: just something to try in 2-3 mins :-)
15:15 xexaxo1:goes to prog marek to send a patch for his take on the vdpau perf. degradation topic
16:34 netz: heyo :)
17:41 netz: fnodeuser: hello :)
17:42 imirkin_: hakzsam_: can you add a 0x130 for GP100 and make sure it's listed first of the GPxxx's?
17:42 fnodeuser: hi
17:42 imirkin_: hakzsam_: that way one can do GP100-
17:42 hakzsam_: just before 0x137?
17:43 imirkin_: hakzsam_: yes. also i'd flip the order around to be the logical one :)
17:43 imirkin_: not sure why you added things backwards like that.
17:43 hakzsam_: yeah, but the current order is chronological I guess, right?
17:43 imirkin_: GP107 didn't come first.
17:43 imirkin_: GP100 did
17:43 imirkin_: and then GP104
17:43 imirkin_: and then probably GP106, and then GP107 :)
17:43 imirkin_: dunno where GP102 is in that list
17:44 imirkin_: mostly it's to have logical chip ranges in variants
17:44 hakzsam_: ah
17:44 imirkin_: so you can say like variants=":GP100" or whatever
17:44 hakzsam_: my bad
17:44 hakzsam_: I read the wrong column :)
17:44 hakzsam_: I will re-order
17:44 imirkin_: thanks
17:45 hakzsam_: so, should be 0x104, 0x106, 0x102, 0x107 according to https://en.wikipedia.org/wiki/GeForce_10_series
17:46 hakzsam_: but 0x102, 0x104, 0x106 and 0x107 makes more sense to me
17:47 hakzsam_: s/0/3 or remove 0x, but you get the idea :)
17:47 hakzsam_: imirkin_: http://hastebin.com/wibetuqicu ok?
17:49 netz: so, is FeatureMatrix uptodate for maxwell cards? installing gentoo on a spare disk for learning and fun and thinking to give nouveau a go on it
17:49 netz: specifically I have a GTX 750 ti
17:49 hakzsam_: no
17:49 hakzsam_: we need to update it
17:50 hakzsam_: tess and EXA are DONE
17:50 netz: yays.
17:51 hakzsam_: well, tess is part of mesa 13 and EXA will be pushed very soon by imirkin_ :)
17:53 hakzsam_: netz: I have just updated the page for tess
17:53 netz: hakzsam_: cool beans :D
17:54 netz: so, EXA vs XRender, since you've not mentioned the latter, what can I expect to not work optimally without XRender?
18:03 hakzsam_: no clue about that, but imirkin_ knows for sure
18:04 hakzsam_: but I guess XRender is also done
18:06 netz: oh cool :D
18:06 hakzsam_: as well as xv
18:07 hakzsam_: but you will need to build your own version of the DDX
18:07 hakzsam_: or wait for the next release
18:08 netz: ddx in this context is?
18:08 hakzsam_: xf86-video-nouveau
18:08 netz: ah gotcha.
18:09 netz: how's nouveau+wayland in general?
18:09 hakzsam_: the latest version is 1.0.13 and EXA for maxwell will be in 1.0.14
18:09 hakzsam_: I have never tested myself
18:12 karolherbst: huh, I don't get the ordering in rnndb of GP
18:12 netz: appreciate the honesty. I could probably use a dev version in gentoo, so that shouldn't be too much of an issue.
18:12 karolherbst: gp107 is clearly the youngest one
18:12 hakzsam_: karolherbst: yeah, the order is not correct, I read the wrong column when I wrote the patch :)
18:13 karolherbst: gp100, gp104, gp106, gp102, gp107 if you care about release dates
18:13 hakzsam_: I have a new one which will fix it
18:13 karolherbst: but honestly?
18:13 karolherbst: since kepler it doesn't matter anymore
18:13 karolherbst: and I would suggest to hope the same for pascal
18:13 hakzsam_: yeah, but I would prefer 102 after 100, to do GP100-
18:14 karolherbst: gp100 is the oldest anyway
18:14 hakzsam_: yup
18:14 karolherbst: so it really doesn't matter
18:14 karolherbst: why make it complicated if there is no benefit?
18:14 hakzsam_: 102 after 100 is not complicated ;)
18:14 karolherbst: true
18:15 karolherbst: but 102 was released after 106
18:15 hakzsam_: but wtvr
18:15 hakzsam_: we want to GP100- that's it
18:15 karolherbst: right
18:15 karolherbst: and I would order them numericly
18:15 karolherbst: *numerically
18:15 hakzsam_: yes
18:15 karolherbst: will check the gmail mmiotrace account, I hoipe for pascal traces
18:16 karolherbst: \o/ right pw at first try
18:16 hakzsam_: I want pascal traces with compute
18:16 netz: so regarding nvidia shelling out the firmware into the linux-firmware tree, what gpu generations/codenames/whatever does that cover?
18:17 hakzsam_: pmoreau: could you try again to record traces?
18:17 karolherbst: netz: maxwell2+
18:17 hakzsam_: imirkin_: updated
18:18 netz: karolherbst: 2+ here meaning?
18:18 karolherbst: gm20x
18:18 karolherbst: 2nd gen maxwell
18:19 netz: ah
18:19 karolherbst: :O
18:19 karolherbst: awesome
18:19 karolherbst: fermi SLI laptop traces + vbios
18:19 karolherbst: for nvc8
18:19 netz: NV117 (GM107) < too early, eh?
18:19 karolherbst: netz: you are better of without them
18:20 karolherbst: aka reclocking support
18:20 netz: oh?
18:20 karolherbst: well
18:20 karolherbst: guess what, we can reclock 2nd gen maxwell gpus
18:20 karolherbst: but guess what we can't do
18:20 karolherbst: controling fans
18:20 netz: heh.
18:20 netz:busts out the waterblocks
18:20 karolherbst: so, yeah
18:21 netz: as I mentioned in my last visit here I'm prolly gonna be jumping ship to rx480 for my next build anyways, so its moot in the long term.
18:22 karolherbst: netz: you could try to reclock your gpu though with current nouveau master
18:23 netz: sure. are you guys needing testers in general, then?
18:23 karolherbst: yes
18:23 karolherbst: maxwell reclocking support will come with 4.9
18:23 karolherbst: mhh
18:23 karolherbst: 4.10
18:23 karolherbst: and it isn'T really much tested
18:24 netz: sure indeed. I have a fair amount of free time and I'm always down to further open source :)
18:25 karolherbst: netz: what kernel are you currently running?
18:27 karolherbst: I want pascal traces, not fermi stuff
18:28 netz: karolherbst: 4.8.sommat
18:28 karolherbst: netz:
18:28 karolherbst: netz: good
18:28 netz: 4.8.4
18:29 karolherbst: netz: then you can simply use my master_4.8 branch
18:29 karolherbst: https://github.com/karolherbst/nouveau/commits/master_4.8
18:29 karolherbst: run make inside drm
18:29 netz: karolherbst: atm I'm working on the gentoo install, should I target kernel 4.8 on that, then?
18:29 karolherbst: install the nouveau/nouveau.ko file
18:29 karolherbst: netz: it isn't a kernel
18:29 karolherbst: but an out of tree module
18:30 karolherbst: ohhh
18:30 karolherbst: I see what you mean
18:30 karolherbst: yeah, 4.8 would be best
18:30 karolherbst: nouveau master usually depends on the drm-next tree, which is a little bit more tricky to install
18:40 netz: noted.
18:41 Yoshimo: pmoreau: have you played with the loop karol suggested a few days ago when we where trying the getpmu code?
19:19 dcomp: maxwell reclocking for 4.10 confirmed?
19:20 netz: dcomp: yep, according to what karolherbst told me earlier
19:20 imirkin_: 4.10 merge window isn't open yet
19:21 imirkin_: that doesn't happen until 4.9 is released
19:21 imirkin_: lots of things can happen between now and then.
20:27 kingsley: What's a good PCI Express card?
20:27 AmarokNelg: kingsley: Since you're asking here, something nvidia :P
20:29 imirkin_: kingsley: get AMD. it's well-supported, unlike NVIDIA.
20:29 kingsley: A specific model number, or ideally numbers, would be convenient.
20:30 imirkin_: kingsley: that all depends on your parameters for 'good', and for AMD, you can ask in #radeon.
20:31 kingsley: imirkin: Duly noted.
20:33 kingsley: My idea of 'good' is a video card supported by my gear: A MoBo with a PCI Express port running open source software, namely, Debian Linux and Nouveau. Bonus points for fast.
20:34 imirkin_: GTX 780 Ti is the fastest NVIDIA gpu supported by nouveau (including reclocking).
20:34 imirkin_: (fastest consumer GPU... Tesla K40/K80 might be faster, but they're just a wee bit more expensive...)
20:35 karolherbst: the 780 ti is quite fast though
20:35 imirkin_: right. if those teslas are faster, it's not by much. they might have more vram and/or ecc vram, etc
20:36 imirkin_: however i would definitely recommend getting an AMD gpu if open source support is something that's important to you.
20:37 kingsley: imirkin: I'm checking out the GTX 780 Ti's technical specifications.
20:38 kingsley: It evidently uses the PCI Express 3.0 bus.
20:38 imirkin_: yep, should be a double-width pcie x16 board
20:41 kingsley: Do you happen to know if it would be compatible with my old MoBo's PCI Express 2.0 x16 slot?
20:41 imirkin_: yes, it is.
20:42 imirkin_: it'll be (very) marginally slower than it might be on a PCIe 3.0 bus
20:42 imirkin_: but i'm 99.9% sure it'll work on a PCIe 2.0 bus
20:43 kingsley:has a vague recollection that PCI Express buses are different than PCIe.
20:43 imirkin_: PCIe = PCI Express
20:45 kingsley: imirkin: Thank you very much for taking the time to share your detailed thoughts. Your generosity and know-how are both fine qualities.
20:45 imirkin_: note that it'll require a bunch of additional power
20:45 imirkin_: so make sure your PSU is up to the job
20:45 kingsley: If all goes well, I'll get a chance to root around for one in some used hardware bins later today.
20:46 imirkin_: mmmm ... i doubt you'll find one for less than like $200 or so
20:46 kingsley: OK, thanks for the power and price tips.
20:47 imirkin_: it's the highest-end consumer Kepler card
20:47 imirkin_: which is 2 generations behind the recently-released Pascal
21:00 karolherbst: one thing more cores > higher clocks ;)
21:00 karolherbst: we have some issues with really high clocks
21:00 karolherbst: an OC 770 will most likely fail to reclock right
21:01 karolherbst: ohh, you decided on the 780 ti already :)
21:01 karolherbst: nice
21:01 karolherbst: if you have any issues to reclock on the recent master, feel free to ping me
21:02 hakzsam_: karolherbst: gm107 exposes more perf counters :)
21:02 hakzsam_: like warp_execution_efficiency
21:02 hakzsam_: I'm implementing MP perf counters in nvc0
21:06 karolherbst: yeah!
21:06 hakzsam_: http://hastebin.com/zevoxaxaxu.makefile
21:06 hakzsam_: this one is NICE
21:07 karolherbst: !!!
21:07 karolherbst: perfect for implement a scheduler
21:07 karolherbst: *implementing
21:07 imirkin_: or at least comparing scheduler perf
21:07 hakzsam_: that's exactly why I implement those perf counters
21:07 hakzsam_: yep
21:07 karolherbst: hakzsam_: mind checking the SR3/SR4 traces with that one?
21:07 hakzsam_: hey, not implemented yet
21:07 hakzsam_: WIP state :)
21:07 karolherbst: ohh I only have a SR3 trace on my google drive
21:08 karolherbst: yeah, okay, but this would be a good test
21:08 hakzsam_: sure
21:08 karolherbst: because nouveau is exceptionally bad there
21:08 imirkin_: orbea had a nice example of a situation where i saw one of those utilization metrics go to 10%
21:08 imirkin_: something in dolphin with a GS or something
21:08 karolherbst: hakzsam_: but do you know what we really need? A way to read out counters on nvidia and nouveau for the same trace and compare
21:09 imirkin_: i glanced at the shaders, it wasn't doing anything *obviously* crazy
21:09 karolherbst: mhh
21:09 karolherbst: I could imagine that zcull is really important for GS
21:09 hakzsam_: imirkin_: understanding what counters return is not trivial :/
21:09 imirkin_: hakzsam_: well ... it was at 40-50%
21:09 imirkin_: and then when the "slow" section came on, it dropped to 10%
21:10 imirkin_: i'm guessing it was related
21:10 hakzsam_: yes, but this can be related to many things
21:10 hakzsam_: that's why I think it's not easy
21:10 imirkin_: agreed
21:10 imirkin_: there are also various parameters around how things are dispatched
21:11 hakzsam_: karolherbst: with apitrace+perfkit that's doable
21:11 hakzsam_: imirkin_: yes
21:11 imirkin_: so it could just be that we're not configuring the dispatch of GS/etc optimally
21:11 imirkin_: [well, we're not configuring it at all]
21:11 orbea: imirkin_: it was the maps in metroid prime. here is the trace again http://ks392457.kimsufi.com/orbea/stuff/trace/dolphin-emu_metroid-prime-map.trace.xz
21:12 imirkin_: hakzsam_: feel free to take a look --^
21:12 hakzsam_: sure
21:12 imirkin_: and see if anything obvious pops out
21:12 hakzsam_: orbea: do you remember which metrics you tried?
21:12 imirkin_: i don't think he tried any. i did though. but i don't remember which :)
21:12 hakzsam_: oh okay
21:12 hakzsam_: you used perf counters, super-nice :)
21:12 orbea: yea, I didn't try anything like that
21:13 hakzsam_:still have to put info into the wiki
21:15 hakzsam_: imirkin_: but without graphics perf counters it's hard (still downloading the trace though)
21:15 hakzsam_: because the ones I currently expose are more compute-related
21:16 imirkin_: agreed
21:16 hakzsam_: but now we have compute shaders :)
21:16 hakzsam_: so we could eventually write piglit tests, enable AMD_perf_monitor and check the values
21:17 hakzsam_: to make sure the given counter is correctly configured
21:17 imirkin_: hakzsam_: something to figure compute shader dispatches might be nice
21:17 imirkin_: i wonder if we should just keep track of it "by hand"
21:17 imirkin_: since it's a pretty fixed number
21:18 hakzsam_: yeah
21:18 imirkin_: i.e. in ->dispatch_compute, just add to a local counter :)
21:18 imirkin_: well, screen-local
21:19 hakzsam_: I'm not 100% sure to understand what you mean. Do you want to profile compute shaders?
21:19 imirkin_: no
21:19 imirkin_: ARB_pipeline_statistics_query
21:19 imirkin_: or whatever
21:19 hakzsam_: ah right
21:19 imirkin_: introduces a counter for CS invocations
21:19 hakzsam_: that test still fails
21:19 imirkin_: which we don't have in hw
21:19 imirkin_: or at least don't know where to find
21:19 hakzsam_: we don't know :)
21:19 imirkin_: but the # of invocations should be pretty straightforward to compute
21:20 hakzsam_: yeah
21:20 imirkin_: oh hrm….. indirect dispatch can screw that up, huh
21:20 hakzsam_: well we can do that in launch_grid I think
21:20 imirkin_: what if num_groups is in an indirect buffer
21:20 hakzsam_: right
21:21 imirkin_: we could write some macro magic to accumulate it somewhere of course
21:21 hakzsam_: it's doable
21:21 hakzsam_: the best thing would be to understand how blob does it in hw
21:22 hakzsam_: but the MMT was weird...
21:22 hakzsam_: imirkin_: karolherbst full list of events+metrics on gm107 http://hastebin.com/acetevenab.nginx
21:23 hakzsam_: mupuf: you will like as well ^
21:24 hakzsam_: all the _utilization will be very nice to have
21:24 hakzsam_: but I bet I can't for now
21:31 hakzsam_: orbea: thanks for the trace, it's a very good example
21:32 mupuf: hakzsam_: you found the missing mux?
21:32 orbea: np, im glad its useful :)
21:32 hakzsam_: mupuf: yes,
21:33 mupuf: ahhh! How did you find it?
21:33 mupuf: gm107 plugged, btw
21:34 hakzsam_: looking at a MMT trace
21:34 hakzsam_: it's currently semi-working
21:34 hakzsam_: I still have to RE all perf counters
21:34 hakzsam_: and make sure my shader is correct
21:43 hakzsam_: imirkin: not sure what's wrong with that trace, I don't really have more time to investigate right now
21:43 hakzsam_: but I will store the trace somewhere :)
22:02 imirkin_: hakzsam_: ok, no worries. i have no clue what's wrong with it either. but it very clearly is wrong ;)
22:02 hakzsam_: right
22:16 karolherbst: hakzsam_: inst_issued0-2?
22:17 hakzsam_: ?
22:17 karolherbst: why are there 3 with numbers
22:17 karolherbst: I mean, single issue and dual issue makes sense, or does "0" mean, no issue?
22:17 imirkin_: yep
22:17 karolherbst: ahh
22:17 hakzsam_: yes
22:28 hakzsam_: inst_issued2 is 0 even with piano
22:28 hakzsam_: but not totally unexpected
22:29 karolherbst: on maxwell?
22:30 hakzsam_: yes
22:30 karolherbst: I see
22:30 karolherbst: mhh, maybe it has to be explicitly stated in the sched opcodes?
22:30 karolherbst: just like on kepler
22:30 imirkin_: karolherbst: it does.
22:30 imirkin_: stall 0 == dual-issue
22:30 karolherbst: ohh right
22:30 karolherbst: makes sense
22:31 karolherbst: hakzsam_: would it be possible to just implement the stall thing? Or is doing something bad _really_ bad?
22:31 hakzsam_: I already have the stall count thing
22:31 hakzsam_: but it's not enough
22:31 karolherbst: ohh
22:31 karolherbst: mhh odd
22:32 karolherbst: something else missing in the sched thing?
22:32 hakzsam_: maxwell sched codes are more complicated than kepler
22:32 hakzsam_: barriers mostly
22:32 karolherbst: yeah, I know
22:32 karolherbst: I see
22:32 karolherbst: hakzsam_: you could try to be super optimistic if stall == 0
22:32 karolherbst: and see what that does
22:34 karolherbst: imirkin_: mind keeping an eye for join instructions an check if their sched opcode is always 0x2f or sometimes also 0x4f?
22:34 hakzsam_: imirkin: outch! with glxgears: inst_issued1=5700 vs inst_issued0 = 59000
22:34 hakzsam_: ....
22:34 karolherbst: well
22:34 karolherbst: expected I guess
22:35 karolherbst: furmark would be interssting as well
22:35 karolherbst: *interesting
22:35 hakzsam_: without a scheduler it's expected
22:35 imirkin_: hakzsam_: i think our st 0x0 is actually becoming st 0xf
22:35 hakzsam_: imirkin: yeah
22:35 hakzsam_: I think as well
22:35 imirkin_: (or equivalently bad thing)
22:35 mwk: wrt inst_issued
22:36 karolherbst: hakzsam_: try furmark with this enabled: https://github.com/karolherbst/mesa/commit/e86b5c86b19978b3f839073ec6ee3a8f522b328d
22:36 mwk: you extracted these signal names from somewhere in nv driver?
22:36 hakzsam_: yes
22:36 karolherbst: this is one benchmark where nouveau is actually faster than nvidia (wth the patch, on my kepler)
22:36 mwk: then they may not mean what you think they mean
22:36 mwk: it's likely they're actually making a 3-bit number
22:36 karolherbst: mwk: the inst_issued things are right though, I can confirm this
22:37 karolherbst: not for maxwell obviously
22:37 mwk: how?
22:37 hakzsam_: mwk: I can't say they are obviously right without writing some micro benchmarks...
22:38 mwk: well
22:38 karolherbst: I wrote a pass to improve dual issueing and perf went up, inst_issued2 went up, inst_issued1 went down and 2*inst_issued2 + inst_issued1 == inst_issued
22:38 karolherbst: could be concidence
22:38 karolherbst: but it is really likely right
22:38 karolherbst: maybe we interpret the numbers wrong, or not exactly right
22:38 hakzsam_: karolherbst: coz we are lucky for those ones
22:38 karolherbst: sure
22:38 mwk: hmm
22:38 karolherbst: not saying we know for all
22:38 karolherbst: but those are right
22:38 mwk: and what about inst_issued0?
22:38 hakzsam_: but if you want to be sure all counters return what they mean, you have to write micro benchs
22:39 hakzsam_: mwk: inst_issued0 is new since maxwell
22:39 mwk: oh, hm.
22:39 karolherbst: yeah
22:39 mwk: that's interesting
22:39 hakzsam_: mwk: "Number of cycles that did not issue any instruction
22:39 karolherbst: on my kepler I only played with 1 and 2
22:39 mwk: fair enough, if the docs say so
22:39 karolherbst: mwk: read the comment https://github.com/karolherbst/mesa/commit/30eaa15e32b6141a0074c0c427b57b8afee97f71.patch
22:40 mwk: I've seen a few examples of multi-bit "count" signals, where you're supposed to do 4 * signal2 + 2 * signal1 + signal0 to get the total count
22:40 mwk: thought it could be one of those
22:40 karolherbst: it kind of is
22:40 karolherbst: if you want the total issued instructions
22:40 hakzsam_: mwk: well, some perf counters are really easy to check, like shared_store (just need to write a compute shader which writes N times) for example
22:40 hakzsam_: mwk: on fermi yes
22:40 karolherbst: total = 1*issued1 + 2*issued2
22:40 hakzsam_: mwk: some crazy modes :)
22:40 hakzsam_: mwk: LOGOP mode IIRC
22:41 mwk: yeah... and some are not :)
22:41 hakzsam_: mwk: but since kepler is easier
22:42 hakzsam_: about inst_issued0 I think it will decrease a lot with my scheduler
22:42 karolherbst: hakzsam_: ohh you have one?
22:42 hakzsam_: work in progress
22:43 karolherbst: nice :)
22:44 hakzsam_: imirkin: inst_issued1 = 197M | inst_issued0 = 1.757B
22:44 hakzsam_: awesome :)
22:44 karolherbst: :D
22:45 karolherbst: hakzsam_: on kepler you have issued_ipc though
22:45 hakzsam_: same on gm107
22:45 karolherbst: ohh wait
22:45 karolherbst: issue_slot_utilization
22:45 karolherbst: with that you can fake inst_issued0
22:45 hakzsam_: it's a metric so yeah
22:46 hakzsam_: pretty sure blob already exposes a metric with inst_issued0 vs inst_issuedX
22:46 hakzsam_: anyway, I have what I wanted :)
22:46 karolherbst: :)
22:46 karolherbst: I really would like to see the results with furmark though
22:46 karolherbst: I got it to be like 8% faster than nvidia
22:47 karolherbst: the patch freezes some games though, so I did something wrong
22:49 karolherbst: anyway, bed time
23:10 netz: shite.