00:04 karolherbst: yay void nv50_ir::CodeEmitterNVC0::setImmediate(const nv50_ir::Instruction*, int): Assertion `(u32 & 0xfff00000) == 0 || (u32 & 0xfff00000) == 0xfff00000' failed.
00:06 karolherbst: seems like it will get messy with immediates
00:30 karolherbst: gnurou: any lock ups so far on your end?
00:31 gnurou: karolherbst: you mean about the nve6 FECS timeout and RSpliet's patch?
00:31 karolherbst: also with my reclocking stuff
00:57 mwk: phew, done with instruction formats... up to v4 at least
00:57 mwk: now would be a perfect time to throw in an asm parser and test they actually work
00:57 mwk: or maybe a disassembler, I wonder which one is easier
00:59 mwk: disassembler looks simpler
01:23 mwk: and we have a disassembler
02:03 wjx: Is this trello card still relevant? "Reduce performance / refuse to boot if not all the power connectors
02:03 wjx: are plugged"
02:04 wjx: https://trello.com/c/admzDRvd
02:04 imirkin: theoretically
02:04 imirkin: in that nouveau presently does no such thing
02:06 wjx2: Will it be easy to achieve this? since it's labeled "easy"
02:07 imirkin: doubtful. i disagree with that categorization.
02:12 wjx2: Are there currently any easy task to try in nouveau driver? I want to get my hands dirty on it
02:13 imirkin: what GPU is available to you?
02:13 wjx2: GTX750Ti
02:14 imirkin: ah, a GM107
02:15 imirkin: can you briefly describe your skillset and interests?
02:19 wjx2: I have read some nouveau and drm kernel code, as well as libdrm code, I know basic concept of drm, but don't have working experience on it
02:20 wjx2: I have some kernel experience, I can do c
02:56 mwk: clang is now on board :)
02:56 imirkin: wjx2: any particular area of nouveau you're interested in addressing? feature? etc?
03:06 wjx2: imirkin: I think any area is okay, Is power management suitable for a beginer to try?
03:07 imirkin: wjx2: well, it's mostly about interest... maxwell hasn't had a lot of reclocking effort done on it afaik
03:07 imirkin: wjx2: i assume you have GDDR5 vram?
03:07 imirkin: wjx2: either way, i'd recommend you sync up with karolherbst re his latest efforts in that area
03:08 imirkin: i'm more knowledgeable in the 3d driver
03:12 wjx2: imirkin: I have interest in 3d driver too, you can be my mentor :-)
03:13 wjx2: imirkin: are there any work I can try?
03:13 wjx2: or I can help test something
03:15 imirkin: well
03:15 imirkin: one task
03:16 imirkin: that has caused everyone i've ever given it to to disappear
03:16 imirkin: is to make tessellation work on maxwell
03:16 imirkin: however i've grown wiser and will not make an attempt to hand out such a task
03:16 imirkin: it appears to be too complex
03:17 imirkin: (there are a few notes about it here: https://trello.com/c/oaOt6jdd/118-gm100-tessellation)
03:18 imirkin: i actually can't think of any great 3d tasks for maxwell... you're probably better off trying to get reclocking up and running and being serviceable
03:18 imirkin: that will probably be the most help
03:31 wjx2: imirkin: I see this trello card, but don't know where to start for this, can you give me some recommendation?
03:33 imirkin: wjx2: for which? maxwell reclocking?
03:33 imirkin: wjx2: have a look at karol's tree here: https://github.com/karolherbst/nouveau/commits/maxwell_reclocking
03:34 imirkin: not sure if he has a more up-to-date branch
03:34 wjx2: this maxwell tessellation issue
03:34 imirkin: ah, for the tess thing... ignore it :)
03:34 imirkin: like 3-4 separate people have demonstrated to me it's not an appropriate project to start with
03:41 wjx2: imirkin: ok, at least I won't disappear for taking this :-)
03:41 imirkin: famous last words...
03:41 imirkin: ... said before one disappears
03:41 imirkin: heh
03:41 wjx2: imirkin: I will take a look at the reclocking things
03:42 imirkin: cool. karol's around a lot, although he's probably asleep right now. you should have no trouble finding him
03:43 wjx2: ok,thanks!
08:08 Calinou: is there any open source game/engine using tesselation yet? :|
08:08 Calinou: haven't seen any
08:08 Calinou: only AwesomeBump is an open source program that makes use of it, but it's not a game
08:11 gnurou: karolherbst: still stable on my end
08:32 karolherbst: imirkin: mawell_reclocking branch ;)
08:32 karolherbst: wjx2: ^^
08:33 karolherbst: gnurou: awesome
08:34 karolherbst: imirkin: our vogl patch was merged :D
08:43 wjx2: karolherbst: hi, karol, I will take a look at the reclocking things, and will learn more from you :-)
08:44 karolherbst: wjx2: if you want to do something power management related I know a lot of tasks somebody could do
08:49 wjx2: karolherbst: yes, I would like to have a try, are there any beginner tasks?
08:50 karolherbst: wjx2: one is pretty easy I can think of, but nobody really got around to do that, because it isn't exactly important either
08:50 karolherbst: wjx2: on very high temperature (102°C+) the GPU automatically cuts the input clocks of the engines through something we call FSRM
08:51 karolherbst: wjx2: but currently we don't parse it and don't update the reported clocks by nouveau
08:52 karolherbst: wjx2: ohh, wait, I totally forgot: since gk110 the stuff works differently again and nobody tooke a look at this either
08:52 karolherbst: mupuf would know more about the situation here
08:54 karolherbst: wjx2: but maybe we could also do it the other way: is there any issue which annoys you the most with your maxwell card under nouveau?
08:55 wjx2: I
08:56 wjx2: karolherbst: I have a GTX750Ti, but haven't uses it for long time...
08:56 karolherbst: wjx2: oh, so you use something else as your main GPU?
08:57 wjx2: karolherbst: Currently using intel
08:58 karolherbst: wjx2: oh, so you have a desktop with an intel GPU
09:00 wjx2: yes, but I planning to plug GTX750Ti back
09:02 karolherbst: wjx2: do you have some knowledge regarding PCIe?
09:02 wjx2: some moderate knowledge
09:03 Calinou: I plan to get a Pascal card next month btw
09:03 Calinou: probably a GTX 1080
09:03 Calinou: my 570 seems to have died or something, also I want to build a new PC :p
09:06 karolherbst: Calinou: well if you get the 1080, would you mind sharing the vbios with us? :D
09:11 karolherbst: wjx2: well maybe you should start by using your card with nouveau and check what doesn't work as expected
09:11 karolherbst: wjx2: and work on that
09:13 Calinou: karolherbst: sure, I will run Antergos on the host PC
09:14 wjx2: karolherbst: ok, I will have a try
09:15 Calinou: karolherbst: not sure if I'll get a stock or custom one
09:15 Calinou: the Founders Edition is much more expensive (€700 instead of €600)
09:15 karolherbst: Calinou: doesn't matter
09:15 Calinou: I'll wait for the custom ones I guess
09:15 Calinou: ah, ok
09:15 Calinou: I want a quiet card, especially at idle
09:15 karolherbst: there will be differences and we want to know about them
09:15 Calinou: my 570 was noisy even at idle
09:15 karolherbst: that's all
09:15 Calinou: differences between what and what?
09:16 karolherbst: vbios of older cards
09:16 Calinou: right
09:16 Calinou: I think performance difference between 570 and 1080 is like 2x :p
09:17 Calinou: I'll buy 4K monitors next year too
09:35 karolherbst: Calinou: I think the difference is _way_ bigger
09:36 karolherbst: the 570 isn't really that fast
09:36 karolherbst: Calinou: the 780 Ti is already more then 3 times faster
09:36 karolherbst: *than
09:37 karolherbst: Calinou: specs:570 1405.4 GFLOPS, 1080 ~9000
09:37 Calinou: I mean, in games
09:37 Calinou: maybe 2.5x
09:37 karolherbst: yeah, me too
09:39 karolherbst: memory allone seems to be 4x faster
09:39 Calinou: also, 1080 will have 8 GB of RAM
09:39 Calinou: :D
09:40 karolherbst: yeah
09:40 Calinou: so it should be plenty for 4K gaming
09:40 karolherbst: and like super high clocked clocks
09:40 karolherbst: 1607 MHz is the base clock
09:40 karolherbst: usually before that 1GHz was considered "high" clocked
09:41 Calinou: GTX 570 is 732 MHz
09:43 karolherbst: yeah
09:44 karolherbst: and usually that's not important because more cores are important
09:44 karolherbst: but then again 480 vs 2560 cores
09:45 Calinou: memory bus is less wide on the 1080 though
09:45 Calinou: 256 bit vs 320 bit
09:45 Calinou: but that doesn't matter
09:45 Calinou: what about power usage?
09:49 karolherbst: 1080 should need less power
09:49 Calinou: good :)
09:49 karolherbst: Calinou: and the bus width is smaller, because the 1080 uses quad pumped GDDR5X
09:49 karolherbst: so 256 bit GDDR5X equals 512 bit GDDR5
09:50 karolherbst: that's the reason why there are clocked as 10GHz (4x 2.5GHz)
09:53 Calinou: also I will be able to use Vulkan :)
09:53 Calinou: but again, no open source program uses it yet :P
09:53 Calinou: and I haven't even started learning OpenGL
10:12 RSpliet: gnurou: skeggsb appears to prefer his version of the patch for simplicity. We agree to disagree on which is more elegant, but that's just a matter of preference in the end
10:13 RSpliet: I notified him though to run the new patch past you, as you tested v1 but not his v2. Logically there shoudn
10:14 mupuf: imirkin: reading one GPIO and disabling 3D accel if it is not set does seem super easy in my opinion
10:14 RSpliet: 't be a difference, but formally it's wrong to slap "tested by" on something you didn't test bit-by-bit :-)
10:14 mupuf: mwk: impressive work!
10:48 gnurou: skeggs's patch seems to work just as fine
10:53 RSpliet: gnurou: thanks
11:07 karolherbst: RSpliet: awesome work for findin this :)
11:10 karolherbst: now we need to take care of the linebuffer and reclocking should be in a really good state on kepler
11:11 mupuf: karolherbst: yeah, I would like to look into this linebuffer
11:11 mupuf: I have access to people who know this sort of things :)
11:11 mupuf: AKA, implemented it for Intel
11:11 karolherbst: ahh
11:11 mupuf: it is not gonna be the nicest :s
11:11 karolherbst: very nice
11:11 karolherbst: mhh
11:12 karolherbst: we have to somwhat know which lines to buffer, right?
11:12 mupuf: but data-driven approach, I will need to get a ton of data
11:12 mupuf: because it has so many inputs
11:12 mupuf: and conditions
11:12 mupuf: what?
11:12 mupuf: linebuffer == inline buffer
11:12 mupuf: it has nothing to do with the CRTC lines
11:13 karolherbst: ohhh
11:13 mupuf: it is before the CRTC
11:13 mupuf: it is the thing that will requests to read the framebuffer
11:13 karolherbst: I thoguth it is a buffer for the displays in teh case where we loose memory access or something
11:13 mupuf: and the watermarks are when to read and/or increase the priority
12:11 karolherbst: slowly it is getting better: https://gist.github.com/karolherbst/b9cf8ac6aad10ed28e1a8f747ad2cb7a
12:33 Calinou: does AMD still have their watermark with fglrx by the way?
12:33 Calinou: the "unofficial card/driver" one?
12:41 RSpliet: mupuf: I don't expect the configuration to be so painful, it's a matter of finding the right numbers
12:41 RSpliet: in the end, it's just a fifo
12:44 RSpliet: I'd expect scanout to have the highest priority of them all (shouldn't be harmful, it's rate limited by the pixel-clock)
12:55 karolherbst: now we are talking: https://gist.github.com/karolherbst/b9cf8ac6aad10ed28e1a8f747ad2cb7a :)
12:55 karolherbst: -0.09% sounds much better than the -0.06% I got before :D
13:06 RSpliet: karolherbst: that slight reduction in GPR is interesting
13:06 RSpliet: I take it that's for eliminating pointless branches right?
13:09 karolherbst: RSpliet: no
13:09 karolherbst: RSpliet: RA
13:09 karolherbst: that's the usual RA noise you get if you change live ranges
13:10 karolherbst: RSpliet: sometimes RA also adds movs to allign d/t/q reg uses for some instructions
13:10 RSpliet: karolherbst: yes I get that it's the RA who does the assignment, but what change gives you this noise, and why?
13:10 karolherbst: because prior new opts it layed out the stuff differently
13:10 karolherbst: RSpliet: RA isn't exactly smart when it comes to reg allocation for d/t/q accesses
13:10 karolherbst: so it just happens randomly
13:11 RSpliet: well, surely it follows a deterministic algorithm
13:11 karolherbst: right, but as I said: if you change live value ranges it changes stuff
13:11 RSpliet: sure
13:12 RSpliet: so what causes you to change live ranges?
13:12 karolherbst: I remove unary ops
13:12 karolherbst: or well
13:12 karolherbst: DCE removes them
13:13 RSpliet: oh I thought you were showing the results of the "eliminate empty basic blocks" pass
13:14 karolherbst: noo. this pass is pointless
13:14 karolherbst: :D
13:14 karolherbst: RSpliet: example where GPR count increases: https://gist.github.com/karolherbst/4a19f2c39cd03b4e4d6d4900bcddf166
13:14 karolherbst: BB:14 gets completly optimied away
13:15 RSpliet: keep that on-line for a bit longer please, I'll read through it tonight
13:15 karolherbst: and the phis in BB:16 replaced by slcts
13:15 RSpliet: tnx
13:15 karolherbst: and then RA simply allocates registers differently
13:15 karolherbst: huh
13:15 karolherbst: slct u32 $r8 eq $r5 $r63 $r0...
13:15 karolherbst: yeah well
13:17 karolherbst: slct u32 $r8 eq $r5 $r63 $r0 => max $r8 $5 $r63?
13:17 karolherbst: ohhh no, it is the other way around
13:17 karolherbst: if $r0 is 0 $r5 gets set, otherwise 0
13:26 karolherbst: huh... that's super odd
13:26 karolherbst: hurt gpr ../nvidia_shaderdb/bioshock_infinite/1958.shader_test - 0 22 -> 23
13:27 karolherbst: but
13:27 karolherbst: $r9 is the highest gpr accessed
13:28 karolherbst: ohh right
13:28 karolherbst: SSO
13:38 RSpliet: Sanitary Sewer Overflow?
13:38 karolherbst: seperate shader objects
13:39 karolherbst: if there is only a vertex shader in the shader_test and SSO isn't explicitly enabled, then a lot of outputs are just dropped
13:39 karolherbst: and a lot of code can be optimized away
13:39 karolherbst: that's why my empty branch elim pass removed like 10% away in those
13:39 karolherbst: because they needed SSO and shader-db-run can't handle that
13:40 RSpliet: ah
13:40 karolherbst: well there is a patch which scanes the extensions declared int he [require] part
13:40 karolherbst: but piglit uses SSO ENABLED and the patch the extension
13:40 karolherbst: but I think there will be a proper patch in the future
13:41 karolherbst: because two other devs ran into the same trap as me
13:41 karolherbst: optimizing empty branches away and wondering why it doesn't affect the applications allthough 10% instructions where cut away
13:42 karolherbst: mhhh
13:42 karolherbst: mov with mod encountered
13:42 karolherbst: yeah well..
13:47 karolherbst: mov u32 %r1107 f32 neg %r1089 hehe
13:52 karolherbst: " slct ftz u32 %r1108 ge f32 neg %r1090 %r1090 %r1089 (0)" ==> "mov u32 %r1108 f32 neg %r1090"
13:52 karolherbst: by AlgebraicOpt
13:52 karolherbst: mhh
13:52 karolherbst: maybe it is my fault
13:53 karolherbst: nope, it isn'T
14:22 Tom^: would it be much work implenting a vram usage into nouveau's uh "gallium hud support"
14:29 chithead: Tom^: doesn't GALLIUM_HUD="requested-VRAM" work?
14:30 Tom^: chithead: https://gist.github.com/anonymous/a866184a372de1a415070f198901bb4e i dont have that :(
14:37 chithead: interesting. this is what I get on r600g http://dpaste.com/38P4DMS
14:38 chithead: maybe this is related https://lists.freedesktop.org/archives/mesa-commit/2013-January/041536.html
14:39 chithead: oops wrong link, https://cgit.freedesktop.org/mesa/mesa/commit/?id=fb5cf3490ebbc173211b6c04c869e3fb9f4dbecc
14:43 imirkin: chithead: those are radeon-specific things
14:43 imirkin: Tom^: we kinda have that already i thought... might need a debug build to see it though.
14:43 Tom^: ah hm
14:44 imirkin: (one of the driver queries)
14:44 Tom^: now that you mention it you told me this before :P
14:44 chithead: imirkin: yes, but that might give an idea how much work it would be
14:44 Tom^: debug enables bunch of more "gallium huds"
14:44 imirkin: Tom^: a bunch more metrics... but yes.
14:50 Tom^: imirkin: shared_load and/or local_load perhaps?
14:51 Tom^: or _store hm
14:51 imirkin: ?
14:52 Tom^: with debug i get these metrics https://gist.github.com/anonymous/51f5212f8ab3461d9276b5e413e2669f
14:52 imirkin: drv-tex_obj_current_bytes and drv-buf_obj_current_bytes_vid
14:52 imirkin: those two == vram usage
14:52 Tom^: ok
14:53 imirkin: and drv-buf_obj_current_bytes_sys == gart
14:53 Tom^: time to hunt the fps drops in cs:go
14:53 imirkin: i thought they got magically fixed?
14:54 Tom^: so far with its ingame console commands ive sort of managed to figure out that it only thinks i got a total 512mb vram, and if i set to high settings i go above it. and thats when things starts to lag
14:54 Tom^: and the reason it magically got fixed was because i ran at lower settings and never went above it
14:54 imirkin: hehe
14:57 Tom^: imirkin: if im allowed to wildy theorise its a bit related to the same thing where you have to load each scene in unigine heaven for shaders to "cache" or load
14:58 Tom^: imirkin: when cs:go starts running above the limit it thinks is 512mb it starts dumping and loading frequently to sort of save itself
15:18 Tom^: imirkin: which perhaps a disk shader cache could solve. *shrug* enough theorising. :P
15:19 imirkin: yeah... that's pretty unlikely.
15:19 imirkin: shader cache helps if time is being spent compiling shaders
15:21 Tom^: yea and thats what i thinking it does when it goes to its limit. unloads models and textures and what not. loads them backup when used and drops something else but then requires to spend a little time compiling shaders again
15:22 Tom^: idk, i barely know what gets placed in vram or what shaders is. haha il just stop.
15:30 Tom^: but then again valve should just fix cs:go to properly detect my vram.
16:02 imirkin_: Tom^: models and textures are things that are stored in objects in vram
16:02 imirkin_: Tom^: shaders, while also stored in vram, make up like ... 0.00000001% of it all
16:03 Tom^: i see
16:05 imirkin_: we allocate a fixed ... 64k? area for shaders
16:05 imirkin_: if that ever overflows, we evict 'em all and reupload what's needed
16:05 imirkin_: perhaps you're hitting that limit
16:05 imirkin_: since a reupload would cause a stall
16:06 Tom^: would all hell break loose if i increase that area to like 128k or similiar
16:06 imirkin_: nope
16:06 imirkin_: it's just wasted vram in many cases
16:06 imirkin_: this is esp important if you have, say, 128MB of vram
16:06 imirkin_: you don't want to have 1MB of it go to shaders that never end up getting used
16:07 Tom^: yea
16:07 imirkin_: when you have 4GB, it's easier to make that sacrifice
16:07 Tom^: where is the limit set in the code? and il fiddle around a bit
16:08 Tom^: or well "limit" rather the allocation size.
16:08 imirkin_: nvc0_screen.c
16:08 imirkin_: ret = nouveau_bo_new(dev, NV_VRAM_DOMAIN(&screen->base), 1 << 17, 1 << 20, NULL,
16:08 imirkin_: i guess we alloc 1MB already
16:09 imirkin_: but the sky's the limit ;)
16:09 Tom^: :)
16:09 imirkin_: just make sure you also adjust the nouveau_heap_init right below it
16:09 Tom^: ok
16:09 imirkin_: (and the 1<<17 is 128kb to make sure it gets alloc'd to a large page)
16:17 karolherbst: imirkin_: maybe we should increase the space for shaders? Because in eon based games I usually hit this
16:17 imirkin_: karolherbst: in actuality, what we should do is just double it if we ever hit it, dynamically
16:17 karolherbst: imirkin_: and always reserve 0.01% of entire at minimum
16:18 imirkin_: since our genius plan to dealing with it is just evicting everything
16:18 imirkin_: we might as well just allocate a new bo and move on with life
16:18 imirkin_: instead of trying to synchronize with the old one
16:18 imirkin_: and as we do that, we may as well make one that's 2x as big
16:19 karolherbst: yeah, maybe
16:19 karolherbst: but maybe we can reserve a reasonable amount from the beginning if the gpu has a lot of VRAM already
16:19 imirkin_: unfortunately there are a few annoyances here... i think we probably have to still throw in a synchronize
16:19 imirkin_: er, serialize
16:19 imirkin_: karolherbst: 1MB is reasonable.
16:20 karolherbst: maybe it was 5 years ago
16:20 karolherbst: it will only get worse
16:20 karolherbst: and in 5 years we might have to evict all shaders in every game
16:20 imirkin_: there are a few confounding factors
16:20 karolherbst: I would say min(1MB, 0.1% of VRAM)
16:20 imirkin_: one of them is that i think a bunch of these "different" shaders are actually identical
16:21 imirkin_: and it would behoove us to identify that
16:21 karolherbst: mhh right
16:21 karolherbst: that would make sense
16:21 imirkin_: since it would mean (a) fewer compiles and (b) less shader uploading/switching/etc
16:21 karolherbst: espcially with all those SSO shaders
16:21 imirkin_: actually, it's the non-SSO shaders that concern me more
16:22 karolherbst: funny that it only ever happen to me in eon games
16:22 karolherbst: and they use SSO
16:22 imirkin_: since every time you take a non-SSO fs and link it with a different vs (or vice-versa)
16:22 imirkin_: that generates a fresh fs
16:22 imirkin_: well, maybe there's some additional idiocy going on there too
16:22 karolherbst: mhh
16:22 imirkin_: like... maybe they make multiple copies of identical sso shaders
16:22 imirkin_: that'd be dumb, but certainly within the limits of their stupidity
16:22 karolherbst: with ST_DUMP_SHADERS usually there are like 6k+ shaders
16:22 karolherbst: and most of them are dups
16:23 imirkin_: ok, so they're just assholes then
16:23 karolherbst: most likely
16:23 imirkin_: or... we do something dumb in mesa
16:23 karolherbst: we
16:23 karolherbst: ll
16:23 imirkin_: there's a world of possibilities :)
16:23 karolherbst: I think when they get combined
16:23 imirkin_: someone should look at what's actually going on
16:23 karolherbst: or maybe the engines don
16:23 karolherbst: 't care, because nvidia handles that already
16:23 imirkin_: rather than blindly go about increasing limits
16:23 karolherbst: they also compile/link every frame
16:24 karolherbst: so I think they are just assholes here
16:24 imirkin_: maybe, maybe not. someone will have to look at a trace and analyze.
16:24 karolherbst: and if you look at the 5000 most CPU expensive gl calls, 60% of the top 500 are compiles/links
16:24 imirkin_: i def don't want to go around blaming until i've taken a look
16:25 hakzsam: this eviction issue can be reproduced with a tomb raider trace IIRC
16:25 imirkin_: if you want to go around blaming, that's fine by me ;)
16:25 karolherbst: and then 2k are like glClear
16:25 karolherbst: hakzsam: any eon game does, too
16:25 hakzsam: okay
16:25 hakzsam: maybe we should improve that
16:26 karolherbst: imirkin_: well I ran my sr3 trace through the cpu profiler of apitrace
16:26 imirkin_: right
16:26 imirkin_: but is it mesa's fault or sr3's fault?
16:26 karolherbst: they just compile a lot
16:26 karolherbst: like I said: in every frame there are compiles and links
16:26 imirkin_: ok
16:26 karolherbst: so either mesa recompiles too much
16:26 imirkin_: that's not necessarily bad.
16:26 karolherbst: or they reupload new shaders every time
16:26 imirkin_: right. so like i said... someone will have to look at the trace to see what's going on.
16:27 imirkin_: (i guess i should just start hitting up-up-up-up-enter)
16:27 karolherbst: imirkin_: like something like this: https://cgit.freedesktop.org/mesa/mesa/commit/?id=595d56cc866638f371626cc1d0137a6a54a7d0f8
16:28 karolherbst: but in this case the divinity engine replaces valid shader code with some stupid hex strings like "423acd3d320d93028d0d8c90e09d" or something
16:28 karolherbst: and mesa tried to recompile and linking failed
16:29 imirkin_: no part of what you're saying is "i analyzed the trace in detail and here is exactly what's going on"
16:29 imirkin_: now, you may not be inclined to do that, which is fine. but you can't propose solutions when you don't know the problem.
16:30 karolherbst: right, didn't got into detail with the eon games
16:33 karolherbst: mhh apitrace overhead is damn high
16:33 karolherbst: but replaying the trace I made with nouveau on nvidia results in missrendering
16:33 karolherbst: and still poor performance
16:41 Tom^: imirkin_: since its a bit unreliable to test it i cant say defenitely but setting it to 1 << 22 and an entire DM game i didnt get the ocassional microstutter nor the major fps drops. but il continue playing see if it comes back
16:41 imirkin_: Tom^: yes... continue, uh, "testing" :)
16:41 Tom^: :p
16:47 Tom^: ah there we go, second game. fps drop back :P. time to increase it to something obscene then
17:03 karolherbst: ohh nice, found another bug after running some opts multiple times
17:08 karolherbst: Tom^: do you also have the feeling that loading times are faster with a bigger area for the code?
17:09 Tom^: no idea, bit hard to tell when the game is on a ssd
17:09 Tom^: it loads quite fast already
17:21 Tom^: yea even tho 1 << 29 it starts lagging after ~2 deathmatches
17:22 imirkin_: right, so ... the underlying problem needs to be fixed
17:22 imirkin_: rather than just increasing the count
17:22 Tom^: right, i need more vram.
17:22 Tom^: xD
17:22 imirkin_: hehehe
17:22 karolherbst: mh "slct ftz u32 %r917 lt f32 neg %r910 %r910 %r888" => "neg ftz u32 %r917 f32 %r910"
17:23 karolherbst: that looks somehow wrong
17:23 imirkin_: depends what %r888 is
17:23 karolherbst: a mul
17:24 karolherbst: ohh I think I know what is goind on
17:25 karolherbst: "slct ftz u32 %r917 lt f32 neg %r910 %r910 %r888" => "mov ftz u32 %r917 f32 neg %r910" => "neg ftz u32 %r917 f32 %r910"
17:25 karolherbst: yeah, can't do the first opt if there are mods on the sources
17:25 karolherbst: *different
17:26 imirkin_: right.
17:27 karolherbst: AlgebraicOpt::handleSLCT is a bit too optimistic
17:27 imirkin_: sounds like it.
17:27 karolherbst: yeah, I already had to add stuff in it today
17:28 karolherbst: because it generated movs with mods on the source
17:28 imirkin_: SLCT's don't really come up that often
17:28 karolherbst: I know
17:28 imirkin_: did it not allow mod's before and you allowed them?
17:28 karolherbst: no
17:28 imirkin_: hm ok
17:28 karolherbst: I translated the mod to the op
17:28 karolherbst: so instead of mov neg it did a neg
17:28 imirkin_: right
17:28 imirkin_: if both ops have the same mod it's ok though :)
17:28 imirkin_: er, both src's
17:29 karolherbst: if (slct->src(0).mod != slct->src(1).mod) return
17:29 imirkin_: yup
17:29 imirkin_: hm
17:29 imirkin_: looking at target_nvc0
17:30 imirkin_: { OP_SLCT, 0x4, 0x0, 0x0, 0x0, 0x6, 0x2 }, // special c[] constraint
17:30 imirkin_: which means that neg should only ever be allowed on src(2)
17:30 imirkin_: and no other mods allowed
17:31 imirkin_: and as expected, those mods do nothing in emit
17:31 karolherbst: even neg?
17:31 imirkin_: so... yeah. it's just a highly disallowed situation
17:31 imirkin_: which i'm guessing you created
17:31 imirkin_: neg only allowed on src(2)
17:31 karolherbst: I meant the emit drops the mods
17:32 karolherbst: ohh the disallowed to nothing
17:32 karolherbst: mhh no, actually my pass is fine
17:32 imirkin_: does your pass check that there are no mods?
17:32 karolherbst: just when I run another algebraic opt after it, it causes missrendering
17:32 imirkin_: ah, so you're feeding unexpected data in.
17:33 karolherbst: yeah
17:33 karolherbst: but good to know that only neg is allowed
17:33 karolherbst: because I also allow abs currently
17:33 imirkin_: and only on src(2)
17:33 imirkin_: not on src0/src1
17:33 karolherbst: same goes for selp I geuss?
17:33 imirkin_: i would assume so
17:34 karolherbst: okay, then only movs/negs are okay :) on source 2
17:34 Calinou: why are you doing so much assembly, I thought graphic drivers used C?
17:34 Calinou: is it for shaders?
17:35 imirkin_: Calinou: the graphic drivers, written in C, have to emit shader code that the GPU can execute.
17:36 imirkin_: which means compiling the GLSL or whatever input into the GPU's ISA
17:36 imirkin_: the bulk of modern graphics drivers is the compiler.
17:36 Calinou: ah
17:36 Calinou: I dream of the day all software can be written in Python :>
17:36 Calinou: that'd need CPUs like 10 times faster than today
17:37 imirkin_: unlikely
17:37 imirkin_: because then someone would write software in C and it'd be 10 times faster.
17:37 karolherbst: :D
17:38 karolherbst: (or results in better battery lifetime)
17:39 karolherbst: imirkin_: but I already had enough fun with immediates today, because they can also only be emited on source 2 and only a limited range of values
17:40 karolherbst: selp and slct are really hard to deal with :/
17:41 karolherbst: but I think that is to be expected on instructions with three sources
17:41 imirkin_: karolherbst: look at the target tables which describe all this stuff
17:41 karolherbst: imirkin_: I use target->insnCanLoad already
17:41 imirkin_: ah ok
17:41 imirkin_: that should be sufficient
17:41 karolherbst: yeah
17:41 imirkin_: if it's not, improve it :)
17:42 karolherbst: it is just messy that you need to create the instructions before
17:42 imirkin_: well it's designed for a diff usecase
17:42 karolherbst: would be nice to add an overload with op/type
17:42 imirkin_: feel free.
17:43 karolherbst: yeah, you would remark that in the patch anyway, because I create the instructions, when something fails => I delete it again :D
17:43 imirkin_: but really you shouldn't be worrying about stuff like that in algebraicopt
17:43 imirkin_: never merge it in
17:43 imirkin_: create the op
17:43 imirkin_: and then loadpropagation will fix it right up
17:43 karolherbst: hu? No, the SEL Pass I write is a completly new one
17:43 karolherbst: it's too complex to merge the opt in the other ones
17:44 imirkin_: right...
17:44 karolherbst: it is also more of a phi -> selp/slct pass
17:44 imirkin_: you're trying to do too much at once.
17:44 imirkin_: don't worry about merging the values directly into the slct
17:44 imirkin_: throw in a mov
17:44 imirkin_: and then do a single pass of the LoadPropagation thing
17:44 imirkin_: and if that feels like merging things in, great
17:44 imirkin_: otherwise wtvr
17:44 karolherbst: mhh
17:45 karolherbst: I create a selp first anyway
17:45 karolherbst: this already made the pass much easier
17:45 karolherbst: the selp=>slct is just on top of that
17:45 imirkin_: ah ok
17:45 karolherbst: phi => selp is just used to merge the sources in a non phi way and the selp=>slct part is used to merge the predicate away
17:46 karolherbst: the problem with the phi=> selp part is, that if I do it too often, I might add new instructions
17:46 karolherbst: because selp=>slct isn't always possible
17:47 karolherbst: imirkin_: the AlgebraicOpt::slct fixup: https://github.com/karolherbst/mesa/commit/73082a0e1b6b46a436173e4cd4ee174cc9a965a2
17:48 karolherbst: I've added the "slct->src(0).mod = slct->src(0).mod ^ Modifier(slct->op);" part in case of two mods being present
17:48 karolherbst: no idea how "slct->src(0).mod.getOp();" handles if the source has like neg abs
17:50 karolherbst: Calinou: by the way, even python gets translated into some sort of ISA and they need to do a lot of optimizations to have a fast JIT
17:51 karolherbst: Calinou: just look at how insane all those javascript JITs things are
17:51 imirkin_: karolherbst: it makes it into a OP_CVT
17:52 karolherbst: and then Java also does stuff like hot patching to create more representations of the same method if one method is called with the same set of parameters to further optimize
17:52 karolherbst: imirkin_: ahh, okay
17:53 karolherbst: I am still a bit dissappointed though: https://gist.github.com/karolherbst/b9cf8ac6aad10ed28e1a8f747ad2cb7a
17:53 karolherbst: :/
18:07 mupuf: karolherbst: less hurts!
18:07 imirkin_: less helps too
18:07 imirkin_: a bit shy of the original -5% estimate ;)
18:08 karolherbst: mupuf: the amount of hurts only came because I enabled immediates and more distance between phi and sources
18:09 karolherbst: mupuf: last version was around -0.06% without hurts
18:09 karolherbst: imirkin_: :D
18:09 karolherbst: imirkin_: well, the nvidia GPUs have those fancy predicated instructions
18:09 karolherbst: main reason why this pass doesn't do much
18:09 karolherbst: can't remove any branching :/
18:10 karolherbst: imirkin_: which -5% estimate do you mean? :D
18:10 mupuf: karolherbst: the first numbers you gave us
18:11 karolherbst: mupuf: do you mean from the empty branch elim Pass?
18:12 karolherbst: aka this https://gist.github.com/karolherbst/9806873e8d843cbead3635b9eb49d7cb
18:13 mupuf: yeah, looks about what I remembered
18:13 mupuf: and what I think Ilia remembers too
18:14 karolherbst: well this is a completly different thing
18:14 karolherbst: also
18:14 karolherbst: those were SSO shaders and shader-db doesn't handle it
18:14 karolherbst: so all vertex exports were removed
18:14 karolherbst: and a bunch of code can be removed
18:14 karolherbst: so it was basically wasted time
18:15 karolherbst: mupuf: the new numbers are form a phi -> selp/slct pass
18:15 karolherbst: like the SELPropagation pass for intel
18:15 karolherbst: *SELPeephole
18:15 karolherbst: basically to optimize value = cond ? v1 : v2; away
18:15 karolherbst: or into something smarter than branched code
18:16 mupuf: ok!
18:20 karolherbst: mhh funny, now that I don't cover the air holes on the bottom of my laptop, my CPU is like 8°C cooler
18:22 mupuf: how surprising
18:23 karolherbst: but seriously, why are those on the bottom
18:23 karolherbst: on a laptop
18:45 Tom^: karolherbst: because aesthetics, not logics.
18:49 karolherbst: nah, the eye candy of that laptop is somewhere else :D
18:50 mupuf:agrees
18:50 mupuf: having seen the laptop in question, I can attest that it should not even be called a laptop
18:50 mupuf: unless you are OK with having a skin cancer in the next few months
18:52 karolherbst: :D
18:54 mupuf: it is more like a transportable tabletop :D
18:55 mupuf: I would make the same choice if I could not have both a laptop and a desktop pc
18:55 mupuf: but luckily, I seem to get laptops every 3 years
18:55 mupuf: from school or from work
18:56 mupuf: that's how I got started with nouveau btw
18:56 mupuf: my school-provided laptop had an nvidia gpu
18:56 mupuf: and /me wanted to have fun with the kernels
18:56 mupuf: and the nvidia driver was getting in my way
18:57 karolherbst: yeah, well desktop+performant laptop is a bit more expensive in total
18:57 mupuf: yep
18:57 karolherbst: and I wanted to spare me the hassle of a synching data
18:57 karolherbst: *-a
19:03 karolherbst: but as a bonus I have a removeable CPU and GPU, so I don't need to replace everthing when something breaks.. well getting a new motherboard might be tricky, but besides that
19:04 imirkin_: or a new screen
19:04 mupuf: imirkin: don't jinx it!
19:04 Tom^: you guys seen this btw, MSVC new invention. https://0x0.st/Nxu.png lets add phone home to everything compiled.
19:05 mupuf: Tom^: I do not care about how insane MSVC is as long as it follows the code standard
19:05 mupuf: don't use a proprietary compiler, it is worse than a proprietary applicatoin
19:06 Tom^: indeed
19:07 karolherbst: imirkin_: well a screen is usually replaceable except when the manufacturer is an asshole :D
19:07 karolherbst: mupuf: performance of ICC though :O
19:09 karolherbst: mupuf: but what if every gcc in binary form has already a backdoor which gets added everytime gcc gets compiled :O
19:10 karolherbst: Tom^: #include "stdafx.h"
19:10 mupuf: karolherbst: hehe
19:10 karolherbst: Tom^: that's why I wouldn't trust this picture anyway
19:11 karolherbst: mupuf: this is a serious problem though
19:15 Calinou: I like having a fast laptop, it's good for developing games, or just gaming, or compiling software
19:15 Calinou: too bad quad-core i7 + Intel graphics laptop are very uncommon
19:15 Calinou: (that is, without a dedicated card)
19:16 karolherbst: Calinou: iris
19:16 karolherbst: but yeah
19:16 karolherbst: there are only a few
19:16 Calinou: this laptop seems nice, but a bit expensive: www.amazon.com/ZenBook-UX501VW-Touchscreen-i7-6700HQ-Thunderbolt/dp/B01CQRNBJG
19:16 Calinou: (but apparently, the battery life is decent for such a powerful laptop)
19:16 Calinou: much better than my Acer according to reviews
19:20 karolherbst: a bit expensive for the perf
19:20 karolherbst: ohh 4k and touch
19:20 karolherbst: well
19:23 karolherbst: but maybe the 960M is good enough, because you might be able to fully reclock it with nouveau now
19:23 karolherbst: but then a 780M is usually faster (and more stable)
19:39 mupuf: karolherbst: env_dump support to dump all the information from hwmon? :p
19:39 mupuf: now, the problem is ... that's a lot of data!
19:39 mupuf: and then, what do we do with it?
19:40 mupuf: so. instead, maybe, I would need to create one file per GPU used
19:40 mupuf: and one file for the cpu
19:40 mupuf: the problem is to know what driver to look for
19:41 mupuf:did not think about that at 3am a few days ago
19:43 mupuf: well, I will do something smarter later
19:45 karolherbst: mupuf: isn't hwmon data like never constant?
19:45 mupuf: sure, it is dumping it every 100ms
19:45 karolherbst: ahh
19:45 mupuf: metrics support
19:45 karolherbst: right
19:45 karolherbst: but why put this into env_dump?
19:45 karolherbst: ohh
19:46 karolherbst: to time start/stop?
19:46 mupuf: yeah, but also to be able to dump the data of the stuff you actually use
19:46 karolherbst: mhh right
19:46 mupuf: temperature and fan speed is FYI
19:46 mupuf: but power and cpu usage should not be
19:46 karolherbst: well one issue
19:46 karolherbst: you might have multiple hwmon entries providing the same data
19:46 mupuf: ?
19:47 karolherbst: my sensors output for example: https://gist.github.com/karolherbst/8c62396f175808144f6e048939545ce2
19:47 karolherbst: and I have 4 hwmon entries for real
19:48 mupuf: yep, same here
19:48 mupuf: but... what do you want to do here?
19:48 karolherbst: but I meant that like 3 entries could provide the same data
19:48 karolherbst: but with a difference accuracy
19:48 mupuf: I could always pick the highest temperature of all and write this one down
19:49 mupuf: yes
19:49 karolherbst: highest value has one problem: maybe onw of those entries is really slow in updating the value
19:49 karolherbst: and then you collect garbage
19:50 mupuf: we are talking about temperature here, still not sure what to do with it
19:50 mupuf: I could collect alerts though
19:50 karolherbst: no, I would just collect it like power consumption
19:50 karolherbst: just collect everything and let the user decides what to do with that
19:50 karolherbst: maybe somebody wants to benchmark temperature with window opened vs window closed or something stupid like that
19:51 mupuf: yeah, but I also want to bisect based on some metrics
19:51 karolherbst: or maybe somebody wants to tweak the fans and let them run slower
19:51 karolherbst: mhh
19:52 mupuf: cpu usage and power efficiency being two
19:52 karolherbst: well, temperature is a data point which is really hard to deal with usually
19:52 karolherbst: because there are too many factors from outside affecting it
19:52 mupuf: yeah, only useful to debug why stuff was slow
19:52 karolherbst: right
19:52 mupuf: it is also not instantaneous
19:52 karolherbst: maybe it is enough to just collect throttles
19:52 mupuf: unlike cpu and power usage
19:52 karolherbst: right
19:53 karolherbst: but in any case, there is a better way to collect throttles
19:53 karolherbst: or time throttled
19:54 karolherbst: well I can't think of anything usefull currently
19:55 karolherbst: I would just leave it as a data point you collect, but maybe don't use it for bisecting stuff, because there is always better data to do that
19:57 mupuf: well, I can always filter by unit
19:57 mupuf: never bisect on anything else but cpu usage or power
19:57 mupuf: well, cpu usage would have the unit %, which is not really helping
19:58 mupuf: but it may be good-enough
19:58 mupuf: time will tell!
19:58 TheRealJohnGalt: Is the 460se supported by nouveau? Also does anyone know where I can check? I'm asking for a friend.
19:58 mupuf: 460se? never heard of it, but yeah, it should
19:58 mupuf: it would be Fermi or kepler
19:59 TheRealJohnGalt: And DRI3 support?
19:59 mupuf: I forgot if we support DRI3 actually
20:03 Calinou: karolherbst: thing is, 970M/980M are almost only in gaming laptops
20:03 Calinou: with shit battery life, usually TN displays…
20:29 dcomp: I think I may have found something in the gm108 mmiotraces
20:30 dcomp: they seem to write to 10a1c0 ... of which I can only find reference to in the gm200 devinit
21:11 dcomp: i give up, looking at nvbios
21:11 dcomp: MEM TYPE table at 0x79e, version 10, 16 entries
21:11 dcomp: Detected ram type: DDR2
21:11 imirkin_: that's all lies though
21:12 imirkin_: the way nvbios decides is based on additional files
21:12 imirkin_: which you probably didn't provide
21:13 dcomp: im pretty sure the mmiotrace is loading pmu data
21:13 imirkin_: probably.
21:13 dcomp: but for the life of me cant figure it out
21:15 karolherbst: dcomp: try my new maxwell_reclocking branch
21:15 imirkin_: karolherbst: oh, did you talk to that guy with the GM107 who wanted to work on kernel stuff?
21:15 karolherbst: imirkin_: a little
21:16 karolherbst: imirkin_: I told him to get the card working with nouveau first and find issues which annoy him most :D
21:16 karolherbst: but it is hard to find good and easy task for the gm107
21:16 karolherbst: because everything is still somewhat messy