06:16 imirkin: skeggsb: what do you think about https://bugs.freedesktop.org/attachment.cgi?id=119726 ?
06:16 imirkin: someone said it helps a bit, but ... not completely
06:18 imirkin: iirc there were a ton more differences between nouveau and xf86-video-nv...
06:18 imirkin: esp as it related to those IGPs
06:19 imirkin: oooh - just saw you pushed out a fix for third DP port on NVS 510 - iirc there was a bug about that one
11:16 RSpliet: mupuf: what do you currently have in the reator?
11:18 karolherbst: RSpliet: last time I checked it was gk110 + either gf119 or something else
11:19 RSpliet: ah! ... hmm, I could use access for a good 7 seconds :-P
11:21 mupuf: RSpliet: you know what to do, right?
11:21 RSpliet: mupuf: not really
11:22 RSpliet: does it involve dancing?
11:22 mupuf: ok, I need to create an account so you can boot/shutdown the machine
11:22 mupuf: but I can boot it for you now
11:26 RSpliet: thanks
11:28 mwk: mupuf: next project, create a remotely controlled robotic arm to change PCIE cards
11:29 mupuf: mwk: hey hey :D
11:29 mupuf: but it would be less problematic to just have more machines
11:29 mupuf: I freeed up some space the cupboard that holds most of the electronics
11:30 mupuf: I should have space for a TX1 and a few laptops
11:30 mupuf: that will allow us to connect to a lot of machines already
11:30 mupuf: and although I plan to use them for CI, we could always use them when necessary
11:30 mwk: CI is awesome :)
11:30 mupuf: hehe, right
11:31 mupuf: especially when it has auto bisection :p
11:31 mupuf: which I have :)
11:31 RSpliet: mupuf: here's some inspiration for such a robot arm: https://www.youtube.com/watch?v=Q0Ca5JiEd5I&t=390s
11:38 mupuf: RSpliet: oh yeah :D
11:39 mupuf: prodigy + hackers => so 90s :D
11:44 mwk: mupuf: have you seen Kung Fury? :)
12:16 mupuf: mwk: oh yeah!
12:23 karolherbst: mupuf: I can give you mine laptop for mxm cards when I get a new one :p
13:38 RSpliet: mwk: you wouldn't happen to know whether the MMCTX fifos attached to the FECS and GPC falcons write directly to memory, or rather through an intermediate last-level cache?
13:40 mupuf: karolherbst: would be nice
13:49 karolherbst: mupuf: well I am sure it won't happen this year though :D
13:50 karolherbst: also replacing those mxm cards is a bit... tricky, because my cooling system can handle around 70W and there are much stronger cards out there
14:18 mwk: RSpliet: I'm going to guess they have a store queue and work asynchronously...
14:18 mwk: but a cache as in "read through MMCTX will return stale data", I doubt that
14:19 mwk: for one, I'd guess you need a STOP_TRIGGER to flush the queue
14:57 RSpliet: mwk: hmm, I sort of assumed the MMCTX fifo's are the store queue. Gather writes until they are big enough for a burst write operation and fingers crossed that the row was still open... it would sort of explain why a ctxswitch takes 25μs for 68KB
15:00 mwk: yeah, that's how it works, probably
15:13 Leftmost: Haven't had a chance to work on GM tess this week, but I haven't run away screaming yet.
15:54 gregory38: Hello
15:54 tobijk: so continue :D
15:55 gregory38: lol
15:55 gregory38: I'm trying to profile a bit Nouveau with my app but I have the feeling that GPU reclocking doesn't work as expected
15:55 gregory38: I have a kepler GPU and a kernel 4.4, I booted with the option nouveau.pstate=1
15:55 gregory38: And then echo "0f" > /sys/class/drm/card0/device/pstate
15:55 karolherbst: gregory38: check dmesg
15:55 karolherbst: gregory38: do by any chance get a voltage error -22?
15:55 gregory38: yes
15:56 karolherbst: okay
15:56 karolherbst: then you want to use my nouveau tree
15:56 karolherbst: I fixed up most of the kepler reclocking things
15:56 gregory38: kernel? drm?
15:56 karolherbst: kernel
15:56 karolherbst: but you would need 4.6
15:57 gregory38: I guess I will recompile it.
15:57 karolherbst: https://github.com/karolherbst/nouveau.git
15:57 karolherbst: branch: stable_reclocking_kepler_v5
15:57 karolherbst: and run make inside the drm/ folder
15:57 tobijk: karolherbst: you noticed some regression with latest mesa and your TR2013 traces lately?
15:58 gregory38: Ok thanks for the info
15:58 karolherbst: tobijk: no idea if those are regressions
15:58 karolherbst: gregory38: what gpu do you have by the way?
15:58 karolherbst: gregory38: if you cat your current pstate file, you should see the memory clock go up, but the core clock stays the same, right?
15:59 gregory38: a GTX760
15:59 karolherbst: tobijk: but which issues did you encounter?
16:00 tobijk: karolherbst: not the thing in the traces, it does not trrace them anymore at all :/
16:00 gregory38: cat /sys/class/drm/card0/device/pstate
16:00 gregory38: 07: core 405 MHz memory 648 MHz
16:00 gregory38: 0a: core 405-967 MHz memory 1620 MHz AC DC *
16:00 gregory38: 0e: core 405-1306 MHz memory 6008 MHz
16:00 gregory38: 0f: core 405-1306 MHz memory 6008 MHz
16:00 gregory38: DC: core 966 MHz memory 1620 MHz
16:00 karolherbst: tobijk: uhh, current mesa master?
16:00 tobijk: from yesterday
16:00 gregory38: [10921.616562] nouveau 0000:02:00.0: clk: failed to raise voltage: -22
16:00 gregory38: [10921.616569] nouveau 0000:02:00.0: clk: error setting pstate 3: -22
16:01 gregory38: hum, when I set 0xa, I dont the error
16:01 karolherbst: gregory38: right, clock to 0a should work, because the voltage requiernment for 967 isn't too high, on 0e/0f that's a different issue
16:01 karolherbst: gregory38: well the vbios is a bit messed up basically
16:02 karolherbst: gregory38: those 1306MHz clock states require a voltage which is above the vbios voltage limit and then we fail to set the voltage
16:02 gregory38: stupid factory overclocking
16:02 karolherbst: gregory38: nah, this is fairly normal for most mid-high end gpus
16:03 karolherbst: it's just something we don't deal with yet, but I fixed that on my branch
16:03 imirkin: gregory38: more like "stupid undocumented vbios"
16:03 gregory38: ok
16:03 karolherbst: gregory38: well you can clock to 0a first and then to 0f
16:03 karolherbst: gregory38: this should give you a high enough core clock and high memory
16:04 karolherbst: gregory38: but nouveau may be unstable without my patches in that case
16:04 karolherbst: depends on your vbios though
16:04 gregory38: well it will runs only a couple of seconds for the profiling so unstability is maybe not a big deal
16:04 karolherbst: well, it depends on your card
16:04 karolherbst: some cards seem to crash really fast
16:05 karolherbst: some even just by putting load on them and you get an instant crash
16:05 imirkin: gregory38: also, someone reported that PCSX2 crashes nouveau with some but not all games
16:05 imirkin: so ... be ready :)
16:05 gregory38: where ?
16:05 gregory38: it could be the latest image addition
16:05 imirkin: nah, this was before nouveau exposed images
16:06 imirkin: and it was nouveau that crashed, not PCSX2
16:06 imirkin: iirc the gpu wedged itself or something
16:06 imirkin: i think it was orbea reporting, he might remember which game it was so you can stay away from it for now
16:06 karolherbst: mhh odd
16:07 karolherbst: I am sure pcsx2 ran just fine on my gpu
16:07 imirkin: karolherbst: but presumably you didn't try *every* game ... this was one specific one
16:07 karolherbst: well I tried the most demanding one :D
16:07 gregory38: well depends really of the game, I have a lots (too much) shader combinations
16:08 gregory38: potentially it could have been a bug of myself that cause GPU havoc :)
16:08 karolherbst: we shouldn't crash the gpu though
16:09 gregory38: well you know with latest openGL stuff, you could do dirty thing
16:09 gregory38: but yes GPU ought to survive
16:09 imirkin: or at least recover :)
16:09 imirkin: unfortunately we don't do so hot in the recovery department
16:10 karolherbst: gregory38: well you can check if the 07->0a->0f workaround works good enough for you, if not you can always build nouveau yourself
16:11 imirkin:&
16:11 gregory38: No luck on the workaround but PC is still alive :p So I will compile a 4.6 kernel and then your branch
16:13 tobijk: imirkin: you happen to know if there are traces of the blob for lets say arb-viewport array? (i'm still unlucky in getting the blob to work with my prime system *duh*)
16:15 karolherbst: gregory38: well the core clock should be at 960MHz or something right?
16:15 gregory38: DC: core 966 MHz memory 6007 MHz
16:15 karolherbst: yeah, you won't reach 1306 anyway
16:16 karolherbst: maybe something around 1150MHz is possible
16:16 gregory38: that strange it ought to be enough
16:18 karolherbst: what do you mean?
16:18 gregory38: I have awful perf with PCSX2
16:19 gregory38: I don't have the number of Nvidia without the thread optimization
16:19 karolherbst: well. that's to be expected
16:19 karolherbst: well with crappy core clocks that is
16:20 karolherbst: you sould get much better perf with that increased core clock
16:20 gregory38: perf is better at 900Mhz vs 450
16:21 gregory38: but I didn't push PCSX2 (no upscaling) on the GPU to better profile the CPU impact
16:21 Calinou:wonders how well Nouveau will manage Maxwell/Pascal cards in 1.5-2 years
16:21 Calinou: just for fun :)
16:21 Calinou: maybe 60%+ of proprietary driver performance?
16:22 Calinou: (in a stable manner)
16:22 gregory38: I ought to have 70% of my GPU, should be enough to achieve 100 fps
16:22 Calinou: my 570 was 20% :P
16:24 karolherbst: Calinou: yeah well, thats fermi :p
16:25 gregory38: anyway, could be the way I writed my shader. I put all ubo (even unused) as a glsl header
16:25 gregory38: maybe I just have too much validation
16:25 gregory38: I would try to profile it with current setup
16:26 gregory38: If I've time I will upgrade to 4.6
16:26 gregory38: + your branch
16:26 karolherbst: okay
16:26 gregory38: thanks for all the info
16:34 karolherbst: gregory38: I guess you are currently in the process of cleaning up a lot of stuff as I heard?
16:35 gregory38: ie?
16:35 gregory38: you mean in PCSX2?
16:35 karolherbst: gregory38: I tried "shadow of the colossus" once, perf wasn't as good as expected ;)
16:35 karolherbst: yeah
16:36 karolherbst: gregory38: someone told me that the merge of the OpenGL code was kind of a mess, no idea if that's true though
16:36 gregory38: who, so I can shoot it :p
16:36 karolherbst: :p
16:36 gregory38: honestly sotc is potentially faster with openGL
16:36 gregory38: the issue of this game is the VU part
16:37 karolherbst: like I would ever reveal my informants :p
16:37 gregory38: the vertex shader part of the PS2 is done on separate vectorial units
16:37 karolherbst: yeah
16:37 gregory38: and those unit aren't standard float
16:37 karolherbst: I digged a bit into the technical specs
16:37 karolherbst: and it sounds like a big hack
16:37 karolherbst: the entire PS2 that is
16:37 gregory38: so there are just a pain to emulate
16:37 karolherbst: non IEEE conformant floating point operrations as I heard?
16:38 gregory38: well it makes sense for a GPU
16:38 gregory38: in particular an old GPU
16:38 karolherbst: but I guess games depend on that
16:38 gregory38: In order to achieve the 150Mhz (yes I know)
16:39 gregory38: sotc uses lots of primitives, so there are lots of operation on the VU
16:39 karolherbst: yeah, I figured
16:39 gregory38: game uses few texture but uses directly fragment color (cell shading?)
16:39 karolherbst: but I was CPU bound, not GPU bound
16:39 gregory38: well did you try with nvidia or with nouveau
16:40 karolherbst: both
16:40 karolherbst: always like 400% cpu load
16:40 gregory38: it is likely the core (Emotion engine and VU)
16:40 gregory38: depends of the rendering scene too but I could replay part of the game in GSdx only
16:41 gregory38: except if you put a high upscale, I can achieve 180 fps
16:41 karolherbst: mhh
16:41 gregory38: On nvidia
16:41 karolherbst: but I am also running 1.4.0
16:42 gregory38: on some game latest git is a bit faster, but I don't think sotc will be too impacted
16:42 gregory38: potentially on GSdx, you could have some issues with accurate blending
16:43 gregory38: I'm using texture barrier to do some kind of SW blending
16:43 karolherbst: gregory38: I need like 2 VU Cycle Stealing
16:43 karolherbst: with 0 it is _really_ slow
16:43 gregory38: otherwise EE could potentially be slower due to removal of MMX
16:44 gregory38: yes
16:44 gregory38: too much polygon to process
16:44 gregory38: I didn't profile it but the non-standard float stuff is awful
16:45 gregory38: So to answer your initial question, I'm trying to fix various rendering issue on GSdx
16:45 karolherbst: ahh okay
16:45 karolherbst: so not directly performance related
16:46 karolherbst: or performance too?
16:46 gregory38: Mostly depth related effect
16:46 karolherbst: I see
16:46 gregory38: It depends
16:46 gregory38: I recently replaced an heavy effect with a basic shader
16:47 gregory38: initially it was like 500/1000 draw calls
16:47 gregory38: to emulate this kind of stuff
16:47 gregory38: Apply a brighness correction
16:47 karolherbst: maybe I should switch to the git version then
16:48 gregory38: On snowengines games it is much faster (you need to enable an option)
16:48 gregory38: fast texture invalidation
16:49 gregory38: But globally dev is chaotic sometimes I'm fixing rendering, sometimes I try to improv perf
16:49 gregory38: it depends on my mood
16:49 karolherbst: well I also found some visual artefacts with nouveau on sotc
16:50 gregory38: small or big one ?
16:51 karolherbst: big but trivial
16:51 gregory38: trivial?
16:51 karolherbst: it looks like the screen is devided in 4 big traingles, and some of them are just black
16:51 karolherbst: I mean it doesn't seem to have anything to do with what I see...
16:52 karolherbst: maybe I make a screenshot with the git version and show you
16:53 gregory38: Ah, I know what people mean by cleanup. I recently removed the old fallback extension support. It was to support Mesa but now it is useless
16:54 gregory38: PS2 often send sprite for postprocessing. Which are translated to 2 triangles by the geometry shader
16:54 gregory38: geometry shader can be disabled if I can help
16:55 gregory38: it* can help
16:55 karolherbst: tobijk: no issues playing the game
17:03 tobijk: me neither tioday, dont know what it caused yesterday
17:04 tobijk: karolherbst: but the traces are still broken somehow
17:08 karolherbst: gregory38: ohh another thing: is there a reason why pcsx2 is 32bit?
17:08 gregory38: because I have only 2 arms ;)
17:09 karolherbst: I see
17:09 gregory38: lots of the code is recompiler
17:09 gregory38: so quite lots of work to do
17:09 karolherbst: no idea how much potential it would have to port it over to 64bit
17:09 gregory38: when on the EE is will help
17:10 gregory38: but I have the feeling the real issue is the VU
17:10 gregory38: (by the way, you could try MTVU (multithread VU) for sotc)
17:10 gregory38: 64 bits will be useless here
17:10 gregory38: well there are extra reg but
17:10 gregory38: op will be bigger
17:11 gregory38: but code will remains heavy
17:11 orbea: imirkin: gregory38 I dont recall reporting any games that outright crash with pcsx2 + nouveau, only some graphical issues nad perf problems...
17:11 orbea: *and
17:11 gregory38: whereas on the EE it would be easier to emulate 64 bits operation with real 64 bits opcode :)
17:12 gregory38: but yeah 64 bits wasn't a priority
17:13 gregory38: PCSX2 is quite heavy on the driver. It is good to benchmark validation stuff (and report nice bug report :p )
17:14 gregory38: (PCSX2 code could likely be improved too, to improve perf)
17:15 orbea: its mostly those suspectingly poorly coded games like Xenosaga 1, granted its gotten relatively better too since I first noticed it
17:15 karolherbst: gregory38: already tried the MT thing
17:16 gregory38: orbea: oh, maybe I know what happen.
17:16 gregory38: The game uses a bitmask on the alpha so I emulate it (poorly) with a shader blending + texture barrier.
17:16 gregory38: I just draw triangle by triangle so it was quite heavy on the driver
17:17 orbea: the first 10 minutes are good and then it just lags the moment Kos-Mos appears...
17:18 gregory38: Later I did it differently to flush the texture only once and to draw the full draw call at once too
17:18 gregory38: Otherwise it could be a framebuffer conversion
17:19 orbea: If it helps I could do a pcsx2 issue later
17:19 gregory38: GS format isn't linear so when you want sample the framebuffer as a texture you need to do convert the texture with a shader. It is slow
17:19 gregory38: Well, is it slow with Nvidia ?
17:20 gregory38: ideally you will want to compare with Nvidia but MT optimization disabled
17:20 orbea: I haven't tried, I been burned enough by nvidia that I'm hesistent to bother even installing it
17:20 gregory38: I mean this stuff LD_PRELOAD="libpthread.so.0 libGL.so.1" __GL_THREADED_OPTIMIZATIONS=1
17:21 gregory38: currently nouveau does lots of extra invalidation
17:21 gregory38: it must be fixed first
17:21 gregory38: orbea: do you know if you're CPU or GPU bound ?
17:22 orbea: Not sure?
17:23 gregory38: upscaled or native ?
17:23 orbea: native resolution
17:24 gregory38: ok so it is likely a cpu issue
17:24 gregory38: anyway, on my partially reclocked GPU the perf is awful
17:25 orbea: it occurs both with hw and sw rendering, worse with the former
17:25 gregory38: sw renderer basically render on the cpu and then send the final picture on the GPU
17:25 gregory38: so yeah it is slow but it is normal
17:27 gregory38: it feels like Nouveau isn't happy with vertex/texture streaming
17:28 karolherbst: gregory38: we also have a really big perf issues with eon titles, maybe we really to do much invalidation
17:29 gregory38: karolherbst: https://bugs.freedesktop.org/show_bug.cgi?id=96355
17:29 karolherbst: yeah I already saw that
17:29 gregory38: recently it became worst
17:29 gregory38: on perf
17:29 karolherbst: gregory38: so what did you do to get better perf?
17:29 gregory38: + 3.42% 0.01% pcsx2_GSReplayL libdrm_nouveau.so.2.0.0 [.] nouveau_pushbuf_validate ▒
17:29 gregory38: + 3.34% 0.37% pcsx2_GSReplayL libdrm_nouveau.so.2.0.0 [.] pushbuf_validate ▒
17:29 gregory38: + 3.33% 1.83% pcsx2_GSReplayL libdrm_nouveau.so.2.0.0 [.] pushbuf_kref
17:30 gregory38: karolherbst: better perf on what ?
17:30 karolherbst: gregory38: ohh so you only traced but didn't verify?
17:30 gregory38: on the report, I traced, I checked that validation was called
17:31 gregory38: but I didn't checked the performance impact
17:31 gregory38: because I was trying to get the full speed of my gpu
17:32 gregory38: sotc (above perf) uses lots of polygon. So it relies on persistent buffer
17:32 gregory38: to stream vertex
17:33 gregory38: it could be related to the drm validation
17:33 gregory38: but it could be normal value
17:33 gregory38: I'm not even sure my perf is working
17:34 gregory38: + 9.14% 0.80% pcsx2_GSReplayL libGSdx.so [.] GSRendererHW::Draw ▒
17:34 gregory38: Sorry
17:34 gregory38: I mean
17:34 gregory38: + 6.97% 0.00% pcsx2_GSReplayL nouveau_dri.so [.] nouveau_screen_get_name ▒
17:34 gregory38:
17:34 gregory38: nouveau_screen_get_name is on top of my chart but function is quite small
17:34 karolherbst: those are local percentages, right?
17:34 gregory38: first is children and 2nd one is self
17:35 karolherbst: yeah
17:35 karolherbst: then nouveau_screen_get_name isn't hardly called
17:35 karolherbst: *is
17:37 gregory38: is there a way to know if I have PCIe gen3 enabled ?
17:37 karolherbst: gregory38: lspci -v
17:37 karolherbst: or was it -vv?
17:37 karolherbst: gregory38: -vv and then on the GPU: LnkSta: Speed 8GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
17:38 karolherbst: now I have to think
17:38 karolherbst: ...
17:38 karolherbst: nope
17:38 karolherbst: pcie stuff was merged in 4.5
17:38 karolherbst: so for you it should be at 2.5
17:39 gregory38: LnkSta: Speed 2.5GT/s,
17:39 gregory38: yes
17:39 karolherbst: it shouldn't matter that much though
17:39 karolherbst: well shouldn't
17:39 karolherbst: it is quite easy to change the pcie link speed by hand
17:39 karolherbst: but well
17:40 gregory38: (well for the record, I'm working on some PCIe hardware IP currently)
17:40 karolherbst: well, and 8.0 uplink is already implemented :p
17:40 karolherbst: just not on 4.4
17:40 karolherbst: got merged in 4.5 ;)
17:41 gregory38: that a shame, I upgraded yesterday to 4.4
17:41 karolherbst: well
17:41 gregory38: I wanted stability
17:41 karolherbst: with 4.4 you got working gddr5 reclocking :)
17:42 gregory38: I will blacklist your computer from PCSX2 so you will have more time to work on nouveau ;)
17:43 karolherbst: first upgrade to a usefull nouveau branch :p
17:43 gregory38: yeah
17:43 gregory38: reclocking + pcie 3 have potential to improve my situation
17:43 karolherbst: imirkin: is there an easy way to disable the entire ssbo thing?
17:44 gregory38: yes
17:44 karolherbst: gregory38: well I could check if pcie 3 does much here
17:44 karolherbst: gregory38: but when you install envytools, you can uplink quite easy yourself
17:44 gregory38: I just removed the flag from the structure
17:44 karolherbst: gregory38: where?
17:44 gregory38: never dirty, never called ;)
17:45 gregory38: http://pastebin.com/gcWiss8L
17:45 gregory38: it will broke any program that relies on it
17:45 gregory38: but I think it is enough for benchmark
17:45 gregory38: otherwise, I think some if (shader-> buffer = 0) return in some place could work too
17:46 gregory38: but I let's the experts handle it
17:47 karolherbst: let's see
17:50 karolherbst: well, it seems like it doesn't matter that much
17:52 gregory38: you mean PCIE3
17:53 gregory38: or the hack above?
17:53 karolherbst: nope, the validation thing
17:54 gregory38: may I ask you what did you test ?
17:54 karolherbst: nothing in PCXS2
17:54 karolherbst: I can't set the analog button :O
17:54 gregory38: see I told you that I will blacklist you
17:55 gregory38: dual shock ?
17:55 karolherbst: xbox360
17:55 karolherbst: :p
17:56 gregory38: I dunno, how we manage to make pad management more complex than GPU !
17:57 karolherbst: wow
17:57 karolherbst: 50% speed
17:57 karolherbst: 240% cpu load
17:59 karolherbst: gregory38: internal resolution sould be also fit to screen _if_possible_ ;)
17:59 gregory38: let's me restart a couple of time so I can install 4.6 + your branch
17:59 gregory38: IR is a troll topic :)
18:00 gregory38: see you
18:06 tobijk: meh, why does my vulkan build not find my wayland headers :/
18:09 gregory38: ok back. 4.6 is running
18:10 gregory38: Need to load the good nouveau module now
18:10 gregory38: is there a better alternative to overwrite my 4.6 nouveau.ko module
18:10 karolherbst: not really
18:12 gregory38: ok.
18:12 gregory38: module overwritten, I will reboot again
18:18 gregory38: Ok, I have full speed good jobs :)
18:18 karolherbst: let me guess, 1150 MHz?
18:18 karolherbst: or a bit more=
18:18 karolherbst: ohh wait
18:18 karolherbst: boost isn't enabled yet
18:18 gregory38: DC: core 1084 MHz memory 6007 MHz
18:18 karolherbst: check out the boost file
18:19 gregory38: but still slow as an ass
18:19 gregory38: I have gen3 too
18:21 karolherbst: pcie has like no effect on the speed here
18:21 karolherbst: gregory38: but there is something odd
18:22 gregory38: yeah but I don't think the speed issue is related to the missing 200Mhz
18:22 karolherbst: gregory38: 38% speed, EE: 44%, GS: 46%, VU: 26%
18:22 gregory38: I have the feeling that I'm running on an intel igp
18:22 karolherbst: maybe the GPU speed isn't imporant at all
18:22 karolherbst: and it is a CPU bottleneck
18:22 karolherbst: my GPU is quite bored a bit
18:23 gregory38: the above time it the time spend on thread and children / total slice time
18:23 gregory38: 46% means GS thread is sleeping too
18:24 gregory38: is there a way to know where buffer are located on memory (gart and think like that, to compare with Nvidia)
18:25 karolherbst: I really don't think that the gpu is the performance killer here
18:25 gregory38: hum let's me try without GL_CLIENT_STORAGE_BIT
18:25 karolherbst: I get the same performance on the lowest clock too
18:25 gregory38: yeah
18:26 karolherbst: well
18:26 gregory38: that why I wasn't sure reclocking was working or not
18:26 karolherbst: 8x native res killed perf a bit
18:26 karolherbst: I can go up to 3x native res without perf impact
18:27 karolherbst: yep, on 3x native res, gpu clock affects performance
18:27 gregory38: T:Error ID:2292 S:High => GL_INVALID_OPERATION in glValidateProgramPipeline failed to validate the pipeline
18:27 karolherbst: so on native, the GPU is _really_ board
18:27 karolherbst: well, let me trace stuff
18:28 gregory38: got this message 28659 times for 174 frames
18:29 karolherbst: gregory38: how to you run perf?
18:30 gregory38: perf_3.18 record -g -- $HOME/pcsx2/pcsx2_GSReplayLoader $GS/testsuite/perf/sotc_big.gs.xz
18:30 gregory38: you could generate gs dump with shift-ctrl f8 and keep shift pressed (not too long)
18:31 gregory38: The easiest way to use it is to use env variable
18:31 gregory38: * GSDUMP_SO will be the .so file
18:31 gregory38: * GSDUMP_CONF will be the path of a directory that contains the GSdx.ini file
18:31 gregory38: Then you can use run the dump with
18:31 gregory38: replay_exe your_gsdump.gs
18:32 gregory38: │1948 if (!ctx->_Shader->CurrentProgram[MESA_SHADER_FRAGMENT]) { │
18:32 gregory38: │1949 if (ctx->FragmentProgram.Enabled && !ctx->FragmentProgram._Enabled) { │
18:32 gregory38: >│1950 _mesa_error(ctx, GL_INVALID_OPERATION, │
18:32 gregory38: │1951 "%s(fragment program not valid)", where); │
18:32 gregory38: │1952 return GL_FALSE; │
18:32 gregory38: │1953 }
18:32 gregory38: the mesa error
18:34 gregory38: wait I have likely the wrong line
18:35 karolherbst: gregory38: every thought about implementing a opengl worker thread?
18:35 karolherbst: *an
18:35 gregory38: I was hoping that you will do it in Mesa ;)
18:35 karolherbst: well
18:35 karolherbst: the thing is, mesa doesn't know which generated data you need
18:36 karolherbst: or where your syncpoints would be
18:36 gregory38: well, depends of what you called worker thread
18:37 gregory38: I was thinking at the Nvidia stuff
18:37 karolherbst: how much does it help?
18:38 gregory38: on some games it is near a factor 2
18:38 karolherbst: insane
18:38 gregory38: others game is 10%
18:39 karolherbst: let me try
18:39 gregory38: the mesa error seem related to my latest addition
18:40 gregory38: ah no
18:40 gregory38: day was too long
18:42 karolherbst: okay for sotc threadded optimisation doesn't change a thing
18:44 gregory38: because the core slow you down
18:47 gregory38: I have the source of the error but strangely there is a quote on gl ES3.1
18:47 gregory38: │1465 /* Section 7.4.1 (Shader Interface Matching) of the OpenGL ES 3.1 spec │
18:47 gregory38: │1466 * says: │
18:47 gregory38: │1467 * │
18:47 gregory38: │1468 * - An output variable is considered to match an input variable in │
18:47 gregory38: │1469 * the subsequent shader if: │
18:47 gregory38: │1470 * │
18:47 gregory38: │1471 * - the two variables match in name, type, and qualification; or │
18:47 gregory38: │1472 * │
18:47 gregory38: │1473 * - the two variables are declared with the same location │
18:47 gregory38: │1474 * qualifier and match in type and qualification. │
18:47 gregory38: │1475 */
18:48 karolherbst: perf is somehwat useless today, it prints nothing...
18:48 karolherbst: stupid perf
18:48 karolherbst: useless tool...
18:49 gregory38: the validation of the pipeline could explain the perf impact
18:49 gregory38: THere are loop over all ressources (50)
18:50 gregory38: * 2 and potentially all stages
18:51 gregory38: you know what I will remove the error :p
18:54 gregory38: not better
18:57 karolherbst: the heck perf...
18:58 karolherbst: serioulsy, those profiling tools suck big times
18:58 karolherbst: all of them
18:59 gregory38: yeah it is often faster to try to remvove what you think have a perf impact and check the impact
19:00 karolherbst: maybe gperftools are usefull
19:01 karolherbst: yeah, course, I have only the 64bit version of those...
19:02 gregory38: SSO is broken
19:02 gregory38: (gdb) p consumer_var->name
19:02 gregory38: $13 = 0x8ae8e68 "SHADER[2].c"
19:02 gregory38: (gdb) p var->name
19:02 gregory38: $14 = 0x8b27b80 "SHADER.c"
19:03 gregory38: I need to check if consumer is geometry shader
19:04 karolherbst: why is sso broken?
19:05 gregory38: yes vertex => geometry
19:05 gregory38: because it works everywhere except in mesa
19:05 gregory38: basically there is a mesa check
19:05 karolherbst: well there are some games which use sso
19:05 karolherbst: and they work
19:05 gregory38: that match input and output name with a str cmp
19:06 gregory38: input of geometry shader is an array
19:06 gregory38: output of a vertex shader is scalar
19:07 gregory38: "SHADER[2].c" <= due to GS there is 2
19:07 gregory38: but not in VS : "SHADER.c"
19:07 gregory38: so input and output doesn't match
19:08 karolherbst: well I am no SSO expert
19:08 karolherbst: no idea who implement sso in mesa though
19:09 gregory38: me :p
19:09 gregory38: well partially at least
19:09 gregory38: and a long time ago
19:09 gregory38: compilation was already separated
19:10 gregory38: Ian did a part
19:10 gregory38: and Tim too
19:10 gregory38: and someone at Igalia too
19:11 karolherbst: well then you know how to fix it most likely :p
19:11 gregory38: no because the io validation is a royal mess
19:12 gregory38: and it seems to be recent code
19:12 karolherbst: git blame then :p
19:12 gregory38: let's first open an issue so we have some history
19:14 gregory38: yeah 19th may by Ian
19:15 karolherbst: the heck perf... the heck
19:15 karolherbst: when I profile too long, perf report won't print anything
19:16 gregory38: reduce frequency polling
19:16 karolherbst: then the data becomes useless
19:17 gregory38: yeah
19:17 karolherbst: I think perf just can't handle 300MB big perf.data files
19:18 karolherbst: [ perf record: Captured and wrote 33.744 MB perf.data (424613 samples) ]
19:18 karolherbst: okay, looks good
19:18 karolherbst: but still it won't read that damn file
19:20 karolherbst: gregory38: you know what really sucks perf in sotc? Blending Unit Accuracy >=medium is required
19:21 gregory38: for shadows ?
19:21 karolherbst: for the entire game
19:21 karolherbst: usually sunlight
19:21 karolherbst: even the main menu is already broken
19:21 gregory38: ah yeah
19:21 gregory38: but honestly it is hard to emulate very hard
19:21 gregory38: GS does computation in integer
19:22 gregory38: I think they have implemented a kind of sinus effect based on the integer rounding
19:22 karolherbst: why do they use integer?
19:22 gregory38: I guess it was 10 times faster than float
19:22 karolherbst: ...
19:23 karolherbst: maybe 10 years ago
19:23 gregory38: well PS2 was done before 2000 :p
19:23 karolherbst: well
19:23 karolherbst: let me check something
19:23 gregory38: Honestly, it would be nice if I used integer texture everywhere
19:23 gregory38: with manual blending
19:24 gregory38: it will be free with future hardware
19:24 gregory38: (maxwell and +)
19:24 gregory38: well free enough
19:25 karolherbst: was the GS 32bit only?
19:25 gregory38: for color
19:25 karolherbst: or where there also 64bit operations?
19:25 gregory38: it is fixed function unit
19:26 gregory38: you can do (fragment_color * texture_color >> 7)
19:26 gregory38: blending is something like (Cs + Cd) * As >> 7 + Cs
19:27 gregory38: it is four 8 bits values (channel)
19:27 gregory38: there is also a RGB5A1 format
19:27 gregory38: with various surprised
19:27 gregory38: input texture are often 4 bits or 8 bits index of palette
19:28 gregory38: Seriously, GS is just crazy
19:32 gregory38: ah common, people don't have same mail on bugzilla and git
20:09 karolherbst: gregory38: every thought about using compute shaders?
20:09 gregory38: so far, I'm trying support older GPU too
20:09 karolherbst: well
20:10 karolherbst: you don't have to stick to the same path for every gpu
20:10 gregory38: Gabest tryied to implement something with openCL
20:10 karolherbst: mhhh
20:10 gregory38: yes but rendering is quite complex
20:10 karolherbst: well
20:10 gregory38: adding more path will explode my head
20:10 gregory38: by the way, I'm not sure what I will gain
20:10 gregory38: what can be done with compute shader that can't be done with a fragment shader
20:10 karolherbst: I don't think OpenCL will help you here much, because you use the same devices you render on
20:11 karolherbst: gregory38: compute shaders donare like OpenCL, just in OpenGL and with all OpenGL limitations ;)
20:11 karolherbst: I am sure it is easier to cpmute stuff thought compute shaders than to use fragment shaders
20:12 karolherbst: currently reading that "Explanation of impossible blend" article of yours
20:12 gregory38: the plan is to use in-order atomic fragment shader
20:12 gregory38: You don't have guarantee of compute shader order, do you
20:13 karolherbst: compute shaders aren't part of the usual pipeline
20:14 gregory38: I think CS could be used to convert texture
20:14 karolherbst: I am sure hakzsam knows more anout this
20:15 gregory38: I still need rasterization, depth support and so on ;)
20:15 gregory38: but yeah the conversion of depth to a color could use a compute shader
20:16 gregory38: So I don't know what I did
20:16 gregory38: but I"m in 11 fps now with Nouveau
20:16 gregory38: ...
20:18 karolherbst: mhhh
20:19 karolherbst: I get more
20:19 karolherbst: :D
20:19 gregory38: oh get back to the previous perf 26
20:19 gregory38: fps
20:19 gregory38: on Nvidia I have 180fps
20:19 karolherbst: how should I build pcsx2?
20:19 karolherbst: I am sure somewhere we have a massive CPU bottleneck on the mesa side
20:20 karolherbst: but I simply don't get this thing traced with perf...
20:20 gregory38: ./build.sh --prof
20:20 gregory38: it will keep the frame pointer
20:21 gregory38: at least on PCSX2 side
20:21 gregory38: -DBUILD_REPLAY_LOADERS=TRUE
20:21 gregory38: it will build the replayer so you can replay a gs dump without PCSX2
20:22 gregory38: need to go
20:22 gregory38: good luck
23:42 imirkin: tobijk: why are you curious about ARB_viewport_array? it should work just fine - check what nouveau does, it's correct.
23:43 imirkin: tobijk: there are some mmt's at https://people.freedesktop.org/~imirkin/traces/nve6/
23:43 imirkin: although i suspect they're in the old format, so you can't use demmt with them
23:43 imirkin: you can use dedma though
23:45 tobijk: imirkin: i hoped to find the right registers for guardband clipping when looking at them
23:45 tobijk: as they are resizing the viewport, hopefully the guardband is resized as well
23:45 tobijk: that was my hope at least :)
23:47 tobijk: if somebody has a better idea, cry now :D
23:57 karolherbst: does tesla have something like dual issueing?
23:58 karolherbst: ahh indeed it has
23:58 karolherbst: or is it limited to mad?