04:14imirkin: pmoreau: when you get a chance, try my old patch for viewports, and try adding IMMED_NVC0(push, SUBC_3D(0xf60), 1); somewhere in the init. if that does nothing, try 0xf10
05:24imirkin: hakzsam: this seems to help me slightly, but not enough, for UE4: https://github.com/imirkin/mesa/commit/428bdf7a9c5ed293a57625d02bb9080c446f975d
07:35pmoreau: imirkin: Ok, will try that when I get home.
07:42mac-: kernel: [ 6994.926472] nouveau 0000:01:00.0: fifo: DMA_PUSHER - ch 1 [X] get beef0200 put 0001bad4 state c002018c (err: MEM_FAULT) push 00000000
07:42mac-: shit, again
07:48mac-: can it be fault of outdated nouveau driver ?
07:48mac-: -rw-r--r-- 1 500 500 87800 Sep 6 2013 xf86-video-nouveau-1.0.9-x86_64-2.txz
07:55karolherbst: mac-: could be
07:55karolherbst: well 1.0.9 is a bit old, yes
07:57mac-: it doesn't look good
07:58mac-: but wait a sec
07:59mac-: I tried to compile kernel module, but 4.6 should be quite new and I only need xf86 for xorg
08:04karolherbst: mac-: upstream nouveau will only compile against linux-next (and also drm-nex)
08:05mac-: what is the current version of nouveau?
08:05karolherbst: mac-: but well, your xorg driver is old
08:05karolherbst: xorg driver is 1.0.12
08:05mac-: -rw-r--r-- 1 500 500 95912 Dec 16 03:38 xf86-video-nouveau-1.0.12-x86_64-1.txz
08:05mac-: there is the current version in slack-current
08:05mac-: lets check it
08:13hakzsam: imirkin, this might help according to what the blob does but we don't know exactly what this method is for
08:14hakzsam: I think I have made some progress yesterday, will try later today on kepler
08:14hakzsam: but still definitely not fixed yet
08:25mac-: ok, I had to compile xf86 because ofcourse cuurent Slackware64 package required newer x.org
08:25mac-: but it started
08:25mac-: how can I check version
08:25mac-: it is in x.0.log ?
08:30pmoreau: mac-: Yes: right after X loads the module, two lines below should say something like: "compiled for 1.18.3, module version = 1.18.3"
08:30mac-: compiled for 1.14.3, module version = 1.0.12
08:31mac-: I'm curious if nv34 will sucks same as it did on the old driver ?
08:45karolherbst: 1.14.3 is also a bit old
08:46karolherbst: and unsupported
08:51mac-: they should release next slackware on days, then I will upgrade
09:06mac-: why they wrote 'OpenGL: 1.5 (2.1)' there ?
09:33karolherbst: mac-: because the missing bits are implemented in software
09:33karolherbst: mac-: but the hardware should only be able to reach 1.5
10:01mac-: ok, understand
10:02mac-: nvs280 handles 1.5 and by driver 2.1 is handled but only in software way by CPU
10:03mac-: I see that nvs300 are cheap and many times better, I think I get one
10:03mac-: and return all my nvs280 for recycling
10:19karolherbst: mac-: what about NVS 510?
10:20karolherbst: they are expensive
10:20mac-: I wish to get one with passive cooling
10:20mac-: and hw accel
10:20mac-: nvs300 is highest with passive cooling
10:20karolherbst: well any reasons it has to be a quadro?
10:20mac-: only the support and low TDP
10:21Calinou: TDP means nothing
10:21karolherbst: it somehow does
10:21Calinou: it's been proven, the manufacturer will put what they want to please integrators there
10:21karolherbst: because low TDP means less cooling required
10:22mac-: nvs280 has 13W, nvs285 20W and nvs300 has 21W
10:22mac-: nvs290 sth around 20W
10:23karolherbst: well honestly, if you intent to use nouveau, quadros won't give you anything
10:23mac-: what do you mean ?
10:23karolherbst: same situation with Xeons
10:23karolherbst: some people say it make sense for usual desktop experience, but it really doesn't
10:24karolherbst: quadro usually get only fancy sofware features, but usually desktop GPUs have the same stuff on the chip
10:24karolherbst: but nvidia just doesn't enable it in software
10:24karolherbst: like power consumption readings
10:24karolherbst: and you get better support from nvidia
10:25karolherbst: and quadro cards tend to have more ports I think
10:25mac-: those has DMS59
10:25Calinou: what about GT 710?
10:25Calinou: can drive 3 monitors, I think
10:26karolherbst: yeah well
10:26karolherbst: 3 is nothing
10:26karolherbst: there are cards with 8 ports
10:26karolherbst: mac-: do you need DMS-59?
10:28karolherbst: but this is the same thing as "having more ports", DMS-59 that is
10:29mac-: I use it because it has it
10:29mac-: so far I have one screen and don't have any plans for more
10:29karolherbst: well, there is no advantage in DMS-59 over VGA/DVI if you just plug one diplsay anyway
10:30karolherbst: well it is your choice in the end, but if you don't know why you need a quadro, then maybe you really don't need one ;)
10:30mac-: just need well supported cheap graphics with low power consumption and hw accel for 2D, video and some GPU for ff accel
10:31mac-: ideally on PCI or PCIe x1
10:31karolherbst: well do you want to use nvidia or nouveau?
10:32karolherbst: if you want nouveau and some decent performance, kepler or tesla are the best choices, but not all gpus can be fully reclocked, so there is a bit of pain here
10:32karolherbst: like there is no reclocking support for gddr3 kepler
10:33mac-: I don't play games or sth, just need GPU accel for things like acceleration in ff and etc
10:36karolherbst: well if you biggest concern is stability and you want to use the open drivers, AMD cards are your best choice for now, except you want to help nouveau out, or really doesnt like ATI/AMD cards
10:37mac-: I just have changed to the latest nouveau and will check, if it will be stable then will stay on it
10:38mac-: I don't need top performance or sthm, just basic hw support
10:38mac-: to avoid software priocessing of everything
11:43magic_sam: Hi all :)
11:44magic_sam: I'd like some info regarding nouveau.pstate=1 kernel module parameter
11:44magic_sam: I'm trying to lower the speed of my Nvidia 8800M GTX inside a laptop computer
11:45magic_sam: It's overheating, fans make a lot of noise
11:46magic_sam: how do I specify a new performance level at boot time ?
11:47magic_sam: "/sys/class/drm/card0/device" reads as follows
11:47magic_sam: "03: core 200 MHz shader 400 MHz memory 100 MHz"
11:48magic_sam: I guess that mode would be nice
11:51magic_sam: I forgot to say I'm running Gallium 0.4 on NV92 / 3.0 Mesa 10.3.2
11:51magic_sam: which is shipping with Debian 8.0 stable x86_64
11:52magic_sam: Kernel is 3.16.0-4-amd64
11:53pmoreau: G92 doesn’t have reclocking support in recent kernel, but maybe it did in 3.16, though I doubt it.
11:54pmoreau: Before changing perf lvl at boot time, have you tried changing perf lvl at runtime?
11:55magic_sam: @pmoreau: thanks for answering
11:55magic_sam: no I didn't try at runtime
11:55magic_sam: I don't really know how to do that :)
11:55pmoreau: Instead of cat'ing that file, echo the perf lvl into it
11:56magic_sam: OK, I understand
11:56pmoreau: After that, you should get a "*" at the end of the corresponding perf lvl entry in the file, and the last line should reflect the clocks from the new perf lvl
11:57magic_sam: OK, right now the line with "*" is
11:57magic_sam: "--: core 275 MHz shader 550 MHz memory 300 MHz *"
11:58pmoreau: As to answer your initial question, you set "nouveau.config=NvClkMode=xx", with xx being the perf lvl in decimal, not hexa
11:58pmoreau: See https://nouveau.freedesktop.org/wiki/KernelModuleParameters/
11:59magic_sam: I have read that page already, and since there were parts I didn't understand, I prefered to check with you guys first
11:59magic_sam: would "echo 03 > /sys/class/drm/card0/device" work ?
11:59magic_sam: at runtime ?
12:00pmoreau: If reclocking is supported on that card, yes
12:00pmoreau: Otherwise, you should get a message like "not implemented"
12:00magic_sam: OK, I'm going to try this right now
12:00magic_sam: thanks for your time @pmoreau
12:01pmoreau: You’re welcome
12:04pmoreau: magic_sam: Looking at the code from 3.16, it looks like only Tesla’s NVAA+ and Kepler’s NVE0+ advertise reclocking
12:05magic_sam: "echo "03" > /sys/class/drm/card0/device/pstate"
12:05magic_sam: write error: Function not implemented
12:05magic_sam: you are right
12:07magic_sam: Any idea which kernel I should use in order to be able to use that option ?
12:07pmoreau: You could grab a more recent kernel, and modify https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/nouveau/nvkm/subdev/clk/g84.c: replacing `device->chipset >= 0x94` by `device->chipset >= 0x92`
12:08gnurou: karolherbst: sorry, could not really follow the discussion - are you interested in Kepler or Maxwell dual-issue information?
12:09pmoreau: And you probably want an additional patch which improves stability, but which hasn’t made it into Linus’ tree yet
12:11magic_sam: @pmoreau: https://nouveau.freedesktop.org/wiki/PowerManagement/
12:12magic_sam: could I play with memory clock, thermal sensors and fanspeed control ?
12:12magic_sam: with my current setup
12:13pmoreau: RSpliet: I can’t find back your recent reclocking patch which improves stability. Am I imaginating things? Or was it karolherbst maybe?
12:15pmoreau: magic_sam: I think you can set the fan control to manual, and change the fanspeed, but I don’t know how, nor which chipsets are supported. mupuf should know that
12:16pmoreau: As for thermal sensors, if you run `sensors`, it should retrieve that data from Nouveau
12:17pmoreau: And for memory clock, apart from moving from perf lvl to perf lvl, you do not have the ability to enter a clock frequency and have Nouveau try to switch to that one.
12:18magic_sam: OK, I see
12:18magic_sam: thanks for your explanations
12:20magic_sam: regarding the fan, would "nouveau.config="NvFanPWM=1"" work ?
12:28pmoreau: I don’t know what parameters to NvFanPWM mean. Probably 1: auto-detect, 0: manual, -1: forced auto?
12:28karolherbst: gnurou: yeah
12:29imirkin: magic_sam: if you have GDDR3 vram, a few G92 users were able to successfully use the current upstream code (after manually modifying it to allow G92's)
12:29pmoreau: Err, -1: auto-detect, 1: force auto
12:29imirkin: magic_sam: otherwise you can try to see if a pre-3.13 kernel will work for you
12:29karolherbst: pmoreau: RSpliet has a patch for fixing some gr firmware stuff, but skeggsb had another one which he upstreamed
12:29pmoreau: karolherbst: Ahhhh! Ok
12:30gnurou: karolherbst: so by your answer I suppose you want Kepler AND Maxwell
12:30karolherbst: gnurou: well, the current code is good enough now though
12:30pmoreau: gnurou: *AND* Pascal! ;-)
12:30karolherbst: gnurou: well, for kepler we have done a lot already
12:30gnurou: I have a bug opened to release that info actually... need to push it a bit
12:30karolherbst: gnurou: pixmark_piano: inst_issued2: 217M -> 271M :D
12:31karolherbst: gnurou: 1019 -> 1054 score
12:31gnurou: and for Maxwell maxas' information seems to be pretty complete from what I have seen
12:31karolherbst: okay, so we just would have to implement it then
12:31karolherbst: yeah, so kepler would be good so that we can verify out stuff
12:33imirkin: gnurou: unfortunately maxas's information is in a giant perl program. i've been unable to decipher it.
12:34gnurou: argh, Perl...
12:35imirkin: skeggsb: do you remember anything "funny" about clipping on nv30? i don't mean clipdistance clipping ... i mean vertex shader output clipping. i guess that everything not in -1..1 is supposed to get clipped, but i'm not seeing that (i think)
12:36karolherbst: wow, dual issueing _makes_ a big difference. I would expect the difference to be in a 10% range, but it is more like 20%
12:37karolherbst: mhh 15% in pixmark_piano
12:37karolherbst: gnurou: ohh, and for fermi as well :D
12:37imirkin: hakzsam: any comments on "nvc0/ir: handle a load's reg result not being used for locked variants"?
12:38imirkin: pmoreau: did you get a chance to play around with your GM206?
12:38karolherbst: gnurou: there should be something like dual issueing there, but it is done implicitly on the hardware
12:38magic_sam: @pmoreau and @imirkin: thanks for your help, it's really appreciated
12:38karolherbst: gnurou: but if we know what fermi likes to dual issue, we can schedule the instructions for that a bit
12:38gnurou: karolherbst: fermi and kepler should do mostly fine without explicit scheduling... maxwell is another story though
12:38imirkin: or alternatively, gnurou, would it be a waste of my time trying to get docs on how to do this or that on GM206? (like writing viewport index/mask from non-geometry shaders?)
12:38pmoreau: imirkin: Not yet: it’s 14:38 here, so I’m still at work ;-) Will ping you as soon as I have tested it
12:39magic_sam: @imirkin: I'll try to contribute some mmio traces, I promise :)
12:39karolherbst: gnurou: well, you missunderstood me. On Kepler we have to fill the dual issues in the sched opcodes, on fermi there are none, so we don' care, yet
12:39imirkin: magic_sam: well, RSpliet is the tesla reclocking guru. he could provide more info on how to properly collect them.
12:39karolherbst: gnurou: but I have a PostRA Pass to move some instructions around to improve dual issueing, which only works for kepler yet
12:39imirkin: karolherbst: just coz the info's not explicit doesn't mean it's not there
12:39karolherbst: gnurou: because we have no dual issueing information for fermi
12:39gnurou: imirkin: I hope to be able to release that info soon... it's just that the process takes a while
12:39gnurou: imirkin: so I would not spend too much time on it for now
12:40imirkin: gnurou: which info?
12:40gnurou: imirkin: if I were you
12:40karolherbst: imirkin: well, we still have to know what we can dual issue on fermi :)
12:40gnurou: imirkin: scheduling
12:40hakzsam: imirkin, will do later today, sorry
12:40imirkin: gnurou: all the method docs?
12:40karolherbst: gnurou: ohh, so complete scheduling docs?
12:40imirkin: gnurou: ah, nice!
12:40gnurou: imirkin: ah sorry, misunderstood your message
12:40magic_sam: I need to go now, see you all later :)
12:40magic_sam: cheers, Magic Sam
12:41gnurou: karolherbst: hope to be able to release it, yes
12:41karolherbst: awesome :)
12:41gnurou: but you know how this works... :/
12:41gnurou: slowly and painfully
12:41karolherbst: yeah, I know
12:41imirkin: hakzsam: airlied said it fixed his test-case (well, the CTS test case)
12:41karolherbst: well at least dual issueing is something which is rather easy to RE
12:41imirkin: hakzsam: easily repro'd by doing an atomicExchange() and ignoring its result
12:41gnurou: same for the Kepler clocking information, if only I was in Santa Clara I could just go to people's desk >_<
12:41karolherbst: gnurou: can you say if dual issueing is the same on fermi and kepler?
12:42imirkin: gnurou: get one of those virtual presence thingies
12:42gnurou: karolherbst: I completely ignore that, sorry
12:42karolherbst: :D okay
12:42gnurou: karolherbst: intuitively I would say that if the ISA did not change much, then scheduling is probably close too
12:42hakzsam: imirkin, okay, i'll have a look later, I can't right now
12:42karolherbst: hakzsam: do we have the inst_issued2 counters on fermi too?
12:43hakzsam: karolherbst, yeah
12:43imirkin: hakzsam: yea no worries
12:43karolherbst: mhh, anybody with a fermi mind trying to test something?
12:44hakzsam: karolherbst, send instructions by email, I could test tonight
12:54imirkin: hakzsam: btw, my next move is going to be to rip out the NVE4_CP's and replace them with NVC0_3D's for all the upload stuff.
12:55hakzsam: any reasons to do that?
12:55imirkin: fear of the gathering darkness?
12:55imirkin: i'm concerned that stuff gets desync'd between the channels
12:55imirkin: even though they supposedly all go to the same engine in the end
12:56hakzsam: blob uses NVE4_CP IIRC
12:56imirkin: what does blob know :p
12:56imirkin: blob also uses a separate channel, no?
12:56hakzsam: seems like
12:56imirkin: so that makes sense then
12:56imirkin: anyways, it's something to try
12:57hakzsam: are you guessing that could fix the UE4 issue (on kepler of course)?
12:58imirkin: i'm getting TEX faults
12:58imirkin: which means that it's trying to access a texture that's no longer there
12:58imirkin: either we messed up
12:58imirkin: or it's reading stale TIC info
12:59hakzsam: yeah, a strong reason for trying that thing then :)
13:25npnth: Is there any documentation for nouveau_object_new available?
13:31imirkin: npnth: it creates a new nouveau object :p what kind of documentation were you hoping for?
13:32imirkin: npnth: all that stuff has been rewritten a few times, btw - make sure to look at the latest stuff, in 4.3+
13:32npnth: imirkin: Well, hopefully something along the lines of "If you've got a call that works with a NVC0 card, but returns -22 with a NVE0 card, here's how you need to fix it."
13:33imirkin: what's -22?
13:33npnth: The return value of nouveau_object_new :p
13:34imirkin: fine, i'll look it up myself
13:34imirkin: #define EINVAL 22 /* Invalid argument */
13:34imirkin: there ya go.
13:34imirkin: you supplied some sort of invalid param.
13:34imirkin: you fix it by supplying a valid param.
13:34npnth: Huh, I didn't actually think that it would return errnos.
13:34imirkin: hope you enjoyed those docs :)
13:35npnth: Very enlightening, thanks.
13:35imirkin: i should write a book.
13:35npnth: I'd buy it.
13:40karolherbst: imirkin: between which values are "sat fadd" instructions saturated?
13:41imirkin: karolherbst: 0..1
13:41imirkin: karolherbst: "saturate" means clamp to 0..1
13:41karolherbst: so this is 1? sat add ftz f32 %r12412 %r12407 1.000000
13:41imirkin: (and don't ask me what it does with NaN coz i don't remember)
13:41imirkin: only if %r12407 is guaranteed to be non-negative
13:41karolherbst: ohhh right
14:00karolherbst: uhh TR2013 update
14:25tobijk: hey there, with 4.7 nouveau started to complain about some sensor: nouveau 0000:01:00.0: iccsense: Unknown sensor type 30, power reading disabled
14:25tobijk: is this only a new debug print or is something likely to be broken? :)
14:26karolherbst: tobijk: nope, just your vbios contains something we don't expect
14:26karolherbst: gk107 was it, right?
14:27karolherbst: ahh, ADS1112
14:27karolherbst: tobijk: well, there is no harm in that message
14:27karolherbst: we can't read the power consumption on your GPU anyway
14:27tobijk: thought so , just making sure :) thx
14:28tobijk: maybe this should be hidden for the "expected" fails then
14:29karolherbst: if we would expect it to fail we would disable it
14:29karolherbst: but we can't be sure
14:32karolherbst: uhh the white fog issue is back.. I thought the patches got merged?
14:35tobijk: white fog? is it mesa or kernel related?
14:37karolherbst: mesa glsl
14:37karolherbst: tesselation is broken in TR2013 as well
14:38karolherbst: or maybe I broke it, let's see
14:40karolherbst: huh "fifo: PBDMA0: 00040000 [PBENTRY] ch 2 [00bf890000 TombRaider] subc 0 mthd 0000 data 00000000"
14:40imirkin_: karolherbst: it must do things in parallel. the dream is over.
14:41karolherbst: well I could play it with nouveau
14:41karolherbst: just like in 20% of all cases the window stays black on start
14:41karolherbst: okay, I broke the tesselation thing
14:41karolherbst: I think
14:42tobijk: will check soon, i have that one available
14:44imirkin_: karolherbst: i don't think so...
14:44karolherbst: well I really broke it myself, because I tested with my opts
14:45imirkin_: don't need others' help anymore? can break it all by yourself now? :)
14:46tobijk: imirkin_: i have a little bit of free tie on my hands right now, can i be of service? :D
14:46imirkin_: tobijk: what hw you got again?
14:47imirkin_: well, (a) make sure that reclocking works on it, if you haven't already
14:47karolherbst: well you can always use my branch for reclocking :D
14:47tobijk: it does work fine
14:47tobijk: already does for a while now
14:47karolherbst: tobijk: my branch or stock?
14:47tobijk: stock 4.7
14:48karolherbst: tobijk: did you reclock?
14:48karolherbst: ohh wait
14:48karolherbst: I see
14:48karolherbst: -- ID = 2, mode: 1, link: 50, voltage_min = 1087500, voltage_max = 1087500, volt = 1087500 [µV]
14:48karolherbst: not much we can do wrong here on stock
14:48imirkin_: when there's a will, there's a way
14:49karolherbst: and the link is -- ID = 50, mode: 1, link: 55, voltage_min = 0, voltage_max = 12500, volt = 64000854 + (-69906 * T * 5^6) >> 10 [µV]
14:49imirkin_: tobijk: welll.... how much time are you looking to invest?
14:49karolherbst: so yeah. -0.0125V in the worst case
14:49karolherbst: so who cares
14:49imirkin_: tobijk: there are a few things that can be done, but none of them are 5-minute items
14:49tobijk: 07: core 270-405 MHz memory 810 MHz
14:49tobijk: 0f: core 270-708 MHz memory 1800 MHz AC DC *
14:49tobijk: AC: core 708 MHz memory 1800 MHz
14:49imirkin_: [if they were, i would have done them already]
14:49tobijk: i' fine :)
14:49karolherbst: tobijk: I have your vbios ;)
14:49tobijk: imirkin_: i got the weekend mostly :)
14:49imirkin_: well, you could look into guardband clipping
14:49imirkin_: the basic idea of guardband clipping, as it was explained to me
14:50imirkin_: is that *actual* clipping is expensive
14:50imirkin_: so you want to avoid it as much as possible
14:50imirkin_: and GPUs are designed to rasterize onto 16kx16k grids
14:50imirkin_: but if your viewport is less than that
14:50imirkin_: then a bunch of that grid remains unused
14:51imirkin_: instead of doing "real" clipping (i.e. cutting of the primitive, etc)
14:51imirkin_: you just take the primitive and rasterize it onto the grid
14:51imirkin_: and you know which bits of the grid are out of bounds and which aren't
14:51imirkin_: and so you can trivially discard those fragments
14:51karolherbst: of course, my "stupid SmartCSE" pass broke it
14:51imirkin_: now, this only works if the primitive fits fully onto the grid
14:52imirkin_: so you set up a "guard band" to define an area outside the viewport but that is still rasterizable
14:52imirkin_: and so then the GPU can clip a lot faster than it otherwise would
14:52karolherbst: see anything what I do wrong here? https://github.com/karolherbst/mesa/commit/02874c9ee7511f6c713fe95c45292e09a48ce24e
14:53tobijk: nvidia acually has soething of use for this: http://developer.download.nvidia.com/assets/gamedev/docs/Guard_Band_Clipping.pdf
14:53imirkin_: so there are a bunch of registers to set this up
14:53imirkin_: but ... i'm not sure where they are
14:53imirkin_: i thought we knew
14:54imirkin_: there's the CLIP_RECT stuff - but i don't think that's what that's about...
14:55imirkin_: probably a bit in VIEW_VOLUME_CLIP_CTRL to flip it on, and then more bits elsewhere
14:57karolherbst: okay, I mess with the phis the wrong way... that explains the rather big effect
15:01karolherbst: imirkin_: phis can have more than 2 sources? :O
15:01imirkin_: as many as you want
15:01imirkin_: well, not as many as you want
15:01imirkin_: but as many as there are incoming edges
15:01karolherbst: I merged two phis with src0 and src1 equal but src2 and src3 unequal
15:01imirkin_: not a good idea :)
15:02karolherbst: yeah, I noticed :)
15:02karolherbst: but does it help RA when we can eliminate phis in preRA=
15:03imirkin_: look at what GlobalCSE does
15:06karolherbst: okay :)
15:08karolherbst: PBENTRY again :/
15:15karolherbst: so on ultra settings you can play TR2013 without issues if you don't care about SSAA :D
15:15tobijk: karolherbst: my dl is finished, is there still need to check TR2013?
15:15karolherbst: with a 780 Ti that is
15:15karolherbst: tobijk: well, I am sure Kayden wrote patches for the white fog issue
15:16karolherbst: but I thought they got merged...
15:16tobijk: can you give me an example where to find it, so i can check it
15:16tobijk: or is it obvious? :D
15:18karolherbst: tobijk: I don't know anymore
15:19tobijk: oh wow, bump tess and it gets bright
15:19karolherbst: maybe this? https://cgit.freedesktop.org/~kwg/mesa/log/?h=glsl-copyprop
15:19karolherbst: tobijk: nope
15:19karolherbst: tobijk: depth of field ultra + post processing
15:21karolherbst: no idea why that game is 32bit only...
15:22karolherbst: okay,wrong branch
15:23tobijk: damnit where are the config files for TR2013
15:23karolherbst: tobijk: :D
15:23karolherbst: you can navigate with the arrow keys as well
15:23tobijk: i cant see anything literally
15:23karolherbst: the menus you see
15:24karolherbst: just hope you hit the right thing :D
15:24karolherbst: I will guide you, wait a second
15:25karolherbst: 4 down, enter, 1 down, enter
15:27tobijk: heh ok lets try that
15:27tobijk: 4 down, enter - done
15:28karolherbst: tobijk: tombraider* are the branches called :D
15:29karolherbst: tombraider-2 as the current one I think
15:29tobijk: too easy :D
15:31tobijk: karolherbst: fyi: $HOME/.local/share/feral-interactive/Tomb Raider/
15:32karolherbst: tobijk: well, you could do that also ingame you know :D
15:33karolherbst: tobijk: you only need the two newest commits from tombraider-2
15:34tobijk: if that resolves it i'm happy
15:37karolherbst: perf is really bad
15:38tobijk: i cant determine at all :)
15:38tobijk: *main menu*
15:39karolherbst: tobijk: recent mesa master?
15:39tobijk: failry recent
15:41karolherbst: well the two newest for me were enough
15:41karolherbst: maybe the game didn't pick it up
15:41karolherbst: how do you launched the game?
15:42karolherbst: if you launch it from cli, it invokes xdg-open steam://....
15:42karolherbst: you have to remove that
15:43tobijk: nah i started through the click-colorful client :D
15:44tobijk: ("click-bunti") ;-)
15:44karolherbst: now I am curious
15:44karolherbst: why does it work with my compiler opts? :D
15:44tobijk: i havent pulled the tr branch fixes
15:44tobijk: it is just damn bright :D
15:44karolherbst: ohh wait
15:45karolherbst: I also didn't had the fixes anymore applied
15:45karolherbst: yeah you need the top two commits of tombraider-2 branch ;)
15:50tobijk: imirkin_: for the VIEW_VOLUME_CLIP_CTRL is there more info somewhere, outside envytools? or has that comletely to be reed first?
15:51tobijk: no to 1 or 2? :)
15:58karolherbst: in the TR2013 benchmark: 13.0 => 13.2 with my opts, seems like there is nothing really wrong there anymore :)
15:59hakzsam: imirkin, both patches are good :)
15:59tobijk: karolherbst: with the phi joining / elimination?
15:59hakzsam: imirkin_, I was wondering about the maxwell logic, but actually your thing doesn't affect this part
16:03karolherbst: tobijk: nah, with a bunch of opts
16:03karolherbst: tobijk: they aren't really cleaned up and not well thought through and not well tested
16:03karolherbst: tobijk: but I really should go through them all and upstream the simple ones
16:03karolherbst: like this one: https://github.com/karolherbst/mesa/commit/d29a2569ed3cc2ec9ca6cd76c279eacb843e47f0 :D
16:04karolherbst: but I am sure there is a mod combination which messes that up
16:06tobijk: not that i can name one directly now, but that does not say anything :D
16:07tobijk: karolherbst: if you dont break something with it, i guess this is ok to go with
16:07tobijk: something = piglit
16:28imirkin_: hakzsam: yeah :)
16:32karolherbst: sadly you can't select older versions in steam :/
16:32karolherbst: in TR is every build as a beta branch
16:33karolherbst: windows only...
16:34karolherbst: I get some "GL_INVALID_OPERATION in glUseProgramStages(program not linked)" and stuff like that
16:55tobijk: karolherbst: what are those errors about exactly?
16:57karolherbst: tobijk: shader failing to compile I assume
16:58tobijk: karolherbst: yeah well, hopefully mesa puts out why :)
18:57karolherbst: nvidia really can't replay the tomb raider traces
18:57karolherbst: 0(20) : error C7548: 'layout(location)' requires "#extension GL_ARB_enhanced_layouts : enable" before use
18:57karolherbst: 0(20) : error C0000: ... or #extension GL_ARB_separate_shader_objects : enable
18:57karolherbst: 0(20) : error C0000: ... or #version 440
18:58imirkin_: karolherbst: weird...
18:59imirkin_: depends what's in that shader
18:59karolherbst: they also ported against mesa
18:59imirkin_: they could be missing something
19:00imirkin_: although mesa is reasonably good at not enabling features
19:00imirkin_: without them having been explcitly enabled
19:01karolherbst: I will check it
19:01imirkin_: ARB_separate_shader_objects was in GL 4.1
19:01imirkin_: so that should probably be #version 410
19:01imirkin_: and that's where they fail
19:02karolherbst: I will check the shader
19:04karolherbst: #version 420 core
19:04karolherbst: and then #extension GL_ARB_shader_storage_buffer_object, GL_ARB_shader_image_size, GL_ARB_fragment_layer_viewport, GL_ARB_texture_query_levels, GL_ARB_shader_texture_image_samples and GL_EXT_shader_integer_mix
19:05imirkin_: yeah, i think nvidia's wrong
19:05imirkin_: tobijk_: oh, another thing you could do is add support for multi-sampled images
19:05imirkin_: hakzsam and i kinda wimped out on that
19:10tobijk_: imirkin_: the gb clipping sounds more promising to be honest, problem is i dont know how to trace to the right reg(s)
19:11karolherbst: imirkin_: mhhh, maybe 4.4 is right
19:11karolherbst: imirkin_: because it complaints about GL_ARB_enhanced_layouts
19:11imirkin_: karolherbst: no, it says that ARB_enhanced_layouts adds location, which it does
19:11imirkin_: karolherbst: but sso also adds location
19:12karolherbst: now I get it
19:12imirkin_: see 126.96.36.199
19:12karolherbst: I see
19:13karolherbst: so it is an nvidia compiler bug :)
19:13imirkin_: [ugh, that's the stupid one with changes... but you get the idea]
19:13tobijk_: imirkin_: those regs have to be huge i guess as nvidia states: -100,000,000 to +100,000,000 both horizontally and vertically (for geforce 256)
19:13tobijk_: in pxl
19:13imirkin_: tobijk_: uhhhhh... dunno
19:13imirkin_: tobijk_: it's probably a 64kx64k viewport
19:15mwk: yay, I have a full falcon3 disassembler testsuite, and llvm-mc passes :)
19:15tobijk_: well i guess reeing on the basis of the viewport_array piglits and hope for the best...
19:16mwk: so, time for assembler
19:20hakzsam: imirkin_, so, textures/samplers seem to be aliased on fermi, I have just "fixed" the UE4 fail without using the HUD, but I have to think more for the correct fix
19:21hakzsam: imirkin_, the workaround is hastebin.com/barukukeqo
19:21hakzsam: I will probably rework a bit the invalidation thing between 3d and cp on fermi
19:21hakzsam: but it looks like buggy :)
19:22imirkin_: hakzsam: ok. note that just setting dirty isn't enough... you also have to set the dirty bit
19:22imirkin_: i.e. in nvc0->dirty_3d
19:22hakzsam: yup, as I said it's not the correct fix
19:22imirkin_: i'm not surprised that they're aliased though
19:22hakzsam: setting the dirty bit doesn't change anything, it works without it
19:22imirkin_: everything else is :)
19:25karolherbst: if somebody wants to see some TR2013 flickering: https://drive.google.com/open?id=0B78S7GSrzebISkJnMHJXTUNTRXc
19:49karolherbst: hakzsam: got some time now?
20:14hakzsam: karolherbst, not sure, I should leave in few minutes
20:14karolherbst: no worries, I send you the stuff via email anyway
20:14hakzsam: yeah, got it
20:14karolherbst: shouldn't take you more then 3 mintues (if you ignore compiles) :)
20:17tobijk_: karolherbst: i would liek to experience the TR 2013 flicker, but apitrace does not like me at all, it always dies :D
20:17karolherbst: why does it die?
20:17imirkin_: are you using an older apitrace
20:18imirkin_: that doesn't handle glCopyImageSubData properly?
20:18tobijk_: imirkin_: i had the newest pulled some mins ago
20:18imirkin_: ah ok
20:18tobijk_: now i checked out 7.1 that seems to work
20:18imirkin_: and that issue was fixed a while back
20:18imirkin_: ah, fun
20:18imirkin_: well, not fixed. but hacked around "good enough"
20:19imirkin_: if anyone ever feed it a renderbuffer, it won't work
20:19imirkin_: but no one ever does
20:20tobijk_: yay and another "unhandled exception" O.o
20:25tobijk_: ah apitrace and waffle do not like each other
20:25imirkin_: but waffles are so delicious
20:25tobijk_: now i get hungry again :)
20:26tobijk_: karolherbst: where does it flicker?
20:26tobijk_: it is a bit bright, but else it looks fine
20:26karolherbst: the plants
20:26tobijk_: mh game bug ;-)
20:27karolherbst: but shouldn't be
20:27karolherbst: tobijk_: whic
20:27karolherbst: tobijk_: which mesa version do you have?
20:28tobijk_: up to https://cgit.freedesktop.org/mesa/mesa/commit/?id=c52e92ec3a37c9ab3fb35132e62e1ddf6a770c27
20:28tobijk_: but now i see the flickering
20:29tobijk_: was just not watching close enough inbf
20:29karolherbst: there you go :D
20:39karolherbst: uhh, other stuff also broken now...
20:44tobijk_: if qapitrace wouldnt always die on me...grr
20:49karolherbst: yay! I did it. I got the windows frozen window issue, just in TR2013! and just for graphics!
20:50tobijk: karolherbst: i got a system frozen with your trace, not that i like it :/
20:51tobijk: after replaying it several times though
20:54tobijk: karolherbst: did you narrow it down to a call already? or some calls?
20:55karolherbst: I will try to bisect first
20:55karolherbst: because I am sure this didn't happen before
20:55karolherbst: maybe it did...
20:55karolherbst: who knows
20:58karolherbst: well another trace incoming :D
20:58karolherbst: did I really said I didn't had any issues....
21:00pmoreau: imirkin_: I’m back. Is this the old patch you were talking about? http://hastebin.com/eruxelejeq.m
21:00hakzsam: karolherbst, I'm here actually, but I'm fixing the UE4 issue on fermi, will test your stuff a bit later
21:01imirkin_: pmoreau: no, there were more lines of code :)
21:01imirkin_: pmoreau: although they were largely involved with setting the layer thing
21:01pmoreau: That one then? http://hastebin.com/efekomahih.diff
21:02imirkin_: except s/prog/last/ in that func
21:02pmoreau: Ok. Since that was the last patch you gave me, I was not considering it to be the "old" one :-)
21:03hakzsam: imirkin_, this should be enough for now http://hastebin.com/okutawosik
21:03hakzsam: reworking the invalidation state will take more time
21:04imirkin_: hakzsam: dirty = valid
21:04imirkin_: or did we decide that was a bad idea?
21:04hakzsam: no _valid for textures/samplers
21:04imirkin_: really? o well
21:04imirkin_: ~0 worksforme
21:04imirkin_: this fixes it for UE4?
21:05hakzsam: maybe, we should introduce a valid array for textures and samplers...
21:05imirkin_: ohhhhh wait... it occurs to me ...
21:05hakzsam: I tried with the realistic demo and with the reflects one
21:05imirkin_: i wonder if ->num_samplers/num_textures being off causes issues
21:05hakzsam: will try with element once is downloaded
21:05pmoreau: imirkin_: What was the init you were talking about?
21:06imirkin_: pmoreau: i told you to try setting some reg to some value in init
21:06imirkin_: in like ...
21:06imirkin_: just stick IMMED_NVC0(push, SUBC_3D(thing), 1) or whatever i said earlier
21:06imirkin_: i've long forgotten :)
21:06pmoreau: Wut? I love the function name!
21:07imirkin_: all the unk's in one place :)
21:07pmoreau: I still have the command to insert, I was just missing which init you were referring to.
21:07hakzsam: imirkin_, I wonder if improving the validation stuff could improve performance (if we reduce the number of flushes without breaking anything of course)
21:08imirkin_: hakzsam: probably
21:08imirkin_: hakzsam: step 1 - get a CLEAR understanding of how it works
21:08karolherbst: hakzsam: :)
21:08hakzsam: imirkin_, I think I get the idea :)
21:08imirkin_: hakzsam: once you do, i will tell you in advance, you will have gotten it wrong. figure out why you're not understanding it properly, and try re-reading all the code again
21:08imirkin_: once you do that a few times, you might start to get it :)
21:09hakzsam: I have already started ;)
21:09hakzsam: does the patch looks good to you?
21:10imirkin_: hakzsam: seems reasonable. but you need to be more careful with bufctx's
21:10imirkin_: you're going to start growing bufctx's without bound like that
21:10hakzsam: need to call nouveau_bufctx_reset() sometimes
21:10tobijk: karolherbst: does the flickering happen even in the benchmark?
21:11hakzsam: oops, my HDD is full
21:11pmoreau: imirkin_: viewport_index-render still fails, with both `IMMED_NVC0(push, SUBC_3d(0xf60), 1);` and 0xf10.
21:12imirkin_: so it wasn't that one.
21:12imirkin_: o well.
21:13imirkin_: someone with the hw will have to find which of the 40 unknown regs it i
21:15hakzsam: imirkin_, speaking about bufctx, I think we need to clear it before validating surfaces on kepler, exactly like we do on fermi
21:16imirkin_: actually the fermi fix was wrong
21:16imirkin_: we need to clear it when we set the dirty flags
21:17karolherbst: this is a funny issue: https://drive.google.com/open?id=0B78S7GSrzebITUlEeThKS1RkYmM
21:17karolherbst: tobijk: no clue
21:17karolherbst: tobijk: I saw it sometimes, but it seems really rare
21:17tobijk_: karolherbst: the benchmark looks fine with my own TR
21:18tobijk_: before the system froze again :O
21:18hakzsam: imirkin_, sounds like it's the way to go
21:21mupuf: nice! You figured out the bug with the demos :)
21:21hakzsam: imirkin_, I'm stupid, clearing the bufctx on kepler is just useless because we don't invalidate between 3d and cp
21:22hakzsam: and it's already cleared when we bind the images state
21:22hakzsam: mupuf, yep
21:23tobijk_: karolherbst: wow nice fire effect :D
21:24karolherbst: I know :D
21:25tobijk_: [ 627.750933] nouveau 0000:01:00.0: fifo: read fault at 012ac9f000 engine 00 [GR] client 03 [GPC0/L1_1] reason 00 [PDE] on channel 2 [007fa02000 TombRaider]
21:25tobijk_: [ 627.750938] nouveau 0000:01:00.0: fifo: gr engine fault on channel 2, recovering...
21:26tobijk_: karolherbst: didnt you have this one a few hours ago? -^
21:27karolherbst: tobijk_: nope, I had something else
21:27tobijk_: but a honest wow to the recovering, it worked :)
21:28karolherbst: yeah, as long as the GPU isn't messed up and it is only the graph engine, it works pretty well most of the time
21:33hakzsam: imirkin_, what about this one http://hastebin.com/miyewekome ?
21:33hakzsam: we don't use bins for samplers
21:33imirkin_: right, coz samplers use no resources
21:33imirkin_: or rather, reference no resources
21:34imirkin_: i wonder what happens
21:34imirkin_: when you have, say, a frag shader
21:34imirkin_: with 1 texture
21:34imirkin_: and a compute shader with 2 textures
21:34imirkin_: and you switch from compute to frag
21:35imirkin_: i BET that the frag one needs to invalidate the "second" texture
21:35imirkin_: but it won't, coz num_textures is 1
21:35imirkin_: i think as part of this, you need to set num_textures[s] to MAX(num_textures[s], other thing)
21:35imirkin_: or something liek that... but hopefully you see where i'm going with this
21:36imirkin_: anyways, what you have there is probably fine for now? although i don't really like the placement of the surface bufctx_reset's
21:36hakzsam: seems fine yeah
21:36hakzsam: it fixes that issue at least
21:37hakzsam: but could be improved as you said
21:37imirkin_: yeah, you probably need to adjust graph_state->num_textures/samplers
21:38hakzsam: this seems to be already set
21:38imirkin_: yes, but
21:38imirkin_: the IDEA is that nvc0_graph_state represents GPU state
21:38imirkin_: so that we can emit only the updates necessary
21:39imirkin_: but when setting compute/graphics textures, we end up messing up other bits of gpu state
21:39imirkin_: so i think we need to model that in the graph_state struct
21:39tobijk_: karolherbst: are you using the dof ultra setting for your traces? :O
21:39hakzsam: imirkin_, part of the invalidation improvements :)
21:40imirkin_: you can push your patch, it's fine
21:40tobijk_: karolherbst: as hard as i try now to make an own small trace, those bugs do not occur in my game
21:40hakzsam: I guess this graph state was designed for 3d only
21:40imirkin_: imperfections can be trued up later.
21:40hakzsam: imirkin_, and probably needs to be adjusted with compute
21:40imirkin_: by yours truly.
21:41imirkin_: or actually i guess i just refactored it
21:41karolherbst: tobijk_: no, lowest possible
21:41hakzsam: imirkin_, so, Rb?
21:41imirkin_: hakzsam: sure.
21:41hakzsam: I will just check with elemental
21:41karolherbst: tobijk_: well there was an update today
21:42tobijk_: karolherbst: ah i run with higest, expect the dof thing
21:43karolherbst: tobijk_: but which issue can't you reproduce?
21:44tobijk_: karolherbst: both
21:44hakzsam: imirkin_, elemental replays just perfectly :)
21:44tobijk_: it shows in the trace, but not in the actual game
21:44imirkin_: hakzsam: fantastic
21:44karolherbst: tobijk_: well the flickering one is really rare, and for the fire one you have to enable the isntincts first
21:44karolherbst: tobijk_: maybe it only happens at that one place, who knows
21:44hakzsam: imirkin_, but performance is ... catastrophic
21:44tobijk_: karolherbst: actually i am at the one fireplace by chance :D
21:45karolherbst: mhh weird
21:45imirkin_: hakzsam: performance is catastrophic either way...
21:45imirkin_: i.e. GL 3.2 and GL 4.3
21:45imirkin_: it's pretty smooth at 720x480 on my GK208
21:45hakzsam: imirkin_, and especially on fermi
21:45imirkin_: and really smooth at like 512x384 or whatever
21:45imirkin_: and REALLY smooth at 1x1 ;)
21:46hakzsam: I used 480x320 ...
21:46tobijk_: imirkin_: only lies :P
21:49hakzsam: well, the UE4 demos will be good for improving the invalidation state
21:49hakzsam: because they use both 3d and cp a lot
21:53karolherbst: I always thought they are smooth for me in any case :O
21:54tobijk_: karolherbst: bragging much? ;-)
22:05Ryushin: I have a new Thinkpad P70 with a Quadro M5500 card, which I believe is NV110. I'm having a heck of a time trying to get the Debian nvidia binary driver to work. It pretty much gives me a kernel panic when I select discrete in the bios. How well implemented is the nouveau driver at this point? Power management is also a bit important to me as I use my laptop on my lap and I don't wish to sterilize myself.
22:08imirkin_: Ryushin: nouveau will successfully power off that gpu
22:09imirkin_: Ryushin: as long as you don't actually want gpu power, it'll work great :)
22:09Ryushin: imirkin_: So performance should be good and I should no longer have to even worry about trying to get the nvidia binary driver to work. Do I need any kind of binary firmware from the nvidia driver?
22:10karolherbst: well "good"
22:10imirkin_: Ryushin: performance should not be good.
22:10imirkin_: Ryushin: but you should be able to keep your reproductive organs in order
22:10karolherbst: Ryushin: sure it is a nv110?
22:11karolherbst: I am pretty sure it is a nv120
22:11Ryushin: karolherbst: I believe so. It's not listed in the matrix though.
22:11karolherbst: Ryushin: what does lspci say?
22:12karolherbst: it is nv124
22:13karolherbst: okay, so performance is bad
22:13imirkin_: karolherbst: performance would be bad either way :)
22:13imirkin_: Ryushin: you should default to the intel gpu too
22:13karolherbst: imirkin_: well, not so much, because maxwell1 is reclockable
22:13imirkin_: karolherbst: not upstream.
22:13karolherbst: but the situation looks better than maxwell2 ;)
22:14Ryushin: Yea, it defaults to the intel as it is. Trying to get the m5000 to run, well, because, and well, it added another $1000 to the cost of the laptop.
22:14karolherbst: Ryushin: well, you want to use bumblebee in that case
22:14karolherbst: nouveau can't reclock GM20x
22:14Ryushin: Actually, I rather stay away from it. Rather just have it run native if possible.
22:14imirkin_: Ryushin: learn from your mistakes.
22:14karolherbst: which isn't even nouveaus fault, but nvidias :D
22:15karolherbst: Ryushin: well, talk to gnurou :D
22:15Ryushin: imirkin_: Mistake in what regard. I'm learning figuring out the kernel panic right now.
22:16imirkin_: Ryushin: getting an nvidia gpu and trying to use it in linux
22:16karolherbst: Ryushin: anyway, your nvidia gpu will stay at lowest clocks, which means the intel one will be faster
22:17karolherbst: Ryushin: and you want recent kernels + recent mesa if you plan to use nouveau
22:17Ryushin: karolherbst: Unless I use the binary driver?
22:17karolherbst: Ryushin: right, but then you simply would use bumblebee
22:17Ryushin: Running 4.5 right now.
22:17karolherbst: Ryushin: and bumblebee will power off the gpu (if there is no odd issue)
22:18karolherbst: Ryushin: it's just, that we lack proper firmware to also reclock the memory on those cards
22:18karolherbst: Ryushin: and 4.5 is too old
22:18karolherbst: or is it?
22:18karolherbst: no, I think maxwell2 support got merged in 4.6
22:19Ryushin: Dont' have 4.6 zfs support yet in Debian. Soon though so I will moving to 4.6 shortly.
22:19karolherbst: so, you won't be able to use hw accell on 4.5 with nouveau then
22:19karolherbst: 4.6 + recently new linux-firmware is required
22:20Ryushin: Maybe it's best to just use the Intel for the next six months or so and let nouveau cook a bit.
22:20karolherbst: and mesa-11.2 whatever
22:20karolherbst: Ryushin: well
22:20karolherbst: Ryushin: it took a while until we got the other firmares
22:20karolherbst: gnurou should know more though, even if he can't say anything
22:21Ryushin: Good to know. I'll ask gnurou when it comes back.
23:49huelter: just here to thank you guys, kernel 4.6 solved my issues on 600 ti. Closed the bug https://bugs.freedesktop.org/show_bug.cgi?id=95031
23:49huelter: 660 ti*
23:50imirkin_: don't jinx it :)
23:50huelter: yeah =P
23:50huelter: but something has definitely improved
23:51huelter: was not expecting it, I think no major changes were introduced
23:51huelter: reclocking was not a factor