01:42imirkin: gr. ILLEGAL_SPH_INSTR_COMBO.
02:29dboyan: imirkin: After digging the code, I guess the problem I'm having lies in latency calculation. I was using getLatency() in targets, but they only calculates issue latency, not real latency. If I understand correctly.
02:29imirkin: you should assume that the functions for getLatency() and getThroughput() return bogus values
02:33dboyan: e.g., the real latency between tex and texbar should be much greater than what I got from getLatency() right?
02:33imirkin: dunno what getLatency() returns. but it should be ... i dunno, 10000?
02:34imirkin: that's probably high
02:35imirkin: but you get the gist
03:04imirkin: w00t, looks like it all works... at least the handful of piglits.
03:05imirkin: hakzsam: we had a pretty nasty bug with indirect image handling before :(
03:05imirkin: [see if you can spot it... nv50_ir_from_tgsi.cpp]
04:25sedrosken_: I was told to ask about reclocking support for my GT730 in here?
04:38imirkin: sedrosken_: just use the pstate file...
04:39sedrosken_: forgive my ignorance but I'm not sure what that means
04:39imirkin: depends which GPU it is actually... if it's a GF108 you're SOL, but if it's GK107 or GK208 should be fine
04:39imirkin: sedrosken_: /sys/kernel/debug/dri/0/pstate
04:39sedrosken_: hmm. Not sure which one it is exactly, I should check
04:40sedrosken_: but thanks
04:40imirkin: as mentioned briefly at https://nouveau.freedesktop.org/wiki/
04:40imirkin: sedrosken_: lspci -nn -d 10de:
04:40sedrosken_: it's much appreciated
04:40imirkin: should tell you which one
04:41imirkin: hakzsam: well, hard hang with DOW3 ... no info from that, so dunno if it's something in bindless, or an unrelated issue. how did one run hitman to enable bindless?
04:52imirkin: hakzsam: skeggsb_: this is my latest version that enables bindless - https://github.com/imirkin/mesa/commits/cts
10:41pmoreau: karolherbst: I don’t remember: were you planning to make more changes to "nv50/ir: improve POW lowering"?
10:47karolherbst: pmoreau: no
11:32hakzsam: imirkin: in the preferences file, add this "<value name="EnableBindlessTexture" type="integer">1</value>" into "IndirectX/Direct3D/Config"
11:33hakzsam: you will have to bump the limit because hitman uses a ton of handles
11:33hakzsam: I recommand you to test dirt rally actually
13:44pmoreau: karolherbst: For `pow(x, 6)`, you said that replacing it by MULs was not benefitial due to an increase of GPRs usage. How many more GPRs does it use, and perf-wise, is there any difference?
14:37imirkin: hakzsam: ok... will it just use bindless, or do i need to enable it somehow?
14:38hakzsam: imirkin: if you add the above line in your preferences file, it should enable it itself
14:39imirkin: "preferences file"?
14:39imirkin: [i mean for dirt rally]
14:39hakzsam: imirkin: ~/.local/share/feral-interactive/Dirt Rally/preferences
14:39imirkin: ah got it
14:40imirkin: hmmmm ... i don't have such a directory
14:40imirkin: i have it for other games... should i create it?
14:40imirkin: or will it get created first time i run it?
14:41imirkin: ah yeah, there it is
14:41hakzsam: yeah, launch the game and it should create the dir
14:46hakzsam: does it work?
14:48imirkin: let's see...
14:49imirkin: well, DOW3 crashed while loading the benchmark
14:49imirkin: (crashed the box)
14:49imirkin: hitman crashed too, without enabling bindless
14:49hakzsam: on gm107?
14:51karolherbst: pmoreau: shader-db stats inside the commit ;)
14:53pmoreau: karolherbst: So… I guess 4 programs used 1 less GPR, and 12 gained 1?
14:53imirkin: fifo: read fault at 00002b5000 engine 00 [GR] client 10 [PD] reason 02 [PTE] on channel 7 [007f8e0000 X]
14:56imirkin: this is on GK208 btw
14:58imirkin: ohw ait
14:59imirkin: i think i may have made a little mistake....
14:59hakzsam: dirt rally used to work, no?
14:59hakzsam: forgot to add the resident buffers to the context? :p
15:00imirkin: no. forgot to bump up the size of the uniform_bo after increasing the size of the driver constbuf
15:02hakzsam: ah yeah, little mistake
15:06imirkin: nope. still fail.
15:06imirkin: same exact one, too
15:07hakzsam: does it fail with master?
15:07hakzsam: I mean, does hitman/dirt rally work without bindless and master?
15:08imirkin: iirc my hitman crash was with 17.1 or whatever
15:08imirkin: let me try disabling bindless
15:12imirkin: hakzsam: yeah it loads without bindless
15:12imirkin: with the same tree
15:12imirkin: (dirt rally)
15:13hakzsam: I guess all arb_bindless_texture piglits pass?
15:16imirkin: gr: DATA_ERROR 0000000c [INVALID_BITFIELD] ch 8 [007f840000 X] subc 0 class a197 mthd 1b00 data 20020385
15:16imirkin: that's nice...
15:16imirkin: wtf is 1b00? sounds important...
15:16imirkin: oh. QUERY_ADDRESS_HIGH?
15:16imirkin: yeah, that's way invalid.
15:18imirkin: OGL_Dispatch_33: segfault at 0 ip 00007fa9bac912a5 sp 00007fa98bffe7e0 error 6 in nouveau_dri.so[7fa9ba590000+b7d000]
15:18imirkin: i win!
15:18imirkin: heh. when switching preset modes from ultra low to medium
15:19imirkin: hakzsam: and yeah, all the bindless piglits pass.... i think
15:19imirkin: i'll double-check
15:20imirkin: [29/29] skip: 1, pass: 28
15:40karolherbst: pmoreau: there is no information regarding the amount
15:41karolherbst: pmoreau: basically what we do here is, that we replace some SFU instructions with muls
15:41karolherbst: and this makes dual issueing easier as well
15:42karolherbst: and reduces instruction count, because we lower pow to 3 instructions if the exponent is an immediate value
15:43karolherbst: imirkin: cool
15:44pmoreau: karolherbst: I do understand that part. :-) It is just, IIRC, that you said that replacing `pow(x, 6)` by MULs was not worth it.
15:45pmoreau: karolherbst: I was interested by the data behind that claim.
15:46karolherbst: I think it was around 34 hurt with gprs
15:46karolherbst: I could check again
15:46karolherbst: but it was more like a "meh, not worth it"
15:47karolherbst: would rather try it out again after we got proper scheduling
15:56imirkin: hakzsam: ok, so with my tree, dirt rally appears to work fine without bindless.
16:08imirkin: wow, we really really suck at indirect
16:08imirkin: how did any of this stuff ever work
16:10imirkin: going to rebase those fixes out and do a full sweep
16:11karolherbst: imirkin: could you push your stuff somehow if there are fixes? Maybe those help out with those silly OOR_ADDR issues in hitman
16:12imirkin: karolherbst: well, i just discovered that we mess this up ALL OVER THE PLACE
16:12imirkin: so i'm doing a proper fix
16:12imirkin: and will send that out irrespective of the bindless stuff
16:12karolherbst: yeah, okay, that's basically why I asked
16:13karolherbst: do you have something I could try out to verify it?
16:13imirkin: oooh! found one place where we didn't mess it up. how nice.
16:15imirkin: but of course 5 lines below... back to messing it up
16:19imirkin: ok, now the rebase fun begins
16:28imirkin: karolherbst: https://github.com/imirkin/mesa/commit/32ccee4c1813a06ad2e7fdb63cff87fbfa712f77
16:28imirkin: should apply on top of master
16:31pmoreau: I’ll probably go back to getting images to work in OpenCL, trying out your current WIP. :-)
16:33imirkin: should go back and fix the way that maxwell images work too -- they shouldn't have to refer to things in the constbuf, they can be read out of the TIC
16:34imirkin: i guess i should retest dirt rally with this fix, since i fixed a few places unrelated to images but potentially might get hit with bindless textures even (as a result of additional code)
16:38imirkin: huh. it loads.
16:39karolherbst: imirkin: okay, your patch at least doesn't fix the OOR_ADDR tramps in hitman
16:39imirkin: can't fix 'em all
16:39karolherbst: maybe I get to figure those out, but those kind of issues are hard to track without a trap handler
16:41imirkin: oh wait lol... i didn't have bindless actually turned on in that build
16:41imirkin: let's try that again...
16:48imirkin: yeah, still dies. o well.
16:56imirkin: harry_x: you want to try to get ahold of gnurou or skeggsb_ and ask them for advice on how to debug your situation
16:56harry_x: Thanks. Will do that :)
16:57imirkin: gnurou was previously at NVIDIA and contributed all the secboot stuff to nouveau
16:57imirkin: and skeggsb_ knows everything
16:58imirkin: harry_x: if you're up for debugging, basically the issue is that there's a timeout waiting for something to happen. it'd be nice to get some information about the state of the gpu when that timeout happens.
16:58imirkin: harry_x: for starters, boot with nouveau.runpm=0 so that it doesn't turn off automatically making things much harder to analyze
16:59harry_x: imirkin: Okay, I will take look at the source code and try to figure out how to get state out of it :)
16:59harry_x: I already tried the runpm=0 with little results... But I can try again... One thing I have noticed is it looks like there is different between restart and power off/on cycle (which is quite strange - I would expect that it makes no difference)
17:00imirkin: harry_x: well runpm=0 is just to ease debugging
17:00imirkin: that's pretty common actually
17:00imirkin: warm boot vs cold boot
17:00imirkin: state gets left over on the gpu
17:02harry_x: imirkin: Okay, I will do new dmesg with that flag... Really? I have had expected that BIOS post will reset the card to some default state... I have explicitly disabled any fast boot optimizations
17:02imirkin: the BIOS can't affect external devices
17:02imirkin: the GPU doesn't wipe itself fully i guess
17:03harry_x: Kk. Gotcha. Okay will do full dmesg usng runpm=0 and power off/on cycle so it isn't affected by that
17:04rhyskidd: harry_x / imirkin: I'm around as well with my GP107
17:04rhyskidd: worth me getting a mmio trace of the blob booting with and without a HDMI connected?
17:04rhyskidd: and diffing that?
17:04harry_x: Why not. At least we can compare if you're facing the same issue or not, might be helpful
17:05rhyskidd: ok, I'm fixing up some comments I received to an envytools PR, then can work on an mmio trace
17:05harry_x: rebooting to get that dmesg :-)
17:11harry_x: Now this is very, very strange. I tried power off/on cycle twice with runpm=0 and it worked every time. I didn't worked before. So either it was based on the fact it was based on the fact that it was affected with the fact that it was just reboot.
17:12harry_x: Full power off/on without the flag still results in timeout, that I have verified several times
17:14harry_x: Power consumption is disaster according to powertop, but I guess that's expected...
17:16harry_x: Couldn't that be related to the fact that by default it is enabled on Optimus devices? And this is Optimus laptop, so it gets enabled. But it works more similar to classic GFX card (because the HDMI output is directly connected to the NVIDIA GPU)
17:20imirkin: karolherbst: ok, i figured out that issue with something lingering in the bufctx, i think
17:20imirkin: stupid index buffer. i even knew it might happen, but didn't anticipate the issue wrt clear. i originally had figured it'd be fine
17:35imirkin: hmmm ... looks like i may have messed up some ordering things wrt bindless
17:35imirkin: first have to add to bufctx THEN validate
17:36imirkin: oh i had gotten it right for graphics, wrong for compute
17:39imirkin: oooh. KHR-GL44.cull_distance.functional works now! (well, it fails, but no longer a compile failure!)
17:46imirkin: so that fails, and i think this is the same issue.
17:46imirkin: it would appear as though the "special" 0x2c0/0x2d0 slots aren't working for TCS <-> TES communication
17:47imirkin: karolherbst: could you get me a mmt of that piglit test on blob?
17:47imirkin: i want to see if we're doing something wrong, or if they use some other method of getting that data
17:48imirkin: [or anyone else with a blob setup who knows how to operate mmt]
18:05imirkin: interesting. modifying that test for TES getting the gl_ClipDistance from VS, it seems to work.
18:05imirkin: so it's really just TCS writing the clip distance that's not working
18:15imirkin: hmmm ... yeah, that TCS compilation appears buggered...
18:16imirkin: and sure enough, with NV50_PROG_OPTIMIZE=0 it passes
18:16imirkin: and same for the CTS test
18:18imirkin: karolherbst: ok, so no need for mmt trace then.
18:19imirkin: MemoryOpt is what's messing it up (unsurprising)
18:21karolherbst: imirkin: ohh I will look at your CTS fix
18:23karolherbst: imirkin: is there some special tests you are working on? everything bindless related or random stuff now?
18:47imirkin: karolherbst: "things i notice"
18:47imirkin: so kinda random
18:48imirkin: the first thing was coz i wanted to run some indexing tests in CTS, and that issue hit
18:48imirkin: then i ran the cull distance thing out of curiousity
18:54imirkin: there ya go
19:00karolherbst: did you hit that "KHR-GL44.pipeline_statistics_query_tests_ARB.functional_compute_shader_invocations" assertion already?
19:07imirkin: yeah... bug in their tests as far as i can tell
19:43karolherbst: imirkin: I see
19:51imirkin: urgh. having to go back and properly understand how MemoryOpt works
19:51imirkin: i have a hack-fix for it, but i don't think it's correct
20:25karolherbst: imirkin: did you noticed which tests are generating "OOR_ADDR" traps?
20:32imirkin: didn't see any
20:33imirkin: but i'm still trying to figure out this MemoryOpt thing
20:34karolherbst: yeah, I seem to come further when I run it inside gdb
20:34imirkin: file it
20:35karolherbst: it passes though
20:35karolherbst: yeah, I create a card
20:48imirkin: ok. i think i fixed it.
20:52imirkin: gonna do some archaeology... guessing that bug's always been there
20:56karolherbst: mhh "st b128 a[0x80] $r4:$r5:$r6:$r7 $r0 unk39"
20:56karolherbst: what's that unk39 thing?
20:56imirkin: right, so all this stuff has been there since 57594065c30f
20:56imirkin: karolherbst: blob does it.
20:56karolherbst: I see
20:56imirkin: bit39. and it's unknown ;)
21:14imirkin: urgh. no. my fix went too far again.
21:22Lyude: Was it both maxwell 1&2 that locked up fan control, or just 1
21:23Lyude: and on that note if it is just 1, do we have working reclocking for maxwell1?
21:23Lyude: *just 1
21:25karolherbst: Lyude: we have working reclocking on maxwell1
21:25karolherbst: why you ask?
21:26Lyude: karolherbst: had a friend complaining about the nvidia blob breaking after updating again and was curious if they could actually start getting decent performance with nouveau now
21:27imirkin: define 'decent'
21:27Lyude: imirkin: better then before
21:27karolherbst: it's decent enough for most things
21:27Lyude: e.g. before we had reclocking at all, lol
21:27imirkin: Lyude: like before when accel didn't load and you were stuck on software rasterizer? very much improved.
21:28Lyude: well i mean obviously
21:28Lyude: maxwell was probably around by the time we had some sort of reclocking
21:28karolherbst: it was never enabled
21:28imirkin: maxwell reclocking was turned on in 4.11 iirc
21:28imirkin: and kepler only became stable-ish in 4.10
21:28karolherbst: imirkin: same time as the kepler fixes
21:29Lyude: sorry i'm not being clear. "some kind of reclocking on some kind of gpu maybe not nessecarily maxwell"
21:29Lyude: i wasn't really working on nouveau at that point i'm pretty sure
21:29karolherbst: Lyude: just try it out
21:29Lyude: yeah! that's what I'm going to suggest :)
21:29imirkin: Lyude: kinda. but def not on kepler... and not turned on for the masses. and it's since been removed.
21:29imirkin: [and then reinstated, partly]
21:31imirkin: reclocking has a long and colorful history
21:31imirkin: in its current incantation, kepler and maxwell1 reclocking generally works
21:32imirkin: however it's still nouveau generating the programs and cmdstreams, so subject to the usual shittiness
21:33imirkin: i would def not recommend nouveau for someone who's looking to have a stable desktop experience and never think about graphics drivers again
21:33harry_x: what is the usual shitiness ?:--)
21:33Lyude: this is just for a prime setup, their main GPU is i915 anyway
21:33imirkin: harry_x: general lack of stability and failure to recover properly
21:33Lyude: but yeah, i mentioned to them to try it first and see how they like it
21:33imirkin: harry_x: additionally there are a number of visual issues in various games
21:34imirkin: [as well as plenty which just hang the GPU and often the computer entirely when started]
21:34imirkin: [but that fits under the first umbrella]
21:35imirkin: additionally nouveau ends up with considerably lower performance than blob under fairly identical conditions
21:35imirkin: the rule of thumb is 60-80% of the perf, but it varies from app to app
21:35Lyude: that's still a farcry from what it used to be when i first started linux
21:35Lyude: *started using linux
21:36imirkin: was nvidia making GPU's then?
21:36harry_x: imirkin: That is actually pretty good...
21:36Lyude: hehe, i am only 22 :P. so yes
21:36imirkin: ah :)
21:37rhyskidd: harry_x: can you upload your GP107 vbios to https://bugs.freedesktop.org/show_bug.cgi?id=100228
21:37harry_x: Hahaha when I started using linux it was just 3DFx cards and barely any drivers for anything... :-) So 60-80% is fcking good to my eyes :)
21:37harry_x: rhyskidd: Sure
21:37rhyskidd: cat /sys/kernel/debug/dri/0/vbios.rom > vbios.rom
21:38rhyskidd: I'd like to take a look at any differences between the GTX 1050 and GTX 1050 Ti that the vbios references
21:39rhyskidd: btw rhyskidd == Echelon9
21:40harry_x: Ah :-)) Gotcha.
21:42rhyskidd: harry_x: do you happen to have envytools on that laptop?
21:42rhyskidd: might be good to get the strap_peek as well
21:43rhyskidd: if you know how
21:43harry_x: I can have them in a hour :-) I need go afk for a moment then I will be back :)
21:43harry_x: Sure, will do
21:47rhyskidd: btw, I tried booting with nouveau on this GP107 with a HDMI connected -- still had the MMIO read timeout issue :(
21:55hx_mobile: Thats strange.. It helped on my machine. And runpm=0 helped even more. Must be somethinh different on our machines
22:32harry_x: Confirming that with runpm=0 it runs well without HDMI connected. But it hanged during suspend. I will investigate that and post a bugreport with details (if there isn't one already)
22:34imirkin: this means it's all interrelated with ACPI and PCIE PM stuff. sad.
22:54pmoreau: imirkin: I am not super familiar with file index works, but can’t they still overlap, even with your patch? For example if indexFile=0 points at address 0x0, with a size of 0x20, and indexFile=1 points at address 0x10, with a size of 0x10.
22:54pmoreau: *with how file index works
22:56pmoreau: About "nv50/ir: adjust overlapping logic to take fileIndex-relative offsets"
23:13imirkin: pmoreau: yes, that's the "TODO" comment i have
23:13imirkin: pmoreau: it's intended for e.g. constbufs or otherwise separate areas
23:13imirkin: although it's not like constbufs would get stores
23:14imirkin: but yes, it assumes that diff fileIndex's are non-overlapping
23:20pmoreau: imirkin: Mmh, I am not sure I see what `this->rel == that.rel` brings then
23:24pmoreau: Anyway, I need to get some sleep, and will look at all the patches tomorrow.
23:34imirkin: pmoreau: extremely little
23:34imirkin: pmoreau: this thing says "if two fileIndex's are different, don't overlap"
23:35imirkin: i've refined it to "if two fileIndex's are differnet AND their relative offsets are the same", it doesn't overlap
23:42imirkin: since if two fileIndex's are different but they have different relative offsets, then they might actually end up being the same
23:42imirkin: it's a purely hypothetical situation