00:03 mwk: shuffle2: you got my attention, talk to me when I sober up in 12 hours or so
00:03 shuffle2: cool :)
00:47 imirkin: shuffle2: it matters in that the falcon's interface to the surroundings are likely different
00:52 shuffle2: yup
01:30 imirkin: Lyude: patch sent. and accidentally to pmoreau instead of tobijk. oops.
01:30 Lyude: imirkin: sure thing, I'll try it first chance I get
01:31 imirkin: no rush
01:32 imirkin: like i said - pretty simple patch :)
01:32 imirkin: that said, i might have missed something.
01:32 imirkin: ah crap. yeah. i did.
02:10 Lyude: imirkin: NAK on the patch, fails to recongnize my GTX1060 and ends up loading modeset https://paste.fedoraproject.org/paste/Ka-KqRvTa1EskjWCYc7Yzl5M1UNdIGYhyRLivL9gydE=
02:50 imirkin: Lyude: did you grab the v2 i sent?
02:53 imirkin: Lyude: i don't think that log is from a system with either my patch or a GTX1060
02:53 Lyude: that's definitely a gtx1060, but I'll check again and make sure the right ddx got loaded onto there
02:54 imirkin: then why does it say NV126?
02:54 imirkin: perhaps you got fooled by systemd?
02:55 imirkin: hm no, that's a recent log
02:56 Lyude: imirkin: jfyi https://paste.fedoraproject.org/paste/bTjxYU679mQ6AOu49g3v015M1UNdIGYhyRLivL9gydE=
02:57 imirkin: 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP106 [GeForce GTX 1060 3GB] [10de:1c02] (rev a1)
02:58 imirkin: [ 3604.727] (--) PCI:*(0:1:0:0) 10de:13c2:19da:1366 rev 161, Mem @ 0xfa000000/16777216, 0xd0000000/268435456, 0xce000000/33554432, I/O @ 0x0000dc00/128, BIOS @ 0x????????/65536
02:58 imirkin: see my confusion?
02:58 Lyude: yeah that is, weird
02:58 imirkin: so i dunno what you're doing, but ... it ain't matching up :)
02:58 imirkin: looks like you're running X on the machine with the GTX 960 in it
02:59 imirkin: er, GTX 970 from the looks of it.
02:59 Lyude: I've only got one machine running right now and it's the one with the GTX1060
02:59 Lyude: oh
02:59 Lyude: -NOW- it loads
03:00 imirkin: ok, then a robber came and switched up the logs while you weren't looking, and then vanished into the darkness
03:00 Lyude: imirkin: i have no idea what was happening before but the patch works fine now
03:00 imirkin: maybe a stale log? dunno what TZ you're in, but it's listed as 3:42pm
03:01 Lyude: yeah, that must be it
03:01 Lyude: sometimes i get the wrong xorg log because RHEL and Fedora use different locations, and the one fedora uses depends on how you start X :(
03:01 imirkin: convenient.
03:01 imirkin: can you verify visually that things are generally working?
03:01 imirkin: e.g. start an xterm, and maybe play a xv video?
03:02 imirkin: doesn't have to be exhaustive
03:02 imirkin: also, if you could pastebin the "good" log, that'd be great - want to double check there's not something disabled that should be enabled
03:02 imirkin: (e.g. in my original GM20x bringup, i messed up the copy engine stuff)
03:04 Lyude: imirkin: https://paste.fedoraproject.org/paste/3FdZLmstJtEyZ20wvYan~15M1UNdIGYhyRLivL9gydE=/ if the copy engine doesn't look like it's working in that log though I have a hunch that one might be the fault of the nouveau version I've got installed on here
03:05 Lyude: getting this in my kernel: Mar 21 22:56:02 LyudeCowCube kernel: nouveau 0000:01:00.0: DRM: failed to create kernel channel, -22
03:05 imirkin: boooo
03:05 imirkin: if you could retest it with the stuff in drm-next, that'd be ideal
03:05 Lyude: sure thing
03:05 imirkin: otherwise i may have to check with ben as to wtf changed
03:06 imirkin: (or work it out myself)
03:10 dboyan_: imirkin: tarceri said he won't land the shader_cache api change, so I'll send out a v2 of tgsi cache enablement soon
03:13 imirkin: Lyude: wait what ... "error creating GPU channel"?
03:13 imirkin: Lyude: i think that means you don't have the gr firmware
03:13 Lyude: ugh, i was hoping I did because I didn't see a firmware error
03:14 imirkin: Lyude: or that the firmware loading has otherwise failed
03:14 imirkin: can you check? /lib/firmware/nvidia/gp106
03:14 imirkin: if that dir exists, you're set
03:15 Lyude: got gp100 but not gp106
03:15 imirkin: yeah, you need to update
03:24 Lyude: nice, now accel is definitely working. https://paste.fedoraproject.org/paste/QU3zfk1OFVvI5OhKm6~ZwV5M1UNdIGYhyRLivL9gydE=/ imirkin
03:26 Lyude: feel free to put my rb on
03:27 imirkin: Lyude: yay, thanks for testing - mind just poking around at a few things on the display and make sure it looks OK?
03:27 Lyude: yep, already did :) youtube seems to load fine as does terminal
03:28 imirkin: ok thanks
03:28 imirkin: i won't ask what that "event3" device is...
03:34 Lyude: imirkin: haha, it's a project I've got for making a remotely controllable USB keyboard
03:35 Lyude: mainly to configure machines remotely even if they're not booted into an OS
03:49 dboyan_: imirkin: https://lists.freedesktop.org/archives/mesa-dev/2017-March/149128.html
04:08 imirkin: dboyan_: thanks. i'm applying it to my local repo, but i'll wait until tomorrow in case there are comments.
04:11 Lyude: imirkin: other then the FILL_RECTANGLE register write, did you notice anything else on that mmt trace from the nvidia blob by chance?
04:11 imirkin: Lyude: the one you sent? tbh i've lost track of it =/
04:12 Lyude: no just the one from https://trello.com/c/Fc6BO64w/151-gm200-nv-fill-rectangle
04:12 imirkin: mind linking it again?
04:12 imirkin: oh, that one - no - i don't think i did see anything else obvious
04:12 imirkin: but let me take another look given the new info
04:12 imirkin: (i.e. that it doesn't work)
04:13 Lyude: alright, I'm thinking of trying to get nouveau mmt tracing working so I can more easily compare the the differences in reg writes between us and the blob
04:13 imirkin: ok, well the quick way to do it
04:14 imirkin: is to ... git grep nvif, and then set it to false, in nouveau_drm_winsys.c? something like that
04:14 imirkin: hm crap, hold on
04:15 imirkin: Lyude: revert a8c474760268f2ebdddd655cea06dbef4500236a
04:15 imirkin: that should make a trace that demmt can almost decode
04:15 imirkin: (in mesa)
04:15 Lyude: gotcha
04:16 imirkin: so looking at that trace
04:16 imirkin: there are just not a ton of options
04:17 imirkin: Lyude: https://hastebin.com/wazosifeto.http
04:18 imirkin: Lyude: it's ONE of those...
04:18 imirkin: Lyude: point me at your latest patches again/
04:19 Lyude: imirkin: https://github.com/Lyude/mesa/tree/wip/NV_fill_rectangle
04:20 imirkin: Lyude: for shits and giggles, mind moving the FILL_RECTANGLE setting above the MACRO_POLYGON_MODE_FRONT?
04:20 Lyude: imirkin: already tried :(
04:21 imirkin: what about adding a setting to set CONSERVATIVE_RASTER=0
04:21 Lyude: hm, lemme see
04:22 imirkin: (you'll have to add a #define for it too
04:22 Lyude: i figured, i've tried a couple different registers so far
04:23 imirkin: 0x452 << 2
04:23 Lyude: hm?
04:23 imirkin: 0x1148 = RASTERIZE_ENABLE
04:24 imirkin: er
04:24 imirkin: [or i could have looked it up in rnndb, but the trace was closer]
04:26 Lyude: also, how'd you figure out it was that section exactly?
04:26 imirkin: all the stuff from the previous draw
04:26 imirkin: so from the previosu END_GL to the next BEGIN_GL
04:26 Lyude: ahhh
04:26 Lyude: i'll keep that in mind for the future
04:26 imirkin: and i cut out a couple things i was pretty sure were irrelevant
04:27 imirkin: like ... the fence setting that it does to keep track of when things complete, etc
04:27 Lyude: alright, it's not the conservative raster
04:29 imirkin: could be something random
04:29 imirkin: from the previous initialization
04:30 imirkin: which we're missing but makes this feature work
04:30 imirkin: like PB: 0x00006420 GM204_3D.0xa18 = 0x6420
04:30 imirkin: (just picking one at random)
04:30 Lyude: yeah I played with a18 a little bit
04:30 Lyude: that does do something but I'm not sure what
04:30 Lyude: setting it to 0x6420 causes the window to just go blank
04:31 imirkin: that's good
04:32 Lyude: Interestingly enough, I made mesa set that for every render operation by accident and it had some interesting effects on X. I saw what looked like previous contents of the vram, except no visible garbage. Just the desktop and the outlines of all of the application windows that were open with nothing in them
04:32 imirkin: ok, so that probably messes around with rast
04:32 imirkin: did you set a1c to 0?
04:32 Lyude: didn't try that one, hold on
04:32 Lyude: oh wait, a1c. yeah I did and that one doesn't do anythibng
04:33 imirkin: i mean set both of them though...
04:34 Lyude: think I did but lemme double check
04:34 imirkin: and just once, somewhere up top
04:34 imirkin: like in the screen init
04:36 imirkin: it also appears to set 0x11f0 and 0xfb4 to 0
04:38 Lyude: imirkin: which one in screen init (or both?), and would that be nvc0_screen_create in nvc0_screen.c?
04:38 imirkin: throw everything you can think of in there
04:38 imirkin: and yes, it's that... there's a init_magic() or something
04:38 imirkin: stick things in there
04:41 Lyude: glad to hear I was at least on the right path with sticking all the weird writes I saw in there in random places :)
04:42 shuffle2: btw, pretty sure this one:
04:42 imirkin: when in doubt...
04:42 shuffle2: fb 83 1c 01 mpopunk [unknown: 00 80 1c 01]
04:42 shuffle2: should be mpopret r8 0x11c
04:42 shuffle2: judging by the prologue
04:43 shuffle2: mpopaddret*
04:44 shuffle2: perhaps even lmpopaddret :P
04:46 imirkin: Lyude: that said, even though i do that, it's very rarely worked out =/
04:47 Lyude: worth a shot at least, although I think I'll get a lot more insight when I try comparing the reg traces between nouveau and nvidia side by side
04:47 imirkin: and there's other random settings, like 1138=1, fb8=2
04:48 Lyude: also interesting, sticking the a18 write in screen init didn't kill everything
04:50 Lyude: i've got a couple more ideas to try but I didn't realize how late it was, so I think I'm gonna call it a night. thanks for the help imirkin!
04:50 imirkin: good luck, and good night :)
04:55 imirkin: shuffle2: seems likely, esp given where it is in the opcode map
09:36 pmoreau: imirkin: I could try the Pascal patch, but I’d need to either set up everything on my work computer, or bring the GPU back home (which is doable as I did it yesterday :-D)
11:48 dboyan_: karolherbst: Can you test if this branch works? https://github.com/dboyan/mesa/tree/nvc0-cache-test
11:48 dboyan_: karolherbst: If it doesn't work, please find out the first bad commit within the last three
13:28 RSpliet: pmoreau: how are barriers and texbars handled on the codegen side in the light of optimisations? Do they somehow end a BB? Are the analysis passes clever about this?
13:29 RSpliet: is it trying to force an order on memory(/tex?) read/write instructions, or rather on all instructions?
13:31 karolherbst: RSpliet: afaik, texbars can be anywhere, and thes end up usually before the first read, but that's pretty much everything I know
13:34 RSpliet: karolherbst: little knowledge I have I presume texbars are synchronisation instructions to make sure the texture unit has the data you're waiting for. You can't move tex read/write ops over them because that breaks functionality
13:34 karolherbst: exactly
13:35 RSpliet: but that's not my question
13:35 karolherbst: I think they even get inserted _after_ RA
13:35 karolherbst: so the SSA passes don't need to know anything about it
13:35 RSpliet: ah okay, that partially answers my question
13:35 karolherbst: but not _quite_ sure
13:35 RSpliet: however, SSA's do need to know about regular barrier instructions - they are inserted by the programmer
13:36 RSpliet: and the mechanism is largely the same
13:36 karolherbst: I am pretty sure that passes are dump regarding this, and we just happen to not messing things up
13:36 karolherbst: *mess
13:37 karolherbst: RSpliet: "NVC0LegalizePostRA::insertTextureBarriers"
13:38 karolherbst: so yeah, post RA
13:38 karolherbst: the only pass even aware of OP_TEXBAR is the FlatteningPass, which runs after RA as well
13:39 karolherbst: RSpliet: no clue about regular barrier instructions. All I can tell you is, that texbars are inserted after RA
13:41 RSpliet: karolherbst: I assume that some operations can safely cross the barrier insn - as long as they don't read or write mem. Presumably we don't currently have optimisations that could move mem writes forward so it's not an issue in practice
13:42 karolherbst: I hit this issue toying around with some post RA passes
13:42 karolherbst: we have a method which checks for overlappoing src/dest in post RA
13:42 karolherbst: which have to be called for every instruction jumped over
13:42 karolherbst: *has
13:43 karolherbst: but yeah, basically the texbar is directlry before the read/write
13:44 RSpliet: karolherbst: interesting to know this, thanks
13:44 karolherbst: RSpliet: regarding opts that move mem operations: no we don't have any, I toyed with that as well, but I didn't got any significant enough benefit
13:47 RSpliet: pmoreau: still curious how we handle barriers in program analysis :-)
13:47 RSpliet: karolherbst: btw, if you want to try and improve RA, here's two explicit tasks that may help
13:47 RSpliet: 1) Allocate from large to small to avoid holes in the register file due to misalignment
13:48 karolherbst: RSpliet: 1. yes.... please :D
13:48 karolherbst: RSpliet: output layout first though
13:49 karolherbst: I think somebody mentioned, that it is smarter to do this from bottom to top or so
13:49 karolherbst: anyway
13:50 RSpliet: 2) order allocation following heuristic explained on page 11 of http://www.cl.cam.ac.uk/teaching/1617/OptComp/notes.pdf
13:54 RSpliet: Once that is done, 3) apply heuristic to use the space in the first N registers as broadly as possible (round robin-esque). That can create more freedom to do post-RA scheduling
13:56 RSpliet: I've looked at RA not too long ago ( https://github.com/RSpliet/mesa/commits/bank-aware-RA ) - it's not that bad really
13:56 karolherbst: nice
13:56 karolherbst: I want to fix the two main issues for hitmanpro first though (after I upstreamed my other opts)
13:57 karolherbst: there are two main issues: 1. we run out of tic/tsc space 2. OOR errors most likely due to compute shaders
13:57 RSpliet: Sure, it's your time :-)
13:57 karolherbst: and I may up ending writing a shader trap handler
13:57 RSpliet: OOR? Out Of R...eeses cups?
13:57 karolherbst: exactly
13:57 RSpliet: sounds fatal
13:58 karolherbst: it isn't
13:58 RSpliet: oh, out-of-registers
13:58 karolherbst: no
13:58 karolherbst: range
13:58 RSpliet: oh
13:58 karolherbst: I think it breaks perf in hitman
13:58 RSpliet: You'd have to explain more later... back to my study now
13:58 karolherbst: cause after I removed the printks inside the kernel handler, fps went up by like 0,5%
14:04 karolherbst: side note to myself: don't start new things before finishing up already started stuff :O
14:13 nyef: karolherbst: That's a problem that I tend to have as well. Too many projects "in-flight" at once. /-:
15:08 imirkin: RSpliet: as has been mentioned, texbars are inserted post-ra. the only opt pass that's barrier-aware is the MemoryOpt pass (which combines memory loads, elides stores/duplicate loads, etc). it just flushes its cache when it hits a barrier.
15:09 imirkin: RSpliet: barrier instructions are also marked as "fixed", which i believe prevents some opts across them too, like CSE and whatnot
15:31 Tom^: you guys have some games to collect if you didnt already notice http://www.feralinteractive.com/en/news/752/
15:33 imirkin: yeah, now i have one less excuse =/
15:33 imirkin: although the excuse that my GPUs are slow still stands
15:33 Tom^: hehe
15:34 imirkin: but what they lack in speed, they more than make up for in quantity
15:34 imirkin: just have to get enough TNT2's in parallel, i'm sure they can beat the pants off your 780Ti
15:35 Tom^: pfft you dont have enough PCI-E for that
15:36 imirkin: not an issue - TNT2 was AGP/PCI :)
15:36 Tom^: oh i see
15:36 imirkin: https://en.wikipedia.org/wiki/RIVA_TNT2
15:37 Tom^: ah Release date 1998
15:37 Tom^: i was only 9 then :P
15:37 imirkin: ah... so i can tell your age by what season of simpsons it is?
15:38 Tom^: LOL ive never even thought about that
15:38 Tom^: but yes
16:29 karolherbst: ... https://www.feralinteractive.com/en/news/752/
16:30 karolherbst: well, I doubt I have 25 commits yet, so it should be fine I guess :D
16:31 Leftmost: 22 March will forever be the day that development on mesa stopped.
16:33 imirkin_: of course i bet like 95% of these need 32 textures... really need to make that happen on fermi.
16:38 karolherbst: most likely
16:38 karolherbst: and we need to fix that OOR issue as well
16:39 karolherbst: I bet it is only getting worse with newer titles
16:39 karolherbst: with >32 textures you mean that tsc/tic issue?
16:39 imirkin_: linked tsc
16:39 imirkin_: we have to use linked tsc mode on fermi to get 32 textures
16:39 imirkin_: well
16:39 imirkin_: to get 32 samplers
16:40 imirkin_: (or textures? i never remember)
16:40 karolherbst: ohh I see
16:40 karolherbst: but seriously
16:40 karolherbst: without working reclocking on fermi we don't really need to concern ourselves with getting those titles to run there
16:40 imirkin_: i do - right now fermi's my main board
16:40 karolherbst: I see
16:42 RSpliet: imirkin_: the DDR3 NVC8?
16:42 imirkin_: DDR3 NVC1 ;)
16:42 imirkin_: i don't think they made DDR3 NVC8's
16:43 RSpliet: equally unfortunate from a DRAM clock changing perspective :-P
16:44 imirkin_: yes
16:44 imirkin_: well, maybe once you land the gddr5 stuff, i'll see if ddr3 can be easily thrown in there for good luck
16:45 RSpliet: yeah... my experience with GT21x is that GDDR5 differs a *lot* from the other DRAM standards
16:45 RSpliet: to a point that if I ever go about implementing that, I'd build a new script rather than if-then-else the hell out of the existing one
16:46 karolherbst: sane
16:47 RSpliet: if my 14 year old monitor had DP I'd take a look at it myself at some point, but unfortunately...
16:48 imirkin_: RSpliet: not sufficiently familiar with the details to comment ... just would be nice to finally get reclocking everywhere
16:49 imirkin_: RSpliet: in other news, i have a G92 FX3700 coming to me soonish
16:49 RSpliet: karolherbst: I had a partial success in that area recently. Lots to do, but 324MHz on my NVC4 works
16:50 karolherbst: RSpliet: \o/
16:50 RSpliet: Portal perf x4, from 17 to 74 fps on the timedemo.
16:50 karolherbst: nice
16:50 imirkin_: i bet 1800mhz will be even faster.
16:51 karolherbst: not much though
16:51 fernandogudar: RSpliet: what made it do so?
16:51 RSpliet: I wonder where you got that suspicion from...
16:51 imirkin_: RSpliet: more mhz = more better :)
16:51 RSpliet: fernandogudar: lots of bashing my head against a keyboard, monitor, wall...
16:52 RSpliet: bbl, real work
16:52 karolherbst: nouveau is real work :p
16:53 RSpliet: not until it comes with a real pay
16:53 imirkin_: i'll give you one shiny penny...
16:54 karolherbst: RSpliet: since when is all real work paid one?
16:55 fernandogudar: https://arrayfire.com/demystifying-ptx-code/
16:56 fernandogudar: they demonstrate little better how pointers work, it's consistent as to what i thought, but i mess it up in describing
17:03 fernandogudar: should had pointed just one link, if someone was ever struggling with those, instead of lengthy shit, i forgat there is much cuda and ptx examples
17:10 fernandogudar: rest of the details about them i dunno, it's just good example to recap the things, at least how to call them, not sure about perf at all, but they can be used to do tricky things basically whatever comes to mind
17:10 imirkin_: fernandogudar: first and last warning.
17:13 fernandogudar: two terrible clueless fags in pair with biscuits in front, isn't that terrible?
17:20 Tom^: kattana: there are no fermi boards reclocking?
17:21 Tom^: uh karolherbst i meant, tabfail. and seems he isnt here either :P
17:21 imirkin_: Tom^: Roy has been playing around with it, sounds like with some success
17:21 imirkin_: Tom^: but not upstream, no
17:21 Tom^: oki
18:28 Lyude: imirkin_ btw, reverting that mesa commit didn't seem to help much with getting demmt to read the trace from nouveau
18:30 imirkin_: Lyude: it's step 1
18:30 imirkin_: there's a step 7 too
19:41 karolherbst: imirkin: the OOR error is global memory related
19:42 karolherbst: as a bold move I just disabled emiting global memory access stuff
19:42 karolherbst: no errors
19:44 imirkin_: oh
19:44 imirkin_: just occurred to me
19:45 imirkin_: could be a very sad-face thing
19:45 karolherbst: hum
19:45 karolherbst: after loading time the errors come back
19:45 imirkin_: which is global memory in the same place as the stupid local/shared memory window
19:45 karolherbst: but yeah, it's a lead for now
19:45 karolherbst: maybe there are two OOR types of issues
19:46 karolherbst: hum
19:46 karolherbst: okay
19:46 karolherbst: how to fix that?
19:46 imirkin_: we need to reserve that space with the kernel
19:46 imirkin_: so it doesn't place buffers there
19:47 karolherbst: ohh I see
19:47 imirkin_: or do what we have to do for vulkan, and let userspace just manage that stuff directly
19:48 karolherbst: mhh, which places in the code does it touch? Never done something related to this
19:49 imirkin_: well mostly you'd have to create a new BO-allocation ABI
19:50 karolherbst: in addition to the existing one or a completly new one?
19:51 imirkin_: erm
19:51 imirkin_: is there a difference between "addition to existing one" and "completely new one"?
19:51 karolherbst: well, adding stuff may be able to keep things backwards compatible
19:51 imirkin_: oh
19:51 imirkin_: like extending the ioctl? dunno
19:51 karolherbst: yeah
19:51 imirkin_: would have to look. i'm guessing not.
19:51 imirkin_: i'd rather just make a new one
19:52 imirkin_: it's not difficult to support both driver-side
19:52 karolherbst: well, we could do a switch on the version or so
19:52 karolherbst: I see
19:52 imirkin_: we'd bump the drm version
19:52 karolherbst: but I guess this would take some time to implement and everything
19:52 karolherbst: oh well
19:52 imirkin_: yes... that's likely.
19:53 karolherbst: I though ben wanted to work on vulkan anyway
19:53 karolherbst: so I would rather just to the faster fix
19:53 imirkin_: Lyude: sorry, i realize i didn't fully answer your question - if you can provide me your trace, i can provide you with a demmt patch to decode it.
19:53 imirkin_: [got distracted and forgot]
19:54 imirkin_: karolherbst: first check that that's what's happening. basically bo->addr == 0xfe000000 .. 0x100000000 are disallowed.
19:54 karolherbst: okay
19:54 karolherbst: so at bo creating time I just check for the addr, l
19:54 karolherbst: k
19:55 karolherbst: nouveau_mm_allocate calls?
19:55 imirkin_: i'd recommend just doing it in libdrm_nouveau
19:56 karolherbst: I would just add a gdb break I guess
19:56 imirkin_: just assert on it :)
19:56 karolherbst: for that I need to compiled libdrm, which I didn't setup yet ... :D
19:56 Lyude: imirkin_: sure thign
19:56 Lyude: *thing
19:56 karolherbst: oh well
19:56 karolherbst: should be easy
19:57 Lyude: Also unrelated but, every time I rsync my mesa directory from my main desktop to the machine I'm testing it on (e.g. the one with the nvidia card in it) no matter what I do running `make install` seems to always relink everything on the test machine, is there any way I can get it to stop doing that?
19:58 Lyude: by the time I sync mesa to the test machine everything's already been built, so there shouldn't be any reason for it to relink everything
19:58 imirkin_: libtool likes to relink against system libs
19:58 imirkin_: tbh i'm not sure about these details
19:58 karolherbst: imirkin_: was it nouveau_bo_new or something else?
19:58 imirkin_: feel free to discuss them with xexaxo1 who has become quite the build system expert
20:05 Lyude: imirkin_: https://lyude.net/~lyudess/tmp/gm204-test-trace.mmt.xz
20:06 karolherbst: imirkin_: okay, and how do I get the addr? struct nouveau_bo just has an offset
20:06 imirkin_: offset == addr
20:06 karolherbst: k
20:08 karolherbst: imirkin_: :) HitmanPro: ../../nouveau/nouveau.c:648: nouveau_bo_new: Assertion `bo->offset < 0xfe000000 && bo->offset > 0x100000000' failed.
20:08 imirkin_: uhm
20:08 imirkin_: i should think so.
20:08 imirkin_: i think you might have gotten that one a little wrong.
20:08 karolherbst: uhhh
20:08 karolherbst: I see
20:08 imirkin_: boolean logic is hard :p
20:09 karolherbst: :D
20:10 imirkin_: Lyude: grrr... i forgot how to fix it. this will take a minute.
20:10 Lyude: take your time!
20:10 imirkin_: haha just kidding. found it.
20:11 imirkin_: Lyude: https://hastebin.com/vuvetateqe.cs
20:21 Lyude: imirkin_: beautiful! thanks
20:26 imirkin_: np
20:26 imirkin_: at some point someone needs to fix up the logic for how stuff is set
20:26 imirkin_: i think it should stop looking at channels with nouveau
20:28 karolherbst: imirkin_: this is correct, right? "assert(!(bo->offset >= 0xfe000000 && bo->offset < 0x100000000));"
20:28 karolherbst: assert doesn'T get hit
20:28 imirkin_: might need a ULL
20:28 imirkin_: on the 100000000000000 thing
20:28 karolherbst: well, doesn'T get hit anyway
20:28 imirkin_: ok
20:28 imirkin_: well it was a thought.
20:29 imirkin_: Lyude: uhhhh
20:29 imirkin_: PB: 0x8001044f GM204_3D.FILL_RECTANGLE = { 0x1 }
20:29 imirkin_: oops?
20:29 imirkin_: Lyude: i think we both suck
20:29 Lyude: imirkin_: I noticed that
20:29 imirkin_: <bitfield pos="1" name="ENABLE"/>
20:29 Lyude: :|
20:29 imirkin_: i think you want a 0x2 there =]
20:29 Lyude: oh no
20:29 Lyude: i knew that was coming
20:30 Lyude: lololol
20:30 imirkin_: sorry, i should have picked up on that =/
20:30 Lyude: nah it's okay that looks like something I completely would have missed myself
20:30 Lyude: regardless it helped me learn about tracing things :), but let's see if the proper value here actually works
20:30 imirkin_: but i was like "oh yeah, 1, that's right, enable fill rectangle. that should be fine. definitely don't check the docs i wrote earlier."
20:31 imirkin_: ok, so the low bit is "rasterize whole fb", interesting
20:32 imirkin_: that could be nice for blits
20:33 karolherbst: imirkin_: I think I will check if this is caused by one certain shader or by a certain address, wish me luck
20:33 imirkin_: =]
20:35 Lyude: THERE WE GO!!! a perfectly healthy, well sized rectangle
20:35 karolherbst: ValueRef.get.reg.id?
20:35 imirkin_: karolherbst: that'd be OOR_REG
20:35 karolherbst: ohh true
20:35 karolherbst: offset?
20:36 karolherbst: I guess so
20:36 imirkin_: my claim is that it'll happen based on the bound CB's
20:36 imirkin_: and the addresses being accessed
20:36 karolherbst: might be
20:36 imirkin_: if the address is further than the CB was sized at in the binding, ka-boom
20:36 karolherbst: I could and the value until it doesn't get kaboom
20:40 imirkin_: Lyude: congrats =]
20:40 Lyude: hehe, thank you for noticing that
20:40 imirkin_: Lyude: you still need to cleanup/rearrange those patches, but sounds like you're getting there
20:40 Lyude: yep! the rest should be pretty simple
20:40 imirkin_: now to undo the umpteen hacks :)
20:41 Lyude: also talked with airlied and a few others, we want to try implementing NV_fill_rectangle for some other hardware as well so we can use it more in glamor
20:41 imirkin_: mmmm... i don't think it's a thing on other hw
20:41 imirkin_: other hw has these rectangle primitives
20:41 Lyude: yeah, that's what we were considering looking at playing around with
20:41 Lyude: supposedly radeon and intel has them?
20:41 Lyude: *have
20:41 imirkin_: and adreno
20:41 Lyude: yep
20:42 imirkin_: but it's a different mechanism
20:42 imirkin_: i guess it could be implemented on nvidia by doing triangles + that flag
20:42 imirkin_: yeah, that'd work
20:42 imirkin_: it'd work slightly differently in the undefined cases of those rect primitives, but wtvr - they're undefined
20:43 imirkin_: interpolation is also a little questionable - not sure how it works
20:44 imirkin_: Lyude: https://en.wikipedia.org/wiki/Barycentric_coordinate_system -- good to understand in general for how triangle vertex values become interpolated values at frag shader time
20:45 Lyude: alright, I'll take a look at that
20:45 imirkin_: Lyude: don't get into the details, they're largely irrelevant unless you're making your own interpolator
20:59 karolherbst: imirkin_: duh.... I smell a silly bug appliy a mask of "0x1af" on the address causes the error to disappear at loading time...
21:02 RSpliet: karolherbst: isn't that... going to cause aliassing?
21:02 karolherbst: it causes the game to missrender, if that's what you mean
21:04 RSpliet: well, that's not what I mean, but it *is* the likely outcome of aliassing
21:05 RSpliet: If this problem really is related to the "local mem window", I don't see why that has to be a big problem
21:05 imirkin_: it's not, since the asserts never triggered
21:06 imirkin_: i guess that assert wouldn't catch a bo that spanned that window on both sides
21:06 imirkin_: but it's ... 32MB iirc. big for a BO
21:06 RSpliet: hopefully stating the obvious, but is mesa built with debug? :-P
21:06 imirkin_: but not inconceivable
21:06 imirkin_: well, libdrm
21:06 imirkin_: i think those asserts are there even in release builds
21:06 imirkin_: there's no way to shut them off via configure
21:08 karolherbst: it's just about really small offsets
21:09 karolherbst: <0x500
21:09 imirkin_: karolherbst: instead of your hack, what about NV50_PROG_OPTIMIZE=0
21:09 karolherbst: doesn't help
21:09 imirkin_: hm ok. i was afraid that some of the indirect propagation logic was a little off
21:09 karolherbst: I could try it again, but I already disabled every pass
21:10 karolherbst: while testing
21:10 imirkin_: nah
21:10 imirkin_: well
21:10 imirkin_: wtvr, i'll let you work it out :)
21:10 karolherbst: mask is now "0xffffffaf"
21:10 imirkin_: "mask"?
21:10 karolherbst: well I apply &= 0xffffffaf on the offset
21:10 karolherbst: OOR error disappears
21:11 imirkin_: what offset
21:11 karolherbst: rc.get()->reg.data.offse
21:11 karolherbst: CodeEmitterNVC0::setAddressByFile
21:11 imirkin_: hm
21:11 karolherbst: for FILE_MEMORY_GLOBAL
21:13 RSpliet: the original mask was ?
21:14 karolherbst: thre was none
21:14 karolherbst: or do you mean the mask of all values?
21:15 karolherbst: now we are getting somewhere
21:15 karolherbst: if (src.get()->reg.data.offset < 0x80) src.get()->reg.data.offset &= 0xffffffaf;
21:15 karolherbst: fixes it as well
21:16 karolherbst: classical out of bound?
21:38 Lyude: imirkin_: is it weird that this breaks if I change the 0.26 to 0.25 and the 0.49 to 0.50? https://paste.fedoraproject.org/paste/WlVdZtSUPDmbxjwlv-Z3-V5M1UNdIGYhyRLivL9gydE=
21:42 karolherbst: imirkin_: okay... it happens for exactly three addresses: 0x1c, 0x48 and 0x7c. I guess masking them to 0x10, 0x40 and 0x70 _might_ "resolve" the error, and that would hint to a classical out of bounds I figure. At least it shouldn't be hard to track down now
22:03 imirkin_: Lyude: probe rgb 0 0 0.0 0.0 0.0 -- that's what fails
22:04 imirkin_: Lyude: er nevermind
22:04 imirkin_: Lyude: yes, that's weird.
22:04 imirkin_: off-by-one of some sort?
22:05 Lyude: imirkin_: not sure, should I check to see if the blob does the same thign?
22:05 Lyude: *thing
22:06 imirkin_: Lyude: i wouldn't worry about it
22:06 Lyude: alrightr
22:08 karolherbst: 0x7c -> 0x78 0x48 -> 0x10 0x1c -> 0x10 after those changes, the OOR error at loading time are gone
22:08 karolherbst: and the other OOR errors aren't global memory related afaik
22:08 imirkin_: ok, so there's actually another error that might happen
22:08 imirkin_: which is "unaligned mem access"
22:09 imirkin_: karolherbst: can i see the instructions that are hitting those?
22:09 karolherbst: in a second
22:09 xexaxo1: Lyude, imirkin: during the initial build (make) all the components are build and linked such that they can work from their current location
22:09 imirkin_: perhaps newer GPUs made that part of OOR_ADDR
22:09 xexaxo1: as one installs (make install) they are relinked to honour the updated location.
22:10 Lyude: xexaxo1: so if I just made sure I copied the source to the exact same spot on my test machine, it wouldn't relink?
22:10 xexaxo1: there's other ways to manage that, yet libtool has chosen this route
22:10 xexaxo1: I've not seen a way to disable it, and in all honesty you don't want that
22:10 xexaxo1: since the final (installed) binaries will simply not work correctly
22:11 karolherbst: imirkin_: the odd thing is, after I made my fixups, the rendering is totally broken, so I assume the hw can somehow "fix" it
22:11 Lyude: alrightr
22:11 imirkin_: karolherbst: could be yea
22:11 imirkin_: karolherbst: could you try rounding them up to the nearest 0x10?
22:11 karolherbst: or the hw just reads/writes out of bounds and doesn't care any further
22:11 imirkin_: instead of down
22:11 imirkin_: could be legitimately going out of bounds
22:11 xexaxo1: Lyude: once you "make install" you can copy stuff around
22:11 imirkin_: since you're decreasing them
22:12 Lyude: xexaxo1: ahh, that is useful to know
22:12 karolherbst: imirkin_: I choosed the highest values without causing OOR errors, but I check what I can do
22:12 xexaxo1: yw.
22:13 karolherbst: imirkin_: uhhh, 0x7c -> 0x80 also doesn't generate the error
22:13 xexaxo1: Lyude: handy tip - imagine you want to install to /usr/ (--prefix=/usr/)
22:13 imirkin_: karolherbst: ok, figured that might be the case
22:13 imirkin_: karolherbst: can i see the shader
22:13 karolherbst: I try
22:13 imirkin_: (and the instruction being changed)
22:13 imirkin_: my guess is there's something illegal going on
22:13 karolherbst: obviously
22:13 imirkin_: although with NV50_PROG_OPTIMIZE=0 it should have gone away
22:14 imirkin_: but perhaps we still do the large accesses?
22:14 xexaxo1: but you don't have root access to the system... then all you need is $ DESTDIR=/some/temporary/location/where/I/have/access/to make install
22:14 imirkin_: dunno
22:14 karolherbst: we'll see
22:14 xexaxo1: you can consider DESTDIR as "/"
22:14 imirkin_: karolherbst: anyways, the gist of it is that 64-bit loads have to be 64-bit aligned, 128-bit loads have to be 128-bit aligned
22:15 karolherbst: makes sense
22:15 Lyude: xexaxo1: and then just copy the files from that directory to /, right?
22:15 xexaxo1: Lyude: with this you can build on a fast server where you don't have root and scp over to your system
22:16 xexaxo1: Lyude: precisely - nifty, no ?
22:16 Lyude: yeah that's pretty much exactly what I was looking for, thanks!
22:16 xexaxo1: yw, enjoy
22:18 xexaxo1:recalls something funny about cmake - it also does a relink
22:18 xexaxo1: yet nobody mentions it ;-)
22:18 Lyude: doesn't seem like most things about cmake are ever mentioned :P
22:20 xexaxo1: I'll tell you this - cmake manual is the most difficult/hard to understand things I've read, period.
22:21 xexaxo1: I was reading a friends' thesis a while back and it was an easier read :-\
22:21 xexaxo1: tl;dr; in terms of build systems - they all suck ;-)
22:22 xexaxo1: or s/suck/have issues/
22:22 karolherbst: imirkin_: how do you want to have those shaders? the nv50ir output?
22:22 imirkin_: karolherbst: wtvr
22:25 Mortiarty: imirkin_, lets say you were to find out the solution to the block artefacts in vdpau - where would i find the code?
22:26 xexaxo1: imirkin_: wondering on a highly hypothetical case - not looking to start a drama or anything like that
22:26 xexaxo1: if tomorrow someone comes with a NIR backend for nouveau, that is ~same in both perf. and feature/bugfree -wise as codegen...
22:27 xexaxo1: what would be the deciding factor(s) to consider the work ?
22:28 xexaxo1: if I were in your shoes I'd have this as a minimum - established contributor, and one that is unlikely to "drop the code and run away"
22:28 xexaxo1: btw. it's kind of picky topic, so do ignore if you want
22:29 imirkin_: Mortiarty: can you rephrase your question?
22:29 RSpliet: xexaxo1: 1) code quality, 2) flexibility to generate from SPIR-V as well (as I think this will be the OpenCL route as taken by pmoreau)
22:29 imirkin_: xexaxo1: i've actually considered making a nir -> nvir translation layer in order to improve the perf of *translating*, i.e. disable the max possible of nir opts as well.
22:30 imirkin_: xexaxo1: all things being equal, i'd welcome it as an experiment
22:30 imirkin_: (i.e. assuming it resembled something that worked, etc)
22:31 imirkin_: [coz some st_glsl_to_tgsi stuff is ... pretty slow.]
22:31 Mortiarty: imirkin_, how would i find out you have a solution to the blocking artefacts in vdpau - and in which repository would i get them?
22:31 xexaxo1: RSpliet: yes, code quality (and consistency in a serious one), there is a SPIRV to NIR already right ?
22:32 imirkin_: Mortiarty: it's in the mesa repo. i'd tell you if you were around :)
22:32 imirkin_: xexaxo1: NIR doesn't do all that SPIR-V does
22:32 xexaxo1: had no idea, thanks - one could say "details, details" ;-)
22:33 imirkin_: xexaxo1: and consuming SPIR-V directly would avoid yet-another conversion layer
22:33 imirkin_: [which is what i see NIR as]
22:33 RSpliet: just like the smile on the mona lisa is a detail
22:33 imirkin_: at the same time, i have NO interest in writing a glsl_to_nvir backend
22:34 xexaxo1: wait is it a smile - I always considered it a smirk
22:34 imirkin_: so i'd welcome using $IR as the intermediate between GLSL and NVIR. right now that's TGSI, which is mature and well-tested. an NIR adaptation would be a ton of work to get to feature parity
22:34 imirkin_: you might notice that freedreno/vc4 use nir, but you might similarly notice that they don't support any of the serious GL4 features
22:35 karolherbst: doesn't intel support those?
22:35 imirkin_: it does, but not via gallium
22:35 karolherbst: true
22:35 xexaxo1: was under the impression that mostly because of HW capability.
22:35 imirkin_: all the resource stuff has to be lined up
22:35 imirkin_: xexaxo1: indeed it is
22:35 RSpliet: intel also has an army of code monkeys ;-)
22:35 imirkin_: xexaxo1: but my point is that there's a lot of unwired stuff left
22:36 imirkin_: [actually a4xx could support a lot of that stuff, and the nir thing is a big part of the reason i haven't been touching it]
22:36 xexaxo1:copy/pastes Roy's reply to #intel-gfx :-P
22:36 xexaxo1: ack, wasn't aware. thank you
22:37 xexaxo1: btw, have I mentioned by observations on the "glitches" in some H264 videos when using VDPAU ?
22:37 imirkin_: obviously at the nir level, it supports all that stuff, but when you say "resource #5" in nir, it has to match up to the properly bound resource #5 via the api calls
22:37 imirkin_: xexaxo1: probably. it's a well-known issue though.
22:37 imirkin_: xexaxo1: my observation is that it happens during high-motion scenes
22:37 RSpliet: xexaxo1: which reminds me, I should throw some bananas over the fence as a bounty for fixing Haswell stability issues :-P
22:37 imirkin_: (i.e when it's likely that there are a TON of motion-vectors)
22:37 imirkin_: which i think will happen while zooming, or slow camera panning
22:38 xexaxo1: imirkin_: my layman explanation was "HW cannot keep-up"
22:38 imirkin_: xexaxo1: i think it's more like "some stupid buffer isn't big enough, or not being cleared often enough"
22:38 karolherbst: I just wanted to say: throw more memory after it :O
22:39 RSpliet:throws a DDR3 DIMM at karolherbst
22:39 imirkin_: but i've had this idea before, and i've tried 10x'ing various buffers, and to no avail
22:39 karolherbst:ducks
22:40 karolherbst: imirkin_: did you try to 10x buffers _and_ reclock to max?
22:40 imirkin_: hehe
22:40 karolherbst: the video engines also get clocked differently
22:41 RSpliet: karolherbst: my MCP79 is always at full speed
22:41 RSpliet: same issues
22:41 karolherbst: I mean, not that you try that in like 2 years because nothing else is left, and then it's fixed :O
22:41 karolherbst: well MCP79
22:41 RSpliet: should be able to decode a 720p video
22:41 xexaxo1: are we reclocking the hardware behind the VP1/2.... bsp/vp/ppp engines...
22:41 xexaxo1: * these days
22:41 karolherbst: well most GPUs should be able to run some games at full speed as well, but sometimes nouveau just sucks
22:42 karolherbst: xexaxo1: sure
22:42 RSpliet: xexaxo1: all clocks we observe NVIDIA setting, we set
22:42 xexaxo1: ack, ty
22:42 RSpliet: karolherbst: that's because we still suck at shader compilation, shouldn't be an issue on HW video dec
22:42 imirkin_: another thought is that there's some NAL type or bit that we don't properly decode.
22:42 karolherbst: imirkin_: it's really messy to collect _all_ shaders having g access at the same offset :/ I can't tell what the right one is, so I collect all
22:42 karolherbst: RSpliet: not quite sure about this
22:43 karolherbst: sure, it's _one_ issue, but by far not the only one
22:43 karolherbst: some games just run at crappy fps with engines loads below 10%
22:43 imirkin_: it does all work properly for MPEG4p2 and VC-1
22:43 karolherbst: hardly a compilation issue
22:43 imirkin_: although much less tested with VC-1
22:44 RSpliet: karolherbst: do engine stalls account for load in your measurement?
22:44 karolherbst: RSpliet: yes
22:45 karolherbst: core waiting on memory operations to finish counts as busy
22:45 karolherbst: not quite sure about stalls in the instruction pipeline, but.... the heck if not
22:53 karolherbst: imirkin_: https://gist.githubusercontent.com/karolherbst/3bcab7f252eb2617981f0e0dabf8c158/raw/5f54b83255310786d28512c93ccaee8450120b01/gistfile1.txt
22:53 karolherbst: imirkin_: search for g[*0x1c]
22:53 karolherbst: I mean as a pattern
22:53 karolherbst: there is usually something like "g[$r4d+0x1c]" in post RA
22:54 imirkin_: so all those are 32-bit loads
22:54 karolherbst: :O
22:54 karolherbst: yeah, seems like it
22:54 karolherbst: and they were c1[0x1c] in pre SSA
22:55 imirkin_: i was thinking there might be some 64-bit loads
22:55 imirkin_: or 128-bit loads
22:55 imirkin_: with funny address offsets
22:56 karolherbst: mhhh
22:57 imirkin_: but that doesn't seem to be the case
22:57 karolherbst: could it be, that something is wrong related to the c* -> g conversion?
22:57 imirkin_: anything's possible
22:57 karolherbst: ld u64 %r571d c7[0x120] + ld u32 %r574 g[%r571d+0x1c]
22:57 imirkin_: but i'm not sure what OOR_ADDR would mean for a gmem load
22:57 imirkin_: i still think this is something to do with constbufs
22:57 karolherbst: and if c7[0x120] contains garbage .> ups
22:58 imirkin_: sure, that'd be unfortunate
22:58 karolherbst: hum
22:59 karolherbst: I am sure it is something super silly
23:01 karolherbst: what
23:01 karolherbst: 's the best way to print the entire shader from gdb?
23:01 imirkin_: p func->print()
23:01 imirkin_: (or is there no func->print? gr)
23:02 karolherbst: I doubt it
23:02 karolherbst: mhh, I will figure something out then
23:02 imirkin_: easy to write one :)
23:02 imirkin_: look at how the debug stuff does it
23:07 karolherbst: it hits twice on 0x1c while actually running the game
23:07 karolherbst: instance 1: https://gist.github.com/karolherbst/cfdb40249073de8cfc867d640a95f634
23:11 karolherbst: would be funny if the emiter is wrong...
23:14 karolherbst: instance2: https://gist.github.com/karolherbst/caf96c54925181d29e0f787cb446789c
23:15 imirkin_: hilarious
23:15 karolherbst: you found something?
23:16 imirkin_: no. just responding to the "would be funny if emitter was wrong" comment
23:16 karolherbst: ohh :(
23:21 karolherbst: seems like nvdisasm is also happy
23:21 karolherbst: @!P0 LD.E R0, [R0+0x1c];
23:22 karolherbst: envydis says "(not $p0) ld b32 $r0 ca g[$r0d+0x1c]"
23:22 imirkin_: yeah
23:22 imirkin_: so ... the ONE thing that i've never been sure about
23:23 imirkin_: is whether it's ok to do LD R0, [R0]
23:23 imirkin_: but seems like that really shoudl work
23:23 karolherbst: hopefully I guess?
23:24 karolherbst: I mean, it could be the issue
23:24 karolherbst: but this would be rather odd
23:24 karolherbst: liker super odd
23:25 imirkin_: i remember running into an issue like that with multi-word loads
23:25 imirkin_: but not with single-word loads
23:26 kattana: is this or will be ever be doable with nouveau?? --> http://phoronix.com/scan.php?page=news_item&px=Intel-XenGT-2016Q4
23:26 kattana: I need it badly
23:27 imirkin_: maybe... would need someone to work on it
23:27 karolherbst: imirkin_: you know, shall we check that just to be sure?
23:27 imirkin_: go for it.
23:27 karolherbst: I wouldn't know how
23:27 imirkin_: sec
23:27 karolherbst: or rather where the place is inside RA
23:28 imirkin_: yeah, looking for it. hold on.
23:30 imirkin_: https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp#n2277
23:30 imirkin_: see that addHazard stuff? remove the typeSizeof() bit of it the conditional.
23:32 karolherbst: well, the error still appears
23:34 karolherbst: this error makes like no sense :/
23:36 karolherbst: I think it isn't limited to those addresses, it's just that the others don't happen to be executed due to the predicates
23:37 karolherbst: but then again, why does it fix stuff when I change the address
23:40 karolherbst: maybe it just happen to fix something else
23:44 karolherbst: imirkin_: "LDL R57, [RZ];" this means load from local 0x0?
23:44 karolherbst: or what is that RZ for?
23:44 karolherbst: ohh reg zero
23:44 karolherbst: I hope
23:46 karolherbst: imirkin_: I think it is most likely that access to s[] go out of bound
23:46 karolherbst: cause it is only 0x200 big
23:46 karolherbst: and has tons of reg+offset accesses
23:53 Plagman: l