03:20imirkin: Subv: precisely.
04:09sigod: karolherbst, any progress on nve06?
04:10sigod: nve6 i mean
05:09karolherbst: sigod: not really
05:10karolherbst: I tried to look into it, but..
05:10karolherbst: it is hard to be pretty sure the issue I encounter is indeed caused by the firmware
15:19stsquad: can anyone point me to where the fifo errors are printed in the kernel?
15:19stsquad: e.g.: fifo: read fault at 0001080000 engine 1b [CE2] client 18 [GR_CE] reason 02 [PTE] on channel 2 [003fbfa000 Xorg]
15:26BootI386: stsquad: https://github.com/skeggsb/nouveau/searchq=fault+at+engine+client
15:27BootI386: (BootI386) stsquad: https://github.com/skeggsb/nouveau/search?q=fault+at+engine+client
15:32stsquad: is the type of fifo being used exposed somewhere? dmesg | gk and dmesg | gf give nothing
15:32stsquad: s/|/| grep /
15:39imirkin_: "type of fifo"?
15:40imirkin_: oh, you mean how to read the code? it's impossible.
15:40imirkin_: don't even try.
15:40imirkin_: in the name of code sharing, it's a giant maze of calls between countless files
15:43stsquad: imirkin_: well in the first instance I just want to find out if the fault information includes the size of faulting access - in which case I know the problem
15:46imirkin_: it's the copy engine which is faulting
15:47imirkin_: the one built into the fifo
15:48imirkin_: at the start of some page, so ... either a resource isn't mapped or you're going too far
15:51stsquad: imirkin_: or (in my case) it's doing a wide access on a broken PCI maybe... or at least that's want to prove/disprove
15:52imirkin_: it's a read fault because the PTE isn't there
15:52imirkin_: if it were a broken PCI, it'd be a bus error
15:52imirkin_: (well, there's various levels of broken pci... but if it's just a dma access failing, it'd be reported as such)
15:53stsquad: imirkin_: ahh ok - is this a PTE in the card or system PTE?
15:53imirkin_: in the gpu
15:53imirkin_: the gpu has a vm, which allows you to, among other things, point VA at vram or sysmem
15:53imirkin_: now, the fifo is generally a cpu-side buffer, so if the dma's were getting messed up, that would mean all kinds of bs commands are going through
15:54imirkin_: you can force it to live in vram, with nouveau.vram_pushbuf=1 iirc
15:54imirkin_: but if dma is messed up, it could just as easily get messed up on upload into that vram area :)
15:54imirkin_: moral of the story: dma is important.
15:56stsquad: imirkin_: well I can try see if that changes things. The other option would be some sort of missing barrier but that would depend on something being setup on one core and assumed to be ready by a second core.
15:56imirkin_: it's a gpu side error ...
15:57imirkin_: the gpu reports that it received some commands to operate the copy engine. it did that. the copy engine tried to access some memory that was not mapped (for read) in the gpu vm. so you got the error.
15:57imirkin_: note that i've seen these things happen for seemingly unexplained reasons
15:57imirkin_: when are you seeing it?
15:59stsquad: imirkin_: mpv -vo=opengl or even browsing in firefox. I have successfully played a whole movie with mpv -vo=xv
15:59imirkin_: and are you on a funny setup?
15:59imirkin_: are you one of the arm pcie gpu guys?
15:59stsquad: imirkin_: arm64 with a broken pcie
16:00imirkin_: welll ... doing GL things is generally a good way to get failures with nouveau
16:00imirkin_: things like firefox and mpv tend to trigger them
16:00stsquad: imirkin_: I have a hack that in theory keeps everything 32 bit wide - but I'm trying to confirm
16:00imirkin_: so i don't think you're seeing anything unexpected
16:01stsquad: imirkin_: fair enough - if it's just general driver flakiness then there isn't much I can do
16:01imirkin_: esp if you combine it with things like kde or gnome
16:02stsquad: 4.17 kernel
16:03imirkin_: that helps :)
16:06stsquad: well it was really unstable on 1.19 but the latest xorg at least meant I could get past the greeter/login
16:06stsquad: of course falling back to the proprietary driver isn't an option for us ;-)
16:06imirkin_: hmmmm surprising
16:07imirkin_: actually 1.20 is where people have had issues
16:07imirkin_: are you using xf86-video-nouveau?
16:07imirkin_: or are you trying modesetting?
16:07orbea: only xf86-video-nouveau and DRI2 work or me after xorg-server 1.20 here
16:08orbea: I've still been meaning to bisect it...
16:08imirkin_: orbea: meaning what -- that modesetting doesn't work, or meaning that DRI3 doesn't work?
16:08orbea: modesetting just gets stuck with any GL
16:08orbea: and forces DRI3
16:09orbea: the master is a little better, only gets stuck if I use compton
16:09stsquad: imirkin_: well I have x11-drivers/xf86-video-nouveau-1.0.15-r1 installed but I think we have to use modesetting to get a console when booting up
16:09imirkin_: stsquad: xorg, confusingly, ships a driver called "modesetting"
16:09imirkin_: which uses GL to provide X acceleration
16:10orbea: DRI3 + the nouveau ddx always seems to have been flakey here, but it seems worse last time I tried...
16:10imirkin_: this has no connection to kernel modesetting
16:10imirkin_: except for the fact that it uses kernel modesetting.
16:11stsquad: imirkin_: AFAICT from Xorg.0.log it loads both drivers
16:11stsquad: one sec, let me join from that machine
16:13orbea: imirkin_: the nouveau ddx + DRI2 seems very stable at least, just slightly sluggish after using modesetting for a while :P
16:13imirkin_: orbea: it's sluggish? should be faster...
16:13orbea: taht might be DRI3?
16:13orbea: that made it feel faster
16:14stsquad: stsquad_on_arm64: poke
16:15stsquad: hmm can't talk for some reason
16:16imirkin_: we get annoying spammers (is there any other kind)
16:16imirkin_: who spam annoyingly with their spam
16:16orbea: pm the log to the registerd user? :P
16:16imirkin_: well, the pastebin url hopefully :)
16:17stsquad_on_arm: tries again
16:17stsquad_on_arm: imirkin_: ^
16:18imirkin_: ok yeah, it's all happy - loaded nouveau and everything
16:18stsquad_dodgyarm: who knew there was a nick length limit...
16:18imirkin_: probably the spammers :)
16:19imirkin_: otherwise you end up with a 1MB nick...
16:20imirkin_: anyways, by the sounds of it, the 32-bit limit has helped. i thought you had trouble mapping BAR's in the first place...
17:12stsquad:leaves glxgears running as a soak test
17:13imirkin_: use glxspheres - it's much heavier
17:13imirkin_: although not so much on pci traffic i guess?
18:24stsquad: imirkin_: hmm not part of mesa-progs?
18:25karolherbst: stsquad: virtualgl
18:26stsquad:unmasks and builds
18:43stsquad: 59FPS, 74 Mixels/sec
18:44imirkin_: otherwise it syncs to vblank
18:44stsquad: imirkin_: kernel param or application param?
18:44imirkin_: env var
18:44imirkin_: i.e. vblank_mode=0 glxspheres
18:45stsquad: 206FPS, 255 Mpixels/sec
18:52stsquad: so I guess I just need to figure out how to stop firefox hanging my system
18:52imirkin_: try chrome :)
18:52imirkin_: or disable the gpu accel bs in firefox
18:52karolherbst: imirkin_: is that the same issue we have with plasma?
18:52imirkin_: no clue
18:53karolherbst: anyway, I want to fix that issue now
18:53karolherbst: because it is getting rather annoying
18:53imirkin_: so fyi, the way i've tracked these things down before is to find an apitrace to repro with
18:53karolherbst: on the 660 I can trigger that issue with plasma + 2x glxspheres and I am not sure if this is context switching or that or both
18:54karolherbst: or maybe it is the same issue all along...
18:54karolherbst: who knows
18:54karolherbst: maybe our firmware is just super slow compared to nvidias and triggers those issues faster
18:54karolherbst: yeah, me neither
18:54imirkin_: i've generally stayed away from such issues
18:54karolherbst: anyway, fixing one issue is a good starting point anyway
18:55imirkin_: which isn't a great approach, but ...
18:55karolherbst: personal health is also quite important :p
18:55karolherbst: anyway, I can trigger a issue quite fast on my machine, so this should be a good starting point if you can tell with 90%+ confidence you improved the situation
18:56imirkin_: the last "hard" issue i debugged was the program code segment switch thing which triggered with dirt rally + bindless.
18:56stsquad: imirkin_: is the apitrace a tool for tracing gl requests? part of the nouveau tools or a general purpose one?
18:56imirkin_: stsquad: general purpos
18:57imirkin_: it intercepts all GL calls and knows how to record appropriate data for later replay
18:57karolherbst: imirkin_: ohh, btw, did you keep track on what we are missing for 4.4 CTS?
18:57imirkin_: i thought you were doing that
18:57karolherbst: it is basically just 3d images, no?
18:57imirkin_: for fermi/kepler, 3d images is the big-ticket feature
18:57karolherbst: well yeah, but I didn't checked much the last months
18:57imirkin_: i'm sure there's little stuff
18:57imirkin_: for maxwell, 3d images should be fine
18:57imirkin_: there are some issues with how i did bindless though
18:57karolherbst: the last run on kepler2 for 4.5 was "Failed: 8/7445 (0.1%)"
18:58karolherbst: I think I will work on that as well, when I have some time left
18:58karolherbst: maybe even focusing on 4.4 only for now and see how much work would be 4.5 on top of that
18:58karolherbst: when I get 4.4 to pass
22:19Subv: what would be a good task for someone new who wants to contribute to nouveau? (i've never used nouveau before but i'd like to help)
22:26karolherbst: Subv: something on the compiler is always nice
22:26karolherbst: because you aren't invovled in most of the headaches
22:26karolherbst: or a bug you encounter
22:27gnarface: i have a specific request if you're really bored Subv, but most people here have already given up on it
22:27imirkin_: "make it work"
22:27karolherbst: gnarface: what request?
22:27gnarface: well, yes but something *specific*
22:28gnarface: karolherbst: https://bugs.freedesktop.org/show_bug.cgi?id=82835
22:29Subv: debugging hangs is way out of my league for now, sorry
22:29karolherbst: mhhh, I have 8800 GTS cars, but those are g80
22:30karolherbst: Subv: well, fix bugs you encounter or are annoying to you
22:30karolherbst: usually the best way to get started
22:30imirkin_: gnarface: ha ha ha ha yeah
22:30gnarface: (the one about G92 hanging on h264 decoding is clutch. getting same cards to reclock properly would be effectively full functionality for me)
22:30imirkin_: mwk: if you ever happen to have any ideas about that btw... -^
22:31karolherbst: ohh right, speaking about reclocking, I should finally get the patches ready ... but I really don't know what to do with them, because they are fine (tm)
22:31gnarface: i received two anonymous tips about the video decoding thing actually but i was unable to do anything with the information
22:31karolherbst: tips as in workarounds or as in important information?
22:32gnarface: hints towards a fix
22:33Subv: karolherbst: the thing is, i don't use linux as a daily driver and only have an 860m card on my laptop, i was wondering if there were specific bugs/improvements/unimplemented things i could try my hand at to familiarize myself with the nouveau code
22:34gnarface: not an actual literal fix, but more like a description of what the hardware is probably waiting on when it appears to freeze, and also a tip that the firmware not needing to match the driver version is wrong as per the freedesktop.org page
22:35gnarface: but also that even they haven't tested it since the legacy driver that supported this device so...
22:35gnarface: ... mmiotraces of the firmware version freedesktop.org uses on any driver recent enough to recognize it as valid firmware will probably yield bupkus
22:36gnarface: (which, as i've been told here is the case)
22:40gnarface: i reached a catch-22 on this at the point that my test case was the Steam client for Linux which refused to acknowledge the legacy driver
22:40gnarface: (otherwise this may all have worked out of the box)
22:45HdkR: Subv: If you get a Maxwell card then it can serve two purposes for you ;)
22:46Subv: isn't that a GM107 card though
22:46imirkin_: gnarface: the firmware differences are the encryption keys, which are necessary to decrypt encrypted video streams
22:47imirkin_: i'm 99.999% sure that other than those few bytes, the uploaded firmware is the same
22:47HdkR: Subv: 860M? Depends on which one you've got. There are two cards under that name, Kepler and Maxwell
22:47gnarface: imirkin_: from my recollection, they were themselves less sure of that
22:47gnarface: imirkin_: it kindof seemed like they didn't know, like they hadn't even looked at it
22:51Subv: oh interesting, any way to check?
22:51HdkR: Should say
22:54Subv: 640 cuda cores, i think this one's the maxwell version
22:54HdkR: That it would be
22:55HdkR: Fantastic. Optimize Dolphin's shaders in Nouveau :P
22:56karolherbst: yeah, improving shader optimizations are always helpful
22:56karolherbst: still have some pending stuff
22:58karolherbst: that would be nice to get cleanedup :)
22:59karolherbst: HdkR: I could imagine that an iadd3 might benefit dolphin?
22:59karolherbst: or is dolphin float only?
23:00HdkR: karolherbst: Dolphin is heavily integer based in the fragment shaders
23:00karolherbst: so iadd3 may help indeed
23:00HdkR: Not sure how often iadd3 might help
23:00HdkR: imad optimizations would definitely help though :)
23:01karolherbst: well, we have that already :p
23:01karolherbst: but none iadd3 support
23:01HdkR: combining mul and add = optimization?
23:02karolherbst: yeah, why not?
23:02HdkR: Oh yea, first good step
23:03karolherbst: you have to start somewhere :p
23:03HdkR: Sprinkle in some xmad for me :)
23:03karolherbst: but iadd3 is kind of magic on nv hardware
23:03karolherbst: because you can use that for c = add neg a neg b as well
23:04karolherbst: normally you would end up with a neg + add instruction
23:04karolherbst: iadd3 can take two neg modifiers
23:04karolherbst: add only one
23:04karolherbst: and other fun stuff
23:05karolherbst: you could just make iadd3 and add identical in the first place and only have a special case when having 3 operands really
23:05karolherbst: but well
23:06karolherbst: I guess add is a bit faster than iadd3
23:06karolherbst: but neg+add slower than iadd3?
23:06pendingchaos: imirkin_: the low bit of the high word of a bindless handle is 1 in nouveau? why?
23:06imirkin_: coz 0 = error :)
23:06imirkin_: i probably could have fought it, but it wasn't worth it
23:07imirkin_: but basically handle == 0 means "allocation error"
23:07karolherbst: imirkin_: this patch looks ready to be merged though, no? https://github.com/karolherbst/mesa/commit/a12472f107209000ac013c2f23423ca272d129c4
23:07karolherbst: should probably run piglit, but...
23:09imirkin_: karolherbst: probably fine... you can't use a short immediate btw?
23:09karolherbst: I can: https://github.com/karolherbst/mesa/commit/56d13872868547c90cd0412b732ecb693cdfdbcc
23:09karolherbst: but check the patch ;)
23:09imirkin_: karolherbst: i.e. shladd a << b + shortimm
23:09imirkin_: well, based on your code, no
23:09karolherbst: the benefit is huge
23:09karolherbst: nah, the other patch
23:09imirkin_: oh i see.
23:10karolherbst: codegen lacks some interfaces to really check for that
23:10karolherbst: because we can only check for current instruction + load
23:10imirkin_: target has some stuff
23:10imirkin_: insnCanLoad and whatnot
23:10karolherbst: not take that combination and tell me it works
23:10imirkin_: kinda sorta though
23:10karolherbst: yeah, doesn't work here
23:10karolherbst: because we have no iadd3 instruction
23:10karolherbst: not yet
23:10imirkin_: what's iadd3 got to do with it?
23:11karolherbst: because insnCanLoad can only check the given instruction + a load
23:11imirkin_: it'd be inconveneitn
23:11imirkin_: but doable
23:11karolherbst: that's what I meant with lacking an interface
23:11karolherbst: we really want to do something like: op + srcs and ask the target if this works out
23:12karolherbst: uhm + defs
23:14karolherbst: ohh wait, right, this was iscadd instead of iadd3
23:14karolherbst: but with iadd3 I have the same issue in another patch
23:15karolherbst: I think with that iscadd + iadd3 patches I got to - 0.5% instructions, so it might be worth to look into that
23:16karolherbst: huh, https://github.com/karolherbst/mesa/commit/e94edbb4b4f8bfb94db2b9891d69d8a24d1c373c
23:16karolherbst: this aptch though
23:17karolherbst: I really should take a loot at all my branches and finish the more trivial things
23:18karolherbst: https://github.com/karolherbst/mesa/commit/65387c57110882e9c82033504b96b1f9d6319e0e ..
23:20imirkin_: karolherbst: put a branch together
23:20imirkin_: for me to review
23:20karolherbst: the thing is
23:20karolherbst: I never know which patch is correct and which not
23:21pendingchaos: imirkin_: so for nve4 images you're suggesting https://hastebin.com/geketehaze.txt?
23:21karolherbst: I just need time to check each patch and run piglit
23:21karolherbst: and check the games affected
23:21imirkin_: pendingchaos: yes, something like that
23:21imirkin_: pendingchaos: hopefully the assumption holds!
23:21imirkin_: it might not, but if not, things can be laid out differently. i think they're both %64 though.
23:22pendingchaos: NVC0_CB_AUX_MP_INFO is in the way
23:22imirkin_: no reason not to make both tables adjacent, i imagine
23:22imirkin_: the order of things in there is largely historical
23:26pendingchaos: I think I can do that, it would be nice if someone with a kepler card could test it though
23:26pendingchaos: as for the NVIDIA thing for the piglit tests, I'm pretty sure the tests are correct and that any failure on the blob would be it's bug, though I think I'll run them on it anyway tomorrow
23:27pendingchaos: in case one of the tests isn't correct
23:29imirkin_: pendingchaos: well esp with images, there's a lot of funny business with types
23:29imirkin_: like ... you can't just have "image2D foo", i don't think
23:29imirkin_: you have to do "writeonly image2D foo"
23:29imirkin_: or "layout (rgba8) image2D foo"
23:30imirkin_: just coz it's bindless doesn't absolve you of the format/etc requirements
23:31imirkin_: so i'm surprised that your foo() worked without complaining that it wasn't marked writeonly
23:32pendingchaos: yeah, it probably should be marked as writeonly
23:34imirkin_: normally there's logic in the glsl compiler which validates this stuff
23:34imirkin_: but i guess it wasn't being triggered
23:34imirkin_: this whole bindless thing is, unfortunately, a pit of infinite sadness
23:34pendingchaos: loadSuInfo32() seems to wrap bindless
23:35imirkin_: and all the effort is largely for naught -- there are no conformance tests, and nobody uses it like this
23:35pendingchaos: handles around 512
23:35imirkin_: someone was being careful!
23:35pendingchaos: we would probably have to do something about that for the suInfoBase - bindlessBase hack
23:36imirkin_: i guess :(
23:36imirkin_: how about...
23:36imirkin_: we just combine them into the same table
23:36imirkin_: which we size at 512
23:36imirkin_: and move on with life
23:36imirkin_: the first 8 slots are "bound"
23:37pendingchaos: sounds good
23:38imirkin_: just have to be careful not to allocate more than 504 handles. seems like that should be easy.
23:38HdkR: Doesn't EXT_shader_image_load_formatted make it so you don't need a layout qualifier on read?
23:38imirkin_: it does
23:39imirkin_: but that's (a) not supported in mesa right now and (b) not enabled in the shaders in question
23:39HdkR: One or the other should be fixed then :P