00:00imirkin: basically i have no interest in creating a backend IR where i do load propagation and whatever else
00:01imirkin: but if i can take something "off the shelf" which supports nv30's (meager) capabilities, i'd be interested in switching it over.
00:02robclark: admittedly all the drivers using nir have actual backend IR's.. *but* w/ some configurable guidance about what things to leave in SSA when doing ssa->regs we might be able to better accommodate simple drivers.. which seems like a useful thing..
00:02robclark: (and if the things you want to load-prop are left in SSA then that is a real trivial thing for the backend compiler to deal with)
00:04imirkin: ... but RA relies on that
00:04imirkin: er rather, RA relies on those temps getting eliminated
00:05imirkin: anyways, don't worry about it, at least from a nv30 perspective - it's (sorta) fine as is, definitely not worth putting a ton of thought/effort towards it
00:06robclark: right.. so it would need to be configurable enough to know what backend can eliminate.. if there are constraints like on ir3 where instruction can take const/in in only certain src args or only certain # of src's can be const, etc.. then maybe it needs to be callback fxn sorta thing..
00:06imirkin: in nv50 ir, we have insnCanLoad()
00:07imirkin: which determines whether an instruction can load a value at a particular position
00:07imirkin: e.g. https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp#n264
00:07robclark: imirkin, anyways, afaiu etnaviv is in pretty similar boat as far as what it's backend needs..
00:08robclark: so if there are things that would make it easier for simple backends and would be useful to both, I can try to find some time
00:08robclark:thought nv50 was completely different from nv30?
00:09imirkin: it is
00:09robclark: or is it similar restrictions?
00:09imirkin: no, totally different
00:09imirkin: but in the nv50 backend compiler
00:09imirkin: we have that concept ;)
00:09imirkin: since the restrictions are different between nv50 and nvc0, but those share a single IR/compiler core
00:10imirkin: all that is abstracted out via targets
00:10imirkin: i suppose i could look at extending codegen to support all this ...
00:10imirkin: seems painful, esp the revectorize bit of it. might be easier than worrying about nir though.
00:11robclark: well, anyways, for nir pass to know whether or not to leave something in ssa, it needs to examine all the consumers.. maybe a simple 'bool (*can_load_prop)(nir_instruction *, int srcn)' type callback that the nir pass calls for each consumer is sufficient?
00:12robclark: reinventing revectorization doesn't sound easier to me ;-)
00:12imirkin: seems easier than reinventing IRs
00:13imirkin: that function would have to take the load instruction too
00:13imirkin: since some values might be propagatable and some not
00:13imirkin: and also you'd have to figure out a way to represent those arguments properly
00:13imirkin: also you have to actually propagate
00:13imirkin: you can't just ask
00:14imirkin: since for example you might be able to propagate either a or b but not both together
00:16robclark: well, a callback that took the whole instruction does have visibility into everything..
00:16imirkin: hm. so on second thought, this whole codegen thing doesn't SOUND so bad ... might do that
00:17imirkin: just have to think about how to write a vectorize pass
00:17robclark: (otoh, tgsi way of assuming you can load-prop everything is worse ;-))
00:17imirkin: should be easily adaptable from nir
00:17imirkin: [and it has the advantage of me not having to learn anything more about nir... always a plus]
00:18robclark: idk, nir is much nicer than tgsi ;-)
00:18imirkin: i see the two as totally unrelated
00:18imirkin: they do have a small amount of overlap, but tgsi is largely useful for communicating programs, and nir is largely useful for manipulating them
00:18robclark: they are not the same thing, true.. but if I had a choice of which one to consume I'd take nir
00:19imirkin: well, codegen already takes tgsi
00:19imirkin: so it's 0 extra work
00:20robclark: (for the gen's that already use codegen, I agree)
00:20imirkin: and it has RA, and it can represent all the various instructions
00:20imirkin: and it supports multiple register files (like address register, condition, etc)
00:21robclark: ra where all you have is vec4's vs scalar w/ multiple register classes isn't exactly the same thing.. although, idk, maybe codegen's ra is configurable enough..
00:21imirkin: codegen's ra is fairly configurable
00:21imirkin: that's the biggest question though
00:22imirkin: (nv50 is weird, so it supports ... some amount of weird)
00:22imirkin: the other question is around control flow
00:22robclark: weird is a multi-dimensional space ;-)
00:23imirkin: not sure how well the simplified control flow maps onto the more generic stuff
00:23imirkin: but it'll be a fun weekend project to find out
00:24imirkin: going through it in my head, it does seem like it could work
00:25imirkin: but it's not extremely concrete
00:25imirkin: and i'm not _so_ familiar with nv30 restrictions, so ... good times.
00:25imirkin: seems a lot easier than figuring out why talos randomly shows green/red walls
00:26robclark: anyways, if that turns out to be a pita, I think the extra things that a simple compiler backend would need to make nir useful are not so much work, and sounds useful to etnaviv.. (and really a2xx as well, although I think the likelyhood of me getting around to changing a2xx compiler is low.. plus it actually needs a real IR for somethings)
00:26imirkin: well, if nv30 can run on codegen, chances are etnaviv can run on codegen :)
00:27imirkin: and i sure do like the idea of codegen being useful to more people
00:28imirkin: maybe rename it to nvir -- newly-vectorized ir :)
00:29robclark: (really not the easiest thing to pronounce, tho)
00:30imirkin: virn? vectorized ir -- new!
00:32robclark: imirkin, sounds like: http://epguides.com/HeyVernItsErnest/cast.jpg :-P
00:32imirkin: quality show - Sep 1988 to: Dec 1988
00:33imirkin: and the year after that, the simpsons came out
00:33imirkin: seem to be doing much better.
00:39robclark: yeah, you'll have to come up with some backronym to call it 'bart' :-P
00:40imirkin: or 'homer'
00:46imirkin: mupuf_: do you have a nv3x?
00:47imirkin: mupuf_: if so, would you be willing to set up an environment where you can load blob drivers for it (173.x series i believe), and grab some mmt's?
00:52imirkin: mupuf_: actually i'm gonna try to route my board through a VM... will have to find a distro with userspace old enough for that driver to work
00:53imirkin: (debian-latest? heh)
00:54imirkin: oooh. xorg 1.15. fancy.
06:03mupuf_: imirkin: i do not think I have an nv3x
06:03mupuf_: and I still have to work on the alarm bug
09:01zeq: RSpliet: I tried nvapeek 0x100220 0x20 but I get "PCI init failure". Does it not work with the nvidia kernel module(s) loaded?
09:20mwk: zeq: as root?
09:20mwk: what distro are you using?
09:20zeq: yes. it works with my nvc0 on my laptop (nouveau)
09:21zeq: not with the nv50 (NVIDIA)
09:21mwk: I'm having problems as well, it seems that some recent change in nvidia drivers conflicts with libpciaccess
09:21mwk: nv50 is now legacy driver only, right?
09:24zeq: yeah, 340.96
09:24zeq: it's the oldest series still maintained I believe
09:25mwk: and what kernel version are you using?
09:25zeq: I want to use nouveau instead, but I'm trying to figure out how to get memory reclocking to work. gpu reclocking works okay.
09:26zeq: (obviously NVIDIA reclocks it just fine)
09:27zeq: it's a stupid really, the BIOS doesn't support multiple pstates, just the one + boot clocks (there is no pstate for the boot clocks)
09:28zeq: ^ bit stupid
09:29zeq: where as the nvidia-settings utility allows arbitrary clocks to be set (with "coolbits" enabled)
09:30zeq: I was trying to figure out if I could read the register changes applied through the overclocking options
09:30mwk: seems nvidia has discovered IORESOURCE_EXCLUSIVE
09:30mwk: #define IORESOURCE_EXCLUSIVE 0x08000000 /* Userland may not map this resource */
09:30mwk: well, fuck you too
09:31zeq: is that in the kernel shim module?
09:31mwk: it should be in there somewhere
09:31mwk: look for a call to some function with _exclusive in name
09:32mwk: or hmm...
09:34zeq: I need to try to work this out for the fermi in my laptop too... it's even worse there the boot clocks are ridiculous
09:34mwk: zeq: are you using arch linux?
09:34mwk: that's much better
09:34mwk: take a look at your kernel config, disable CONFIG_STRICT_DEVMEM
09:34mwk: probably CONFIG_IO_STRICT_DEVMEM as well
09:35mwk: let me know if it helps
09:36zeq: will do. I couldn't find anything in the nvidia kernel module source
09:37zeq: yeah, perhaps I have that set differently on my laptop
09:38zeq: compiling new kernel now
09:39mwk: arch has this setting set in the default kernel. that's... unfortunate
09:40zeq: I don't use arch because I need to be able to easily tweak these things :-)
09:46zeq: mwk: that's better! :-D
09:49zeq: RSpliet: you were right there's no change in those registers :-(
09:56zeq: That said, there also isn't any difference between the boot contents and/or before/after loading the NVIDIA module or X!?! Does that mean the memory always has the voltage set appropriate for high frequencies and just runs clocked right down at boot?
09:58zeq: I wonder if this card *just* needs the memory clock set appropriately?
10:02zeq: is rammap the mapping between ram timings and VDD?
10:04zeq: if so, from above perhaps this card doesn't provide any mapping because the VDD for the memory is never touched? Would a fake entry be possible reading the boot VDD?
10:04zeq: Could this actually be the "right" fix for this specific card?
10:31RSpliet: zeq: those registers are only a tiny fraction of the full RAM configuration (eg. the actual timings)
10:32zeq: RSpliet: okay, but those certainly don't change. Could there be any validity to the idea that the RAM VDD is static for this specific card?
10:33RSpliet: most of the code for changing clocks w/ nouveau is probably already in the kernel, there's just a few details that are different for G80 vs the G94 that we do somewhat support
10:33zeq: So wouldn't it be possible to create a mapping using the boot VDD?
10:33RSpliet: and since the devil lives in the details, your card is likely to crash when you blindly try to use that code
10:34zeq: I *blindly* said true to reclocking ;) it doesn't crash, but it does report that there is no rammap
10:34zeq: the GPU clocks do reclock sucessfully
10:35zeq: I'm just wondering if the reclocking code may be trying to do too much for this card
10:35zeq: I only has 1 pstate afterall
10:36zeq: I meant conceptually, but wanting to be able to calculate an appropriate VDD.
10:38zeq: obviously by default it doesn't do *anything* at all to reclock ;-)
10:39RSpliet: afaik, if no rammap is reported, it doesn't change memory clocks at all
10:40RSpliet: so yes, the current code is not built for the permittable situation where there's no rammap table
10:41RSpliet: as it wasn't designed for cards older than G94
10:42RSpliet: anyway, I'm currently very busy on other stuff, and my non-existent spare time currently is poured into Fermi reclocking
10:43zeq: I certainly can't complain about that, since my laptop is eagerly awaiting it! :-)
10:43RSpliet: (for which, before I get anyones hopes up too high, no working code exists)
10:43RSpliet: about 70% done for GDDR5 cards
10:43RSpliet: DDR3 haven't started
10:44RSpliet: but should be easier
10:46zeq: I'll see if I can figure out how the rammap code works and hack something up to make it permit no rammap table
10:59vita_cell1: RSpliet how is going Fermi reclock?
12:34RSpliet: vita_cell1: stalled, no time
12:35vita_cell1: yes I understand you
12:36RSpliet: last time I could test, I had nouveau generate a near-identical script to change my one NVCE to it's middle perflvl's memclk... it didn't work, presumably a handful of details need polishing
12:37RSpliet: but they involve REing one of the newer NVIDIA pdaemon firmware to work out two unknown script opcodes
12:38RSpliet: (eg. a solid weekend of focussed work to RE, useless to try and do in spare half hours)
12:39RSpliet: and then there's the making sure it can change back to the lowest, followed by work on the switching between simple postdiv and PLL to be able to support full speeds
12:41RSpliet: don't get excited about the stuff I pushed forward for inclusion in 4.8, that's all just minor tested stuff that was otherwise accumulating dust in my tree
15:06imirkin: robclark: hm, i thought a little more about how RA would work for vec4's ... "painfully". i don't see an obvious way to properly integrate writemasking. will have to think some more.
15:08imirkin: hm. maybe i don't have to... ... ... needs more thought.
15:14robclark: hmm, all the load intrinsics have a writemask.. but not alu.. they just operate on a vecN (1..4)
15:15robclark: (but tbh I haven't thought in vec4 in a long time.. and haven't looked at how i965 handles it.. which was why I was suggesting to just hack something up so you could experiment w/ different passes and then nir_print it at the end)
17:06imirkin_: mogorva: any other issues with the GK208? [other than the TR:U and Deus Ex ones that I've yet to work out]
17:06mogorva: imirkin_: nothing serious at the moment :)
17:09imirkin_: mogorva: btw, i (re?)discovered an issue on nv50 where alpha tests don't work for all formats. dunno exactly how that'd manifest itself.
17:10imirkin_: [haven't pushed a fix yet... i have an initial version locally but i need to redo it to not be so horrible]
17:11mogorva: on NV50 there are more existing issues (glitches) than on the GK208
17:11mvaenskae: cheers, i have an nvidia 320m using nouveau on a macbook pro mid 2010 (macbook 7.1 i believe) on an mcp89 controller running linux-4.6.2-gentoo and get the following errors logged in my syslog: https://bpaste.net/show/5094dab40788
17:11mogorva: also, it's much more stable than the NV50 (which always crashed my system after 15-30 minutes gameplay)
17:12imirkin_: robclark: i think if we're willing to settle for slightly worse RA [which i am!] i think we can ignore writemasking. if we just allocate registers as if they weren't vec4 at all [but aligned], we can just use swizzles and "implicit" writemasks based on allocated register positions.
17:12imirkin_: mogorva: yeah, there were unidentified issues with NV50 which ... remain unidentified
17:13imirkin_: mvaenskae: can you elaborate how you managed to accomplish this feat?
17:13mvaenskae: can someone help me out on getting these errors a little lessened as it begins glitching some of my xorg apps (e.g. pdf viewer is completely useless until i do a restart of X)
17:13imirkin_: BEGIN_END_ACTIVE means that we messed something up royally
17:13imirkin_: mvaenskae: what version of mesa are you on?
17:14mvaenskae: imirkin_: running gentoo with mesa-11.2.2
17:14robclark: imirkin, it actually might not be hard to just do register classes, so (for example) either two vec2 or one vec4 can be assigned to the same reg..
17:14robclark: regalloc supports
17:14robclark: (I should have thought of that sooner)
17:14imirkin_: robclark: right. it's just tricky to deal with cleverness like having a .xw writemask
17:15imirkin_: but non-holey ones are easy
17:15robclark: tbh, I'm not sure what .xw ends up looking like in nir (for alu isntrs)
17:15robclark: but yeah, .xw is annoying ;-)
17:15robclark: (but I guess you just ignore it and treat it as a vec4 dst)
17:15imirkin_: mvaenskae: hmmmmm ok. pretty sure that shouldn't be happening.
17:16imirkin_: mvaenskae: i do rememver NVAF having extra-special issues, but i don't remember this being one of them
17:16mvaenskae: imirkin_: what is the cause of these errors?
17:16imirkin_: robclark: right - i'm talking about a vectorize pass
17:16imirkin_: mvaenskae: well, the most immediate cause is that an illegal command was issued between a BEGIN and END command
17:17imirkin_: mvaenskae: however that most likely points to something wrong with the command processing
17:17imirkin_: and/or channel switching
17:17mvaenskae: imirkin_: looks like a race condition then due to something doing multithreading?
17:17imirkin_: mvaenskae: more likely the channel switching logic is subtly broken for NVAF
17:18mvaenskae: what is NVAF? :)
17:18imirkin_: mvaenskae: your gpu
17:18imirkin_: mvaenskae: it gets reported as 0xaf i think
17:18imirkin_: here's the ctx switching logic: https://github.com/skeggsb/nouveau/blob/master/drm/nouveau/nvkm/engine/gr/ctxnv50.c
17:19mvaenskae: hm, any way i can get you guys proper logs with which you can work with?
17:19imirkin_: well, unfortunately MCP89 is only included in (some) macbooks, so it's fairly rare
17:19mvaenskae: it is a macbook so erm... yeah :/
17:19imirkin_: and (maybe even more unfortunately) i haven't the faintest clue how to go about debugging any of this
17:20imirkin_: there's probably ~2 people who would have any clue, and they're largely onto bigger and better things than looking at nv50-era hw
17:20mvaenskae: i feel as if i am forced to using nvidia-proprietary to not have these issues :/
17:21imirkin_: pretty much.
17:21imirkin_: moral of the story: buy hw from a company that plays nice with open-source
17:21mvaenskae: well, i got this machine gifted
17:22imirkin_: anyways, if you stop using unnecessary fanciness in your desktop, it's much more likely to work well
17:22mvaenskae: but i couldn't really afford a nice laptop right now, not at least one which is also repairable :(
17:22mvaenskae: i am using i3
17:23imirkin_: just remove nouveau_dri.so
17:23imirkin_: and see if that improves the situation
17:23imirkin_: rm `locate nouveau_dri.so`
17:24mvaenskae: what would it fall back to?
17:24imirkin_: well, you still get accel from the X server
17:24imirkin_: just no GL accel
17:24imirkin_: the X server's stuff is much more conservative than what the GL lib
17:24imirkin_: and all sorts of random software has decided it'd be hilarious to use GL for rendering
17:25mvaenskae: i think it is my pdf renderer then
17:25mvaenskae: mupdf has opengl support
17:25imirkin_: right, well nuking nouveau_dri.so will avoid the situation
17:26imirkin_: if something _really_ needs GL, it'll render using llvmpipe
17:28mvaenskae: oh, i haven't included llvm support for mesa so i believe i will use other methods
17:29imirkin_: well, there's also softpipe
17:29imirkin_: which is _really_ slow
17:29imirkin_: but yea
17:29imirkin_: (and classic swrast which gets you GL 2.1)
17:29imirkin_: (but isn't quite as incredibly slow as softpipe... i think it uses rtasm?)
17:33mvaenskae: thanks for the help, i will try and see how it will perform and i did disable opengl support in mupdf
17:37mvaenskae: imirkin_: one more question though; is there a way to flush all contents of the graphical buffer?
17:37imirkin_: not sure i understand your question
17:38mvaenskae: i am not sure how a gpu works but i think that the glitching is happening because it is mapping to incorrect regions where renderings are stored
17:38mvaenskae: could i purge all non-active regions which are not used in this very instance?
17:39imirkin_: the glitching is happening because the command buffer and contexts get screwed up
17:40mvaenskae: what methods of recovery would i have?
17:40imirkin_: or sometimes restarting X might be enough
17:40mvaenskae: but while X is running nothing can be done?
17:40imirkin_: practically speaking, no
17:40gouchi: sorry to bother you I was wondering xrandr --output HDMI-1 --set "underscan hborder" X --set "underscan vborder" X with KMS
17:41gouchi: we can achieve the same result by using modetest ? https://wiki.linaro.org/xinliang/libdrm/modetest right ?
17:41imirkin_: gouchi: i think you're missing some words
17:41imirkin_: sure, modetest can set those KMS properties
17:42gouchi: imirkin_: https://01.org/linuxgraphics/gfx-docs/drm/drm-kms-properties.html because nouveau has those properties
17:42gouchi: nice !
17:42gouchi: imirkin_: thank you
17:43imirkin_: i think dithering mode and underscan are swapped in those docs
17:44imirkin_: 6bpc/8bpc makes sense for dithering mode, not underscan
17:46mvaenskae: imirkin_: thanks a lot for your help though
17:46mvaenskae: also for the quick help at that
17:46imirkin_: mvaenskae: sorry there's no great answer
17:46imirkin_: mvaenskae: in *general* fermi/kepler tend to be more stable
17:46imirkin_: but hardly perfect
17:46mvaenskae: it is good to know that there is not much hope and i will look at having no nvidia card in my next laptop
17:47mvaenskae: but as mentioned, this was a gift :)
17:47mvaenskae: i will try going with intel only from now on
17:48imirkin_: amd tends to be pretty decent too if you require something with a bit more oomph
17:49mvaenskae: i just want something to render my screen and watch some youtube vids
17:49mvaenskae: no gaming, no rendering of blender stuff
17:49imirkin_: right. don't need a lot of oomph for that.
17:51imirkin_: gouchi: note that modetest will need to become KMS master in order to modify those properties (i think)
17:51imirkin_: gouchi: i dunno what you're trying to do though
17:52imirkin_: [and i also haven't the faintest clue whether those properties persist between "masters"]
17:52gouchi: imirkin_: we tried to fix this issue https://github.com/libretro/Lakka/issues/408
17:53imirkin_: what's Lakka?
17:54gouchi: imirkin_: linux distribution based on openelec and we are using drm/kms egl on PC
17:55imirkin_: i don't think we enable underscan by default. dunno - maybe we do.
17:55imirkin_: unfortunately the bug reporter doesn't mention what GPU he has..
17:56imirkin_: anyways, it'd be trivial to write a more directed application like modetest that's just designed to set properties. modetest is really designed around testing...
17:56imirkin_: presumably you already have software which interacts with kms - should just teach it how to configure properties
17:57gouchi: I see right
17:58imirkin_: basically modetest is for "hey, i just implemented this feature, and want to test out if it works, but don't want to teach the whole stack about it quite yet"
17:58imirkin_: which means it ends up supporting fiddling with basically everything in kms :)
17:59gouchi: imirkin_: he has NVIDIA G72 (246300b1) | bios: version 05.72.22.76.00
18:00imirkin_: aha. so old-school dispnv04. super.
18:03imirkin_: looks like that starts out with overscan = "50" for TV outputs (i.e. composite/s-video)
18:04imirkin_: and there's a different tv_overscan property
18:05imirkin_: the docs you saw were for nv50+ connectors
18:06imirkin_: oh no. it's called "overscan"
18:07imirkin_: and it defaults to 50, but i haven't the faintest clue what that means
18:07imirkin_: pixels? random units?