03:05 mooch: mwk: how does the ref register work in pio mode?
03:40 imirkin: skeggsb: oh wow. we weren't doing WC on BAR2?
03:56 _xvilka_: tarragon: may be a bit late, but if your game without symbols/etc - you may want to try 1) radare2 2) frida 3) both
03:56 tarragon: _xvilka_: thanks
03:56 _xvilka_: gdb is best as a source level debugger, but hardly usable for "raw" debugging and reversing
03:57 tarragon: frida not in portage
03:57 _xvilka_: tarragon: both have irc channels on freenode
03:57 orbea: _xvilka_: ppsspp is free software, can just build it with debugging symbols
03:57 _xvilka_: tarragon: and both I recommend to use from it
03:57 _xvilka_: orbea: ah, ok
03:57 _xvilka_: tarragon: *to use from git, sorry
03:57 orbea: its cmake so that is not too hard
03:58 mooch: does anybody have any BIG details on pfifo's pio mode?
03:58 tarragon: _xvilka_: do you have a quick one liner to use radare2?
03:59 _xvilka_: we had a cheatsheet, 1sec
03:59 _xvilka_: tarragon: https://radare.gitbooks.io/radare2book/content/refcard/intro.html
04:00 _xvilka_: tarragon: + a migration guide from GDB/IDA users - https://radare.gitbooks.io/radare2book/content/debugger/migration.html
04:00 tarragon: oh man, that's too long, I need something to get nouveau errors.
04:00 imirkin: _xvilka_: he just needs gdb.
04:01 imirkin: some application is crashing. he needs to see where.
04:01 imirkin: no need for radare or ida
04:01 _xvilka_: no problem, I thought it is a closed source one
04:02 orbea: well, you're kind of right...the game is not free, but the emulator its running in which we are debugging is :P
04:02 imirkin:&
04:03 tarragon: I think it would be a good idea to have these debug utilities with ready examples for users to tests right away.
04:03 tarragon: such as perf.
04:17 _xvilka_: btw, if you are using GDB - you may want to consider to install voltron gdb extension, to ease the debugging/crash analysis with it
04:45 mooch: mwk: okay, i've got commands sending, dafuq is object class 0x00 on nv3?
05:21 mooch: mwk: i need your code for rendering primitives on nv3 again ;w;
11:22 RSpliet: skeggsb: whoa, that's a nice code-drop!
11:38 pmoreau: skeggsb: Indeed, very nice code-drop! And quite a bit of a speed-up for the resume path for imem: just a bit shy of 30x. :-D
12:12 RSpliet: yeah, wonder if there are similarly trivial optimisations possible elsewhere on the boot path
15:10 karolherbst: skeggsb: are you aware, that current envytools broke compiling the PMU sources?
15:11 mwk: karolherbst: what happened?
15:12 imirkin_: should fix envytools imo
15:12 imirkin_: depends on the issue i suppose
15:12 karolherbst: mwk: fuc5 cleanups
15:12 karolherbst: movw
15:12 karolherbst: imirkin_: I think envytools was actually fixed and the PMU source is simply broken
15:13 karolherbst: I encountered wierdness with movw before
15:13 karolherbst: and ported some source away from using movw
15:13 imirkin_: ok, so then the code should be fixed since it's broken anyways ;)
15:13 mwk: are we talking v5 source?
15:13 karolherbst: yeah, but I want mwks opinion on that
15:13 imirkin_: i don't remember the deal with movw on fuc stuff
15:13 karolherbst: mwk: yes
15:13 imirkin_: but mwk sure does :)
15:14 karolherbst: mwk: there are many places where we use movw in fuc5 paths still
15:14 mwk: then it's most likely broken
15:14 mwk: movw doesn't exist on v5...
15:14 karolherbst: mwk: thanks for confirming
15:14 mwk: let me just look at my test harness...
15:15 mwk: yep
15:15 karolherbst: mwk: example source: https://github.com/karolherbst/nouveau/blob/master_4.13/drm/nouveau/nvkm/subdev/pmu/fuc/memx.fuc#L83
15:15 karolherbst: the else branch
15:16 mwk: if you try to use the old mov immediate opcode (which movw is an alias for), it will have the semantics of "set destination register to 0"
15:16 mwk: on v5
15:16 karolherbst: yeah, I remember
15:16 imirkin_: heh. that's not quite right ;)
15:16 karolherbst: but I think it was actually wierder than that? maybe not
15:16 mwk: that code should just be changed to use mov
15:17 karolherbst: well, guess I'll fix it today then
15:17 karolherbst: mwk: mov is fine for 0xffff values?
15:17 mwk: since the imm fits in 16 bits anyhow
15:17 karolherbst: k
15:17 mwk: for -0x8000 thru 0x7fff
15:17 imirkin_: it sign-extends?
15:17 mwk: yes
15:17 imirkin_: does envyas know that?
15:17 mwk: it should, yes
15:17 imirkin_: i.e. if you try to do mov $r5 0x8000 will it complain?
15:18 mwk: if not, we have a bug (not inlikely)
15:18 mwk: yes
15:18 imirkin_: good
15:18 imirkin_: mwk: does mov work on v3?
15:18 mwk: ok, just tested it, it works
15:18 mwk: imirkin_: sure does
15:18 mwk: it's the exact same instruction as movw, except it verifies the argument to be in-range
15:18 imirkin_: i.e. will mov $r5 0x1620 work on both v3 and v5? so then what good is movw? oh, coz it leaves the upper bits of the dst reg untouched?
15:19 mwk: ie. "movw $r0 0xdeadbeef" will really be assembled as "mov $r0 0xffffbeef"
15:19 mwk: imirkin_: the point of movw is to be used in macros that load a full 32-bit value
15:20 mwk: "movw $r0 CONST ; sethi $r0 (CONST & 0xffff0000)"
15:20 imirkin_: right ok
15:20 imirkin_: coz you can do movw $r0 0xffff
15:20 imirkin_: but you can't do mov $r0 0xffff
15:21 mwk: yes
15:21 karolherbst: mwk: so in this place it is fine to use mov without the sethi bit?
15:21 mwk: karolherbst: yes
15:21 imirkin_: gotcha. so outside of macros, movw should never be used anyways
15:21 mwk: correct
15:21 imirkin_: i thought it was somehow special
15:21 imirkin_: didn't realize it was the same op
15:21 mwk: nah, it's just an ugly assembler hack
15:22 imirkin_: so envydis should never produce a movw right?
15:22 mwk: correct
15:22 imirkin_: karolherbst: anyways, sounds like s/movw/mov/ everywhere is what we want
15:22 karolherbst: imirkin_: yeah
15:22 imirkin_: (except the imm32 macro, and maybe a couple other very odd cases)
15:23 karolherbst: currently still busy with getting all my important macros to work on that machine :D
15:23 karolherbst: I fix it until it compiles... and then check if something is left over we have to fix
15:36 karolherbst: mwk: is it also fine if envyas generated different opcodes for earlier gens?
15:36 mwk: hm?
15:36 mwk: v3 and v5 mov opcodes are different, if that's what you mean...
15:36 karolherbst: I thought you said that mov gets compiled to movw anyhow
15:36 karolherbst: on earlier gens
15:36 mwk: yes
15:36 karolherbst: well, the generated code changed
15:36 karolherbst: I hoped it would stay the same
15:37 mwk: can you give an example?
15:39 karolherbst: mwk: https://gist.github.com/karolherbst/914b2056536e2b24dd62a261b916df24
15:40 mwk: karolherbst: the first instruction of memx_func_wait_vblank_head1 is "movw $r7 0x20", correct?
15:41 mwk: congrats, you have size-optimized the code by a few bytes
15:41 mwk: envyas always selects 16-bit form for movw, since it's such an ugly hack, but it selects 8-bit or 16-bit for for mov, whichever is shorter
15:41 karolherbst: k
15:42 karolherbst: also there is one "movw $r2 0xfff3" in com.fuc for the ce falcon
15:42 karolherbst: not that it matters, but I would like to fix it up as well
15:42 mwk: that's worrying.
15:42 karolherbst: why?
15:42 karolherbst: it is fuc3 only
15:43 mwk: ah, it's immediately followed by a sethi, good
15:44 karolherbst: can I convert that to mov $r2 -0xb?
15:44 karolherbst: or is 0xfffc fine?
15:44 karolherbst: ohh wait, you talked about it
15:44 mwk: there's a macro for that, isn't there?
15:44 karolherbst: imm32 right
15:45 karolherbst: mwk: but we don't share macros between falcons
15:45 karolherbst: so the ce one doesn't have it
15:45 mwk: meh
15:45 karolherbst: right
15:45 mwk: then let it stay as movw
15:45 karolherbst: exactly my thought
15:45 karolherbst: k
15:49 karolherbst: I guess this should be fine: https://github.com/karolherbst/nouveau/commit/82b1b7b722bc396a84d70f8bbdbdc494a2b63a06#diff-46074e1ca987aa006e3217871b21caaa
16:15 karolherbst: imirkin_: currently I am wondering, why reclocking worked that well on fuc5 hardware so far....
16:53 imirkin_: what's the op to load an imm32 on fuc5 again?
16:54 karolherbst: mov
16:54 karolherbst: you don't need to do that sethi anymore
16:54 imirkin_: oh, and movw used to select an op form that no longer exists on fuc5?
16:54 karolherbst: right
16:54 karolherbst: which set the register to 0
16:55 imirkin_: i'll reply with a better commit message :)
16:55 karolherbst: imirkin_: see my related patches here: https://github.com/karolherbst/nouveau/commits/master_4.13/drm/nouveau/nvkm/subdev/pmu/fuc
16:56 karolherbst: mhh, why didn't I wrote proper messages there
16:56 karolherbst: meh
17:17 naptastic: I just wanted to say, using a 4.11 Linux kernel, the tunables for fan and frequency control are freaking awesome, and the developers working on it deserve a big round of applause.
17:18 imirkin_: glad it works for you
17:19 imirkin_: depending on your GPU, you may have some benefits from updating to 4.12 to enable boosting
17:19 imirkin_: (or was that already there in 4.11? i forget)
17:19 naptastic: I've been trying to get 4.13 (because BTRFS) for a long time. So far, of the 8 hosts I manage, 4.13 has worked on zero of them.
17:20 imirkin_: as a result of nouveau, or unrelated issues?
17:20 naptastic: Which frankly might be a Nouveau problem, but I haven't isolated it because the only information I get is "loading initramfs..." and then a hard freeze. But that's not any of your problem.
17:20 naptastic: IOW, inconclusive, and if it turns out to be Nouveau, I'll come here to complain. ;)
17:21 orbea: wouldn't be before nouveau loads?
17:21 orbea: *that be
17:21 imirkin_: yeah, sounds like your initrd is messed up
18:45 Lyude: ergh, it's been a while. How do I change the current clock frequency for nouveau again?
18:45 Lyude: or perf level
18:45 imirkin_: echo foo > pstate
18:46 Lyude: but where is pstate
18:46 imirkin_: /sys/kernel/debug/dri/0/pstate
18:46 Lyude: ah, cool
18:54 orbea: Lyude: if you are having trouble remembering you could stick this in your root's .bashrc or w/e you're using. Make sure the actual values match what is in your pstate file though, I'm not sure how variable that is? http://dpaste.com/10MXTZX
18:56 tobijk: orbea: it depends, some dont have 0a, but the other cases are common imho
18:56 imirkin_: within a gpu generation it's similar
18:57 imirkin_: but it can vary a lot across gens
18:57 orbea: ah, I guess that is good to know :)
18:57 imirkin_: like nv4x's have stuff in the 2x's
18:57 tobijk: but yeah, just adapt to your pstate output :)
18:57 Lyude: orbea: thanks! although I would use grep on pstate to figure out the valid states, yeah
18:57 imirkin_: they're just id's, there's no intrinsic meaning to any of them
18:58 tobijk: imirkin_: do you have some recent piglit result around, all i found on your acc are from 2015, thats a bit old
18:58 orbea: Lyude: yea, I could make it better :)
18:58 imirkin_: tobijk: on my pc at home, yea
18:58 imirkin_: but not more recent than 2016
18:58 tobijk: mhm k
18:59 tobijk: somebody with a kepler around wanting to run piglit for once? :D
19:01 orbea: are there instructions?
19:03 tobijk: orbea: yeah, get a recent mesa, clone build and install waffle and build piglit, follow: https://people.freedesktop.org/~imirkin/
19:03 imirkin_: (and let me know if that needs to get updated)
19:04 tobijk: imirkin_: still works for me, i have just added -x fp-long-alu on top
19:05 orbea: I can try it today, but I need to eat first and then build a few things. :)
19:05 orbea: im on mesa git, but haven't updated for a week or two?
19:06 tobijk: orbea: that is recent enough for nouveau
19:06 orbea: alright, then just waffle it seems
19:06 tobijk: i just want a general overview
19:07 tobijk: orbea: yeah and a recent piglit if possible
19:07 tobijk: the test change quite a bit
19:07 tobijk: *tests
19:07 orbea: yea, i would use the master, slackware doesn't seem to provide it all yet...
19:07 orbea: and this would be with a GK110B
19:08 tobijk: orbea: that'd be nice
19:08 Lyude: to whoever documented all of the SLCG registers on nv108 that were certainly not there the last time I did a scan through the vbios repo, thank you a ton wow
19:09 Lyude: I was starting to think slcg just wasn't a thing with kepler
19:09 tobijk: karolherbst: piglit results? :D
19:09 orbea: tobijk: i'll get back to you later today with the results :)
19:09 Lyude:begins SLCG work for kepler
19:09 tobijk: orbea: no worries, take your timw
19:09 karolherbst: Lyude: nice!
19:15 Lyude: mupuf: poke, do you know if the newly added SLCG register documentation on envytools for kepler (specifically nv108) has been verified?
19:16 imirkin_: i think it's mostly from nvgpu headers
19:20 Lyude: sweet
19:21 imirkin_: in practice i don't think any of this stuff can be *verified*
19:21 imirkin_: (without access to the actual docs)
19:21 Lyude: of course
19:22 imirkin_: you'd have to find the appropriate test points, if they're even exposed
19:22 imirkin_: well beyond the skillset of an average idiot software person like us :)
19:22 Lyude: test points == voltage testing points on the board, correct?
19:23 imirkin_: yea
19:23 imirkin_: more literally, "place where to stick probe"
19:23 Lyude: ah, I figured. it's extremely unlikely I'll try this (but who knows ;)) but has anyone actually found any of them?
19:24 imirkin_: not aware of it... it's all BGA nowadays anyways, this stuff is not designed to be debugged
19:24 imirkin_: they do hw debugging on specialized boards
19:24 imirkin_: where this stuff is explicitly made available for peeking and poking
19:25 airlied: where do I plug my jtag in?
19:25 imirkin_: jtag is the next level... already removed from the literal poking
19:25 imirkin_: there's some chance that jtag access is possible
19:29 tarragon: imirkin_: that's dangerously close to industrial espionage
19:32 imirkin_: tarragon: if they leave it enabled on the boards they ship, i don't see how it can be interpreted as anything but an invitation for everyone to use the feature
19:33 tarragon: joking :D
19:34 Lyude: hehe, I bet they still have the entry points somewhere based on my experiences (to clarify, none of these experiences happened with nv pre-prod hardware. i really wish they did)
19:34 Lyude: probably just need to solder something onto them
19:35 Lyude: also curious, are you guys ever planning to use meson with envytools?
19:36 mwk: use what?
19:36 Lyude: meson, the build system everyone with fdo.org has been dropping autotools for
19:36 mwk: say, why would I drop cmake for meson?
19:37 Lyude: it uses a non-turing complete scripting language, it's dead simple uses ninja and is very fast. but cmake is not terrible
19:37 tobijk: Lyude: cmake works fine there, why bother working to get meson to build envytools?
19:37 Lyude: tobijk: just curious
19:37 Lyude: my scripts work with either one so i don't mind a whole ton
19:37 tobijk: Lyude: if i'm not wrong you can have at least ninja with cmake as well
19:38 Lyude: yep, that's usually what I do
19:39 Lyude: on that note, does cmake have any way to list the configuration options for a project like ./configure --help or meson configure <build_dir> does?
19:39 mwk: Lyude: any hope of simplifying our CMakeLists if they're converted to meson?
19:40 tobijk: Lyude: not sure, i just use ccmake to find out all the time :>
19:40 Lyude: mwk: most likely yeah. I'd take a look at xserver/mesa's meson.build files and see what you think
19:40 tobijk: and later have those opts use with plain cmake :>
19:40 Lyude: tobijk: that is useful to know, thanks!
19:50 Lyude: also, was there any progress on the iso-hub from anyone else while I was gone?
19:50 karolherbst: Lyude: nope
19:51 Lyude: ah, I thought I saw more documented registers but maybe I'm misremembering
19:51 karolherbst: ohh well, that might happen from time to time
19:52 Lyude: well it certainly will help once I get to it :)
19:54 karolherbst: Lyude: fell free to ping me about any review you want to get though, usually I won't work when you do, but I could take care of such stuff when I start until having lunch
19:56 Lyude: karolherbst: sure thing! to be honest I've been thinking about where I left off last time and I think what I'm going to do is try to see if I can get at least one BLCG and one SLCG implementation for kepler done so I have a good idea of how the design around nvkm will really work, e.g. if there's any other hooks I might need to add to enable/disable the various gating levels as needed
19:56 Lyude: if you still have the link around (if not I can get it again) I'll be pushing most of my wip stuff to github
19:57 karolherbst: Lyude: sounds perfect! Best to send review requests to my "new email"
19:57 tobijk: did someone already work on the proposed resizable-pci bars for nouveau? (it is not yet merged into mainline, yet looks promising) https://lwn.net/Articles/736740/
19:58 imirkin_: is that a thing on nvidia gpu's?
19:58 mwk: imirkin_: it is
19:59 mwk: to an extent
19:59 imirkin_: mwk: can you resize BAR2?
19:59 imirkin_: or whichever one the vram hole is on
19:59 mwk: BAR1
20:00 mwk: yeah
20:00 mwk: you can select from 64-512MB
20:00 imirkin_: why would it ever be != 512?
20:00 mwk: to avoid running out of 32-bit bus address space
20:00 imirkin_: oh, coz it eats up precious 4GB memory on 32-bit
20:01 tobijk: yeah and compatibility as i take it from the article
20:02 tobijk: but maybe skeggsb is on it already, he pushed a new "vmm" yesterday to his branch ~ 60 commits or something :D
20:03 imirkin_: i don't think that BAR sizing was in there
20:03 imirkin_: there *was* a thing where apparently half the BAR is inaccessible if some random bit is set somewhere
20:03 imirkin_: but that was a little while ago
20:04 mwk: that was BAR3 though, wasn't it?
20:04 imirkin_: could be.
20:16 Lyude: also, does anyone know if the different android versions of the tegra drivers are worth scanning through, or will I be fine with just the latest one? android-tegra-dragon-3.18-o-preview-4
20:17 mwk: Lyude: what are you doing?
20:17 mwk: the gk20a driver has removed some reg defines in newer versions FWIW
20:19 Lyude: mwk: confirming some registers that look suspeciously like BLCG and SLCG
20:19 Lyude: already got 3 new SLCG ones for kepler :D
20:19 mwk: then unfortunately you need to look further than the latest version
20:19 Lyude: no problem, that's why we have git grep :3
20:20 sooda: also note that for tegra drivers there is more than just the dragon repo
20:20 Lyude: there are a lot of new people here, huh
20:20 sooda: i mean https://nv-tegra.nvidia.com/gitweb/?p=linux-nvgpu.git;a=summary and linux-nvgpu-t18x
20:20 Lyude: sooda: that is very helpful, thanks
20:21 sooda: (surprisingly old code there though)
20:22 sooda: but r28 is the newest public apparently https://developer.nvidia.com/embedded/linux-tegra
20:22 Lyude: btw, are these all just different remotes for the same repo or different repos entirely
20:23 sooda: dragon is a fork for pixel c afaik
20:24 sooda: and others; pixel c has nouveau, not nvgpu. but it's still there i guess
20:24 karolherbst: sooda: not upstream nouveau, though, right?
20:24 Lyude: there is some downstream nouveau
20:24 sooda: not quite upstream :D
20:25 Lyude: which, brings an enormous amount of very annoyed sounding questions in my head, but I digres...
20:25 Lyude: *digress
20:28 karolherbst: sooda: thanks for reminding me though
20:28 Lyude: sooda: that's really weird... I can't seem to find any remote address to clone from here
20:29 Lyude: on the linux-nvgpu branches from nvidia I mean
20:30 karolherbst: Lyude: well it won't help you except you have a tegra soc gpu, basically. There are some other patches, but I think most of the stuff is already fixed in upstream nouveau. I actually planed to go through all those downstream repositories and check what we can still use
20:33 sooda: i just have git://nv-tegra.nvidia.com/<reponame>.git configured
20:34 sooda: the tegra gpu is pretty much identical to the big ones if you don't count clocks and power and boot, which i believe are quite significant for you :/
20:35 Lyude: i think the power gating registers are more or less the same though
20:35 Lyude: like, the SLPG/BLCG registers I'm seeing (I think) seem to match up perfectly between kepler and nvgpu, and nvgpu lists them as whatever I thought they were (BLCG/SLCG)
20:35 Lyude: most of them anyway
20:35 sooda: regs are likely the same; how they're used by sw differs
20:36 Lyude: oh of course, but I've also got all of the mmio traces on hand so it's just a matter of copying the values
20:36 sooda: :)
20:36 Lyude: anyway, does anyone know how I can get a remote for that nvidia repo that I could actually clone in git?
20:37 sooda: i just said it
20:37 Lyude: oh
20:37 Lyude: sorry lol, didn't notice
21:12 mastermart: i am getting more paranoid when i get closer to doing some business , that everything will be stolen from me again, and forced into medical stay with court order, however RSpliet did not much elaborate what kind of scheduling he had in mind, so i go public with it, pointers can be used with constant indices which can be changed by ring methods
21:13 mastermart: so , allthough the reg indices of indirect movs, are not changeable, then with constant literal this can be done
21:14 mastermart: i don't eloborate more on this one, i hope better brains can just google the methods
21:14 mastermart: however actually the hw circuit holds all operands and opcodes of a pc order in register file, by giving alu bus address
21:15 mastermart: to the absolute address in reg file, so when the absolute address changes, the alu refires again, so... what can be done is go slightly faster with bypassing fetch & decode circuitry
21:18 mastermart: it should though, save energy and add performance at the same time, however the code is simple but little bit weird and complex to read
21:18 mastermart: i mean thin, but complex
21:20 mastermart: so anyone wanting to know what hw exactly does then MIAOW is some of intelligence, i for insteance read it all through, and hence got confirmation to my theories
21:25 Lyude: karolherbst: btw if you ever end up needing something like this, just threw together a quick script to grep through the source tree of all of the nvidia repos I've got cloned and all of their branches
21:26 karolherbst: Lyude: how many branches are there?
21:26 Lyude: a lot
21:26 pmoreau: WTF, the code with optimisations produces the correct result, but fails with no optimisations. --"
21:26 karolherbst: well right, but like 1k lot or more like 100k lot?
21:26 Lyude: I've got both the android tegra repos and nvgpu repos
21:26 Lyude: oh not THAT many
21:26 karolherbst: pmoreau: ;)
21:26 Lyude: abut like 20
21:26 karolherbst: pmoreau: you are not the first to say that
21:26 karolherbst: Lyude: ohh, k
21:26 Lyude: it just seems like a lot because searching through each branch takes a little bit
21:27 pmoreau: I’m sure I’m doing the wrong thing, and somehow Nouveau manages to fix my mess, some of the time
21:27 karolherbst: pmoreau: I ran into the same issues
21:27 imirkin_: pmoreau: could be an op that's being emitted improperly
21:27 imirkin_: pmoreau: but that gets optimized out :)
21:28 pmoreau: Maybe, I need to investigate it more :-)
21:29 pmoreau: Trying to get one more test of the OpenCL CTS to pass.
21:45 pmoreau: Wow, interesting: post-RA, I get ` 30: and u32 $r2 %r79 $r2 (8)`, which ends up as `and b32 $r2 0x0 0x0`. First, no idea why I still have a `%r79` post-RA, that sounds quite bad! And how did the second $r2 become 0x0.
21:45 pmoreau: (Full debug output: https://hastebin.com/enocubowar.pl)
21:48 imirkin_: that happens when RA fails
21:48 imirkin_: or when you did something VERY weird
21:50 imirkin_: 34: ld u8 %r78 g[%r76d+0x1] (0)
21:50 imirkin_: 35: mov u8 %r79 %r78 (0)
21:50 imirkin_: i'd recommend making that second mov a mov u32
21:51 imirkin_: also ... who bothers with the and 0xff on the store?
21:51 pmoreau: I’m not even sure why I have it in the first place. Oh probably, a result from the cvt.
21:51 imirkin_: is it you or the source code?
21:53 pmoreau: I need to check; I thought it was due to a cvt that could be simplified, but there is no such cvt in the first place.
21:54 pmoreau: For the AND, I think it is because I can end up with 4 chars in the same reg.
21:55 imirkin_: ok
21:56 imirkin_: i wonder if there's a store variant to store the various bytes
21:57 pmoreau: Ah, no, I know where the mov comes from: it’s from an OpVectorShuffle instruction. Let’s see if I change the mov.u8 to a regular mov.u32.
21:57 imirkin_: those shuffle ops have always confused me so much
21:57 imirkin_: (i don't mean in spir-v, i mean in general)
21:58 pmoreau: Yeah, they are a bit confusing.
21:58 imirkin_: so there's no RIG_Node for %79
21:58 imirkin_: which is why it's not getting a reg
21:59 imirkin_: i don't think our code deals properly with u8 regs
22:01 pmoreau: It doesn’t IIRC.
22:03 mastermart: i have no reliable connection, there are many ways, in hw wavefront is allocated/scheduled and deallocated/descheduled in the resource cam tables in hw, so as most people have read some patents and science it will work such that, absolute address of units is tid+regID in the wavefront vgpr space
22:05 mastermart: i see several solutions to make the code, but i stick with couple of them for demonstration purposes
22:07 pmoreau: imirkin_: Yep, replacing the mov.u8 and not creating an 8-bit register helped. :-)
22:07 imirkin_: yay
22:07 Lyude: what was that
22:07 Lyude: was that a markov chain bot trying to look like a nouveau contributor
22:07 imirkin_: Lyude: crazy people with nothing better than to annoy us
22:08 Lyude: man, how does someone feel good with themselves about spamming support channels for an open source project
22:08 imirkin_: dunno, but it's been fueling this guy for years
22:08 Lyude: that's like feeling smug because you stole candy from kids
22:08 Lyude: heh
22:09 pmoreau: imirkin_: Thanks for the help! :-)
22:09 imirkin_: pmoreau: np!
22:10 imirkin_: pmoreau: do you have things worth upstreaming?
22:12 pmoreau: There is always the shared memory patches that are on the ML ;-p
22:13 pmoreau: But otherwise, I really want to clean up that memory management mess, simplify it and fix all those 8-bit register things.
22:14 pmoreau: I was looking at the CTS as a way to test the code more, but at the same time, it also makes me implement new features. :-D
22:15 imirkin_: hehe
22:16 imirkin_: well, i think it should be moderately easy to pipe ARB_gl_spirv through... dunno if there are CTS tests for that
22:16 Lyude: sweet, i think i found the last obvious slcg registers for kepler (17e050, 17ea98, 13cc04, 13c824)
22:16 Lyude: i think there might be one or two more blcg regs as well
22:17 pmoreau: Nice!
22:18 pmoreau: No idea for ARB_gl_spirv. Hopefully someone might implement it on AMD or Intel.
22:18 imirkin_: i think there are patches already
22:19 imirkin_: they rely on using nir to convert the spirv, but we don't have to use that
22:21 pmoreau: I would probably need to change a few things to adapt to the GLSL memory model.
22:24 mastermart: Lyude: cross-mutation of parrot and ape without brains are not welcome doing nouveaus patches, you are the reason for broken code
22:26 Lyude: imirkin_: also add *mastermart*!*@*
22:26 imirkin_: you think this is my first time?
22:26 Lyude: oh right, sorry :P
22:27 imirkin_: he'll just change nicks
22:27 imirkin_: ip addresses seems to be most reliable
22:28 Lyude: yeah. honestly we need a better network then freenode that has ops that actually do proper akills on spammers like this :\
22:28 imirkin_: he uses proxies
22:28 karolherbst: Lyude: there is always a way
22:28 imirkin_: lots of them
22:28 Lyude: i wonder how many of them aren't getting picked up by dronebl
22:29 imirkin_: somehow his comment reminded me of https://frinkiac.com/img/S12E09/1181555.jpg from the simpsons
22:30 karolherbst: Lyude: thing is, you always have to keep in mind the overall damage by overblocking. Always about finding the right balance
22:32 Lyude: karolherbst: yeah, we ended up banning someone's cell phone from ponychat a couple of times by accident...
22:32 imirkin_: it's just this one guy
22:32 imirkin_: he's quite persistent though
22:32 Lyude: they tend to be
22:33 Lyude: we've got one spammer at ponychat that still comes every now and then, and he's been doing it for years now
22:33 megari: imirkin_: Is this the same Estonian fellow who rambles on the fdo/mesa channels from time to time?
22:33 imirkin_: yes.
22:55 mwk: well, a few channels did end up with a +b *!*@*.ee because of this guy...
22:56 airlied: yeah what has estonia ever done for us!
23:17 Lyude: in demmio, what is RAMIN supposed to be?
23:17 imirkin_: vram accesses
23:18 imirkin_: BAR1 accesses
23:18 imirkin_: but demmio keeps track of the way the window is configured
23:18 imirkin_: and gives you proper vram physical addresses
23:20 mwk: BAR3 actually
23:20 imirkin_: one day i'll get it right.
23:21 imirkin_: i'll just start averaging them out and saying BAR2 :)
23:21 mwk: well
23:21 mwk: to make matters more confusing, RAMIN *was* BAR2 on NV40 :)
23:21 mwk: Lyude: for demmio, RAMIN{8,16,32} means accesses to BAR3
23:21 skeggsb: nvidia still call it bar2, it's represented as 3 because bar1 becomes 64-bit on later boards and i guess takes up two "slots"
23:22 mwk: which is a BAR that is used to access VRAM, like BAR1
23:22 mwk: nvidia driver uses BAR1 for "userspace" accesses, and BAR3 [aka BAR2] for "management structure" accesses
23:23 mwk: on NV50+, the offsets in this BAR are virtual addresses translated via the GPU's MMU, which demmio attempts to decode, with mixed results [demmio attempts to keep track of page table contents based on earlier VRAM accesses... sometimes it works well, sometimes not so much]
23:24 Lyude: so is it possible some of the ramin access addresses could really just be corresponding to mmio regs since they're virtual addresses?
23:25 mwk: no
23:25 mwk: the MMU can only map virtual addresses to VRAM addresses or system RAM addresses
23:26 mwk: I mean, if you were truly evil, you could map something to a "system" address that aims at some MMIO space, but that doesn't happen in practice
23:26 mwk: and if it's mapped thru the RAMIN window, it's almost certainly mapped to some VRAM address
23:27 skeggsb: i don't think that actually works anyway, i believe i actually tried it for some reason i can't remember, and it failed
23:27 mwk: it works... as long as you aim it at a *different* GPU :)
23:27 skeggsb: ah, right :)
23:28 imirkin_: lol
23:28 mwk: or for pre-PCIE cards
23:28 skeggsb: yeah, i tried on the same GPU