00:00nyef: I did get that impression, yes. The problem is that it might not be doing so correctly in this case, and it only mattering for one of my four panels.
00:01skeggsb: nyef: i don't suppose tegra reveals the bits to disable hw interference?
00:01nyef: An alternate, and somewhat unlikely, possibility is that it's generating a bogus audio stream, somehow.
00:02nyef: The code for tegra seems to imply that tegra is sufficiently different that the HDA driver has to bounce the infoframe data through a couple of scratch registers and have the GPU driver configure the infoframes from that.
00:03nyef: There's a bit, I think it's bit 16, in the "generic" or "vendor" infoframe control which seems to kill the automatic audio infoframe. Sense inverts at some point, either for gk104 or gf119.
00:04nyef: But IIUC, this bit means that you then have to use the generic/vendor infoframe registers to supply the audio infoframe.
00:14nyef: Hrm. Using hda-verb to spam the AFG node shows non-zero responses on verbs 0xf82 and 0xfa1.
00:22nyef: Spammed with 0xff, same result. Spam with 0 again, get 0xffffffff responses on verbs 0xfa6 and 0xfab.
00:22nyef: ... Might be poking at the wrong codec. /-:
00:23nyef: Ah! These are a couple of the tegra registers!
00:39nyef: Well, they behave like the tegra scratch registers, at least as far as stuffing data into them and reading it back out.
00:39nyef: But, again, hacking up the infoframe seems to do a whole lot of no good.
02:37nyef: Hrm... Does nouveau have HDMI input support working on anything?
02:39dboyan: does any nvidia card have hdmi input?
02:41imirkin: don't think so
02:41imirkin: there were some boards with tv tuners
02:41nyef: I don't know if it's actually the nvidia card, or the rest of the system, but apparently it's affected by nvidia driver version...
02:41imirkin: and maybe even composite/s-video input, dunno
02:41nyef: Some alienware laptops, such as the M17xR4, do.
02:43dboyan: There is an ancient nv40 class card at my home, which has some sort of video input
02:44dboyan: but that machine hardly boots
02:44nyef: So, if I want this to have a chance at working, I can expect to be doing some RE work, huh?
02:51nyef: Overall experience with the HDMI input in windows is *rough*. As in "suspend and resume the machine in order to switch back to the integrated output" level rough.
02:52nyef: Also, sound isn't working, but I don't know if that's because I monkeyed the infoframe controls or for some other reason.
02:55imirkin: what are you doing? getting that 4th monitor to work?
02:56nyef: Sortof? Trying to find a cheap HDMI analyzer.
02:57nyef: If I have an HDMI input, and it can pick out the infoframes, then I have the hardware for an analyzer, don't I?
02:58nyef: Also, apparently this counts as a fifth panel, and possibly my second that doesn't support audio on this hardware.
03:03imirkin: does it support audio on other hardware?
03:06nyef: No luck yet.
03:06nyef: The EDID (and, therefore, ELD) claim an audio capacity, though.
03:07imirkin: are there speakers/line out?
03:08nyef: It's a laptop, so yes, speakers.
03:19imirkin: i mean on the monitor
03:22nyef: Oh, on the monitor that doesn't do audio output on gt215? Yes, it has speakers.
03:24imirkin: and have you verified that they work?
03:24imirkin: [sorry for asking dumb questions, but ... sometimes the most obvious can be elusive when you're heads down into debugging]
03:25nyef: I know. The blindingly obvious is so named because it can't be easily seen when you're trying to find it. (-:
03:36nyef: So, all five of my HDMI sinks, bar one, have been observed to have working speakers using at least one HDMI source. The exception reports that it will sink audio, and definitely has speakers, but is generally messed up and likely an entirely separate project to figure out.
03:37nyef: The MCP89, running nouveau, will source audio that works with three out of four "known good" sinks.
03:38nyef: And I have spent about a day trying to get the blob to run on the MCP89, but have not yet succeeded.
04:22Horizon_: anyone have their ham radio license?
04:34nyef: ... Figured out part of what's going on with that HDMI input: The display drivers for that machine aren't loading, so it can't resume the display from blank.
04:39nyef: On the other hand, none of the nVidia windows installers recognize the display card.
04:40nyef: (PCI ID 10de / 119a)
04:52nyef: Current plan: Reinstall windows on that machine. But not tonight.
04:56Horizon_: nyef: trying it from linux or something?
04:57nyef: That's... actually a decent point. I should also try with Linux.
04:58Horizon_: try a popular, more 'testing' type distro
04:59nyef: Clearly, I've gotten too used to stuff not working in Linux. /-:
05:01Horizon_Brave: a lot can work in linux as long as there's decent enough support for it
05:03nyef: "A lot can work in linux as long as it works in linux"?
05:05Horizon_Brave: haha, hey I didn't say it's logical, or easy
05:05nyef: Also, my record is strongly against me here: I keep ending up with hardware that *doesn't* work right in Linux.
05:06Horizon_Brave: well, you know what that means right?
05:06Horizon_Brave: write your own drivers
05:06Horizon_Brave: learn C and kernel programming
05:07nyef: Umm. I do write my own drivers. I already do C and kernel programming.
05:10nyef: Still, it just makes my point: I've gotten too used to stuff not working in Linux.
05:10Horizon_Brave: put your brain into my head
05:11Horizon_Brave: so I can learn C inside and out
06:10gnarface: even when stuff doesn't work in Linux, i still usually can find out a lot more about the hardware with the stock built-in tools than with windows
10:39karolherbst: exporting "MESA_NO_ERROR=1" globally sounds like a good idea to me
10:42mupuf: karolherbst: lol
10:46karolherbst: nothing really cares about those errors anyway. I wouldn't be surprised that the most crashing applications are those which are toolkit based with OpenGL backends
10:48karolherbst: mhh interesting, with KHR_no_error we don't need to enable those robustness bits anymore
10:51karolherbst: imirkin: do you think it is a good idea to remove those array buffer bound checks when KHR_no_error is enabled?
11:17pmoreau: imirkin, airlied: I am hoping as well for the SPIR-V stuff to be merged one day, but there isn’t much activity on it at the moment.
11:17pmoreau: Heh, it’s still even based on LLMV 3.6
11:19pmoreau: As for CUDA to SPIR-V, I am hoping you can do something like CUDA -> LLVM IR -> SPIR-V. It might need to add a CUDA extension to SPIR-V, like there is for OpenCL, and standardising that extension might be the hardest part.
12:14rxr: so yesterday, regarding the old EFI macbook, with vbios extracte fmor debugfs, on a good old CD boot, sigh:
12:14rxr: kernel: nouveau 0000:01:00.0: enabling device (0002 -> 0003)
12:14rxr: kernel: fb: switching to nouveaufb from EFI VGA
12:14rxr: kernel: Console: switching to colour dummy device 80x25
12:14rxr: kernel: nouveau 0000:01:00.0: NVIDIA G84 (084700a2)
12:14rxr: kernel: nouveau 0000:01:00.0: bios: version 60.84.49.03.00
12:14rxr: kernel: nouveau 0000:01:00.0: fb: 128 MiB GDDR3
12:14rxr: kernel: [TTM] Zone kernel: Available graphics memory: 2022792 kiB
12:14rxr: kernel: [TTM] Initializing pool allocator
12:14rxr: kernel: [TTM] Initializing DMA pool allocator
12:14rxr: kernel: nouveau 0000:01:00.0: DRM: VRAM: 128 MiB
12:14rxr: kernel: nouveau 0000:01:00.0: DRM: GART: 1048576 MiB
12:14rxr: kernel: nouveau 0000:01:00.0: DRM: TMDS table version 2.0
12:14RSpliet: rxr don't paste in the chan like that
12:14rxr: kernel: nouveau 0000:01:00.0: DRM: DCB version 4.0
12:14RSpliet: use a pastebin website
12:14rxr: kernel: nouveau 0000:01:00.0: DRM: DCB outp 00: 01000123 00010034
12:14rxr: kernel: nouveau 0000:01:00.0: DRM: DCB outp 01: 02011210 00000028
12:14rxr: kernel: nouveau 0000:01:00.0: DRM: DCB outp 02: 02011212 00010030
12:14rxr: kernel: nouveau 0000:01:00.0: DRM: DCB outp 03: 01011211 0080c070
12:14rxr: kernel: nouveau 0000:01:00.0: DRM: DCB conn 00: 0040
12:14rxr: kernel: nouveau 0000:01:00.0: DRM: DCB conn 01: 1120
12:15rxr: kernel: nouveau 0000:01:00.0: DRM: unknown connector type 20
12:15rxr: If I can help debug anything to make this efi booted macbooks work without bios extraction please let me know
12:15rxr: sure, next time ;-)
12:50karolherbst: pmoreau: I know what you can do on the Pascal, the same I told Lyude. Basically use mupufs counter faker tool to get nvidia to upclock in little steps and trace that.
12:51Teklad: Always nice to see everyone hard at work. :p
12:51pmoreau: karolherbst: I am not sure I will have time to do that though… :-/
12:52karolherbst: no worries. If Lyude creates one such a trace, I will have enough to start already, more are just there to check for differences or for verifiying stuff
12:52pmoreau: And, starting from Saturday, I will be away from home/work for 4 months.
12:53karolherbst: already found your replacement? Otherwise you can't go you know? :p
12:53Teklad: Welp, now I need to mark up my calendar for the Pascal updates to 2 years + 4 months.
12:53karolherbst: I think "substitute" is the better word here? No idea
12:53pmoreau: Well, I will stil be working, but at a different place (for an internship), which might not have a Pascal.
12:54pmoreau: Eh, I will continue working on Nouveau! No worries!
12:54karolherbst: I thought you were slacking of for 4 months
12:54pmoreau: nooooooo! :-D
12:55karolherbst: pmoreau: would be an idea though
12:55karolherbst: taking off for 6 months and get only paid 80% for 2 years after/before
12:55pmoreau: I am planning to spend more free time on Nouveau during the internship. Especially once I got this clover pull request out. Need to talk with xexaxo1 later.
12:56karolherbst: having a job where one could work on nouveau would be perfect :( oh well
12:56pmoreau: Yeah… :-/
12:56Teklad: I'm almost curious enough to start actually trying to help.
12:57karolherbst: don't think, just do
12:57Teklad: I have a pascal... so I'm already ready!
12:58Teklad: and I can program!
12:58Teklad: Oh wait... too much excitement.
12:58Teklad: Anyhow, I don't mind helping...
12:58Teklad:has lots of free time since he lost his job yesterday
12:58karolherbst: create mmiotraces :p
12:58karolherbst: hum... just like that?
12:58vedranm: Teklad: what are your mad skillz
12:59karolherbst: you must have done something terribly silly or the company went bankrupt :O
12:59vedranm: "programming" is too generic
12:59Teklad: vedranm: I do mostly C.
12:59karolherbst: (or you live where companies can do whatever they want, and fire on a whim :p )
13:00Teklad: karolherbst: Na, the company just sucked... and yes I live in America.
13:00vedranm: Teklad: don't worry I heard there is going to be jobs, jobs, jobs now
13:00Teklad: vedranm: If you quote Trump one more time I'll probably die.
13:00karolherbst: leave until it's too late!
13:00vedranm: but seriously, driver programming might look mystic, but it's doable
13:01vedranm: I grasped parts of it within a few months
13:01karolherbst: vedranm: you make it sound as if it would be terribly hard :O
13:01vedranm: even provided some useful patches
13:01Teklad: I'm an extremely quick study to foreign things.
13:01Teklad: So it shouldn't be an issue.
13:01vedranm: well, when I was folowing ALSA devs ~2007. it looked like magic
13:01xexaxo1: pmoreau: I'm having a dull moment - what clover PR are you talking about
13:01karolherbst: code quality matters
13:01vedranm: Teklad: sounds great
13:01vedranm: did you do any hw programming?
13:01pmoreau: xexaxo1: Something I haven’t sent yet ;-)
13:01karolherbst: vedranm: I assume the Alsa code base was terrible in 2007?
13:02Teklad: vedranm: I haven't done much outside of poking at xboxdrv/xpad a while back... but thats pretty simple.
13:02pmoreau: xexaxo1: About adding SPIR-V support to clover, to be later piped to Nouveau for having OpenCL support; what I have been working on lately. I want to add an external SPIR-V validator, to avoid writing a new one.
13:02vedranm: karolherbst: no, but my coding skills were, I was one of those CS college-educated "who needs to know C/C++ well when I can do Python/PHP" kiddos
13:03xexaxo1:swipes forehead "I'm not loosing my mind, yet"
13:03vedranm: but yeah, I know most of what I know thanks to FOSS projects
13:03pmoreau: xexaxo1: But I don’t know how to add external dependencies in Mesa. And since you have been doing a lot of build related improvements, thought you could help
13:03karolherbst: vedranm: oh well, my starting position wasn't any better
13:03vedranm: CS colleges where I live teach you very little, not sure if USA is better but I hope it is given the price/year
13:03Teklad: I have absolutely 0 formal training in terms of CS.
13:04vedranm: even better
13:04Teklad: I just taught myself all the things... or poked at something until I understood it.
13:04xexaxo1: pmoreau: definitely, just poke with with a link to the external project + mesa branch and I'll look into it
13:04karolherbst: you don't learn how to code in the unviersity
13:04karolherbst: you just don't
13:04vedranm: you learn how to be smug and quote Chomsky
13:04xexaxo1: quote who?
13:04karolherbst: some guy
13:05xexaxo1:must have missed that class
13:05vedranm: some language scientist who is famous in anarchist circles
13:05karolherbst: ohhh, right
13:05vedranm: xexaxo1: formal languages and compiler design usually
13:05karolherbst: that part
13:05karolherbst: no idea why he is especially famous there
13:05vedranm: me neither, I just briefly encountered his language stuff
13:05Teklad: Linus is the most famous butthole you'll ever see.... he spits fire.
13:06Teklad: pew pew pew
13:06vedranm: not in Unis tho
13:06vedranm: in Unis professors have a different hierarchy of importance
13:06xexaxo1: vedranm: did not have either of these classes ;-(
13:06karolherbst: Teklad: Linus just shout at people he thinks deserve it :p
13:06karolherbst: and newcommer one of those
13:07Teklad: karolherbst: I think he has an anger problem.
13:07karolherbst: actually he hasn't
13:07pmoreau: xexaxo1: Oh, thanks! The tool is https://github.com/KhronosGroup/SPIRV-Tools/. Since nothing would use SPIR-V besides clover, for now, I think the linking could be only for clover. Modified version of Mesa: https://phabricator.pmoreau.org/diffusion/MESA/repository/clover_binary/
13:07vedranm: xexaxo1: tbh spending equal amount of time reading thru LLVM docs, related books and source will give you far more knowledge
13:07karolherbst: Teklad: it would be a problem if he would shout at people randomly, but he doesn't
13:07karolherbst: I doubt he shouts anyway
13:07vedranm: pmoreau: wait, is Clover now running on nouveau?
13:07karolherbst: just writing in Upper case afaik
13:08Teklad: karolherbst: Of course, I'd be an angry person too if someone told me to do really stupid things, like rewrite the kernel in C++.
13:08Teklad:loves reading his rants.
13:09Teklad: Okay, so I've never done an mmiotrace.
13:09Teklad: Quick crash course.
13:12Teklad: Oh nevermind... I found what I was looking for.
13:12pmoreau: vedranm: Yes, to some extent: I take SPIR-V as input and convert it to the IR used by Nouveau.
13:13Teklad: One sec... gonna throw my bouncer onto a tty then I'll run this.
13:14pmoreau: vedranm: So, either you need https://github.com/KhronosGroup/SPIRV-LLVM and then you can use clCreateProgramWithSource which will generate the SPIR-V, or feed it SPIR-V through clCreateProgramWithBinary, or clCreateProgramWithIL from OpenCL 2.1, which I added.
13:16vedranm: pmoreau: stock LLVM works?
13:16pmoreau: Nope, as it does not have the hability to generate SPIR-V.
13:17vedranm: pmoreau: damn, the machine where I have a 9500 is super slow and it's infeasible to compile stuff
13:17vedranm: pmoreau: what does run?
13:17pmoreau: You need the fork from Khronos. They are planning to upstream it, but it needs to be rebased on a more recent version of LLVM, and the LLVM folks wanted the current code to be a backend.
13:19pmoreau: vedranm: OpenCL-wise? Not that much. Conditionals, loops to an extent, read/writes to global/local, add/mul & co, get_global_id() & co (though, not all of them)
13:20pmoreau: Atomics, and function calling as well
13:20vedranm: all of them? even cmpswap?
13:20RSpliet: do Pascals do 64-bit integer arithmetic already?
13:21pmoreau: vedranm: Not all of them, the missing ones are: OpAtomicFlagClear, OpAtomicFlagTestAndSet, OpAtomicCompareExchangeWeak, OpAtomicLoad and OpAtomicStore.
13:21vedranm: I see
13:21pmoreau: RSpliet: I doubt it
13:22Teklad: Alrighty.modprobe nvidia_drm
13:22pmoreau: It is not in the "Throughput of Native Arithmetic Instructions" table: http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#maximize-instruction-throughput
13:23RSpliet: pmoreau: it's so easy to waste time on pointer arithmetic
13:24RSpliet: converting five instances like size_t x = get_global_id(0) to unsigned int x = get_global_id(0)
13:24pmoreau: RSpliet: True, but I guess you can avoid some of it thanks to the imm in ld/st.
13:24RSpliet: saves 6% on the entire kernel
13:24karolherbst: witht hat table we could get better sched opcodes on kepler, right?
13:24karolherbst: if we assume this is right
13:25pmoreau: RSpliet: Who would go with the size_t version though? Except if you are going to launch that many threads.
13:25RSpliet: pmoreau: yeah, but if you need to calculate array offsets with them you're paying a high penalty on emulating 64-bit arith
13:25RSpliet: pmoreau: well, the specs define that the return type of get_global_id() returns a size_t object. Using unsigned int means you're doing an implicit conversion
13:26pmoreau: RSpliet: True enough
13:26RSpliet: (just pretend that was correct English)
13:27RSpliet: in my kernel that overhead was 36 instructions
13:28pmoreau: After submitting a paper last year, I realised I had some doubles which had sneaked in my CUDA code, through a define of pi… I don’t remember the exact numbers, but I wouldn’t be surprised if it resulted in ~10% improvements by switching it to floats.
13:28RSpliet: ... although I somehow can imagine they don't want a 64*64 multiplier in hw
13:30RSpliet: gheh, yeah, the FPU capable of doubles isn't replicated quite as often as the FPU for just singles oddly enough :-D
13:31pmoreau: RSpliet: Odd, isn’t it? :-D
13:32RSpliet: well, with DNN inference downscaling to 8-bit integer arithmetic... hardly :-P
13:32pmoreau: Eh eh :-)
13:33pmoreau: I would love the non-GP100 cards to get proper support for 16-bit float arithmetic.
13:33pmoreau: We will see
13:42Teklad: I got that mmiotrace.
13:47pmoreau: vedranm: BTW, I was willing to help a bit fixing bugs in clover, but I guess most of the existing ones are "specific" to Radeon/LLVM backend, rather than bugs in the API implementation?
13:50vedranm: pmoreau: hmm, lemme quickly check
13:51vedranm: pmoreau: well, a couple of them are missing functions in libclc
13:53vedranm: https://bugs.freedesktop.org/show_bug.cgi?id=93977 but I think that one is fixed
13:53pmoreau: vedranm: I have never really understood what libclc was. It seems to implement some functions, but why would the compiler not generate that code?
13:53vedranm: feel free to go through the list at https://bugs.freedesktop.org/show_bug.cgi?id=99553, which reminds me I have to continue submitting stuff I fixed
13:54vedranm: pmoreau: it could technically be a part of LLVM
13:55pmoreau: I’ll have a look at the list, thanks.
13:55karolherbst: Teklad: compress it with xz -9 and upload it somewhere
13:55vedranm: pmoreau: there is something I would love if you could look at, and it's performance _critical_
13:55pmoreau: vedranm: Ah?
13:55vedranm: we don't support async read/write in Clover
13:56karolherbst: Teklad: also get your vbios from /sys/kernel/debug/dri/0/vbios.rom when running nouveau
13:56vedranm: we do something between sync and async
13:56Teklad: karolherbst: On it.
13:56vedranm: and despite beating AMD driver in GROMACS kernel execution performance we suck in async read/write and overall we lose in ns/day
13:57vedranm: pmoreau: if you are interested I can paste an attempt which did not work
13:57pmoreau: vedranm: That could be fun to look into! :-)
13:58pmoreau: pmoreau: It should be something I can easily test with Nouveau, since I don’t have any AMD card lying around.
13:59vedranm: pmoreau: https://ghostbin.com/paste/a53e7
13:59vedranm: well, if you have read/write working, you can
14:00vedranm: (btw, on an unrelated note, I have clEnqueueFillBuffer done, I just have to clean it up and submit it)
14:01pmoreau: Thanks for the paste!
14:03pmoreau: Oh, nice for clEnqueueFillBuffer. It seems like an advanced memset, right? Rather than filling with a single byte you repeat, you fill with a multi-byte pattern.
14:03vedranm: pmoreau: yes, and I guess I could do some fancy optimization of that eventually
14:04vedranm: like copy the pattern to GPU and then do GPU-GPU memory copies
14:05vedranm: right now it's naive but it works
14:06pmoreau: Way better than not working :-)
14:12vedranm: pmoreau: true! :-D
14:13Teklad: karolherbst: http://184.108.40.206/
14:26Teklad: I think I'm back.
14:30karolherbst: Teklad: not sure yet?
14:31Teklad: karolherbst: I'm back. You get those files?
14:31karolherbst: will look at those after I arive at home
15:03dboyan: imirkin_, just scanned through the instruction scheduler in i965. Only post-RA pass takes "real" latency into account. Of the 3 pre-RA passes, 2 of them seems focusing on reducing register pressure.
15:04imirkin_: that's surprising.
15:04imirkin_: although adjusting register pressure is a big part of this.
15:04dboyan: cwabbott has a branch that make pre-RA aware of latency, haven't looked carefully yet
15:04karolherbst: you can be smart at RA time
15:04imirkin_: i.e. using more registers in some cases, or fewer registers in other cases
15:04imirkin_: no. RA is smart enough. leave RA itself alone :)
15:05karolherbst: I think we need to rewrite RA if the only comment about RA is "RA is fine, don't touch it" :p
15:06karolherbst: dboyan: why do they have 3 passes by the way?
15:08dboyan: their names are SCHED_PRE, SCHED_PRE_NON_LIFO and SCHED_PRE_LIFO, in the order of execution
15:08dboyan: oh, it seems the latter ones used to reduce register pressure is only run when spilling occurred
15:09dboyan: that's more reasonable
15:10dboyan: but the latency of each instructions is set to 1 in pre-RA
15:10dboyan: so it's mainly tracking dependencies I guess
15:11karolherbst: most likely
15:13karolherbst: imirkin: some AMD guy once told me, that it isn't important to move instructions far away on AMD hardware, maybe it's just for Nvidia where there should be big gaps between definition and usage? Maybe the one person told me wrong stuff, dunno
15:15imirkin_: well, i don't know anything about AMD hw
15:15dboyan: does AMD still do instruction reordering in HW? :p
15:15jvesely: no, I don't think they ever did
15:16jvesely: although, different pipelines are independtent so older hw could execute insturctions from one WF in parallel
15:17dboyan: yeah, that's another way to hide latency
15:17jvesely: I think most of the latency hiding is done by having 10 active wf one one SIMD unit, so you get to execute instructions only every ~40 cycles
15:22dboyan: well, from kepler on nvidia hw needs explicit sched info like "wait n cycles before scheduling the following instruction"
15:23dboyan: def-uses certainly affect those numbers
15:23imirkin_: it's more a thing with maxwell
15:24imirkin_: at least kepler has training wheels
15:24imirkin_: i.e. you can flip a switch to not feed it sched info at all
15:24karolherbst: you can also feed from stuff
15:24imirkin_: or if you get the sched info wrong, you generally don't get f'd for it
15:24karolherbst: more or less
15:24karolherbst: if you do something bad on purpose, perf can drop by 90%
15:25dboyan: then I might want to learn maxwell's term. quite hard on the first look
15:26imirkin_: perf can drop, but it still executes and the output is correct
15:26imirkin_: whereas on maxwell ... no such luck
15:26karolherbst: I need to figure out how the delay has to be calculated by the way. I want to tweak the delay calculation to adjust dual issueing a little
15:26imirkin_: hence the "training wheels"
15:27dboyan: I guess I must learn it if I were to port my fp64 code to maxwell. Pure guess works for kepler, but probably not for maxwell
15:28karolherbst: dboyan: I wanted to write a tool to compare sched opcodes with nvidia
15:28karolherbst: or do you want to write that?
15:30dboyan: karolherbst: can you detail that a bit more?
15:30karolherbst: dboyan: my first thought was to get a shader through mmt and push it through codegen and compare the sched opcodes codegen generated with the ones Nvidia put in
15:31karolherbst: but I am sure there are better ways to do that
15:31dboyan: then we need a standalone sched calculator :p
15:32karolherbst: or just put an assembler in front of codegen
15:32dboyan: yeah, I think so. asm -> sort of nv50 ir
15:32karolherbst: why "sort of"?
15:33dboyan: I mean bypassing things like opt and ra
15:33karolherbst: could be part of an API or so
15:34karolherbst: but yeah
15:34karolherbst: I think this would be actually helpful
15:34karolherbst: also for DDX code and the embedded binaries
15:35dboyan: DDX code?
15:35karolherbst: Xorg driver
15:35nyef: Device Dependent X, wasn't it?
15:35imirkin_: it was.
15:36imirkin_: well jamm was working on adding sched info to the ddx shaders
15:36karolherbst: I think I told jamm the same already
15:42dboyan: I think it's not a hard one, but it's quite a piece of work
15:43karolherbst: kind of, yes
16:02nyef: Linux (4.10-rc3+whatever I had going locally) + HDMI input: no sound, no video... but swapping back to the Linux screen works.
16:03RSpliet: blob sched codes are a RNG?
16:04RSpliet: I'm rather surprised to see that the first instruction of two programs, in both cases mov b32 $r1 c0[0x44], get different sched codes
16:04RSpliet: 0x2f in one case, 0x0 in the other
16:06karolherbst: 0x0 means: dunno
16:07RSpliet: karolherbst: it implies that to determine sched codes, the blob knows a lot about the instruction that follows
16:07karolherbst: RSpliet: it depends on the next instruction as well
16:08karolherbst: so the first sched value defines what between instruction a and b happens, the second value between b and c
16:08karolherbst: so yeah, of course it can be different if the two first instructions are the same
16:08karolherbst: cause it actually depends on the third as well
16:08RSpliet: and if b is dual-issued with c, a should consider b and c presumably
16:08RSpliet: I know
16:08RSpliet: however... wait, let me pastebin the differences
16:08RSpliet: it's interesting
16:09karolherbst: but no
16:09karolherbst: if you can dual issue c with b
16:09karolherbst: that also means it makes no difference for a if there is b or c next
16:10karolherbst: otherwise you couldn't dual issue c with b
16:10RSpliet: the only *true* difference is c0[0x60] vs c0[0x158]
16:11karolherbst: $ctaidz is important here
16:11RSpliet: yes, but those two are dual-issued
16:11karolherbst: the sched opcode is 0x0 in the first case, because the compiler doesn't know what to do with the $ctaidz mov
16:11karolherbst: because two movs can always be dual issued
16:12RSpliet: so.... ;-)
16:12karolherbst: it makes perfectly sense to me
16:12RSpliet: after the mov to r1, in *both* cases ctaidz is loaded
16:12RSpliet: in parallel with c0[0x60] vs. c0[0x158]
16:13karolherbst: with 0x44 in the first case
16:13karolherbst: the 0x04 opcode says: this insturction can be dual issued _now_ with the previos instruction
16:13karolherbst: it would make no sense otherwise
16:13RSpliet: envydocs disagrees
16:14RSpliet: 0x04 means dual-issue with the *next* instruction
16:14karolherbst: then the sched opcodes here are totally wrong
16:14imirkin_: the sched calculator is the "right" thing, afaik
16:14karolherbst: and the first has to be 0x04
16:14RSpliet: which is consistent with never finding 0x04 in the last sched slot
16:14karolherbst: there are sometimes 0x04 in the last slot
16:14karolherbst: I saw those
16:15karolherbst: or do you mean the last of the entire shader?
16:15RSpliet: no, the last of the 7
16:15karolherbst: I saw 0x04 there
16:15karolherbst: and it doesn't matter
16:15karolherbst: those groups are purly unimportant
16:15karolherbst: you can dual issue the 7th and the 1st instruction of two blocks
16:15karolherbst: no issues
16:16RSpliet: karolherbst: I'll have to believe you on your word for that, but of 37 sched blocks printed out in front of me, none have 0x4 at the end
16:16karolherbst: I thought the same ocne and wrote a patch for it
16:16karolherbst: but then I saw counter examples
16:17karolherbst: I think nvidias compiler is just optimized to only look at the groups
16:17imirkin_: afaik there's nothing special about the groups
16:18imirkin_: but i also haven't investigated the matter extensively.
16:18RSpliet: neither have I, so as I said, just have to take his word on it
16:19karolherbst: mhh, but envydocs is indeed right. it indicates that the instruction can be dual issued with the next one
16:19karolherbst: at least codegen does the same
16:19RSpliet: imirkin_: could there be a difference in issue delay for mov from c0 close to the previous vs. further away?
16:20imirkin_: there are a few things going on here
16:20karolherbst: RSpliet: let me check something
16:20imirkin_: (a) there is a difference between MOV R0, c[0x0][0x0] and LDC R0, c[0x0][0x0]
16:21imirkin_: (b) ... i should probably think of a (b) before i type (a).
16:21jamm: someone said sched blocks?
16:21jamm: is it maxwell?
16:21karolherbst: okay, if there is one MOV in a or b, it can be dual issued
16:21karolherbst: jamm: kepler
16:21RSpliet: ah... which is obscured by (my version of) demmt :-D
16:21jamm: oh, interesting
16:21imirkin_: RSpliet: entirely possible. maybe one has a b32 the other doesn't? dunno.
16:22karolherbst: imirkin: doesn't matter
16:22imirkin_: also perhaps on some isa's it's not a thing, but it is a thing on others
16:22karolherbst: imirkin: anything with mov can be dual issued though
16:22karolherbst: or is there no mov at all
16:22imirkin_: karolherbst: yeah, but LDC can't :)
16:22RSpliet: imirkin_: they both do, but the opcodes are quite different despite both printed as mov
16:22karolherbst: those are all loads
16:22karolherbst: k, got it
16:22imirkin_: RSpliet: feel free to hook up a thing to have demmt call out to nvdisasm
16:22imirkin_: i've contemplated such a thing a few times, but never got around to implementing. it'd obviously be super-slow
16:23karolherbst: "// no loads and stores accessing the same space"
16:23karolherbst: and the world makes sense again!
16:23RSpliet: karolherbst: that's why they pair up ctaidz and c0 for dual issue
16:24karolherbst: but the 0x0 is odd
16:24karolherbst: 0x0 basically means: leave me alone I don't know it
16:24RSpliet: Yeah, would've been more productive to say 0x3f I guess...
16:24imirkin_: note that ctaidz is also not a mov, it's a S2R
16:24karolherbst: or is the first one something from nouveau?
16:25RSpliet: no, both blob 367.27
16:25karolherbst: imirkin: seems to be counted as a load as well
16:30RSpliet: leaves me just wondering why they chose a different operation for the mov b32 $r0 c0[0x158] vs mov b32 $r2 c0[0x60]
16:36imirkin_: yeah, i got nothin' for that
16:37imirkin_: could be that one was originally indirect and got recorded as a LDC internally
16:37imirkin_: (but then they removed the indirection but didn't flip the op? who knows)
16:49pmoreau: If I am not mistaken, LDC needs a barrier because it does not have a fixed latency, whereas MOV does not.
17:03jamm: hakzsam: updated: https://pastebin.com/sVxiQASi. Hopefully I'll get more time to work on this in ~20 hours. Cheers :)
17:08pmoreau: RSpliet: Invalidating perf lvl entries in the VBIOS was easier than expected. :-)
17:14nyef: And... testing with the blob (on Linux 4.4.39 or so) gets me working HDMI input, at least as far as video goes. So, mmiotrace is the next step there... And it is very likely that I will not be able to use this as an analyzer, but now that I know that it exists and semi-works, I want to have it fully working. (-:
17:18hakzsam: jamm: I will annotate the paste
17:24hakzsam: imirkin: https://pastebin.com/sVxiQASi --> first ipa $r3 is useless, just noticed
17:25hakzsam: break time :)
17:28imirkin_: hakzsam: it's useful. look closelier.
17:40pmoreau: RSpliet: I got something, hopefully I didn’t make too many mistakes.
17:46pmoreau: RSpliet: https://phabricator.pmoreau.org/F129718
19:16hakzsam: imirkin: sure. it's a 64-bit addr
19:19Lyude: imirkin_: got back to working on viewport index stuff, got the test to pass!
19:19Lyude: will prep and send patches in just a little bit
19:20hakzsam: jamm: https://pastebin.com/QKKGuKfL
19:29karolherbst: oh no :(
19:30karolherbst: pmoreau: nvidia reclocks on the PMU
19:30karolherbst: but with normal SEQ scripts
19:30karolherbst: I think this will be easier than we think
19:37pmoreau: karolherbst: On Pascal?
19:38karolherbst: or everything is somewhere else
19:39karolherbst: there is nothing really going on in PCLOCK
19:39karolherbst: but the regs are still there
19:39pmoreau: So, nothing would have changed compared to Maxwell then?
19:39pmoreau: (regarding reclocking)
19:39karolherbst: the vbios is completly different
19:40karolherbst: and the volting reg moved
19:40karolherbst: they use another PWM now
19:40pmoreau: :-) So, still a few changes. But not regarding the scripts
19:41karolherbst: ohhh no
19:41karolherbst: they moved the speedo as well?
19:41karolherbst: I think they just do everything on the PMU now
19:42pmoreau: Makes sense, I guess
19:42karolherbst: Lyude: got some time and want to try something out on your pascal with nouveau?
19:44karolherbst: mhh, there isn't even a patch required, can be done through envytools
19:48karolherbst: can somebody execute this on a pascal? "nvapoke 0x021000 0x40040001 && nvapoke 0x122634 0x00000000 && nvapeek 0x0214a8 && nvapoke 0x122634 0x00000041 && nvapoke 0x021000 0x40040000"
19:49Lyude: karolherbst: sure
19:49Teklad: WARN: Can't probe 0000:01:00.0
19:49Teklad: PCI init failure!
19:50karolherbst: as root of course
19:50Teklad: Same output.
19:50karolherbst: open a root shell and try again
19:50karolherbst: Lyude: the nvapoke/nvapeek commands should be enough for now
19:52Teklad: It consistently fails for me no matter what I do.
19:52Teklad: That's odd.
19:52Teklad: While running the nvidia driver, right?
19:52karolherbst: doesn't matter
19:52karolherbst: huh wait
19:53karolherbst: the kernel needs to boot with "iomem=relaxed"
19:54Teklad: That's what I'm missing then, lol.
19:56Teklad: 000214a8: badf510c
19:56Teklad: I'm assuming that's bad.
20:02Lyude: karolherbst: just run those?
20:03Teklad: My GP106 is annoying.
20:03Teklad: It doesn't let me poke at it enough.
20:08karolherbst: ohh nice
20:08karolherbst: the PCIe bits are the same
20:14karolherbst: this will be annoying, either it's all inside the PMU or placed somewhere else
20:14Lyude: karolherbst: 000214a8: badf510c
20:15Teklad: oo, same output as me
20:15Teklad: oh... wrong channel for that
20:16karolherbst: okay, so they probably moved it or maybe it is secured? no idea how it looks like if a secured register is accessed
20:16karolherbst: but probably it's just somewhere else
20:17Lyude: karolherbst: btw, if you've got something specific we need to figure out for that i can poke around once I clear the other cards I've got on trello
20:17karolherbst: no clue yet
20:17karolherbst: this reg was used for calculating the proper voltage starting with fermi. It's basically some kind of quality factor fused into the GPU at production time? or something like that
20:18karolherbst: basically the higher the value was, the lower the required voltage got
20:18karolherbst: super important for proper engine reclocking
20:20karolherbst: okay, so step one is propably to look at the vbios tables until they make sense or to figure out how the voltage is doing? mhhh okay, another idea
20:21karolherbst: Lyude, Teklad: nvapoke 0x20340 8 please
20:21karolherbst: ohh no
20:21karolherbst: nvapeek 0x20340 8
20:21Teklad: 00020340: 000000a0 00000034
20:21karolherbst: okay, cool
20:24karolherbst: does nvidia-smi prints the voltage?
20:24karolherbst: I highly doubt that, just asking just in cae
20:26karolherbst: Teklad: okay, could you put some load on the GPU and do the peek again?
20:26Teklad: karolherbst: Sure.
20:26Teklad: 00020340: 000000a0 0000004f
20:26karolherbst: that's more like it
20:27karolherbst: so the volting actually stays the same
20:27Teklad: Maybe... I could try a heavy load
20:28Teklad: Same output
20:28Teklad: So yea it does.
20:28Teklad: Thanks openmw!
20:29karolherbst: nvapoke 00020344 00000040 && nvapeek 00000040
20:30karolherbst: nvapoke 00020344 00000040 && nvapeek 00020344
20:30Teklad: [root@crethias test]# nvapoke 00020344 00000040 && nvapeek 00000040
20:30Teklad: 00000040: bad00200
20:30karolherbst: yeah I know, the peek was wrong
20:31Teklad: [root@crethias test]# nvapoke 00020344 00000040 && nvapeek 00020344
20:31Teklad: 00020344: 00000034
20:31karolherbst: do it a few times?
20:31Teklad: Its the same... except!
20:31Teklad: When I put some load on the GPU
20:32Teklad: [root@crethias test]# nvapoke 00020344 00000040 && nvapeek 00020344
20:32Teklad: 00020344: 0000004f
20:32karolherbst: this is bad
20:32karolherbst: this is super bad
20:32karolherbst: mupuf: ^^ :( :(
20:32karolherbst: screw those guys .... :/
20:32Teklad: What did they do?
20:32karolherbst: I think they secured the voltage register
20:33Teklad: That sounds... painful.
20:33karolherbst: maybe they secured everything now
20:33Teklad: I don't see the need to secure everything like that.
20:33Teklad: It's a little excessive.
20:34karolherbst: Teklad: this might crash your machine though: nvascan 00020344
20:34Teklad: Good thing I'm not doing anything important.
20:35Teklad: [root@crethias test]# nvascan 00020344
20:35Teklad: 020344: 00000034 00000034 00000034
20:35karolherbst: Lyude: some register can only be changed by secured falcons booted in LS/HS mode
20:35Lyude: also, just setup a dedicated machine for doing this pascal stuff
20:35karolherbst: light secure and high secure
20:35Teklad: With load:
20:35Teklad: [root@crethias test]# nvascan 00020344
20:35Teklad: 020344: 0000004f 0000004f 0000004f
20:35karolherbst: Lyude: basically HS falcons can boot other falcons into LS mode
20:36karolherbst: crap, this is bad
20:36pmoreau: karolherbst: Only having the fan secured was weird, so they wanted to balance that? :-/
20:36karolherbst: pmoreau: apperantly?
20:37Teklad: So not only are we going to have to reverse engineer it... basically going to have to decrypt the entire thing too...
20:37karolherbst: moke like getting nvidias private key
20:37Teklad: Ew... that's like trying to steal someone's ssh key.
20:37Teklad: sounds evil
20:37pmoreau: Except if they release the firmwares… some time before the end of the universe
20:38Lyude: thanks, nvidia
20:39Lyude: so there's definitely nothing else we can do here?
20:39Teklad: Brute forcing the key would probably take end of the universe - 1 year.
20:39Teklad: So there's that.
20:39karolherbst: we could try to get the PMU image from nvidias binaries
20:40Teklad: karolherbst: You could just email nvidia about it.... I'm sure they'll be nice enough to supply the information.
20:41karolherbst: :/ meh that is no fun
20:41karolherbst: I don't want to simply have to re some silly PMU interface :D
20:42imirkin_: hakzsam: well, it's a 2d coordinate, so it's x and y. the $r3 is the "y".
20:43hakzsam: should be displayed as $r2:$r3...
20:44Teklad: karolherbst: I highly doubt they'd be so nice about it anyways... companies enjoy being super private about all the things.
20:44imirkin_: hakzsam: it isn't though, for the gm107 envydis
20:44hakzsam: imirkin: yeah, I know
20:44imirkin_: hakzsam: because it's modelled after nvdisasm which shows no such thing
20:50karolherbst: bad nvidia
20:50karolherbst: very bad
20:51Lyude: also, I am assuming the 4f in the output of those commands means it's locked?
20:52karolherbst: 0x4f is the duty of the PWM
20:52karolherbst: 0xa0 being the div
20:52imirkin_: nvascan writes 0x000 and then 0xffff and then reads back and sees which bits got set
20:53karolherbst: Lyude: basically the 0x4f means the PWM is putting 0x4f/0xa0 of the dynamic voltage part on top of the fixed based part
20:53karolherbst: for me it is 0x60 and 0x26
20:53imirkin_: 0xbadf.... means that the read failed (and an error was reported to the bar unit or something)
20:53karolherbst: which is 0.2375V + 0x6V (base) => 0x8375V
20:54karolherbst: no idea what is the base and dynamic part on your GPUs
20:54karolherbst: it's inside the vbios somewhere
21:02karolherbst: I think I found it
21:02karolherbst: it just doesn't make sense
21:11Lyude: imirkin_: patches posted
21:27mardinator: mart says , he is leaving the scenes, i still had some couple of braver ones who did continue chatting with me in pm
21:29mardinator: overtried couple times, but have not got much to read anymore those days, gui guy i was too with nano editor, some brain damage did not get me close to using proper editor even
21:38mardinator: if i did not use proper editor, then yeah i'd say actually instead of practician i worked harder as theoretician, because theory comes first, but as late go when finally theory came together i no longer even care about practice
21:45mardinator: yeah i'd describe as ten year old i remember i would had had similar hunches if given a chanche to read, it is like you see right away that is suppose to work so that you got to read, you bet that it works so as ten year old, and as a 15year old going over thousounds of patents you can say to others how it works, but the pitty is that i am 34 now instead
21:46mardinator: so my life went all wrong, so bye
21:47imirkin_: oh. well that solved itself nicely.
21:57Lyude: imirkin_: jfyi, didn't add the signed-off-by since you hadn't reviewed it at all
21:58Lyude: imirkin_: sorry, for the patch adding amd_vertex_shader_layer/viewport/etc stuff that I just sent out
21:58imirkin_: right, where you put yourself as the author of my patch? :p
21:59imirkin_: [i'm not particularly sensitive about this stuff, was just amusing more than anything]
21:59Lyude: hehe, i didn't think you would be
21:59Lyude: just wanted to clarify
21:59Lyude: will respin the patches in a little bit
21:59imirkin_: k. did my comment on the first one make sense?
22:00imirkin_: i think it just needs to be a 1-line diff, where you add the && type == GP to the condition for not doing the viewport export.
22:03Lyude: ah, cause the vertex shader sets gl_ViewportIndex which then gets passed to the fragment shader, so we aren't technically setting the viewport in the fragment shader
22:03Lyude: just changing it so it doesn't do the emit stuff for the vertex shader and outputs the viewport correctly right?
22:33imirkin_: viewport index is an input into the rasterizer
22:33imirkin_: so it has to be provided pre-frag shader :)
22:33Lyude: yay, i'm understanding things
22:38nyef: ... I'm looking through a demmio'd mmiotrace log, and it's kindof choked with timer interrupts and such, and I'm not entirely certain what I'm looking for. Is there a "smarter" tool for processing these traces that can say "$n$ timer interrupts, now one of the GPIO interrupts went off", basically do some filtering and picking out the potentially interesting bits from the regular heartbeat bits?
22:38imirkin_: sometimes i grep -v certain things (like TIMER)
22:39imirkin_: i'm not aware of a tool substantially smarter than demmio
22:41nyef: I might have to try and write something, then.
22:42nyef: I wonder if I'd find anything useful by applying probablistic graphical model techniques to a trace, or a collection of traces?
22:54PyroSamurai: pmoreau: seems I have trouble crossing paths with you. I'm here to help the OpenCL implementation. Awaiting orders so I can start learning and working on it.
22:54nyef: I definitely don't have the patience tonight to try and figure out how the HDMI input switchover works on this machine, even with what should be a sufficient trace.
22:55imirkin_: Lyude: that's a lot of S-o-b's ... i'll fix it up though :)
22:59RSpliet: PyroSamurai: pmoreau is currently in Sweden
22:59RSpliet: on his way to France next week, that's CEST
22:59imirkin_: PyroSamurai: may be good if you stated your TZ as well.
23:01PyroSamurai: he is 6hrs ahead of me then. I have family in Denmark so yeah, we have a fairly small window to speak.
23:01imirkin_: not 100% sure this is right, but just browsing around on his phabricator instance, i found this: https://phabricator.pmoreau.org/diffusion/MESA/history/clover_binary/
23:04PyroSamurai: I saw that link in the chat logs as well. Doesn't actually help me understand what that repo is. Still reading the backlog btw.
23:19jamm: hakzsam: thanks a lot! will update the patch again and send it.