01:15 nyef: ... and the penny drops: m->v.blankus is one of the bits that I tweaked with the HDMI 3D stuff... And then while reviewing the patch I see that I used the "wrong" commit comment style.
02:15 dboyan_: seems that pascal is using GP104_COMPUTE register space to hold launch descriptor
02:18 imirkin: dboyan_: yeah, pascal compute is a bit different. i think skeggsb was going to figure it out later. it's semi-documented in nvidia's stuff
02:24 imirkin: specifically the bits over at ftp://download.nvidia.com/open-gpu-doc/ -- compute class and qmd
02:36 dboyan_: imirkin: Found it ftp://download.nvidia.com/open-gpu-doc/Compute-Class-Methods/1/clc1c0.h
02:36 dboyan_: "LoadInlineQmdData"
02:38 dboyan_: A space of size 0x100
02:56 imirkin: ah makes sense - load the compute descriptor in there
02:56 imirkin: instead of giving an address to it in memory
03:21 dboyan_: imirkin: The strangest thing is that I can't find where it launches compute. It's no longer using 0x2bc
08:49 Marex: karolherbst: the link is in 2.5 GT/s x4 still, maybe I screwed something up, but I don't think so
08:49 karolherbst: Marex: did you change the pstate?
09:07 Marex: karolherbst: ha
09:07 Marex: karolherbst: sec
09:09 Marex: karolherbst: yeah, that puts me at 5GT/s x4
09:13 karolherbst_: Marex: nice
09:13 karolherbst_: Marex: mind checking the Xrandr offloading perf with this?
09:14 karolherbst_: I doubt it will be perfect, but it would be good to know who big the overhead is actually
09:14 karolherbst_: ~1.2GBit/s are required, and now we are 2GBit/s
09:16 Marex: hrm, this 4.10rc6 has some odd bugs ...
09:16 Marex: I triggered two by now
09:20 karolherbst_: :(
09:21 Marex: they might be fixed by now ... we're in 4.11rc5 I think
09:21 karolherbst_: might be
09:21 Marex: but if I use xrandr, the kernel crashes (I have the backtrace)
09:21 Marex: if I drag windows across GPUs in xinerama setup, the kernel also crashes
09:22 Marex: the computers hate me, nothing ever works and my user experience is abysmal :(
09:22 Marex: I dont quite understand how other people do it that the computers work for them
09:33 skeggsb: have you put this backtrace anywhere?
09:45 karolherbst_: Marex: yeah, xinearama is kind of each GPU does it's own thing and moving stuff from one GPU to the other one is kind of messy
09:45 karolherbst_: Xrandr doesn't do this, so it's implier in its design
09:59 Marex: karolherbst: all right, gotta do some real work now
09:59 karolherbst: okay
12:36 tarzeau: of 100 workstations, i've switched 30 to nouveau, and it's all great! :)
12:37 tarzeau: some can't be switched because they need cuda, right?
12:41 RSpliet: tarzeau: correct, Nouveau as it stands does not support Cuda or OpenCL
12:41 RSpliet: the latter is a WIP, but by a hobbyist who doesn't do this full-time ;-)
13:01 pmoreau: tarzeau, RSpliet: Once OpenCL 1.0–1.2 is reached (that’s not going to happen overnight), I would be interested in adding support for CUDA.
13:01 RSpliet: pmoreau: Is there a Cuda->SPIR-V pass?
13:01 pmoreau: Nope :-D
13:01 tstellar: pmoreau: I've started prototyping a cuda state tracker.
13:01 RSpliet: tstellar!!!
13:01 imirkin: there's a cuda -> ptx though afaik
13:02 pmoreau: tstellar: Oh, really?! Cool stuff! :-)
13:02 RSpliet: tstellar: where you at nowadays? No longer with AMD right?
13:02 karolherbst: tarzeau: if you want, we could prefer to implement a feature to reduce power consumption a little
13:03 pmoreau: imirkin: And CUDA -> LLVM IR as well, I think. So it might be possible to have CUDA -> LLVM IR -> SPIR-V, but it will most likely need an extension of SPIR-V to support all the CUDA stuff, like there is an extension of SPIR-V for GLSL 450 and OpenCL.
13:04 imirkin: ah. well i haven't the faintest clue what CUDA actually supports, and how it differs from OpenCL. i know the language is quite different, but no clue featurewise.
13:04 pmoreau: tstellar: Do you have links to your prototype? Not that I have time to play with it (especially at the moment: time to sleep would be great already).
13:04 tstellar: pmoreau: I have no hardware, but I'm doing this to learn how clang's cuda frontend works. I'll probably stop once I get enough functions stubbed out to link against against a cuda program.
13:05 pmoreau: tstellar: Ok. I wanted to help on clang’s CUDA frontend, add support for textures and surfaces, but… lack of time as always
13:05 karolherbst: ohh I have an idea: we should put such quotes on the nouveau website: "of 1000 workstations, i've switched 900 to nouveau, and it's all much better now!!! :)"! :O
13:05 tstellar: RSpliet: Red hat.
13:05 tstellar: fwiw I think cuda->ptx would be best/easiest.
13:06 RSpliet: tstellar: hmm... aren't there cuda programs out there that already contain ptx? I guess you'd need a ptx->${gallium_supported_ir} pass at some point
13:07 imirkin: and then ptx -> spir-v or something :)
13:07 pmoreau: imirkin: IIRC, some shuffle functions between lanes in a warp, possiblity to inline PTX assembly (though, we shouldn’t to care about that I think)
13:07 imirkin: pmoreau: ah, i bet those get added as spirv exts soon enough
13:07 tstellar: RSpliet: Yes, when clang compiles cuda programs, they contain what I think is native NV ISA and also the PTX.
13:07 imirkin: they already exist as exts to glsl
13:07 pmoreau: Being able to understand PTX is probably needed, as the PTX is usually included within the app if I am not mistaken
13:08 pmoreau: Right
13:08 imirkin: tstellar: afaik nothing generates native ISA except nv tools and nouveau
13:08 tstellar: imirkin: Yeah clang invokes ptxas as its assembler step.
13:09 pmoreau: imirkin: Maybe? I know that the shuffle functions available in OpenCL are a subset of those available in CUDA, so it might be the same for SPIR-V?
13:09 imirkin: tstellar: oh heheh, ok. that's cheating :)
13:09 dboyan: pmoreau: I guess the GL conterpart of is ARB_shader_ballot, but the function there is quite limited
13:09 tstellar: imirkin: It depends on the nv tools for that and also some of the fatbinary creation.
13:09 imirkin: dboyan: no. there's a NV_something
13:10 imirkin: which exposes all the SHFL variants
13:11 dboyan: NV_shader_thread_shuffle?
13:11 imirkin: sounds right
13:11 pmoreau: Ah, indeed
13:12 pmoreau: Not a SPIR-V extension, yet
13:13 imirkin: all in good time :)
13:13 dboyan: well, the ARB_shader_ballot is a somewhat compromised thing i guess
13:13 imirkin: ARB_* is stuff that works for multiple vendors (&& is approved by the ARB)
13:13 pmoreau: It is equivalent to the GLSL counterpart from what I have seen
13:14 dboyan: I found readFirstInvocation is implemented as readInvocation(ffs(ballot(true)))
13:15 pmoreau: tstellar: Do you have a public repo with your WIP?
13:16 tstellar: pmoreau: Not yet.
13:16 pmoreau: ok
13:17 dboyan: imirkin: I found there is no GP104_COMPUTE.LAUNCH in my pascal compute traces. But there is only a GP104_COMPUTE.GRAPH.SERIALIZE and memory barrier after that. no idea how it launches the compute program
13:18 imirkin: it must write some address somewhere
13:18 pmoreau: Might be that with the pre-emption capabilities they added to Pascal, they had to rework how they launched compute programs
13:18 imirkin: i think there's a ring of these descriptors
13:18 imirkin: they might reference one another
13:20 dboyan: one thing interesting is that there an address is set to launch descriptor even it is passed inline
13:21 dboyan: but I've never seen the address elsewhere
13:22 imirkin: could be in some ioctl, written somewhere else, or ... dunno.
13:36 karolherbst: dboyan, imirkin: another GSoC proposal wants to deal with stuff like this. Please talk with that person first before starting to implement/work (on) stuff
13:49 skeggsb: dboyan: the standard way of launching compute works on pascal still fwiw, i've done it
13:49 skeggsb: doesn't mean there's not new ways though
13:51 dboyan: then compute should not be hard on pascal
13:51 dboyan: though the way blob launches compute is beyond my understanding
13:52 skeggsb: no, it shouldn't be. just adding the new qmd format would be the biggest chunk of work
13:52 skeggsb: iirc pascal extended the isa a bit for fp16 too, which could be worth reverse-engineering
13:53 skeggsb: i modified a simple libdrm-based test app meant for kepler to work on pascal without too much effort
13:53 dboyan: that's cool
13:59 skeggsb: dboyan: oh, i was also using gp100... it's not inconceivable that the subsequent pascal boards change too
14:00 skeggsb: nvidia has a bit of a history of the "0" boards being slightly different to the rest of the series
14:01 imirkin_: and slightly powerhungry :)
14:04 dboyan: mine is a gp107, I guess I still need hacking the source before compiling drm-next
14:05 skeggsb: i'll try and push support for that tomorrow too
14:05 skeggsb: make sure you get gnurou's updated linux-firmware too
14:05 dboyan: oh cool
14:05 skeggsb: the original gp107 fw was rejected by the hw
14:05 skeggsb: i just have to find out why suspend doesn't work on mine, and it's ready to go
14:06 dboyan: But I'm reluctant to use new kernel recently. 4.10 breaks my intel card
14:06 dboyan: but i might give drm-next a shot
14:07 imirkin_: dboyan: do the intel folk know?
14:08 dboyan: i submitted a report yesterday, but they haven't responded yet
14:08 imirkin_: ah, submitting reports to their bugzilla is sometimes ... less-than-effective.
14:08 imirkin_: i like to ping people on #intel-gfx
14:09 imirkin_: the problem is that they get tons of idiots submitting really poor reports, and it's hard to tell who's who :)
14:10 dboyan: okay, i'll do that some time tomorrow
14:38 leberus: hi!
14:44 Satchelboi: karolherbst: Finally up, didn't get a ton of time yesterday to looking at patches, but I think I have an idea
14:45 karolherbst: Satchelboi: nice :)
14:52 Satchelboi: I don't have a ton of time right now between class projects, so I'm going to look through and do some style fixes to help cut space
14:54 karolherbst: Satchelboi: okay
14:58 karolherbst: Satchelboi: we can think of others things later after you got your first patch(es) merged to kind of bring you to a level where you can actually work on actual stuff. I don't think that having only a "style fix" patch prior begining the GSoC project is a good starting point
14:58 karolherbst: if that's woky for you
14:58 karolherbst: *okay
14:59 Satchelboi: karolherbst: If there's something you think would be better please tell me. I didn't have much time yesterday either to check options, so i just went with the low hanging fruit
15:00 karolherbst: Satchelboi: I was actually talking of something you could also do after that. I will try to think of something until tomorrow, but I was also thinking maybe you have enough time before the GSoC project to work on some other things already
15:03 Satchelboi: That's an option yeah. I'll still need to make the deadline for the test patch however too
15:04 dboyan: imirkin_, if we want to handle 64-bit system values (in ARB_shader_ballot), we just need to OP_MERGE the high and lo part of that value. Is this right?
15:04 karolherbst: Satchelboi: yeah, it was just a thought on my side
15:05 dboyan: imirkin_: The high part is always 0 since warp size is 32 though
15:38 imirkin_: dboyan: no need to merge... just set the high bits to 0 :)
15:38 imirkin_: i.e. treat it as 2 distinct values
15:41 dboyan_: imirkin_: I mean here: https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp#n2044
15:41 imirkin_: oh, just do ... if sysval is one of those and idx > 0, return 0
15:41 imirkin_: just like the threadid logic
15:42 imirkin_: er, s/idx/swz/
15:43 dboyan_: um, okay. I'm not really familiar with tgsi things, I guess.
15:44 imirkin_: well, there are 4 components for each slot, xyzw
19:17 mooch: um wtf is this repo?: https://github.com/envytools/scans
19:17 mooch: there's nothing in it, AND no description
19:17 imirkin_: probably for mwk's scans
19:17 mooch: oh, and it hasn't been updated since 2014
19:18 mooch: can i just delete this?
19:18 imirkin_: of registers, object availability, etc
19:18 imirkin_: if you didn't create it, don't delete it.
19:18 mooch: then why is there nothing in it? lmao
19:18 mooch: fine
19:19 RSpliet:grumbles a little bit
19:20 RSpliet: imirkin_: before I mess with the 8800 I'm first shuffling some of the GDDR5 training code around. Seems that NVA3 needs it too... mostly
19:20 imirkin_: yay
19:20 mooch: tbh i actually found some vp1 test failures on nv4e
20:15 Lyude: hm, so I dont' think the GM204_3D.LAYER = { IDX = 0 | USE_GP } has anything to do with https://trello.com/c/IAqZXMmt/154-gm200-amd-vertex-shader-layer-viewport-index
20:16 Lyude: got curious and compared the register dumps from that shader to the ones for fill_rectangle and that gets written to the same value there despite definitely not using that extension'
20:17 Lyude: imirkin_: ^ does that sound at all right from your investigation of this?
20:18 imirkin_: Lyude: it has to do with gl_Layer, but not with gl_ViewportIndex
20:18 imirkin_: if you don't set it to USE_GP, it'll always use a zero layer
20:18 imirkin_: that said, the blob *always* enables USE_GP
20:18 imirkin_: whereas nouveau does not
20:18 Lyude: jfyi, your patch doesn't seem to enable USE_GP either ;P
20:18 imirkin_: but i can assure you, if you disable USE_GP, then it won't work
20:18 imirkin_: it used to.
20:19 Lyude: hm
20:19 karolherbst: what was the last nvidia driver supporting Tesla? 340 or 304?
20:19 imirkin_: + BEGIN_NVC0(push, NVC0_3D(LAYER), 1);
20:19 imirkin_: + PUSH_DATA (push, prog_selects_layer ? NVC0_3D_LAYER_USE_GP : 0);
20:19 imirkin_: karolherbst: 340
20:19 imirkin_: karolherbst: 304 was for GeForce 6/7 series
20:19 karolherbst: ohh, k
20:19 imirkin_: Lyude: so yeah. check nvc0_layer_validate in my patch.
20:20 karolherbst: nice, vbios reing today then!
20:20 karolherbst: ohh wait, no useable internet, ups
20:20 imirkin_: why do you need internet?
20:20 karolherbst: download packges
20:20 karolherbst: I have no internet connection at my apartment yet
20:20 karolherbst: and the wifi on my mac mini doesn't work
20:21 imirkin_: Lyude: iirc with that, gl_Layer did work. but gl_ViewportIndex needed some additional kick in the butt, and i never found which butt to kick
20:21 karolherbst: oh well, then I RE that pcie speed byte on my kepler for now
20:22 Lyude: imirkin_: yeah, according to this prog_selects_layer evaluates false so I wonder if it's just not enabling the register when we thought it was. What is the hdr thing in nvc0_program though?
20:22 imirkin_: Lyude: uhm, what? in which example is it not true?
20:22 imirkin_: the hdr is ... the shader header
20:22 imirkin_: that bit is the "are we outputting a layer" bit
20:23 Lyude: ah
20:24 Lyude: hold on, I remember I had to modify your patch a bit to get it apply so I think I might have just missed something in the process
20:25 imirkin_: pastebin a diff
20:25 imirkin_: and tell me what's not working, specifically
20:28 Lyude: imirkin_: my current diff https://paste.fedoraproject.org/paste/w8ySEgP43RPlap9CdI24NF5M1UNdIGYhyRLivL9gydE= for the mesa patch, although i also had to modify the shader_test to get it to run (made it similar to work/piglit/tests/spec/arb_fragment_layer_viewport/viewport-vs-write-simple.shader_test : https://paste.fedoraproject.org/paste/oEaSGxAF7-O3nAFZ-XkapF5M1UNdIGYhyRLivL9gydE=
20:28 imirkin_: so... that shader doesn't write a layer, right?
20:28 Lyude: the second shader test wouldn't run without it
20:29 Lyude: no, but it fails on nouveau
20:29 Lyude: and def not the blob
20:29 imirkin_: but why would you expect prog_selects_layer to be true?
20:29 imirkin_: either way, feel free to force it to true. pretty sure that does nothing.
20:29 imirkin_: could be i missed such an obvious theory though
20:30 Lyude: imirkin_: that was the only additional register write I saw in your patch other then removing some of the writes in nvc0_gmtyprog_validate()
20:30 Lyude: which just wrote to that reg anyway
20:30 imirkin_: right, but it was just to make sure that the LAYER was written properly
20:30 imirkin_: since previously only the geometry shader could set a layer
20:30 imirkin_: while here i needed to account for other stages, and base the decision on the last stage
20:36 Lyude: ooooh, okay I see. I'm mostly sure I just misread your messages about this from the day before
20:38 imirkin_: but for viewport index, i could never get it to work
20:39 Lyude: okay, yeah I understand where I got confused now :). And yeah, setting USE_GP unconditionally doesn't make much difference
20:40 imirkin_: ok. just as i thought.
20:43 Lyude: imirkin_: just one other question: what was with the requirement on the GL_NV_viewport_array2 thing in that shader test?
20:44 imirkin_: i was dicking around with gl_ViewportMask? or perhaps i'm not even the original author of this?
20:46 RSpliet: skeggsb: HeadSetRasterVertBlankDmi can have my tested-by if you wish
20:47 RSpliet: (hooray, seemless DRAM clock change on Fermi)
20:47 RSpliet: imirkin_: when you have time, could you boot my branch on your GDDR5 GT240 just to double-check I didn't screw up the memory pattern upload in ways that prevents nouveau from loading?
20:48 RSpliet: mmiotrace'd be golden, but if that's tricky... boot test for now will surely do :-)
20:49 imirkin_: RSpliet: mmiotrace of what? my board?
20:49 imirkin_: RSpliet: https://people.freedesktop.org/~imirkin/traces/nva3/nva3-gddr5.log.xz
20:52 RSpliet: imirkin_: got a vbios with that?
20:52 imirkin_: https://people.freedesktop.org/~imirkin/traces/nva3/nva3-gddr5.rom
20:52 imirkin_: courtesy of july, 2014 :)
20:54 RSpliet: imirkin_: ok that seems to double the number of samples I have for GDDR5 train pattern upload
20:54 imirkin_: ;)
21:02 RSpliet: imirkin_: tnx. If you're able to perform a nouveau boot test w/ my branch on *that GT240* to double-check that nothing breaks, I can do some final regression tests on other NVA3 boards soon and send the training part to Ben for review
21:02 imirkin_: ok, will do. hopefully that gt240 still works :)
21:02 imirkin_: it has suffered through a move in the meanwhile, so we'll see.
21:03 RSpliet: Gheh, my GPUs will retroactively have crossed EU borders :-P
21:04 imirkin_: what should i expect btw? switch pstate? or just boot test with no extra params?
21:04 RSpliet: imirkin_: nothing to expect. Just boot, nothing special
21:04 imirkin_: oh, because it does the training on boot no matter what?
21:04 RSpliet: nah, I don't think training is necessary for the lower perflvls
21:05 imirkin_: so what am i testing then?
21:05 RSpliet: or, at least, not the training that requires these patterns
21:05 RSpliet: just whether I didn't screw up the logic and fail module load as a result
21:05 RSpliet: so boring boot
21:05 imirkin_: hehe
21:05 imirkin_: ok
21:06 imirkin_: well you definitely messed up the "don't split strings across lines" rule
21:06 imirkin_: + nvkm_info(subdev, "missing link training data, not uploading "
21:06 imirkin_:  + "patterns\n");
21:06 imirkin_: 
21:06 RSpliet: oh yeah... that one trumps the 80-char limit nowadays doesn't it
21:07 imirkin_: always :)
21:08 karolherbst: In the past it would have fit :O
21:13 RSpliet: karolherbst: if Brexit teaches me one lesson, it's not to get stuck in the past too long...
21:14 karolherbst: !
21:14 karolherbst: pro tip: 80 char is the past, just assume everybody does 120 now
21:14 RSpliet: right, time to take my desktop back in time a decade
21:14 karolherbst: RSpliet: but hey, at least we kep schottlonad and north ireland :D and wales too?
21:15 RSpliet: Gibraltar is the most interesting bargaining chip here...
21:15 karolherbst: hrhr
21:15 imirkin_: no, 80 chars is awesome
21:15 karolherbst: it belongs to spain, I thought that was made clear enough already
21:16 imirkin_: coz then i can fit 2x 80 char windows on my 1200-wide 24" rotated screen
21:16 imirkin_: 120 is the worst =/
21:16 karolherbst: let me guess, your 24 screen can only do 1920x1080?
21:16 imirkin_: 1200x1920
21:17 karolherbst: that doesn't make it better
21:17 karolherbst: try 1440x2560 :p
21:17 imirkin_: lets me have 2x 80char emacs buffers
21:17 imirkin_: with a visible font size
21:19 Plagman: i have 1600 on my portrait monitor and i still wouldn't use 120 wide
21:19 Plagman: (1600x2560)
21:19 Plagman: 80 is still best for two side-by-side legible emacs buffer
21:19 karolherbst: meh :( I am too young for all this
21:20 Plagman: (and a file quickbar)
21:20 RSpliet: imirkin_: 8800 GT boots
21:20 imirkin_: RSpliet: yay. G92?
21:20 karolherbst: I never used emacs in my life
21:21 RSpliet: imirkin_: ack
21:21 imirkin_: karolherbst: doesn't matter what editor...
21:21 RSpliet: one pstate, which isn't boot
21:21 imirkin_: emacs has just a scroll bar in terms of chrome
21:21 imirkin_: RSpliet: ok, that's kinda expected. should still be useful.
21:21 karolherbst: my ancestor would turn around in their graves if they would know what editors I use :D
21:22 RSpliet: notepad.exe?
21:22 karolherbst: some might say I use something even worse, because if you use notepad.exe you most likely don't know any better
21:22 imirkin_: debug.com? :)
21:22 karolherbst: waht's that?
21:22 imirkin_: before your time ;)
21:23 imirkin_: a way to create binaries from the cmdline
21:23 karolherbst: I see
21:23 imirkin_: you just fed hex into it iirc and it made an executable (com) file
21:23 karolherbst: sounds like something nobody would use anymore
21:23 imirkin_: it came standard with ms-dos
21:23 Plagman: you'd use echo into xxd -r probably
21:23 Plagman: same idea
21:23 imirkin_: [something else no one would use anymore]
21:24 Plagman: you'd think so
21:24 imirkin_: Plagman: it's so long ago that i don't really remember how it worked. i think it had some additional niceties...
21:24 imirkin_: like you could use it to place an int op at a random place, etc
21:24 Plagman: but i still run across mainboards and oem systems that want to do fw upgrades from dos
21:24 Plagman: (usually engineering revisions but still..)
21:24 imirkin_: Plagman: ah yeah. use freedos for that :)
21:24 RSpliet: imirkin_: empty timing table, empty ram configuration strings...
21:25 karolherbst: dos only updates are the worse
21:25 imirkin_: RSpliet: is that expected?
21:25 karolherbst: *worst
21:25 Plagman: yeah it's sad
21:25 RSpliet: imirkin_: it reduces the problem to "set PLL"
21:25 imirkin_: RSpliet: so it should just work then? :)
21:26 RSpliet: as soon as all code paths expect this
21:26 imirkin_: well, don't go breaking it all at once
21:28 karolherbst: Marex: sorry, but yours have to wait :(
21:29 RSpliet: Fine, I'll generate a trace...
21:31 karolherbst: mhh, I could have switched over & 0x21
21:31 karolherbst: meh
21:38 karolherbst: pmoreau: do you have your nve7 plugged?
21:38 karolherbst: uhm, this was a macbook, right?
21:39 karolherbst: mupuf: can you plug the nv108 into reator?
21:39 RSpliet: karolherbst: did you already have PCIe bus width config for G92?
21:39 RSpliet: (and more importantly, speed)
21:39 imirkin_: i think he does yea
21:39 RSpliet: cool
21:39 karolherbst: pcie speed is done for g9x+
21:39 karolherbst: I just enabled g92 in 4.10
21:39 karolherbst: I missed it
21:39 karolherbst: for whatever reason
21:40 karolherbst: but it was enabled for g94+ since the first merge
21:40 karolherbst: (in fact, g8x is done as well, but the GPUs doesn't like it, so they just crash)
21:41 Marex: karolherbst: what ?
21:41 karolherbst: maybe there are some pre v2 spec pcie boards at nvidia
21:41 karolherbst: and it would work with those
21:41 karolherbst: no idea
21:41 karolherbst: would be fun to try that out though
21:42 karolherbst: Marex: pcie speed stuff on yours
21:42 Marex: karolherbst: I have working X11, I can do my kernel work, good enough
21:42 Marex: karolherbst: so thanks
21:42 karolherbst: k
21:43 karolherbst: RSpliet: it just isn't parsed out if the PM_Mode table has a wrong version
21:43 karolherbst: RSpliet: do you have a g92?
21:47 RSpliet: I just uploaded a G92 VBIOS
21:47 karolherbst: I think I know where the byte is for older table versions
21:48 RSpliet: I wasn't making any assertions about whether it works or not, just... asking for a friend of a friend
21:48 karolherbst: well, in works technically
21:48 karolherbst: *it
21:48 RSpliet: but only on a perf lvl change?
21:48 karolherbst: yes
21:48 karolherbst: well
21:48 karolherbst: if we can parse it out os the vbios
21:48 karolherbst: it only works for PM_Mode tables with ver 0x40
21:49 karolherbst: which isn't use on all teslas
21:49 RSpliet: nope, this is 0x35
21:49 karolherbst: yeah
21:49 karolherbst: I think it is in bytes 0x11 and 0x12
21:49 karolherbst: of the header
21:49 RSpliet: link width is displayed by nvbios already
21:49 RSpliet: speed isn't
21:49 karolherbst: yeah
21:49 karolherbst: I know
21:49 karolherbst: the width should be alwways the same though
21:49 karolherbst: and always the same the card boots at
21:50 karolherbst: most GPUs even crashed when trying to change the width
21:50 karolherbst: it worked on one or two GPUs in total
21:50 karolherbst: and they booted with the highest width already
21:50 karolherbst: so meh
21:50 RSpliet: not bits 0x11 and 0x12 for sure
21:51 RSpliet: unless 00 00 means 5GT/s x16
21:51 karolherbst: let me check again
21:51 karolherbst: RSpliet: did you check my patch?
21:52 karolherbst: there you see how insane this is
21:52 karolherbst: I expect 0x35 to not be any more sane
21:52 RSpliet: haven't seen it no
21:53 RSpliet: but you haven't seen insanity until you've seen Ben
21:53 karolherbst: :D
21:53 RSpliet: ... figuring out the ram training patterns
21:53 karolherbst: ohh I saw enough
21:54 karolherbst: I am sure it makes all sense after it was REed
21:56 karolherbst: I am sure we parse the width wrongly as well
21:56 RSpliet: imirkin_: Oh I know how to tackle this G92. With several layers of tediousness
21:57 RSpliet: something for another day
21:58 karolherbst: and seriously, who did this
22:02 RSpliet: karolherbst: They reconfigure the PCIe link speed in my trace several seconds before the first reclock
22:02 karolherbst: is it always 5.0?
22:02 karolherbst: on every perf level?
22:02 RSpliet: there is only one perflvl
22:02 karolherbst: mhhh
22:03 RSpliet: which has no bits free that can possibly encode the link speed
22:03 karolherbst: you would be surprised
22:03 RSpliet: unless 00 means 5GT/s
22:03 karolherbst: on 0x40: 0x01 means 2.5, 0x20 means 5.0 anything else is 8.0
22:03 karolherbst: as a bitmask
22:03 karolherbst: so 0x31 is still 2.5
22:04 karolherbst: and 0x00 of course 8.0
22:04 karolherbst: but so does 0xde mean 8.0
22:04 RSpliet: yeah, but _all_ non-0 bits are taken in the perflvl by other functions already
22:04 karolherbst: but to be fair, that byte also change some other pcie config stuff as well
22:04 karolherbst: I thought there are some free ones
22:04 karolherbst: I am pretty sure there is
22:05 karolherbst: what about 0xe+0xf?
22:05 RSpliet: 00 00
22:06 karolherbst: ohh, they are most likely clocks anyway
22:06 karolherbst: 0x12 as well
22:06 RSpliet: it's u8 id, u8 unused mask (ff), u16 script, followed by a bunch of clocks in u16, finished with 0x10 == link width 16
22:07 karolherbst: what about the last byte?
22:07 RSpliet: 0x10 == link width 16 - ?
22:07 karolherbst: 0x10 isn't the last one
22:08 RSpliet: it is in my VBIOS... unless you're looking at the header
22:08 karolherbst: entries are of size 0x1d
22:09 karolherbst: I am looking at the g92 vbios
22:09 RSpliet: 0xbff6: 0f ff a1 d2 00 00 64 00 94 02 40 06 84 03 00 00 00 00 00 00
22:09 RSpliet: c00a: 00 00 00 00 00 00 00 00 10
22:09 RSpliet: How is that last byte not 0x10?
22:09 karolherbst: ohhh
22:09 karolherbst: I thought you meant index 0x10
22:09 RSpliet: ah!
22:10 RSpliet: :-D
22:10 karolherbst: stick a 0xff in the byte before
22:10 karolherbst: maybe it goes only to 2.5 then
22:10 RSpliet: I suspect, since there's an 8 second gap between PCIe config and reclock - that pre 0x40 it's not a per-perflvl setting
22:11 karolherbst: doesn't have to be
22:11 karolherbst: do they read the pcie state?
22:11 karolherbst: the pcie config is always checked on boot as well and adjusted
22:11 karolherbst: completly uncoupled from the clock
22:12 RSpliet: I'll xz the trace and push it for you
22:12 karolherbst: they do fancy things like v1 -> v2 version jump
22:12 karolherbst: the trace won't help me
22:12 karolherbst: I need to know which byte from the vbios to read
22:12 karolherbst: if none changes the link speed, then there is none
22:13 karolherbst: :O lol
22:13 karolherbst: unity is droped
22:14 RSpliet: good riddance
22:17 karolherbst: 18.4 LTS will come with gnome instead
22:37 mwk: mooch: the scans repo was meant to store reg scans of various reg ranges on many cards
22:37 mwk: I kind of made a lot of them
22:38 mwk: but when I attempted to actually push it, github rejected it because of too long files
22:47 RSpliet: heh, no point testing today's work on a card that reclocks fine if you just omit link training altogether
22:54 RSpliet: imirkin_: let me know when you've succesfully given your gt240 a whirl. nvapeek 0x10f8c0 and 0x10f900 if you're curious whether the pattern's been uploaded
23:01 mooch: mwk: oh
23:12 Lyude: just a crazy theory here, what is the chance https://trello.com/c/IAqZXMmt/154-gm200-amd-vertex-shader-layer-viewport-index might somehow rely on GL_NV_viewport_array2?
23:13 Lyude: It's mentioned in that shader test, and removing it doesn't seem to change the test at all on the nv blob
23:13 Lyude: but then again I've also managed to get very broken shaders to run on nv without any complaints...