02:54 Leftmost: imirkin, I don't suppose you have a trace of the tess sanity test from Kepler lying around.
02:55 imirkin: Leftmost: mmmmmaybe
02:56 imirkin: but it won't help you
02:56 imirkin: Leftmost: why kepler?
02:57 imirkin: Leftmost: https://people.freedesktop.org/~imirkin/traces/gk208/ - the quads one is there
02:57 Leftmost: Mostly wanted it as a reference. I wanted to get a handle on how the existing code implements what the trace shows.
02:58 imirkin: well, it's quite different
03:04 Leftmost: Even the sanity test's trace has a lot of data to sift. I found what I believe is the control shader, but by and large all of this is pretty opaque.
03:05 Leftmost: To me, at least.
03:05 imirkin: search for TCP
03:05 imirkin: and TEP
03:05 imirkin: TCP = tess control program
03:05 imirkin: TEP = tess eval program
03:05 imirkin: and yeah, maxwell is very different
03:06 imirkin: it tries to find that stupid address
03:10 airlied: imirkin: how different is maxwell tess?
03:13 Leftmost: Yeah, found the TCP and also the tessLevel{Outer,Inner} bits. Honestly, though, it's all making me feel a bit stupid.
03:14 Leftmost: I mean, I'll soldier through, but I'm not sure where to turn for guidance.
03:16 imirkin: airlied: the way that the outputs/inputs are read in in TCP/TEP programs requires more manipulation
03:17 imirkin: airlied: i suspect it'd be a full day of concentrated work for me with access to 2 separate systems. probably more time for people less familiar with the ins and outs of codegen.
03:17 imirkin: also i have access to 0 systems with maxwell (although i guess reator is one)
03:17 imirkin: Leftmost: it's a lot to take in
03:18 imirkin: Leftmost: if you can complete the project in under a week of full time work, i'd be rather surprised.
03:19 Leftmost: That would be a miracle. :) Not many paths in but the deep end, though, I suppose.
03:19 imirkin: Leftmost: basically there are load/store instructions. they take an offset "inside" the vertex, as well as a "lane" parameter. the way that lane parameter is computed needs to be worked out. also i don't remember exactly, but some/all tcs outputs might be just directly written to memory "by hand"
03:20 imirkin: airlied: my caring level about maxwell is fairly low though, since it's a largely locked down platform
03:20 imirkin: it's just the GM10x's, and my theory is there aren't too many of those running around
03:23 Leftmost: Do the GM20x chips behave yet differently? My chip is GM204.
03:24 dcomp: hey I have a GM10x (x being 8)
03:25 imirkin: Leftmost: GM20x are most likely the same
03:25 imirkin: dcomp: yes, sorry. one of the few :)
03:26 Leftmost: I can run some traces tomorrow.
03:27 imirkin: anyways... i can't get excited about doing it. which is why it has remained not-done for the past ... while.
03:27 imirkin: (6 months i guess?)
03:30 Leftmost: I can get excited about doing it, but it means that poor imirkin and hakzsam will spend a lot of time answering stupid questions.
03:31 imirkin: maybe. so far i can't remember you asking one such question
03:42 Leftmost: Off for the night. Thanks for your help today.
10:12 karolherbst: mupuf_: I totally forgot that I still set the voltage very second in my reclocking patches. But I have also no good idea how to prevent this. And there are several problems
10:13 karolherbst: 1. set sw voltage != set hw voltage
10:13 karolherbst: 2. computer/GPU suspend changes the set voltage
10:18 karolherbst: mupuf_: I was thinking I just save what the code writes into the hardware and reset the value everytime suspend happens and only set the voltage if we would actually change it
10:18 karolherbst: but maybe you have a better idea how to deal with that?
10:53 mwk: ok guys
10:53 mwk: I'm going to create envytools/llvm + envytools/clang repos
10:57 mwk: I'm going to be working on the "falcon" branch until the dust settles and the compiler is actually usable
10:59 mwk: you're welcome to look, but it's far from ready for use (for one, there's no support for anything involving stack yet)
10:59 mwk: Weaselweb: you've expressed interest in the port, here you are
11:01 mwk: oh, and I'm going to do lots of rebasing, so... careful
11:02 karolherbst: mwk: do you think it is already possible to install interrupt handlers and communite with the host?
11:02 mwk: no
11:02 karolherbst: okay
11:02 mwk: it's really, really WIP
11:02 pmoreau: Rebasing is awesome! :-)
11:03 mwk: "anything involving stack" includes saving GPRs on function entry
11:03 karolherbst: mwk: ahh, I see
11:03 mwk: also, there's a major problem of not having a linker yet
11:03 karolherbst: mwk: well I was just thinking if it would be already possible to upload it to the falcons and play with that and just try out really really basic stuff
11:04 mwk: linker is kind of crucial to that :)
11:04 karolherbst: well we need a linker if we have multiple object files, right?
11:04 mwk: you need a linker if you have object files
11:04 mwk: including 1 object file
11:04 karolherbst: ohh right
11:04 mwk: matter of fact, emitting object files is not so good either
11:05 karolherbst: can't we just create a simple static binary?
11:05 karolherbst: and ar them together?#
11:05 mwk: nope, llvm has no support for static binary output
11:05 karolherbst: uhh
11:05 mwk: seriously, we need a linker
11:05 karolherbst: okay
11:05 mwk: and object file emitter, for that matter... it currently doesn't support relocations
11:06 karolherbst: yeah well, in the end I can still write some C code I would want to write and see how well the compile goes :D
11:06 mwk: have fun, I use clang --target=falcon3 -S -O1 x.c -o /dev/stdout
11:07 mwk: don't use anything other than falcon3, it's not supported anyway
11:07 mwk: and don't forget -O1, otherwise clang will put every local variable on the stack...
11:08 karolherbst: ahh okay
11:08 mwk: if you get a selection error, this means I'm missing an instruction - most likely selectcc
11:08 karolherbst: is stack support so messy to implement? I thought there are just those push/pop instructions, or is there more?
11:09 karolherbst: (besides limited stack size)
11:09 mwk: karolherbst: it's not particularly messy, but it needs to be done
11:09 karolherbst: ahh I see
11:09 mwk: I just created a repo on github so that I don't lose my work
11:09 mwk: I've spent like 5 days of work on this thing so far, don't expect too much from it yet :)
11:10 karolherbst: right :)
11:10 mwk: and I have different things to take care of before the stack frames
11:10 mwk: for one, I'd like to stabilize and review the MC component first
11:10 mwk: ie. assembler / disassembler
11:10 karolherbst: yeah, I am not that good with general compiler stuff and have no clue what has to be done :D
11:11 mwk: well... lots of things :)
11:11 mwk: the target stuff in lib/Target/Falcon is made of 6 components
11:11 mwk: AsmParser, Disassembler, InstPrinter, MCTargetDesc, TargetInfo, and CodeGen
11:12 karolherbst: mwk: any suggestions how to build llvm and clang so that it does only contain the falcon stuff?
11:12 mwk: TargetInfo is trivial, it just says "hi I'm Falcon"
11:12 mwk: MCTargetDesc mostly describes the binary format... ELF customization, relocations, etc.
11:12 mwk: Disassembler... I think you can figure out
11:13 mwk: what AsmParser does is obvious, and it's used for inline assembly and for the standalone assembler
11:13 karolherbst: allthough I would have thought that the Disassembler would come for free if you described all the instructions and the target already :/
11:14 mwk: and InstPrinter is for clang -S, ie. assembly output from compiler
11:14 mwk: karolherbst: it mostly is
11:14 mwk: everything relating to ISA is defined by FalconInstrInfo.td
11:14 mwk: eg.
11:14 mwk: def DIVrri16 : InstUBRRI16<0xc, "div", [(set gpr32:$R2, (udiv gpr32:$R1, imm32zx16:$I16))]>;
11:15 pmoreau: karolherbst: LLVM_TARGETS_TO_BUILD="falcon3" would be my guess. maybe in capital letters
11:15 mwk: LLVM_TARGETS_TO_BUILD="Falcon" actually
11:15 mwk: falcon3 is technically a subarch
11:15 karolherbst: yeah well, but I tink one could throw out a lot more
11:15 karolherbst: make I just check ccmake
11:15 pmoreau: Makes sense
11:15 karolherbst: *maybe
11:16 mwk: anyhow... this says DIVrri16 (internal name of instruction) is of format InstUBRRI16, subop 0xc, mnemonic div, and lists the operands
11:16 mwk: tablegen is an awesome tool that converts that all to tables used by assembler, disassembler, and instruction selection
11:17 karolherbst: "The target `Falcon' does not exist."
11:17 mwk: the last argument is a pattern, and ISel will automatically find matching sub-DAGs in the code to compile, and convert them to this instruction
11:17 mwk: the only problem is... I have to define what InstUBRRI16, gpr32, imm32zx16 mean
11:17 mwk: and some of that is done only for ISel right now, but not for disassembler
11:17 mwk: or not for MC object emitter
11:18 mwk: so once I do that, all instructions that use the relevant operands will magically work in disassembler
11:18 mwk: karolherbst: hold on
11:18 mwk: first off, the directory structure
11:18 mwk: you have to clone the llvm repo somewhere
11:18 mwk: and then clone the clang repo to <llvm repo>/tools/clang
11:19 mwk: as for cmake
11:19 mwk: please use a separate build directory
11:19 mwk: I have this option: -DLLVM_EXPERIMENTAL_TARGETS_TO_BUILD="Falcon"
11:20 mwk: I think I missed some list somewhere, it's not picking up Falcon as a "normal" target :(
11:20 karolherbst: ahh
11:20 karolherbst: yeah that works
11:20 mwk: I have llvm compiled for all targets, but I think -DLLVM_TARGETS_TO_BUILD="" could work to disable the rest?
11:20 mwk: also
11:20 mwk: you may want to compile llvm in release mode
11:21 mwk: debug builds take a ridiculous amount of space and time
11:21 mwk: -DCMAKE_BUILD_TYPE=Release
11:21 karolherbst: well I will disable like everything anyway in ccmake
11:21 karolherbst: assertions for example
11:21 mwk: then use Release mode
11:21 mwk: it'll take care of it for you
11:22 mwk: also, you may want -DBUILD_SHARED_LIBS=On
11:22 mwk: it speeds up rebuilding
11:22 mwk: it does introduce a few minor bugs, but nothing we care about
11:23 mwk: [eg. any program will display options for all programs in the suite - harmless]
11:24 karolherbst: I wonder now if it would be more work to add support inside gcc
11:24 mwk: karolherbst: yes.
11:24 mwk: I've worked on gcc before
11:24 mwk: it's no comparison
11:24 karolherbst: okay
11:24 mwk: also, if you go with gcc, you also get to write a linker + assembler on your own
11:25 karolherbst: well the thinking I have was, that if we would have gcc support we might actually have it easier to compile those falcon files while building the kernel
11:25 mwk: bfd sucks like a thousand Vaxen
11:25 mwk: haha, no
11:25 mwk: gcc has one major flaw
11:25 mwk: you can only configure it for one target at a time
11:25 karolherbst: ...
11:25 karolherbst: right
11:26 karolherbst: I totally forgot about that
11:26 karolherbst: that sucks big time
11:26 mwk: so the only way to get a gcc capable of targetting Falcon is to explicitely make a cross-compiler
11:26 mwk: otoh, if you have clang, you can target all supported platforms just by passing --target
11:26 mwk: *and* you don't need a separate assembler at all
11:27 mwk: (linker is a different matter, but I think lld will be enough for our purposes)
11:27 karolherbst: I am already wondering what would happen if gcc would become unused later
11:27 mwk: I'll open champagne
11:27 karolherbst: yeah, I was more speaking about the fsf
11:28 mwk: then they'll get what they deserve
11:28 karolherbst: GPLv4: has to be compileable with gcc :D
11:28 karolherbst: or rather with a "free" compiler
11:28 mwk: they purposefully rejected integration and plugin systems for a long time
11:28 karolherbst: I know
11:28 karolherbst: researchers were annoyed big time
11:30 karolherbst: maybe if the kernel can be compiled with llvm it may change the situation a lot
11:31 mwk: yeah
11:32 mwk: but then, kernel devs can be as boneheaded as fsf
11:33 mwk: not compiling with clang isn't exactly clang's fault
11:33 karolherbst: right
11:33 karolherbst: but I don't think they will mind a lot
11:34 karolherbst: because there are already some changes made to improve the situation
11:36 Weaselweb: mwk: sounds great. any URL to clone from?
11:36 karolherbst: mwk: I was sometimes things what some could do with the userspace page fault handling support. Maybe we could report those mmiotrace page faults to a llvm based process and disassemble all the things nicely or something :/
11:36 pmoreau: Weaselweb: https://github.com/envytools/clang https://github.com/envytools/llvm
11:36 karolherbst: but I also never checked what the userspace page fault handler does
11:39 Weaselweb: pmoreau: thanks a lot
11:39 mwk: I'm pretty sure handling kernel page faults in userspace is a bad idea
11:39 Weaselweb: btw: which hardware uses that falcon architecture? are there any docs?
11:40 karolherbst: mwk: llvm/tools/clang/lib/Parse/ParseStmtAsm.cpp:576:33: error: no matching function for call to ‘llvm::MCObjectFileInfo::InitMCObjectFileInfo(const llvm::Triple&, bool, llvm::CodeModel::Model, llvm::MCContext&)’
11:40 karolherbst: any ideas?
11:42 karolherbst: mwk: well mmiotrace also doesn't handle page faults
11:42 karolherbst: mwk: mmiotrace just wants to know what instructions accesses the mmio region
11:43 karolherbst: but currently the repeat instructions aren't supported
11:43 karolherbst: and we get them with recent nvidia driver releases
11:43 mwk: karolherbst: are you using branch falcon on both repos?
11:44 mwk: Weaselweb: file:///home/mwk/envytools/docs/_build/html/hw/falcon/index.html
11:44 mwk: errr wait
11:44 karolherbst: mwk: ohhhh, right
11:44 karolherbst: mwk: wasn't using it for clang :D
11:44 mwk: Weaselweb: http://envytools.readthedocs.io/en/latest/hw/falcon/index.html
11:45 Weaselweb: mwk: thanks
13:30 mwk: hmm, all signs point to Falcon v4 having interrupt vector #2
13:31 mwk: there are enable bits for that, and a breakpoint condition
13:32 mwk: but... no new vector registers, and no way I can find that would actually trigger the interrupt
13:33 mwk: maybe they reued the exception vector...
14:20 mupuf_: karolherbst: does resume restore the clocks?
14:20 karolherbst: mupuf_: with my patches, yes
14:21 karolherbst: you can even set the pstate while the gpu is off without issues
14:21 karolherbst: so I took care of all those runpm issues while reclocking too
14:22 karolherbst: mupuf_: well I've added a force flag exactly for that, so that the code reclocks even when the code things the right freqs are already set
14:22 karolherbst: *driver
14:22 karolherbst: maybe I could do the same for the voltage
14:23 mwk: another interesting thing
14:23 mwk: one of the unknown optional features of v4 Falcon is called "imem auto fill" according to gk20a headers
14:24 mwk: code paging on demand?
14:25 karolherbst: mwk: another thing which is kind of in my head: I think the nvenc falcon might have direct access to the fb data, but that should be unrelated to anything you do
14:25 mwk: direct access to fb data?
14:25 karolherbst: or display data
14:25 mwk: what kind of direct?
14:25 karolherbst: mhh well on windows you have that feature called shadowplay
14:25 karolherbst: where you can tell the driver to buffer h.264 encoded video frames of the games you play
14:25 karolherbst: for like 30 minutes
14:26 karolherbst: and get a video file for the past 30 minutes
14:26 karolherbst: that's done throught nvenc
14:26 karolherbst: *through
14:26 mwk: and...?
14:26 mwk: nothing special so far
14:26 karolherbst: yeah, but I was curious how nvenc gets the data
14:26 mwk: all Falcons can access VRAM...
14:26 karolherbst: ahh
14:26 karolherbst: didn't know that
14:26 mwk: that's kind of part of the job description
14:27 mwk: the access is rather unwieldy (you have to DMA it to Falcon memory and back), but quite well-known
14:28 karolherbst: I see
14:28 karolherbst: so it shouldn't be any problems for the falcons to get the data for each frame displayed?
14:29 mwk: karolherbst: you have too much faith in marketing documentation
14:30 mwk: "shadowplay" is a driver feature
14:30 karolherbst: yeah I know
14:30 mwk: it translates to *nothing* in hardware
14:30 karolherbst: I was just curious how nvenc gets the video data
14:30 karolherbst: either if the host reads it out and pushes it back to the gpu
14:30 mwk: basically, every time someone renderes a frame, some process sends commands to nvenc to encode it
14:30 karolherbst: or if the nvenc engine can somehow directly access it
14:30 mwk: passing it the virtual address of the frame in some address space
14:31 mwk: ie. exactly nothing interesting
14:31 mwk: exactly the same mechanism as the one that renders to the frame in the first place
14:32 karolherbst: ahh okay
14:32 karolherbst: so it would be even possible to implement this on older falcons with decreased performance, because you would have to implement a h.264 encoder
14:32 karolherbst: and don't get fancy video instructions
14:32 mwk: implementing a software H.264 encoder on a Falcon is not exactly a good idea
14:33 mwk: the whole point of nvenc is that it's fast
14:34 mwk: encoding real-time H.264 of reasonable quality is beyond the reach of software-only Falcon
14:34 karolherbst: yeah I know
14:35 mwk: if you really wanted to implement this on older cards, it'd be a better bet to slurp it back to the CPU
14:35 mwk: it has better chances of dealing with it than a Falcon
14:36 imirkin_: or do it in compute shaders :)
14:36 mwk: or just us the graph engine
14:36 mwk: yep
14:36 imirkin_: didn't they have a demo that did h264 in compute?
14:36 imirkin_: not as fast as hw, but still fast-ish
14:36 karolherbst: I think there are plenty cl based h.264 encoders
14:36 mwk: the thing is, it slows down the game itself
14:37 karolherbst: yeah
14:37 imirkin_: can't win 'em all
14:37 karolherbst: nvenc doesn't affect gaming performance
14:37 karolherbst: or not significantly
14:37 imirkin_: coz it's a hw encoder
14:37 karolherbst: right
14:37 karolherbst: for streaming it is really usefull :D
14:38 mwk: hmm, nvenc is one of the few Falcons without the auto-fill cap, btw
14:38 mwk: along with PSEC and the unknown 1C3 one
14:39 mwk: perhaps it actually has reasonably-sized code? :)
14:39 karolherbst: I would expect that it is rather big
14:39 karolherbst: because you can fine tune the encoder a lot
14:39 mwk: 32704 bytes
14:39 karolherbst: it's not like you only get 3 presets you can choose from or something
14:40 mwk: 0x6d00 bytes out of that contain what looks like Falcon code
14:40 mwk: and the code RAM is... 0x4000 bytes
14:41 Calinou: I wonder if we can get VP9 hardware-accelerated
14:41 Calinou: or VP8 (mostly for encoding)
14:41 Calinou: Daala will probably suffer from the same hurdle of HW acceleration :(
14:41 mwk: compare that to PVLD, which has 0x2000 bytes of RAM, and twice that amount of code
14:42 mwk: I guess nvenc doesn't need that much code paging
14:42 karolherbst: well I think it only supports h.264 anyway
14:42 karolherbst: maybe more on newer chips
14:43 karolherbst: oh 2nd gen maxwell got h.265 support
14:43 karolherbst: and 4k support for h.264
14:43 imirkin_: only GM206
14:43 imirkin_: not GM204 apparently
14:43 karolherbst: odd
14:44 karolherbst: uhh
14:44 karolherbst: pascal got 10 bit encoding
14:47 Leftmost: Calinou, I was under the impression that a fair few chips do have VP9 decode support, at least.
14:49 Calinou: what about NVIDIA ones? and desktop chips in general?
14:49 Calinou: that's kind of needed if you want free formats to really succeed at least on the desktop :/
14:50 Leftmost: According to Wikipedia, GM206 supports it and GP104 will as well. No source listed, though.
20:48 karolherbst: gnurou: are you there and got some time?
20:56 pmoreau: karolherbst: Still a bit early for him I would say: it’s 5:55am in Tokyo. :-D
20:56 karolherbst: uhhh
20:56 karolherbst: mhh
20:57 karolherbst: stupid 6am
20:57 karolherbst: it is always either too early or too late
20:57 pmoreau: :-)
21:02 pmoreau: RSpliet: Which VRAM type does not reclock so well for Tesla cards?
21:02 RSpliet: 2nd gen Tesla has some trouble with DDR2
21:02 imirkin_: DDR2 for older cards, GDDR5 for GT215
21:02 RSpliet: first gen tesla has no implementation for DDR2
21:02 pmoreau: Ok, thanks for the info!
21:03 RSpliet: GT215 GDDR5 is equally non-existant, although stuff in my tree wrt generating MRs could be helpful there
21:03 imirkin_: RSpliet: let me know if you want me to test
21:03 pmoreau: For some reason, I was thinking of DDR3 or GDDR3… --"
21:03 RSpliet: 1st gen Teslas don't have DDR3 to the best of my knowledge
21:39 hakzsam: karolherbst, do you have the realistic one demo from UE4?
21:39 hakzsam: I would like to know how it works on your kepler
21:40 karolherbst: mhh the elemental demo?
21:40 hakzsam: I'm trying to fix the rendering issue on gf119
21:40 hakzsam: nope
21:40 imirkin_: one of the other demos
21:40 karolherbst: uhh, nope, then I don't have it
21:40 hakzsam: karolherbst, http://ue4linux.raxxy.com/realistic_rendering_demo.tar.bz2
21:40 hakzsam: if you have time to test, that could be nice :)
21:40 karolherbst: yeah, I have time
21:41 hakzsam: did you already clone the gl43's branch of ilia?
21:41 karolherbst: hakzsam: yeah
21:41 hakzsam: cool
21:42 hakzsam: imirkin_, does the lmem_size fix is part of your branch?
21:42 karolherbst: hakzsam: so I should test with the gl43 branch?
21:42 hakzsam: yep
21:42 karolherbst: k
21:43 hakzsam: karolherbst, make sure to update the branch
21:43 hakzsam: maybe ilia has rebased
21:43 karolherbst: I checked already
21:43 imirkin_: hakzsam: not pushed
21:43 hakzsam: karolherbst, and you have to apply https://lists.freedesktop.org/archives/mesa-dev/2016-May/117382.html
21:44 hakzsam: which should prevent the MEM_OUT_BOUNDS thing
21:44 karolherbst: also the first patch?
21:44 hakzsam: no
21:44 karolherbst: well either way, it doesn't work
21:44 hakzsam: the first patch is fermi only
21:44 hakzsam: please elaborate :)
21:44 hakzsam: what are you seeing?
21:45 karolherbst: disco lights :D
21:45 karolherbst: ahh now it is white
21:45 hakzsam: cool, same issue
21:45 hakzsam: wait I'll send you a patch
21:46 karolherbst: well compute shader were also broken in TR, so I will retest there too I guess if something gets fixed
21:47 hakzsam: http://hastebin.com/ediwopijik
21:47 hakzsam: this should get rid of the disco mode
21:47 hakzsam: for really weird reasons
21:48 karolherbst: well
21:48 karolherbst: okay
21:48 karolherbst: it looks better
21:48 hakzsam: don't you have a black screen?
21:48 karolherbst: nope
21:48 karolherbst: it seems to work
21:48 karolherbst: but
21:49 karolherbst: in the upper left corner there is sometimes a colored rectangle
21:49 karolherbst: sometimes redish
21:49 karolherbst: sometimes yellowish
21:49 karolherbst: and somtimes it covers the entire screen
21:49 hakzsam: on gf119 only the first frame is correctly rendered
21:49 karolherbst: transparently though
21:49 karolherbst: nope, it looks fine besides the colored rectangle
21:49 karolherbst: I can more and look around
21:49 karolherbst: *move
21:49 hakzsam: could you please take a screenshot?
21:50 karolherbst: ohhh
21:50 karolherbst: "fifo: read fault at 00199a4000 engine 00 [GR] client 01 [GPC2/T1_0] reason 02 [PTE] on channel 2 [00bf890000 RealisticRender[3571]]"
21:50 karolherbst: after moving around for a while
21:50 hakzsam: well, don't move for now :D
21:50 karolherbst: :D
21:50 karolherbst: it is fine though
21:51 hakzsam: so, we have multiple issues
21:51 karolherbst: I count two
21:51 karolherbst: https://i.imgur.com/fFJ39GR.png
21:51 hakzsam: thanks
21:51 hakzsam: actually, 3
21:51 hakzsam: because there is a different one on fermi
21:52 hakzsam: imirkin_, well, I think the tex constraints thing is broken on kepler (for surface ops)
21:52 karolherbst: ahh
21:52 karolherbst: it seems lightning related
21:52 hakzsam: imirkin_, that might explain the rendering issue
21:53 hakzsam: one issue at a time :)
21:53 karolherbst: yeah, it is related to the sun lightning
21:53 karolherbst: if I look at the door, no issues :D
21:53 hakzsam: thanks for testing, I'll have a look at the first issue (the disco mode)
21:54 karolherbst: mm the colord rectangle seems to be some kind of texture
21:57 hakzsam: imirkin_, 1412: not $p1 sustp 2D $r0 $s0 f32 # u8 $r0d c15[0x84] $p0 $r4 (8)
21:57 hakzsam: (kepler)
22:02 imirkin_: hakzsam: is there a question?
22:06 hakzsam: nope, should be $r4q
22:06 hakzsam: I know why it fails, just have to find the correct fix :)
22:10 imirkin_: oh
22:10 imirkin_: is c15[0x84] right? wtf is that arg?
22:10 hakzsam: format
22:11 imirkin_: oh, is this output from nouveau_compiler?
22:11 imirkin_: which doesn't set up the bases?
22:11 hakzsam: yeah
22:11 imirkin_: hehe ok
22:15 mwk: karolherbst: you're guilty of hanging my machine :(
22:16 mwk: I noticed you added ethernet support to nva and decided to take a peek
22:16 mwk: ... on a diskless machine
22:17 imirkin_: hehehe
22:17 imirkin_: like running tcpdump over ssh :)
22:17 imirkin_: or gdb'ing into X from an xterm
22:18 RSpliet: reading 8250 debugging information over UART
22:18 karolherbst: mwk: lol? :D
22:20 karolherbst: mwk: but why does it matter that it is diskless? :/
22:20 karolherbst:feels stupid not getting the joke if there is one?
22:21 imirkin_: diskless means it's nfsroot
22:21 imirkin_: nfsroot means it runs over ethernet
22:22 karolherbst: okay, but why should the machine hang?
22:22 imirkin_: machine's fine
22:22 imirkin_: just ... network died
22:22 karolherbst: mhh okay, but why should that happen?
22:22 imirkin_: because he went poking around in it
22:22 karolherbst: ahh :D
22:22 karolherbst:is really stupid indeed
22:23 karolherbst: mwk: also mcp79?
22:23 karolherbst: I added that stuff cause I REed some of those WoL bits
22:24 karolherbst: but the forcedeth source code doesn't match what I found out
22:24 karolherbst: so either I am stupid or the source code is simply wrong
22:24 karolherbst: and maybe it is different for each chipset
22:26 mjg59: Wow, forcedeth
22:26 mjg59: That's a blast from the past
22:27 karolherbst: mjg59: yeah well, it is actually more fun to RE as the GPUs, because there is like no prop driver anymore :D
22:27 karolherbst: mjg59: nvidia figured forcedeth is better already and just dropped their driver
22:28 karolherbst: and stuff like WoL is completly broken there
22:28 karolherbst: so you basically do stuff and hope for the best
22:31 mwk: karolherbst: nope, MCP55
22:32 mwk: my MCP79 appears to have lost its CMOS memory and won't boot
22:32 mwk: err wait, that's my MCP77
22:32 mwk: my MCP79 is fine :)
22:32 karolherbst: :)
22:33 karolherbst: if you are interested in WoL I could hack something together, but it is a bit messy to implement :/
22:33 mwk: I might, actually
22:33 karolherbst: well
22:33 mwk: it would be nice to have this fleet netbootable
22:33 karolherbst: but the hardware has a real good wol support though
22:33 karolherbst: much more advanced than the linux one
22:33 mwk: and they're mostly nv ethernets
22:34 karolherbst: the ethernet device does some masked hashing on the eth frame
22:34 karolherbst: so you configure a mask (up to 256 bytes or something?) and then add a hash for that
22:34 mjg59: karolherbst: Intel hardware supports that
22:34 mjg59: Linux doesn't
22:35 karolherbst: yeah well, linux has shitty wol support
22:35 mjg59: Basically lets you do things like configure WoL on connections to port 22 and so on
22:35 mjg59: Although you need something to do proxy ARP for you
22:35 karolherbst: :/
22:35 karolherbst: the heck?
22:35 mwk: huh, fun stuff
22:35 karolherbst: well, if you need a proxy ARP, the wol support is useless
22:36 karolherbst: seriously
22:36 mwk: *shrug* it'd work for my configuration
22:36 mjg59: karolherbst: IIRC Microsoft have a spec for getting routers to do that
22:36 karolherbst: like I don't care?
22:36 karolherbst: if you have to configure the router, then it is broken by design
22:36 karolherbst: simply put
22:36 mwk: but then... I'm perfectly happy sshing to the pupper master and telling it to shout target MAC
22:36 mjg59: How else are you going to do it?
22:37 mjg59: No way you're getting a unicast packet to a sleeping machine otherwise
22:37 karolherbst: mjg59: send an IP package to the suspended machine and let it wake up via Wol?
22:37 karolherbst: why should the router care
22:37 mwk: karolherbst: before you can send an IP packet, you need to perform ARP
22:37 mwk: someone has to do this
22:37 mjg59: I'm at work. My machine at home is asleep. How do I get a packet to it?
22:37 mwk: you could do a wake on ARP, I suppose
22:37 karolherbst: yeah well
22:37 mwk: but then you can't make a rule that says "only on port 22"
22:37 mjg59: mwk: Yeah, but that loses the granularity
22:37 mjg59: Right
22:38 karolherbst: well I got it working on my system
22:38 mwk: so that leaves you with proxy ARP
22:38 mjg59: The idea was basically "Let the router know you're going to sleep just before you go to sleep, tell it you're back when you wake up"
22:38 karolherbst: nope, I didn't had to configure a thing on my router
22:38 karolherbst: it just caches the ARP entry forever (basically)
22:39 mjg59: karolherbst: Yeah if your router has an infinite arp cache it'll work
22:39 mjg59: But there's typically no guarantees around that
22:39 karolherbst: right
22:39 karolherbst: yeah, but anything else is a big hack
22:39 karolherbst: some even suggested iptables rules....
22:39 mjg59: A big hack that can provide actual guarantees :p
22:39 karolherbst: well you can still wake up on ARP stuff
22:39 karolherbst: and go to sleep a second later or something
22:40 karolherbst: you can catch broadcasts really easy if the router doesn't try to be uber smart
22:41 mjg59: Well, you can also set the filter to only match your address for the ARP
22:41 mjg59: This would work better if we actually gave userspace any way to know what caused the wakeup
22:41 karolherbst: mhhh, I think the nvidia ethernet devices can do some basic stuff while the host is asleep though
22:43 karolherbst: I played around with the wake up on any phyiscal package and got annoyed why my machine wakes up like every 5 minutes. Some IPTV boradcasting stuff got through....
22:43 mwk: hehe
22:43 mwk: that's going to be fun to RE
22:43 karolherbst: yeah well
22:43 karolherbst: for MCP79 I am mostly done
22:44 karolherbst: there are 6 slots
22:44 karolherbst: where you can put 6 masks + hashed value
22:44 mwk: care to stuff the doc in envytools?
22:44 mjg59: Huh. Pretty much the same as Intel.
22:44 karolherbst: and if one ethernet frame matches -> wakeup
22:44 mjg59: Wonder if Microsoft have some minimal requirements around that.
22:44 mjg59: karolherbst: Are you planning on trying to add some sort of ethtool interface for this?
22:44 karolherbst: mjg59: do I look insane? :D
22:44 mjg59: I imagine trying to get new userspace interfaces through netdev will be a great use of your time
22:45 mjg59: Haha
22:45 mjg59: Yeah basically
22:45 mwk: karolherbst: you're reverse engineering an old NIC
22:45 karolherbst: well first off, I don't think the forcedeth devs are around
22:45 karolherbst: 2. I expect from nvidia to open the hw docs
22:45 karolherbst: cause they just removed their driver
22:45 karolherbst: ...
22:45 hakzsam: karolherbst, do you have the reflection demo?
22:45 karolherbst: hakzsam: nope
22:46 hakzsam: this one has the same rendering issue
22:46 hakzsam: maybe you can test with my patch?
22:46 mwk: karolherbst: nvidia will open their docs at the rate of one MMIO register per month
22:46 karolherbst: mjg59: does intel hardware alos have a 256 byte mask?
22:46 mwk: maybe in 10 years you can actually write a driver
22:46 karolherbst: mwk: well, there are only 0x1000 :D
22:46 mwk: ... if there are still forcedeths around by then
22:46 mwk: 0x400 :p
22:46 karolherbst: ohh for me it is 0x1000
22:47 mjg59: karolherbst: I can't remember the precise details, but something like that
22:47 mjg59: Also either 5 or 6 slots
22:47 mwk: yep, and each reg takes 4 bytes :)
22:47 RSpliet: karolherbst: are they 1-byte wide?
22:47 karolherbst: nope, 4
22:48 RSpliet: then there's max. 0x400 individually addressable regs in your 0x1000-byte sized MMIO window?
22:48 mwk: probably much less than that
22:48 karolherbst: yeah, didn't check how many are empty
22:48 karolherbst: but those wol range is quite big already
22:48 karolherbst: this covers like 30 or 40 regs alone
22:48 karolherbst: I think even more
22:49 mwk:resists the temptation to check
22:49 RSpliet: maybe insert a hard drive first :-D
22:49 karolherbst: unionfs and mirror into tmpfs :D
22:50 karolherbst: mwk: but yeah, basically we could add all those forcedeth things into rnndb and try to verify everyting :/
22:51 karolherbst: mwk: but I doubt I care enough to actually do that
22:52 hakzsam: karolherbst, could you please give a shot at this patch http://hastebin.com/yaqorufesi ? and remove the old one
22:52 hakzsam: it should work as expected and remove the disco mode as before
22:54 mwk: karolherbst: I'd need to find a keyboard and display...
22:54 karolherbst: mhh, not really
22:54 mwk: also, the machine doesn't exactly have a lot of RAM either :p
22:54 hakzsam: karolherbst, mmh?
22:56 karolherbst: hakzsam: so I need imirkins and your latest paste, right?
22:56 hakzsam: yep
22:56 karolherbst: mhh
22:57 karolherbst: I think it looks a bit different? but still totally wrong
22:57 hakzsam: the realistic demo?
22:57 karolherbst: yeah
22:57 hakzsam: does it look better though?
22:58 karolherbst: well more whitish with a lot of colored dots
22:58 karolherbst: but some disco lights at the start
22:58 karolherbst: hard to tell if I can call that "better"
22:58 karolherbst: I can somehow recognize the window though
22:58 hakzsam: okay well, I see
22:59 hakzsam: my latest should fix an issue anyway, but there is a second one
22:59 karolherbst: ohh going too far to the window causes the gpu hangs
23:00 hakzsam: well thanks, will have a look tomorrow, time to sleep
23:01 karolherbst: hakzsam: mhh okay, I think it is better with your latest patch
23:01 karolherbst: with your patch there is nothing "blinking" anymore
23:01 hakzsam: should be
23:01 hakzsam: but still the disco mode or not?
23:02 karolherbst: well yeah, but if you move the camera it turns into being only small dots with a white background
23:03 hakzsam: okay, we will see tomorrow :)
23:15 Hijiri: I've been using a fermi card, and I know that reclocking support for it is thin. However, I was able to get writing to pstate to work once this morning, but it did not work before and it hasn't worked since (I echoed a couple performance levels to test it out, played a game with much improved performance, and then reclocking stopped working)
23:15 Hijiri: Is there any reason it would work occasionally but not always?
23:16 imirkin: no
23:16 imirkin: unless you were on a kernel that happened to allow it, in which case it'd always be allowed
23:16 Hijiri: If it doesnt work, it gives the "Function not implemented" error
23:16 imirkin: right
23:17 Hijiri: I was doing some other things in between it not working and working, so I'm not sure what could have affected it
23:17 Hijiri: Oh yeah, I'm also using PRIME on a laptop
23:17 imirkin: chances are you were switching between the intel gpu and the nvidia gpu
23:17 imirkin: chances are the intel gpu is going to be faster than the GF108 without reclocking
23:18 Hijiri: yeah, it was faster before I reclocked
23:18 Hijiri: After reclocking though, the nvidia card was much more performant
23:19 karolherbst: Hijiri: what application did you test with?
23:19 imirkin: were you on an experimental branch which enabled reclocking on fermi?
23:19 Hijiri: Rabi-Ribi
23:19 Hijiri: I'm using the packages from Debian stretch
23:19 imirkin: my guess is that no reclock ever happened in the first place
23:19 imirkin: and you're just mistaken about what happened
23:19 karolherbst: let me check something
23:19 karolherbst: imirkin: could be pcie?
23:20 imirkin: karolherbst: is that upstream yet?
23:20 Hijiri: I'm pretty sure I was using the nouveau drivers
23:20 karolherbst: imirkin: yeah, for some time actually
23:20 imirkin: karolherbst: ah ok.
23:20 Hijiri: I used DRI_PRIME=1 with glxinfo, after setting nouveau as an offload provider, and it gave me nvidia info
23:20 karolherbst: but I have no idea if that would work on fermi
23:20 imirkin: but reclocking wouldn't be hooked up at all
23:20 Hijiri: And then I used DRI_PRIME=1 with the game
23:20 Hijiri: I don't know how I could tell while running the game if the game was using nouveau for sure
23:21 karolherbst: imirkin: pcie is in since 4.5
23:21 Hijiri: I'm on 4.4
23:21 imirkin: karolherbst: not _that_ long :p
23:21 Hijiri: if you mean kernel
23:21 karolherbst: right
23:21 karolherbst: well
23:21 karolherbst: Hijiri: I would expect that somethign went wrong
23:21 karolherbst: and that intel ran the game
23:22 Hijiri: It was faster than when I run it with intel, and intel also crashes frequently on the game
23:22 karolherbst: very odd
23:22 karolherbst: you can verify with /sys/kernel/debug/dri/1/clients
23:22 karolherbst: it should list it there if it runs on the nvidia one
23:22 karolherbst: but mhh
23:22 karolherbst: it shouldn't be faster than intel
23:23 karolherbst: no way, except it is some high end gpu
23:23 Hijiri: it's not
23:23 imirkin: well, i'm guessing this is sandybridge or ivybridge paired with a GF108?
23:23 Hijiri: it's ivy bridge
23:23 Hijiri: let me check the graphics card again
23:23 imirkin: ivb ain't exactly a speed demon either
23:23 Hijiri: yeah, GF108M
23:24 karolherbst: well
23:24 karolherbst: gf108 isn't fast either
23:24 Hijiri: I was thinking of getting a kepler card for my desktop anyway, since it was still lagging a little with reclocking
23:24 imirkin: no, i have one - it's horribly slow.
23:24 Hijiri: What's a decent kepler card with stable reclocking support?
23:24 karolherbst: well the 550m doesn't sound as bad :D
23:25 imirkin: Hijiri: unfortunately reclocking isn't "stable" by model, but by specific board
23:25 imirkin: also to get more stable reclocking, you need to use some not-yet-upstream patches
23:25 karolherbst: there are some really odd crashes on some cards we can't really find the cause for
23:27 Hijiri: I don't really know that much about graphics cards, I'm guessing the board is different based on manufacturer/batch?
23:28 imirkin: yeah
23:28 imirkin: what type of ram chips are used, etc
23:28 Hijiri: Could I expect to see used cards list enough information to know if it will work well?
23:28 karolherbst: no clue
23:28 imirkin: so it's not like "GTX 770 works, GTX 760 doesn't". it's much more subtle than that.
23:28 karolherbst: though chances are pretty decent actually though
23:29 karolherbst: still not good enough that I would say "most likely it will work"
23:29 karolherbst: it's getting better though
23:30 Hijiri: Ok, imirkin, karolherbst, thanks for all the help
23:30 Hijiri: I'll have to do more research about any cards I might decide to get
23:30 karolherbst: Hijiri: well, you can enable some reclocking bits on fermi
23:30 karolherbst: but it won't do much
23:30 karolherbst: you may get like 25% more perf?
23:31 karolherbst: more on prime due to faster pcie link
23:31 karolherbst: well
23:32 karolherbst: maybe the pcie link doesn't matter as much on slower card
23:32 Hijiri: pstate has one performance state with 752MHz/1569MHz core clock/memory clock vs the low performance state with 202/324
23:32 karolherbst: because you have the card render a lot of frames so that it is significant
23:32 karolherbst: Hijiri: well, you could uploack the core with a kernel patch, but with bad memory speed you still get bad perf
23:35 Hijiri: Where could I find information on how well reclocking works on different boards?
23:36 karolherbst: nowhere
23:36 Hijiri: Ah
23:36 karolherbst: we are at a point where it really doesn't matter on kepler. It is mostly luck ... well more bad luck that it doesn't work
23:37 karolherbst: I doubt there is much reclocking related going wrong though
23:37 Hijiri: Alright
23:37 karolherbst: allthough there is one issue (or more) which I can't track down yet