03:32 Subv: hey, has there been any recent attempt at providing an offline GLSL compiler for nvidia cards?
03:45 HdkR: A Nouveau based offline compiler? :D
03:46 HdkR: A mesa one would be kind of cool. Pass in shader code + state object or something and it compiles for whatever target
03:47 imirkin: Subv: compiler from glsl to what?
03:47 HdkR: Binary + disassembler I presume
03:47 Subv: ^indeed
03:48 imirkin: hm, i guess not
03:48 imirkin: nouveau_compiler will take TGSI -> binary
03:48 Subv: is there any way to obtain the input TGSI from GLSL?
03:48 imirkin: and glsl_compiler will take glsl to glsl IR
03:49 imirkin: but nothing will do an offline run of glsl ir -> tgsi
03:49 imirkin: not that it couldn't be done, just ... not useful enough
03:49 imirkin: to worry about it
03:49 Subv: it sounds like an useful tool for people who maintain the optimization passes though
03:49 imirkin: anyways, all the pieces are there, all one needs is the will to assemble them
03:49 imirkin: that'd be me. i don't find it useful.
03:50 imirkin: starting from tgsi is a lot more useful
03:50 imirkin: and if i really need to start from glsl, i dump the tgsi while the thing is running
03:54 Subv: i see
03:58 HdkR: imirkin knows the instruction assembly by heart, doesn't need an offline compiler and disassembler to inspect things :)
03:59 imirkin: debug - that's all you need to write code
04:00 imirkin: them assemblers... that's just for wusses!
04:02 HdkR: I just write code without bugs </s>
04:03 imirkin: that's really the best approach
04:03 imirkin: and i hardly believe that's the end of your (or my) sarcasm...
04:04 HdkR: Bottomless pit mate
04:04 imirkin: quite so
05:08 Subv: mmm, in maxwell (and perhaps others), shouldn't "[float a, b] bool p0 = a > b || isnan(a) || isnan(b)" be trivially convertible to FSETP_GTU?
05:09 imirkin: sure, something like that
05:10 HdkR: Does nvcc even optimize that case?
05:10 imirkin: dunno. we sure don't
05:11 imirkin: we don't have a fp-safe optimizer either
05:23 Subv: HdkR: nvcc doesn't
05:23 Subv: nouveau doesn't, either
05:24 Subv: what do you mean by an fp-safe optimizer?
05:24 Subv: performing only optimizations that don't affect the IEEE floating point conformance?
05:28 imirkin: yes
10:56 RSpliet: karolherbst, Sarayan, HdkR: x86 processors have plenty of general purpose registers. Look up "register renaming" if you're interested, greatly reduces the occurance of false hazards
10:56 karolherbst: for some definitons of "plenty" I figure?
10:57 RSpliet: From a lecture 6 years ago I recall "more than 50"
10:57 RSpliet: But the numbers are not public
10:58 karolherbst: I highly doubt you have that many
10:59 karolherbst: I am sure you have 50 registers
10:59 karolherbst: but those aren't gprs
10:59 RSpliet: No, you really do
10:59 RSpliet: Pardon, x86_64
11:00 karolherbst: RSpliet: you have 16 full gprs in x86_64
11:00 karolherbst: everything else isn't a gpr
11:00 RSpliet: karolherbst: Those are architectural registers, not physical registers
11:00 RSpliet: as I said, look up register renaming
11:02 karolherbst: sure, but that's also kind of magic you start to do inside hardware
11:02 karolherbst: it all sucks
11:03 karolherbst: my point was, you don't want to have that inside hardware
11:03 karolherbst: but with x86 you are screwed and have to
11:03 RSpliet: Yeah, but the magic inside hardware matters. The magic is either in hardware or in your compiler, and sticking your magic into a compiler means you have wider instructions (using up more DRAM bandwidth) and no backwards compatibility
11:03 karolherbst: you want that magic inside your compiler
11:04 RSpliet: No. You can't run a Kepler program on Fermi. That's fine for a GPU because the compiler is tied to the API. If you can't run a Ryzen program on Bulldozer you'll get an absolute hell w/ distributing software
11:05 karolherbst: yeah
11:05 karolherbst: that's why x86 sucks
11:05 RSpliet: That's why CPUs suck?
11:05 karolherbst: well, you can solve that issue
11:05 karolherbst: is just the question if you find a solution you want to go
11:06 RSpliet: How? There's a bootstrap problem.
11:06 karolherbst: and?
11:06 RSpliet: Alright, come up with a solution! Go go go!
11:06 karolherbst: you can do a lot of that stuff offline
11:07 karolherbst: normally your CPU doesn't change that often
11:07 karolherbst: you could have fat binaries containing the generic code + the optimized one for your CPU
11:07 karolherbst: this can be even done as a step in the installation process
11:07 karolherbst: still sucks though
11:07 karolherbst: you could break compatibility faster
11:08 karolherbst: you do so regarding SSE/AVX anyway
11:08 karolherbst: or other extensions
11:09 karolherbst: there are applications out there you simply won't run on 10 year old CPUs
11:10 RSpliet: Compatibility is not something people want to sacrifice when you get into anything but desktop workloads. For a desktop it'd be fine, Firefox, *Office and *OS are maintained well enough to fix that quickly. The kind of software that businesses run don't have that kind of manpower behind it
11:10 karolherbst: or maybe you just add a second kind of CPU and do the GPU approach for applications
11:10 karolherbst: you have to break compatibility at some point anyway
11:10 karolherbst: if you want to leave the x86 mess behind you
11:10 RSpliet: karolherbstI'm not a fan of x86, but I think you over-estimate the mess
11:11 karolherbst: most of todays CPU bugs are caused by all that, because you care about perf, but don't want to get limited by what x86 is
11:11 karolherbst: we either stick with an insecure total mess what x86 is, or move to something new
11:11 karolherbst: I don't see how we can make x86 secure and fast
11:12 karolherbst: it's fast, but not secure
11:12 RSpliet: Most bugs are not due to the x86(-64) specification, but due to the implementation. If you want a fast processor, you do out of order, speculation, etc. etc.
11:12 karolherbst: well, not even fast
11:12 karolherbst: yes, becaues you don't want to have a slow CPU
11:12 karolherbst: remove all that, you get a slow CPU
11:13 karolherbst: or maybe you can get it to be fast
11:13 karolherbst: I don't know
11:13 RSpliet: Exactly. Remove multiple-issue and out-of-order and you'll never get past 1 instruction per cycle
11:13 karolherbst: current sitaution is, you add those things to make CPUs faster
11:13 karolherbst: yes, and that's why the current CPU situation just sucks
11:13 karolherbst: it isn't even that much better with ARM
11:14 karolherbst: just they break compatibility more explicit and mroe often
11:14 karolherbst: and people seem to be fine with it
11:14 RSpliet: Because businesses don't run ARM
11:15 karolherbst: right
11:15 karolherbst: but business is where security matters even more
11:15 karolherbst: and I don't have to tell you that's a total fuckup right now anyway
11:15 karolherbst: not even caused by insecure CPUs
11:15 karolherbst: that's just a tip
11:15 karolherbst: *the
11:16 RSpliet: Oh totally. Security is better than it's ever been... and it's pretty poor
11:16 karolherbst: yeah
11:16 karolherbst: some people think that with newer languages it will be better though
11:16 karolherbst: I have my doubts
11:16 karolherbst: doesn't really matter if the hw is insecure
11:16 RSpliet: Security is as strong as the weakest link
11:16 karolherbst: yeah
11:17 karolherbst: so to get more secure we have to break literally all business applications anyway
11:17 karolherbst: that's a sacrifice I am willing to make
11:17 karolherbst: if they aren't able to pay for it, then they should just get down
11:17 karolherbst: I don't care
11:18 karolherbst: don't write software if you aren't able to maintain it
11:18 karolherbst: there is still this widespread opinion, that you are able/want to write software once and be able to use it forever
11:18 karolherbst: which is just pure bs
11:18 RSpliet: Don't build a house if you aren't able to maintain it? Truth is everything in life has a life expectancy, and we always exceed it without considering the consequences for cost reasons
11:19 karolherbst: 1. you aren't 2. you won't
11:19 karolherbst: right, but for online services, business should be more considerate about all that ;)
11:20 RSpliet: They are harming themselves by not being considerate, I'm fine with that
11:20 karolherbst: they are harming their customers as well
11:20 karolherbst: and it is questionable if they harm themselves
11:20 karolherbst: which case do we have, where that was the case?
11:21 karolherbst: in the last few years
11:21 RSpliet: Plenty of companies losing lots of money on ransomware attacks
11:21 karolherbst: and?
11:21 RSpliet: Losing money == harm
11:21 karolherbst: doesn't mean they have less than they would have if they would invest in security
11:22 karolherbst: anyway, their business usually doesn't get hurt fundamentally
11:22 karolherbst: sure, there is some costs
11:22 karolherbst: but that's all part of the cost/benefit calculation
11:22 karolherbst: business as usual if you are willing to say that
11:22 karolherbst: you risk stuff, you get hurt, but you might be better of getting hurt than spending money on security meassures
11:23 karolherbst: companies do this every day
11:23 RSpliet: I can't name names, but I definitely know of some companies that lost boatloads of money for not securing properly... because projects were delayed
11:23 karolherbst: and IT security isn't even the only point
11:23 karolherbst: right, but that's still business as usual
11:23 karolherbst: bad things happen
11:24 karolherbst: what I meant is, there isn't a real insentive to try to be as secure as possible or necessary
11:24 karolherbst: you do some bits
11:24 karolherbst: and sometimes you do less than what people suggested to you
11:24 karolherbst: and then some get hurt
11:24 karolherbst: but
11:24 karolherbst: if you create insecure router willingly, you should go out of business
11:24 karolherbst: like literally
11:25 karolherbst: you shouldn't be allowed to sell stuff anymore
11:25 RSpliet: Yes. Anyway, we're derailing massively. x86-64 processors have more general purpose registers than you can address, and they resolve a lot of of the hazards introduced by register allocation without extending the opcode format or breaking backwards compatibility.
11:25 karolherbst: (cisco is a total security fuckup for years and nothing happens)
11:27 karolherbst: RSpliet: right, I just don't think it's fundamentally what we want in the future though
11:27 karolherbst: or well, we won't be able to solve the important issues sticking to x86
11:31 Sarayan:waves
11:32 Sarayan: karol: most people don't see or care about x86 assembly in their work though
11:32 Sarayan: I mean, javascript, right?
11:32 karolherbst: well
11:32 karolherbst: someone has to update the JIT
11:32 karolherbst: but yeah, that would be managable
11:32 Sarayan: sure, and someone has to write the compiler, or the OS
11:33 karolherbst: right
11:33 Sarayan: but the ratio of compiler-os-jit people to other programmers is insane
11:33 karolherbst: yeah
11:33 karolherbst: that's something we could improve :p
11:33 RSpliet: karolherbst: Oh, ARM processors do the same thing... Power, RISC-V. Any processor that aims for an IPC beyond 1 does this stuff. Renaming isn't even speculation, just dealing with the limitations of an aligned opcode format
11:34 karolherbst: RSpliet: well, renaming isn't all that bad anyway
11:34 karolherbst: out of order/speculation are where the trouble starts
11:34 karolherbst: out of order alone causes big headaches
11:34 karolherbst: like, how to write constant time crypto with a CPU doing out of order stuff
11:34 karolherbst: you start to workaround byb adding more instructions
11:35 karolherbst: and now you have to set barriers and everything
11:35 RSpliet: Heh, yes, the classic. Timing is never going to be predictable on general purpose CPUs
11:35 karolherbst: well, it isn't exactly better on GPUs either though
11:35 karolherbst: and you can fix it by rewriting your crypto lib
11:36 RSpliet: That's the point I make with my PhD thesis ;-)
11:36 karolherbst: best solution: you take every path every time
11:36 Sarayan: what saves GPU is that the half-time of programs on them if a week at most :-)
11:36 karolherbst: and select the path at the end in constant time
11:36 karolherbst: or well, select the correct result
11:36 Sarayan: Quantum!
11:36 Sarayan: (sorry)
11:37 karolherbst: *sigh* :p
11:37 RSpliet: karolherbst: And then there's the unpredictable timing arising from sharing your DRAM/PCI-e/USB bus with multiple cores
11:37 karolherbst: right
11:37 karolherbst: but
11:37 karolherbst: that doesn't matter that much
11:37 karolherbst: if your first run is faster/slower than the second one means nothing if you can't predict what path was taken
11:37 Sarayan: you can ensure that your L1 is hot before the timing-critical code
11:38 karolherbst: if the time invariant is not depending on the content, but on other factors you might be good to go
11:38 karolherbst: but
11:38 karolherbst: for most libraries it isn't the case
11:38 RSpliet: I think computers should get to a point that there are *so many* factors of unpredictable timing that you can't infer (leak) information based on timing anymore. Spectre shows how you can isolate out the chaos, so I think we need more chaos ;-)
11:38 karolherbst: :D
11:38 karolherbst: or you can just write secure code :p
11:39 karolherbst: simple, isn't it?
11:39 Sarayan: the problem with chaos is that differential analysis is damn powerful
11:39 karolherbst: anyway, most developers are crap in regards to writing secure code
11:39 RSpliet: "Timing predictable crypto code" is an academic pastime from 20 years ago, I wouldn't dare to rely on it's safety
11:39 karolherbst: that's just a given
11:40 karolherbst: and most developers are overestimating their ability to write secure code
11:40 RSpliet: 20% secure, as usual.
11:40 Sarayan: Rule #1 about writing crypto libs: don't
11:40 karolherbst: ;)
11:41 karolherbst: not even crypto libs is the issue
11:41 karolherbst: even using crypto libs can go horribly wrong
11:41 karolherbst: simple task: write a secure login function
11:41 karolherbst: "simple"
11:41 Sarayan: mwahahaha
11:41 karolherbst: using whatever libraries there are out there
11:42 Sarayan: can the attacker rewrite the hard-drive firmware to change /etc/shadow on the fly?
11:42 karolherbst: internet login stuff
11:43 RSpliet: Okay, so what I'd do is allocate a 32 byte buffer, then read keyboard input into it until I detect a carriage return. Compare the input to the contents of password.txt and...
11:43 karolherbst: you are stupid and want to write your own oauth server or something
11:43 Sarayan: web stuff, answering "new connection, who that?" is hard in the first place
11:43 karolherbst: RSpliet: ;)
11:43 RSpliet: Sarayan: sure, your WD hard-drive just contains a bunch of ARM cores.
11:43 karolherbst: RSpliet: first version: 50 loc, secure version: 1k loc
11:43 karolherbst: :p
11:43 Sarayan: not fucking up session cookies in the first place is non trivial :-)
11:44 karolherbst: :D
11:44 karolherbst: true
11:44 karolherbst: you compare byte by byte, right?
11:44 karolherbst: with early exit
11:44 karolherbst: has to be correct
11:44 karolherbst: fun fact, most string compares actually do that
11:44 karolherbst: like bailing on first mismatch
11:44 karolherbst: but you can't use that for that stuff
11:45 Sarayan: karol: funnier fact, the wii compared the signature hashes with strcmp
11:45 karolherbst: you always have to compare $max_token_length chars every time
11:45 karolherbst: :D
11:45 karolherbst: crap
11:45 karolherbst: I assume this could be exploited?
11:45 Sarayan: which means bailing out at the first zero
11:45 Sarayan: this has been exploited a lot :-)
11:45 karolherbst: of course
11:46 Sarayan: http://wiibrew.org/wiki/Signing_bug
11:46 karolherbst: ohh crap
11:46 karolherbst: that's stupid
11:46 karolherbst: or well
11:46 karolherbst: not stupid, as most devs don't know better
11:47 Sarayan: they also won't notice in testing
11:48 Sarayan: sony did worse though :-)
11:48 karolherbst: they all do
11:49 Sarayan: They signed all their stuff with ECDSA and never changed the random value
11:49 karolherbst: :D
11:49 karolherbst: hey, we have a standard for that though
11:49 karolherbst: called eTLS
11:49 karolherbst: ..........
11:49 karolherbst: don't look it up
11:49 Sarayan: For those who don't know, sign two things with the same ecdsa key and the same random value and you can recompute the ecdsa private key
11:50 Sarayan: "oops"
11:50 karolherbst: it would hurt you big times
11:50 karolherbst: I give you the short summary
11:50 Sarayan: t'was in the ps3
11:50 karolherbst: data center people wants to have those MID firewalls, but that won't work with TLS 1.3
11:50 karolherbst: so now there is eTLS ;)
11:51 karolherbst: uhm
11:51 karolherbst: mitm
11:51 karolherbst: not mid
11:51 Sarayan: they call that "packet inspection", right?
11:51 karolherbst: right
11:51 karolherbst: and with https you have to do a mitm thing, and let users install your CA cert and so on
11:51 Sarayan: or require people to use a proxy
11:52 Sarayan: that's what they do here
11:52 karolherbst: but with tls 1.3 which forces ephemeral DH ciphers, this doesn't work anymore
11:52 karolherbst: yeah...
11:52 karolherbst: I once worked in such a company where they installed that later
11:52 karolherbst: my home server wasn't accessible anymore
11:52 karolherbst: that's how I noticed
11:52 Sarayan: at least they don't pretend they don't inspect stuff
11:52 karolherbst: because.. you know, the FW bails if it can't decrypt it
11:52 karolherbst: what a relief
11:53 karolherbst: not working for such companies is the best thing you can do
11:53 karolherbst: the day they install such a firewall, without notice, because you don't agree
11:53 Sarayan: Well, I know the admins, I know the level of control they want, it's reasonable
11:54 karolherbst: and you have to agree because this requires some It policies internally in the company
11:54 karolherbst: so you have to leave
11:54 karolherbst: simple
11:54 karolherbst: it's not reasonable
11:55 karolherbst: that firewall did even block steam login
11:55 karolherbst: because it does crypto stuff inside JS
11:55 karolherbst: and sent out crypto stuff via HTTPS
11:56 karolherbst: the problem isn't even why they do that, it's just that those firewalls are inherently insecure as well
11:56 karolherbst: it's childs play to attack those
11:57 Sarayan: cnow yeah that's a damn real problem
11:58 karolherbst: you make it even simplier
11:58 karolherbst: because everything which goes through that FW is "secure"
11:58 karolherbst: because of that CA cert
11:58 karolherbst: some of those didn't even verified the web server had a valid cert
11:58 karolherbst: "rookie mistake"
11:59 Sarayan: "woops"
14:59 karolherbst: \o/ HdkR did you know about that flickering sun issue in dolphin with zelda wind waker?
14:59 karolherbst: under nouveau?
15:01 karolherbst: anyway, will send out the first patches today to get that multi context stuff fixed
15:01 karolherbst: some cleanup stuff which should be easy to get done
15:07 HdkR: karolherbst: I did not know that nouveau had that issue
15:37 AndrewR: hi all. I tried to recompile mesa with opencl support, and to my surprize clinfo showed something even for nv92 ?! https://pastebin.com/H298uzF5
15:37 karolherbst: AndrewR: doesn't work though
15:37 AndrewR: karolherbst, :)
15:37 karolherbst: but yeah
15:38 karolherbst: currently clover doesn't really check if it would work or not
15:39 karolherbst: or maybe it checks for TGSI?
15:39 karolherbst: something like that
15:39 AndrewR: karolherbst, anyway, i was reading pdf presentation from early 2018, and it seems this generation GPU will need even more rework for even trying this OpenCL stuff ..
15:39 karolherbst: most likely
15:39 karolherbst: but
15:39 karolherbst: maybe it just works?
15:39 karolherbst: dunno
15:39 karolherbst: didn't test myself
18:46 BootI386: karolherbst: Note that IA64 was a huge flop, whereas it could have been a decent x86 replacement
18:47 Sarayan: no, there's no way to make ia64 decent
18:47 karolherbst: yeah, ia64 was a failure
18:47 joepublic: Would it? It had poor x86 dirt slow emulation, which ironically was a great example of how good its application performance was
18:50 HdkR: I don't believe a VLIW architecture is a sane x86 replacement
18:51 BootI386: Well, it had no speculative execution and shifted all the opt task to the compiler
18:52 karolherbst: which is genrally a sane thing to do
18:52 karolherbst: but doesn't help if you have to emulate x86
18:52 BootI386: ofc
18:53 HdkR: I'm currently in the process of learning x86-64 for the first time :P
18:53 karolherbst: HdkR: I bet you would rather continue working on GPU ISAs :p
18:54 BootI386: AMD64 is crazy (actually less than x86), and that's why i like it
18:54 BootI386: But
18:56 BootI386: Crazyness is not a very good idea when it comes to security :)
18:56 HdkR: Yes, x86-64 is disgusting
18:56 HdkR: but it is a project I find interesting so I should become fairly familiar with the isa
18:56 karolherbst: the FPU thing is just messy
18:56 HdkR: FPU register stack
18:57 BootI386: Forget everything except SSE4
18:57 karolherbst: HdkR: do you now why there is no floating point stuff inside the kernel?
18:57 Sarayan: x86-64 doesn't have a register stack anymore
18:57 Sarayan: karol: because saving fpu state is costly
18:57 Sarayan: so it's done lazily
18:57 HdkR: Yea, probably that
18:57 karolherbst: wouldn't be an issue if you wouldn't have any fpu state to save :p
18:58 HdkR: I've done FPU code in the kernel before. Was fun
18:58 HdkR: but that was on a custom ARMv7 target
18:58 Sarayan: you have to save state before, restore after, and manage if you're context-switched out
18:58 BootI386: Lol
18:59 Sarayan: I think there's primitives for that though, like kernel_fpu_enter() / kernel_fpu_exit()
18:59 karolherbst: and then there GPUs where all that doesn't matter
18:59 HdkR: (and the FPU wasn't exposed to userspace so I didn't have to deal with context saving)
19:00 karolherbst: HdkR: no FPU in userspace sucks though
19:01 HdkR: Userspace didn't require it in this case :D
19:01 HdkR: Wasn't running user generated code
19:01 karolherbst: but at least for x86, SSE is the non crappy x87 FPU stuff
19:01 HdkR: aye
19:01 BootI386: Forget x87
19:01 HdkR: I only learned about x87's register stack crap last night
19:02 BootI386: It never existed
19:02 BootI386: Erased from history
19:02 Sarayan: BootI386: what, you don't like 80-bits floats?
19:02 HdkR: Or hardware BCD support? :D
19:03 BootI386: Staaaap. :x
19:03 karolherbst: I guess the good thing about x87 back in those days was, it was faster than software emulation
19:04 BootI386: *the *only* good thing